Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications

Categories

🎓 Certifications
🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
CheatGrid
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications
LVLEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Model Pruning and Neural Network Compression Cheat Sheet

Model Pruning and Neural Network Compression Cheat Sheet

Back to AI and Machine Learning
Updated 2026-05-02
Next Topic: Model Training & Optimization Cheat Sheet

Model pruning is a neural network compression technique that systematically removes weights, neurons, channels, or entire structures from trained networks to reduce computational cost and memory footprint while preserving accuracy. Originally inspired by biological synaptic pruning, modern pruning methods balance sparsity (percentage of parameters removed) against performance degradation, enabling deployment on resource-constrained devices and reducing inference latency. Unlike quantization or knowledge distillation, pruning directly eliminates redundant parameters rather than representing them more efficiently. The lottery ticket hypothesis suggests that dense networks contain sparse subnetworks ("winning tickets") that, when trained in isolation, can match or exceed original performance—fundamentally changing our understanding of why over-parameterized networks train successfully.

What This Cheat Sheet Covers

This topic spans 13 focused tables and 85 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Pruning TypesTable 2: Pruning Criteria and Importance MetricsTable 3: Pruning Schedules and StrategiesTable 4: Pruning Methods for LLMs and TransformersTable 5: Pruning for Vision ModelsTable 6: Combined Compression TechniquesTable 7: Pruning Schedules and RecoveryTable 8: Sparsity Patterns and ImplementationsTable 9: Benchmarking and EvaluationTable 10: Pruning-Specific Tools and FrameworksTable 11: Specialized Pruning Research MethodsTable 12: Domain-Specific Pruning ApplicationsTable 13: Handling Normalization and Special Layers

Table 1: Core Pruning Types

Pruning methods differ mainly in what they remove and how regular the resulting sparsity is. Unstructured approaches drop individual weights for maximum compression but need special kernels to run fast, while structured ones—cutting whole channels, filters, neurons, attention heads, or even entire layers—keep the math dense and so deliver real speedups on ordinary hardware. The granularity you pick is the central trade-off between compression ratio and practical inference gains.

TypeExampleDescription
Unstructured magnitude pruning
mask = torch.abs(weight) > threshold
• Removes individual weights below a magnitude threshold
• creates irregular sparse matrices with highest compression but requires specialized sparse kernels for speedup
Structured channel pruning
prune_channels(conv_layer, indices)
# Remove entire output channels
Removes entire output channels in convolutional layers, reducing both FLOPs and actual inference time on standard hardware without sparse kernel support.
Filter pruning
remove_filters(conv_layer, bottom_k)
# Prune complete 3D filters
• Eliminates complete convolutional filters (kernels), reducing model width and feature map dimensions
• hardware-friendly and maintains dense matrix operations
Neuron pruning
mask_neurons(fc_layer, importance < t)
• Removes entire neurons from fully connected layers based on activation patterns or output contribution
• more aggressive than weight pruning but preserves layer structure

More in AI and Machine Learning

  • Model Monitoring and Drift Detection Cheat Sheet
  • Model Training & Optimization Cheat Sheet
  • AI Bias & Fairness Cheat Sheet
  • Edge AI and TinyML Cheat Sheet
  • Mixture of Experts (MoE) Architecture Cheat Sheet
  • PyTorch Cheat Sheet
View all 83 topics in AI and Machine Learning