Model Pruning and Neural Network Compression Cheat Sheet

Back to AI and Machine Learning

Model pruning is a neural network compression technique that systematically removes weights, neurons, channels, or entire structures from trained networks to reduce computational cost and memory footprint while preserving accuracy. Originally inspired by biological synaptic pruning, modern pruning methods balance sparsity (percentage of parameters removed) against performance degradation, enabling deployment on resource-constrained devices and reducing inference latency. Unlike quantization or knowledge distillation, pruning directly eliminates redundant parameters rather than representing them more efficiently. The lottery ticket hypothesis suggests that dense networks contain sparse subnetworks ("winning tickets") that, when trained in isolation, can match or exceed original performance—fundamentally changing our understanding of why over-parameterized networks train successfully.

Back to AI and Machine Learning