Knowledge distillation is a model compression technique in machine learning where knowledge from a large, complex teacher model is transferred to a smaller, more efficient student model. Originally introduced by Geoffrey Hinton in 2015, this approach enables deploying powerful AI capabilities on resource-constrained devices while maintaining competitive performance. The core insight is that the soft probability distributions (dark knowledge) produced by teacher models contain rich inter-class relationships that are lost when using only hard labels. Distillation has become essential for deploying deep learning models at scale, with modern applications spanning NLP transformers, computer vision CNNs, and large language models where compression ratios of 10x or more are achievable with minimal accuracy loss.
Share this article