# Knowledge Distillation Status: public Confidence: high (0.86) (verified) Last verified: 2026-05-28 Generation: ai_structured ## TL;DR Knowledge distillation compresses model behavior from a teacher into a student. This repair keeps the article focused on core distillation and one BERT-family example. ## Core Explanation The previous version mixed broad, duplicate, future, or mismatched evidence. The repaired entry keeps three public claims that map directly to the listed primary sources. ## Further Reading - [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531) - [Knowledge Distillation: A Survey](https://doi.org/10.1007/s11263-021-01453-z) - [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108)