Knowledge Distillation

Status: public · Confidence: high (0.86) · Basis: verified_sources

## TL;DR

Knowledge distillation compresses model behavior from a teacher into a student. This repair keeps the article focused on core distillation and one BERT-family example.

## Core Explanation

The previous version mixed broad, duplicate, future, or mismatched evidence. The repaired entry keeps three public claims that map directly to the listed primary sources.

## Further Reading

- [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)
- [Knowledge Distillation: A Survey](https://doi.org/10.1007/s11263-021-01453-z)
- [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108)