---
atomic_facts:
  - id: fact-kd-1
    statement: Knowledge distillation trains a smaller student model to match information from a larger teacher model.
    source_title: Distilling the Knowledge in a Neural Network
    source_url: https://arxiv.org/abs/1503.02531
    confidence: high
  - id: fact-kd-2
    statement: >-
      Gou et al. survey knowledge-distillation methods across teacher-student architectures and learning
      objectives.
    source_title: "Knowledge Distillation: A Survey"
    source_url: https://doi.org/10.1007/s11263-021-01453-z
    source_doi: 10.1007/s11263-021-01453-z
    confidence: high
  - id: fact-kd-3
    statement: DistilBERT reports a smaller and faster distilled version of BERT for language-understanding tasks.
    source_title: "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter"
    source_url: https://arxiv.org/abs/1910.01108
    confidence: high
category: ai
completeness: 0.84
confidence: high
conflict_of_interest: none_declared
created_date: "2026-05-22"
data_period: static
derived_from_human_seed: true
disputed_statements: []
generation_method: ai_structured
id: kb-2026-00286
is_live_document: false
known_gaps:
  - This compact repair keeps only source-mapped public claims from the sampled audit entry.
language: en
last_verified: "2026-05-28"
primary_sources:
  - title: Distilling the Knowledge in a Neural Network
    type: academic_paper
    year: 2015
    url: https://arxiv.org/abs/1503.02531
    institution: arXiv
  - title: "Knowledge Distillation: A Survey"
    type: academic_paper
    year: 2021
    url: https://doi.org/10.1007/s11263-021-01453-z
    doi: 10.1007/s11263-021-01453-z
    institution: International Journal of Computer Vision
  - title: "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter"
    type: academic_paper
    year: 2019
    url: https://arxiv.org/abs/1910.01108
    institution: arXiv
schema_type: TechArticle
secondary_sources: []
title: Knowledge Distillation
updated: "2026-05-28"
---

## TL;DR

Knowledge distillation compresses model behavior from a teacher into a student. This repair keeps the article focused on core distillation and one BERT-family example.

## Core Explanation

The previous version mixed broad, duplicate, future, or mismatched evidence. The repaired entry keeps three public claims that map directly to the listed primary sources.

## Further Reading

- [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)
- [Knowledge Distillation: A Survey](https://doi.org/10.1007/s11263-021-01453-z)
- [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108)