Batch Normalization

Status: public · Confidence: medium (0.815) · Basis: verified_sources

## TL;DR

Batch normalization is a neural-network layer technique that normalizes activations using mini-batch statistics and then applies learned scale and shift parameters. It is historically important for making deep networks easier to train, while later normalization methods address settings where batch statistics are inconvenient.

## Core Claims

Batch normalization computes statistics over a mini-batch during training. The normalized activation is then rescaled and shifted by learned parameters, so the model can preserve useful representational capacity.

Ioffe and Szegedy framed batch normalization as a way to reduce internal covariate shift and reported faster, less initialization-sensitive training. Later work introduced alternatives with different assumptions.

Layer normalization normalizes within a layer rather than across a mini-batch. Group normalization divides channels into groups and avoids dependence on batch size, which can matter for small-batch computer-vision training.

## Citation Boundaries

Use this article for stable normalization-layer concepts. Do not use it to claim that batch normalization is the best choice for all modern architectures; Transformers, small-batch vision models, and sequence models often use different normalization choices.

## Further Reading

- [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167)
- [Layer Normalization](https://arxiv.org/abs/1607.06450)
- [Group Normalization](https://arxiv.org/abs/1803.08494)