## TL;DR
Backpropagation is the fundamental algorithm for training neural networks, computing gradients of the loss with respect to every weight via repeated application of the calculus chain rule.

## Core Explanation
The forward pass computes predictions; the backward pass computes gradients. For each layer, backprop multiplies the upstream gradient by the local derivative (activation function, weight matrix). The chain rule connects output error to every parameter.

## Detailed Analysis
The vanishing gradient problem occurs when activation functions like sigmoid/tanh saturate — derivatives approach zero, stopping learning in early layers. ReLU (f(x)=max(0,x)) alleviates this with constant gradient of 1 for positive inputs. Batch normalization further stabilizes gradient flow by normalizing layer inputs.

## Further Reading
- Stanford CS231n: Backpropagation Notes
- Goodfellow et al., Deep Learning, Ch.6
- Distill.pub: Why Momentum Really Works