---
id: loss-functions
title: Loss Functions in Machine Learning
schema_type: TechArticle
category: ai
language: en
confidence: high
last_verified: "2026-05-24"
created_date: "2026-05-24"
generation_method: ai_assisted
ai_models:
  - claude-opus
derived_from_human_seed: true
conflict_of_interest: none_declared
is_live_document: false
data_period: static
atomic_facts:
  - id: f1
    statement: >-
      Cross-entropy loss is the standard objective for classification, minimizing the negative log-likelihood of the true class. It provides strong gradient signals and is theoretically connected to
      maximum likelihood estimation.
    source_title: Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep Learning (Chapter 6.2). MIT Press 2016
    source_url: https://www.deeplearningbook.org/contents/mlp.html
    confidence: high
  - id: f2
    statement: >-
      Focal Loss (Lin et al. 2017, ICCV) down-weights easy examples and focuses training on hard negatives, addressing extreme class imbalance in object detection. It was key to RetinaNet achieving
      state-of-the-art results.
    source_title: Lin, Tsung-Yi, et al. Focal Loss for Dense Object Detection. ICCV 2017
    source_url: https://arxiv.org/abs/1708.02002
    confidence: high
  - id: f3
    statement: >-
      Contrastive loss (InfoNCE, Oord et al. 2018, DeepMind) maximizes mutual information between positive pairs while pushing away negatives. It is the foundation of SimCLR, CLIP, and most
      self-supervised learning methods.
    source_title: van den Oord, Aaron, et al. Representation Learning with Contrastive Predictive Coding. 2018
    source_url: https://arxiv.org/abs/1807.03748
    confidence: high
completeness: 0.9
known_gaps:
  - Contrastive losses (InfoNCE, SimCLR)
  - Perceptual losses for generative models
disputed_statements:
  - statement: No major disputed statements identified
primary_sources:
  - title: Pattern Recognition and Machine Learning (Bishop)
    type: textbook
    year: 2006
    url: https://link.springer.com/book/9780387310732
    institution: Springer
  - title: Focal Loss for Dense Object Detection
    type: academic_paper
    year: 2017
    url: https://arxiv.org/abs/1708.02002
    institution: ICCV
secondary_sources:
  - title: A Comprehensive Survey of Loss Functions in Machine Learning
    type: survey_paper
    year: 2024
    authors:
      - multiple
    institution: Annals of Data Science (Springer)
    url: https://doi.org/10.1007/s40745-024-00537-4
  - title: Focal Loss for Dense Object Detection
    type: conference_paper
    year: 2017
    authors:
      - Lin, Tsung-Yi
      - Goyal, Priya
      - Girshick, Ross
      - He, Kaiming
      - Dollár, Piotr
    institution: Facebook AI Research / ICCV
    url: https://arxiv.org/abs/1708.02002
  - title: "Deep Learning (Textbook) — Chapter 6.2.1: Loss Functions & MLE"
    type: textbook
    year: 2016
    authors:
      - Goodfellow, Ian
      - Bengio, Yoshua
      - Courville, Aaron
    institution: MIT Press
    url: https://www.deeplearningbook.org/
  - title: A Simple Framework for Contrastive Learning of Visual Representations (SimCLR)
    type: conference_paper
    year: 2020
    authors:
      - Chen, Ting
      - Kornblith, Simon
      - Norouzi, Mohammad
      - Hinton, Geoffrey
    institution: Google Research / ICML
    url: https://arxiv.org/abs/2002.05709
updated: "2026-05-24"
---
## TL;DR
Loss functions quantify the difference between model predictions and ground truth, guiding optimization. Cross-entropy dominates classification; MSE dominates regression; specialized losses handle imbalanced, structured, or adversarial tasks.

## Core Explanation
MSE (mean squared error) penalizes large errors quadratically — sensitive to outliers. MAE (mean absolute error) is more robust but has non-differentiable points. Huber loss combines both. For generative models, GAN losses (adversarial, Wasserstein) and diffusion losses (noise prediction) require task-specific formulations.

## Detailed Analysis
Triplet loss (FaceNet, Schroff et al., 2015) learns embeddings by ensuring anchor-positive distance is less than anchor-negative by a margin. CTC loss (Connectionist Temporal Classification) handles sequence alignment without explicit segmentation — fundamental to speech recognition.

## Further Reading
- PyTorch Loss Functions Documentation
- Papers With Code: Loss Functions
- Keras: Losses Guide