Causal Representation Learning: Deep Causal Discovery, Intervention, and Counterfactuals

## TL;DR
Causal Representation Learning bridges deep learning with causality — moving beyond correlational patterns to learn representations that encode cause-effect relationships. Unlike standard deep learning which captures statistical associations, causal representations enable robust generalization, intervention reasoning, and counterfactual "what-if" predictions.

## Core Explanation
Standard deep learning: learn representations that predict outputs well (correlation). Problem: spurious correlations (e.g., predicting pneumonia from X-rays using hospital-specific text markers rather than lung pathology) lead to brittle models that fail under distribution shift. Causal approach: learn representations that capture the underlying causal generative factors — independent mechanisms that remain invariant under interventions. Pearl's causal hierarchy: Level 1 (Association): P(y|x) — standard ML; Level 2 (Intervention): P(y|do(x)) — what happens if we change x?; Level 3 (Counterfactual): P(y_x'|x,y) — what would have happened had x been different? Causal representation learning targets Level 2-3.

## Detailed Analysis
Key methods: (1) Invariant Risk Minimization (IRM) — learn representations where the optimal classifier is invariant across environments; (2) Variational causal inference — treat latent confounders as learned variables; (3) CausalVAE — jointly learn causal graph and latent representations; (4) CITRIS (Causal Identifiability from Temporal Intervened Sequences) — identifies causal factors from interventional time-series data. ICA (Independent Component Analysis) provides theoretical foundations for identifiability — under certain nonlinear ICA conditions, true causal variables can be recovered from observations alone. The ACM Computing Surveys 2025 review emphasizes three pillars: how deep learning tackles identifiability, how deep architectures encode causal structure, and how causal principles improve robustness. Applications: healthcare (treatment effect estimation from EHR data), economics (policy impact evaluation), and autonomous driving (predicting consequences of actions). Critical open problem: moving from "small bottleneck" causal representations to high-dimensional representations comparable to self-supervised models (e.g., CLIP, GPT embeddings).

## Further Reading
- The Book of Why by Judea Pearl (2018)
- Causal Inference in Statistics: A Primer (Pearl, Glymour, Jewell, 2016)
- CausalAI Conference & DoWhy/PyWhy Python Libraries