## TL;DR
Model evaluation metrics quantify performance. Classification: accuracy, precision, recall, F1-score, ROC-AUC. Regression: MSE, MAE, R². Confusion matrix: TP/FP/FN/TN. Choose metrics aligned with business goals: medical diagnosis needs high recall (miss fewer positives), spam detection needs high precision (fewer false alarms).
## Core Explanation
Precision = TP/(TP+FP) — when model predicts positive, how often is it correct? Recall = TP/(TP+FN) — how many actual positives did model find? F1 = harmonic mean of precision/recall. ROC-AUC: area under receiver operating characteristic curve — probability that random positive ranks above random negative. Cross-validation: k-fold CV gives robust performance estimate.
## Further Reading
- [Pattern Recognition and Machine Learning (Bishop)](https://www.microsoft.com/en-us/research/people/cmbishop/prml-book/)