Object Detection: YOLO, R-CNN, and DETR

## TL;DR
Object detection identifies and localizes objects within images. Two-stage detectors (R-CNN family) prioritize accuracy; single-stage detectors (YOLO, SSD) prioritize speed; transformer-based detectors (DETR) simplify the pipeline.

## Core Explanation
R-CNN progression: R-CNN (region proposals + CNN) → Fast R-CNN (shared convolutions) → Faster R-CNN (Region Proposal Network) → Mask R-CNN (adds instance segmentation). YOLO divides the image into a grid, predicting bounding boxes and class probabilities per cell. DETR uses learned object queries as input to a transformer decoder.

## Detailed Analysis
Anchor boxes were the dominant approach for years — predefined bounding box templates at multiple scales and aspect ratios. Anchor-free methods (CornerNet, CenterNet) and transformer methods (DETR, Deformable DETR) have simplified detection pipelines. Non-maximum suppression (NMS) remains essential for removing duplicate predictions.

## Further Reading
- Papers With Code: Object Detection
- YOLO Official Documentation (Ultralytics)
- MMDetection Open-source Toolbox