## TL;DR
AI-powered video surveillance goes far beyond simple motion detection -- modern systems track multiple people and objects across camera networks, recognize specific behaviors (fighting, falling, loitering), detect anomalies in real-time, and even answer natural language questions about what happened in a video. The rise of edge AI and privacy-preserving techniques is making intelligent surveillance both more powerful and more accountable.

## Core Explanation
Traditional video surveillance: motion detection, manual review, hours of footage for seconds of relevant events. AI-powered: (1) Object detection and classification -- person, vehicle, animal, package with bounding boxes and confidence scores; (2) Multi-object tracking (MOT) -- follow the same person/vehicle across frames and cameras (ReID -- person re-identification using appearance features); (3) Action/behavior recognition -- classify actions (walking, running, fighting, falling) from video clips; (4) Anomaly detection -- identify unusual events without pre-defined categories (unsupervised -- learn normal patterns, flag deviations); (5) Face recognition -- identify known individuals against watchlists; (6) Crowd analytics -- count, density estimation, flow patterns.

## Detailed Analysis
Transformer architectures for video: (1) Video Swin Transformer -- extends Swin Transformer to 3D by computing self-attention within local 3D windows, hierarchically merging patches; (2) TimeSformer -- applies spatial attention and temporal attention separately, more efficient than full 3D attention (divides computation by factor of patch_count); (3) VideoMAE -- masked autoencoder pretraining on video, reconstructing masked spatio-temporal patches. Anomaly detection approaches: (1) Reconstruction-based -- autoencoder trained on normal video, flags frames with high reconstruction error as anomalous; (2) Prediction-based -- predict future frames, anomaly = large prediction error; (3) Weakly supervised -- trained on video-level labels (normal/anomalous), learns to localize anomalous segments. Privacy-preserving surveillance: edge AI cameras (NVIDIA Jetson, Google Coral) run inference locally, transmitting only metadata (counts, events, trajectories) rather than video streams -- reducing bandwidth by 99% and addressing privacy regulations. The "smart city" surveillance vision faces tension between public safety benefits and mass surveillance concerns -- technical privacy measures (federated learning, on-device processing, blurring of non-target individuals) are essential prerequisites for ethical deployment.