## TL;DR
Reinforcement Learning (RL) trains agents to make sequential decisions by maximizing cumulative reward through trial-and-error interaction with an environment. Key concepts: Agent, Environment, State, Action, Reward, Policy. Famous successes: AlphaGo, Dota 2 (OpenAI Five), robotics.
## Core Explanation
Markov Decision Process (MDP): formalizes RL as (S, A, P, R, γ). Value functions: V(s) expected return from state, Q(s,a) from state-action pair. Bellman equation: recursive relationship of value functions. Q-learning: model-free, learns optimal policy without environment model. Deep Q-Network (DQN, DeepMind 2013): combines Q-learning with deep neural networks, mastered Atari games from pixels.
## Further Reading
- [Reinforcement Learning: An Introduction (2nd Ed, Sutton & Barto)](http://incompleteideas.net/book/the-book-2nd.html)