Reinforcement Learning¶

Learning to act by maximizing cumulative reward through interaction with an environment.

Imagine teaching a puppy to sit. You can't explain the word 'sit' — instead, whenever it happens to sit, you give a treat, and over time the puppy learns which actions earn treats. Reinforcement learning works the same way. A computer program, called an agent, tries actions inside some world, and each action earns a small reward or penalty. Nobody tells the agent the correct answer; it discovers, through lots of trial and error, which sequences of actions lead to the most reward over time. The tricky part is patience: sometimes the best move gives no reward now but sets up a big reward later, so the agent must learn to plan ahead rather than grab the nearest treat. This is how programs learn to play games, steer robots, and improve chatbots.

The main ideas¶

Markov decision processes — The formal framework: states, actions, rewards, transitions, and policies.
Value & policy methods — Q-learning, policy gradients, actor-critic, and when to use each.
Deep RL — Combining RL with deep networks (DQN, PPO) for high-dimensional problems.
RLHF — Reinforcement learning from human feedback — how LLMs are aligned to preferences.
Multi-agent RL — Many agents learning together — cooperation, competition, and emergent behaviour.

Machine Learning · AI Agents & Autonomy · Robotics & Embodied AI

Want to make things?

Head to AI School — AI camps where kids build their own games.

Reinforcement Learning¶

The main ideas¶

Related areas¶