Module 1Reinforcement Learning

MDPs and Bellman Equations

Learn the formal language that all RL algorithms build on.

Why this module matters

Without MDP intuition, RL algorithms feel disconnected and magical.

Prerequisites

  • Basic probability

Learning objectives

  • Define states, actions, rewards, and returns
  • Interpret Bellman recursion
  • Connect long-horizon decisions to local updates

Core concepts

Return
Value function
Bellman expectation

Hands-on practice

  • Write Bellman updates by hand on a tiny GridWorld

Expected output

A small worksheet that makes Bellman thinking concrete.

Study checklist

  • Define states, actions, rewards, and returns
  • Interpret Bellman recursion
  • Connect long-horizon decisions to local updates

Common mistakes

  • ⚠️ Confusing reward and return
  • ⚠️ Ignoring discount-factor interpretation

Module rhythm

  • 1. Read the summary and why-it-matters section first.
  • 2. Work through concepts before rushing into practice.
  • 3. Use the checklist to verify real understanding, not just completion.

How to continue

Now solve small environments exactly with dynamic programming.

Back to course overview →

How to use this page well

Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.