Module 1Reinforcement Learning
MDPs and Bellman Equations
Learn the formal language that all RL algorithms build on.
Why this module matters
Without MDP intuition, RL algorithms feel disconnected and magical.
Prerequisites
- ▸ Basic probability
Learning objectives
- ▸ Define states, actions, rewards, and returns
- ▸ Interpret Bellman recursion
- ▸ Connect long-horizon decisions to local updates
Core concepts
Return
Value function
Bellman expectation
Hands-on practice
- ▸ Write Bellman updates by hand on a tiny GridWorld
Expected output
A small worksheet that makes Bellman thinking concrete.
Study checklist
- ✅ Define states, actions, rewards, and returns
- ✅ Interpret Bellman recursion
- ✅ Connect long-horizon decisions to local updates
Common mistakes
- ⚠️ Confusing reward and return
- ⚠️ Ignoring discount-factor interpretation
Module rhythm
- 1. Read the summary and why-it-matters section first.
- 2. Work through concepts before rushing into practice.
- 3. Use the checklist to verify real understanding, not just completion.
How to continue
Now solve small environments exactly with dynamic programming.
Back to course overview →How to use this page well
Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.