Module 2Reinforcement Learning
Dynamic Programming
Use exact tabular methods to build value-function intuition.
Why this module matters
Dynamic programming makes value updates tangible before function approximation complicates everything.
Prerequisites
- ▸ MDPs
Learning objectives
- ▸ Run policy evaluation
- ▸ Compare policy iteration and value iteration
- ▸ Visualize value improvement
Core concepts
Policy evaluation
Policy improvement
Optimality
Hands-on practice
- ▸ Implement value iteration for GridWorld
Expected output
A tabular solver with policy/value visualization.
Study checklist
- ✅ Run policy evaluation
- ✅ Compare policy iteration and value iteration
- ✅ Visualize value improvement
Common mistakes
- ⚠️ Updating policy before evaluation converges
- ⚠️ Treating DP as irrelevant because it does not scale
Module rhythm
- 1. Read the summary and why-it-matters section first.
- 2. Work through concepts before rushing into practice.
- 3. Use the checklist to verify real understanding, not just completion.
How to use this page well
Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.