Module 2Reinforcement Learning

Dynamic Programming

Use exact tabular methods to build value-function intuition.

Why this module matters

Dynamic programming makes value updates tangible before function approximation complicates everything.

Prerequisites

▸ MDPs

Learning objectives

▸ Run policy evaluation
▸ Compare policy iteration and value iteration
▸ Visualize value improvement

Core concepts

Policy evaluation

Policy improvement

Optimality

Hands-on practice

▸ Implement value iteration for GridWorld

Expected output

A tabular solver with policy/value visualization.

Study checklist

✅ Run policy evaluation
✅ Compare policy iteration and value iteration
✅ Visualize value improvement

Common mistakes

⚠️ Updating policy before evaluation converges
⚠️ Treating DP as irrelevant because it does not scale

Module rhythm

1. Read the summary and why-it-matters section first.
2. Work through concepts before rushing into practice.
3. Use the checklist to verify real understanding, not just completion.

How to continue

Then shift from exact solutions to sampled returns.

Back to course overview →

How to use this page well

Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.