Module 2Reinforcement Learning

Dynamic Programming

Use exact tabular methods to build value-function intuition.

Why this module matters

Dynamic programming makes value updates tangible before function approximation complicates everything.

Prerequisites

  • MDPs

Learning objectives

  • Run policy evaluation
  • Compare policy iteration and value iteration
  • Visualize value improvement

Core concepts

Policy evaluation
Policy improvement
Optimality

Hands-on practice

  • Implement value iteration for GridWorld

Expected output

A tabular solver with policy/value visualization.

Study checklist

  • Run policy evaluation
  • Compare policy iteration and value iteration
  • Visualize value improvement

Common mistakes

  • ⚠️ Updating policy before evaluation converges
  • ⚠️ Treating DP as irrelevant because it does not scale

Module rhythm

  • 1. Read the summary and why-it-matters section first.
  • 2. Work through concepts before rushing into practice.
  • 3. Use the checklist to verify real understanding, not just completion.

How to continue

Then shift from exact solutions to sampled returns.

Back to course overview →

How to use this page well

Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.