Module 3Reinforcement Learning

Monte Carlo Methods

Estimate values from full trajectories without bootstrapping.

Why this module matters

Monte Carlo methods reveal the variance side of RL before TD methods step in.

Prerequisites

  • Returns and MDPs

Learning objectives

  • Compare first-visit and every-visit estimates
  • Understand variance across episodes
  • Connect sampling to unbiased estimation

Core concepts

Episode return
Unbiased estimation
Variance

Hands-on practice

  • Estimate state values from sampled episodes

Expected output

A notebook comparing Monte Carlo estimates across seeds.

Study checklist

  • Compare first-visit and every-visit estimates
  • Understand variance across episodes
  • Connect sampling to unbiased estimation

Common mistakes

  • ⚠️ Expecting low-variance learning from sparse episodic returns

Module rhythm

  • 1. Read the summary and why-it-matters section first.
  • 2. Work through concepts before rushing into practice.
  • 3. Use the checklist to verify real understanding, not just completion.

How to continue

Now add bootstrapping and enter the TD family.

Back to course overview →

How to use this page well

Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.