Module 3Reinforcement Learning
Monte Carlo Methods
Estimate values from full trajectories without bootstrapping.
Why this module matters
Monte Carlo methods reveal the variance side of RL before TD methods step in.
Prerequisites
- ▸ Returns and MDPs
Learning objectives
- ▸ Compare first-visit and every-visit estimates
- ▸ Understand variance across episodes
- ▸ Connect sampling to unbiased estimation
Core concepts
Episode return
Unbiased estimation
Variance
Hands-on practice
- ▸ Estimate state values from sampled episodes
Expected output
A notebook comparing Monte Carlo estimates across seeds.
Study checklist
- ✅ Compare first-visit and every-visit estimates
- ✅ Understand variance across episodes
- ✅ Connect sampling to unbiased estimation
Common mistakes
- ⚠️ Expecting low-variance learning from sparse episodic returns
Module rhythm
- 1. Read the summary and why-it-matters section first.
- 2. Work through concepts before rushing into practice.
- 3. Use the checklist to verify real understanding, not just completion.
How to use this page well
Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.