Module 11Reinforcement Learning
Capstone: PPO on MuJoCo
Apply everything on a harder continuous-control benchmark.
Why this module matters
This is where RL theory, training discipline, and engineering judgment all collide.
Prerequisites
- ▸ All previous RL modules
Learning objectives
- ▸ Train a PPO agent on MuJoCo
- ▸ Analyze reward curves and instability
- ▸ Compare scratch and library implementations
Core concepts
Continuous control
Reward scaling
Training instability analysis
Hands-on practice
- ▸ Train HalfCheetah-v4 or a similar environment and write a capstone report
Expected output
A full PPO continuous-control project and analysis report.
Study checklist
- ✅ Train a PPO agent on MuJoCo
- ✅ Analyze reward curves and instability
- ✅ Compare scratch and library implementations
Common mistakes
- ⚠️ Jumping to MuJoCo before mastering small environments
- ⚠️ Trusting one reward curve
- ⚠️ Ignoring normalization and reward scale
Module rhythm
- 1. Read the summary and why-it-matters section first.
- 2. Work through concepts before rushing into practice.
- 3. Use the checklist to verify real understanding, not just completion.
How to continue
From here, you can move into advanced RL research, alignment, or agentic decision systems.
Back to course overview →How to use this page well
Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.