Module 5Reinforcement Learning
Deep Q-Networks
Move from tabular value learning to neural function approximation.
Why this module matters
DQN is the point where RL becomes modern deep RL engineering.
Prerequisites
- ▸ TD learning
- ▸ PyTorch basics
Learning objectives
- ▸ Understand replay buffers and target networks
- ▸ Train a neural Q-function
- ▸ Diagnose unstable value estimates
Core concepts
Replay
Target network
Q-value estimation
Hands-on practice
- ▸ Train DQN on CartPole or Pong-lite
Expected output
A scratch DQN baseline with reward curves.
Study checklist
- ✅ Understand replay buffers and target networks
- ✅ Train a neural Q-function
- ✅ Diagnose unstable value estimates
Common mistakes
- ⚠️ Using correlated online samples only
- ⚠️ No target network
- ⚠️ Over-interpreting one lucky run
Module rhythm
- 1. Read the summary and why-it-matters section first.
- 2. Work through concepts before rushing into practice.
- 3. Use the checklist to verify real understanding, not just completion.
How to continue
After value-based deep RL, switch to direct policy optimization.
Back to course overview →How to use this page well
Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.