Module 5Reinforcement Learning

Deep Q-Networks

Move from tabular value learning to neural function approximation.

Why this module matters

DQN is the point where RL becomes modern deep RL engineering.

Prerequisites

▸ TD learning
▸ PyTorch basics

Learning objectives

▸ Understand replay buffers and target networks
▸ Train a neural Q-function
▸ Diagnose unstable value estimates

Core concepts

Replay

Target network

Q-value estimation

Hands-on practice

▸ Train DQN on CartPole or Pong-lite

Expected output

A scratch DQN baseline with reward curves.

Study checklist

✅ Understand replay buffers and target networks
✅ Train a neural Q-function
✅ Diagnose unstable value estimates

Common mistakes

⚠️ Using correlated online samples only
⚠️ No target network
⚠️ Over-interpreting one lucky run

Module rhythm

1. Read the summary and why-it-matters section first.
2. Work through concepts before rushing into practice.
3. Use the checklist to verify real understanding, not just completion.

How to continue

After value-based deep RL, switch to direct policy optimization.

Back to course overview →

How to use this page well

Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.