Module 5Reinforcement Learning

Deep Q-Networks

Move from tabular value learning to neural function approximation.

Why this module matters

DQN is the point where RL becomes modern deep RL engineering.

Prerequisites

  • TD learning
  • PyTorch basics

Learning objectives

  • Understand replay buffers and target networks
  • Train a neural Q-function
  • Diagnose unstable value estimates

Core concepts

Replay
Target network
Q-value estimation

Hands-on practice

  • Train DQN on CartPole or Pong-lite

Expected output

A scratch DQN baseline with reward curves.

Study checklist

  • Understand replay buffers and target networks
  • Train a neural Q-function
  • Diagnose unstable value estimates

Common mistakes

  • ⚠️ Using correlated online samples only
  • ⚠️ No target network
  • ⚠️ Over-interpreting one lucky run

Module rhythm

  • 1. Read the summary and why-it-matters section first.
  • 2. Work through concepts before rushing into practice.
  • 3. Use the checklist to verify real understanding, not just completion.

How to continue

After value-based deep RL, switch to direct policy optimization.

Back to course overview →

How to use this page well

Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.