Module 7Reinforcement Learning

Actor-Critic Methods

Combine policy learning with value estimation for lower-variance updates.

Why this module matters

Most practical modern RL relies on actor-critic structure in one form or another.

Prerequisites

  • Policy gradients
  • Value functions

Learning objectives

  • Explain the actor/critic split
  • Use advantage estimates
  • Implement a simple A2C-style loop

Core concepts

Critic targets
Advantages
Bias-variance balance

Hands-on practice

  • Train a small actor-critic baseline

Expected output

An actor-critic implementation with tracked policy/value losses.

Study checklist

  • Explain the actor/critic split
  • Use advantage estimates
  • Implement a simple A2C-style loop

Common mistakes

  • ⚠️ Using noisy critic targets carelessly
  • ⚠️ Confusing value and advantage

Module rhythm

  • 1. Read the summary and why-it-matters section first.
  • 2. Work through concepts before rushing into practice.
  • 3. Use the checklist to verify real understanding, not just completion.

How to continue

Then move to PPO, the workhorse practical policy optimizer.

Back to course overview →

How to use this page well

Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.