Module 7Reinforcement Learning

Actor-Critic Methods

Combine policy learning with value estimation for lower-variance updates.

Why this module matters

Most practical modern RL relies on actor-critic structure in one form or another.

Prerequisites

▸ Policy gradients
▸ Value functions

Learning objectives

▸ Explain the actor/critic split
▸ Use advantage estimates
▸ Implement a simple A2C-style loop

Core concepts

Critic targets

Advantages

Bias-variance balance

Hands-on practice

▸ Train a small actor-critic baseline

Expected output

An actor-critic implementation with tracked policy/value losses.

Study checklist

✅ Explain the actor/critic split
✅ Use advantage estimates
✅ Implement a simple A2C-style loop

Common mistakes

⚠️ Using noisy critic targets carelessly
⚠️ Confusing value and advantage

Module rhythm

1. Read the summary and why-it-matters section first.
2. Work through concepts before rushing into practice.
3. Use the checklist to verify real understanding, not just completion.

How to continue

Then move to PPO, the workhorse practical policy optimizer.

Back to course overview →

How to use this page well

Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.