Module 7PyTorch Foundations

GPU Acceleration Basics

Write device-aware code and understand transfer costs and throughput.

Why this module matters

Hardware awareness is where toy notebooks stop and real ML engineering starts.

Prerequisites

▸ Training loop patterns
▸ Tensor basics

Learning objectives

▸ Move models and tensors across devices safely
▸ Understand MPS vs CUDA differences
▸ Measure speedups and bottlenecks

Core concepts

Host-device transfer

Pinned memory

Throughput vs latency

Hands-on practice

▸ Benchmark one step on CPU vs GPU
▸ Profile dataloader transfer time
▸ Test mixed precision support on available hardware

Expected output

A device-agnostic training script with basic timing instrumentation.

Study checklist

✅ Move models and tensors across devices safely
✅ Understand MPS vs CUDA differences
✅ Measure speedups and bottlenecks

Common mistakes

⚠️ Forgetting to move labels to device
⚠️ Synchronizing too often during timing
⚠️ Assuming MPS and CUDA behave identically

Module rhythm

1. Read the summary and why-it-matters section first.
2. Work through concepts before rushing into practice.
3. Use the checklist to verify real understanding, not just completion.

How to continue

Now apply all of this to convolutional models that actually solve vision tasks.

Back to course overview →

How to use this page well

Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.