Module 7PyTorch Foundations

GPU Acceleration Basics

Write device-aware code and understand transfer costs and throughput.

Why this module matters

Hardware awareness is where toy notebooks stop and real ML engineering starts.

Prerequisites

  • Training loop patterns
  • Tensor basics

Learning objectives

  • Move models and tensors across devices safely
  • Understand MPS vs CUDA differences
  • Measure speedups and bottlenecks

Core concepts

Host-device transfer
Pinned memory
Throughput vs latency

Hands-on practice

  • Benchmark one step on CPU vs GPU
  • Profile dataloader transfer time
  • Test mixed precision support on available hardware

Expected output

A device-agnostic training script with basic timing instrumentation.

Study checklist

  • Move models and tensors across devices safely
  • Understand MPS vs CUDA differences
  • Measure speedups and bottlenecks

Common mistakes

  • ⚠️ Forgetting to move labels to device
  • ⚠️ Synchronizing too often during timing
  • ⚠️ Assuming MPS and CUDA behave identically

Module rhythm

  • 1. Read the summary and why-it-matters section first.
  • 2. Work through concepts before rushing into practice.
  • 3. Use the checklist to verify real understanding, not just completion.

How to continue

Now apply all of this to convolutional models that actually solve vision tasks.

Back to course overview →

How to use this page well

Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.