Module 7PyTorch Foundations
GPU Acceleration Basics
Write device-aware code and understand transfer costs and throughput.
Why this module matters
Hardware awareness is where toy notebooks stop and real ML engineering starts.
Prerequisites
- ▸ Training loop patterns
- ▸ Tensor basics
Learning objectives
- ▸ Move models and tensors across devices safely
- ▸ Understand MPS vs CUDA differences
- ▸ Measure speedups and bottlenecks
Core concepts
Host-device transfer
Pinned memory
Throughput vs latency
Hands-on practice
- ▸ Benchmark one step on CPU vs GPU
- ▸ Profile dataloader transfer time
- ▸ Test mixed precision support on available hardware
Expected output
A device-agnostic training script with basic timing instrumentation.
Study checklist
- ✅ Move models and tensors across devices safely
- ✅ Understand MPS vs CUDA differences
- ✅ Measure speedups and bottlenecks
Common mistakes
- ⚠️ Forgetting to move labels to device
- ⚠️ Synchronizing too often during timing
- ⚠️ Assuming MPS and CUDA behave identically
Module rhythm
- 1. Read the summary and why-it-matters section first.
- 2. Work through concepts before rushing into practice.
- 3. Use the checklist to verify real understanding, not just completion.
How to continue
Now apply all of this to convolutional models that actually solve vision tasks.
Back to course overview →How to use this page well
Treat each module as a compact learning system: understand the intuition, verify the concepts, do one hands-on task, then use the checklist and mistakes section to pressure-test your understanding.