PyTorch Foundations
This course is designed for Python engineers who want to become dangerous with modern deep learning tooling, but without pretending that abstractions are enough. You start with tensors and end with a reproducible, GPU-aware training workflow that supports real projects.
How beginners should use this course
- ▸ Do the modules in order. They were designed to reduce cognitive load, not maximize novelty.
- ▸ Re-type the code, do not just read it. Muscle memory matters in PyTorch.
- ▸ Keep one running notebook of mistakes, fixes, and shape notes. That becomes your real learning asset.
- ▸ Finish the capstone before moving to Transformers. Otherwise the abstraction gap stays too high.
Mathematical Foundations
Tensors as N-dimensional arrays
A tensor generalizes scalars, vectors, and matrices into one consistent abstraction.
If you understand shape, rank, and broadcasting, most PyTorch code stops feeling magical.
Many deep learning bugs are just silent tensor-shape misunderstandings in disguise.
Backpropagation as chain rule
Autograd is just repeated chain rule application across a computational graph.
The model is not learning by magic. It is receiving gradient signals layer by layer.
When training breaks, you often need to reason about where the gradient is vanishing, exploding, or detached.
Gradient descent and optimization
The optimizer moves parameters in the direction that reduces loss, but the step size and geometry matter.
This is why learning rate choice often matters more than architecture choice in early experiments.
Momentum and adaptive optimizers change how quickly and smoothly training converges.
Detailed Modules
Tensor Fundamentals
Understand tensors as the core data structure behind every neural network operation.
You will learn
- ▸ Create tensors from Python lists, NumPy arrays, and random initializers
- ▸ Read shape, dtype, and device information without confusion
- ▸ Use indexing, slicing, reshape, permute, and broadcasting correctly
Hands-on practice
Write a small tensor workbook that reproduces matrix multiply, elementwise ops, and broadcasting edge cases by hand.
Expected output
A notebook that explains 10 core tensor operations with input/output shape annotations.
Autograd & Backpropagation
See how PyTorch builds computational graphs and computes gradients automatically.
You will learn
- ▸ Track which tensors require gradients and why
- ▸ Interpret .grad, .backward(), and detach() in practical training code
- ▸ Manually compute a simple derivative and compare it with autograd output
Hands-on practice
Build a one-layer linear regression example and print every gradient during training.
Expected output
A minimal script that demonstrates gradient accumulation, zeroing, and graph detachment.
Building Models with nn.Module
Learn the standard PyTorch abstraction for defining trainable models and reusable blocks.
You will learn
- ▸ Register layers and parameters correctly inside __init__
- ▸ Design clear forward() methods and inspect named parameters
- ▸ Compose small layers into larger reusable modules
Hands-on practice
Implement an MLP classifier twice: once in a flat class, once with reusable blocks.
Expected output
A clean nn.Module-based MLP with parameter counts and shape tracing.
Losses, Optimizers, and Schedulers
Understand how objective choice and parameter updates affect learning dynamics.
You will learn
- ▸ Use CrossEntropyLoss, MSELoss, and Huber loss in the right contexts
- ▸ Compare SGD, Adam, and AdamW on the same toy problem
- ▸ Apply step, cosine, and warmup schedules intentionally
Hands-on practice
Train the same model with three optimizers and plot the loss curves side by side.
Expected output
A short experiment report explaining optimizer and scheduler tradeoffs.
Data Pipeline Design
Build robust Dataset and DataLoader pipelines that do not become training bottlenecks.
You will learn
- ▸ Write custom Dataset classes for structured local data
- ▸ Use transforms, collate_fn, pin_memory, and num_workers effectively
- ▸ Debug shape and label issues before they poison training
Hands-on practice
Create a CIFAR-style image loader plus a custom collate function for variable metadata.
Expected output
A reusable data pipeline template that can be reused in later projects.
Training Loop Patterns
Move beyond toy examples into a training loop you would actually keep in a real project.
You will learn
- ▸ Separate train, validation, and checkpoint logic cleanly
- ▸ Track reproducibility with seeds, config dictionaries, and metrics logging
- ▸ Know when to use early stopping and when not to trust it
Hands-on practice
Refactor a messy training script into train_one_epoch(), evaluate(), and save_checkpoint() functions.
Expected output
A production-style training loop skeleton with metric logging and checkpoint restore.
GPU Acceleration Basics
Learn how to write device-aware code that runs on Mac, CUDA, and larger accelerators.
You will learn
- ▸ Move tensors and models safely across cpu, mps, and cuda devices
- ▸ Profile CPU-GPU transfer overhead and identify bottlenecks
- ▸ Avoid common mistakes around dtype, host-device sync, and batch sizing
Hands-on practice
Benchmark the same training step on CPU and GPU and explain the observed speed difference.
Expected output
A device-agnostic training script with simple timing instrumentation.
Convolutional Networks in Depth
Understand CNN mechanics before using ResNet-like architectures as black boxes.
You will learn
- ▸ Compute output shapes and receptive fields correctly
- ▸ Understand why batch norm and residual connections help optimization
- ▸ Build a small CNN and then deepen it into a ResNet-style model
Hands-on practice
Implement a CIFAR classifier with conv blocks, batch norm, dropout, and residual shortcuts.
Expected output
A CNN baseline and a ResNet-style upgrade with comparison metrics.
Mixed Precision and torch.compile
Add speed without losing training stability or turning your loop into mystery meat.
You will learn
- ▸ Use autocast and GradScaler correctly
- ▸ Understand fp16 versus bf16 tradeoffs across different hardware
- ▸ Know when torch.compile gives real wins and when it is not worth it
Hands-on practice
Benchmark one training run with and without AMP and compile warmup.
Expected output
A benchmark note with throughput, VRAM use, and any numerical issues observed.
Experiment Tracking Capstone
CapstoneWrap everything into a reproducible experiment workflow that another engineer can rerun.
You will learn
- ▸ Log metrics, configs, checkpoints, and artifacts systematically
- ▸ Compare multiple runs and document what changed
- ▸ Write a simple model card that explains scope and limitations
Hands-on practice
Run a small hyperparameter sweep on CIFAR-10 and compare the top three runs.
Expected output
A reproducible capstone run with logs, checkpoints, and a model card.
Common Pitfalls
Skipping shape checks
Beginners often trust the model too early. Print shapes aggressively until your mental model matches the tensors moving through the network.
Mixing train and eval mode
BatchNorm and dropout behave differently in training and evaluation. If you forget model.eval(), your validation numbers lie to you.
Changing too many variables at once
If you modify model, optimizer, scheduler, augmentations, and batch size together, you learn nothing from the result.
Trusting a single good run
A lucky seed can make a bad setup look competent. Save configs and compare more than one run whenever possible.
🏁 Capstone Project: CNN Classifier on CIFAR-10
Your goal is not just to train a model. It is to prove you can structure data loading, model design, optimization, logging, and evaluation into a coherent pipeline. Once you can do that, Transformers stop feeling like magic and start feeling like just another architecture class.