🖼️

PyTorchBeginner

CNN Image Classifier / CNN 图像分类器

Build a ResNet-style convolutional image classifier on CIFAR-10, then evaluate, export, and understand the full training workflow end to end.
在 CIFAR-10 上构建一个 ResNet 风格卷积分类器，并完成评估、导出和端到端训练流程理解。

Dataset / 数据集

CIFAR-10

Classes / 类别数

Target / 目标

>90% acc

Train time / 训练时间

~15 min

Project Background / 项目背景

CNN image classification is one of the cleanest entry points into modern deep learning engineering. It sits at the intersection of data pipelines, model architecture, optimization, evaluation, and export, so it is the perfect place to learn how all the moving parts of training actually fit together.
CNN 图像分类是进入现代深度学习工程最干净的入口之一。它正好处在数据管线、模型结构、优化、评估和导出这些核心问题的交叉点上，因此非常适合用来理解完整训练系统是怎样协同工作的。

Problem it solves / 它要解决什么问题

The direct task is image classification on CIFAR-10, but the deeper problem is learning how a model turns pixels into layered visual features and how an engineering pipeline turns that model into a reliable training workflow. This page is really about making vision training legible, not just chasing one accuracy number.
直接任务当然是在 CIFAR-10 上做图像分类，但更深的问题是理解模型如何把像素逐层变成视觉特征，以及工程管线如何把这个模型变成稳定可用的训练工作流。这个页面真正要解决的，是让视觉训练过程变得可解释，而不是只追一个 accuracy 数字。

What you learn / 你会学到什么

▸ Residual connections and why they help deep CNN optimization / 残差连接为何能帮助深层 CNN 优化
▸ Data augmentation and its effect on generalization / 数据增强如何影响泛化能力
▸ Mixed precision and practical PyTorch training patterns / 混合精度与实用 PyTorch 训练模式
▸ Confusion-matrix based error analysis / 基于 confusion matrix 的错误分析
▸ Model export and deployment readiness / 模型导出与部署准备

Starter Code / 起始代码

The code below gives you the training scaffold. The most important thing is to understand why the residual block, optimizer, scheduler, and data transforms have to line up correctly.
下面这段代码给出训练骨架。真正关键的是理解为什么残差块、优化器、调度器和数据变换必须彼此配合正确。

class ResBlock(nn.Module):
    def __init__(self, in_ch, out_ch, stride=1):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, 3, stride=stride, padding=1, bias=False),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_ch, out_ch, 3, padding=1, bias=False),
            nn.BatchNorm2d(out_ch),
        )
        self.shortcut = nn.Identity() if stride == 1 and in_ch == out_ch else nn.Sequential(
            nn.Conv2d(in_ch, out_ch, 1, stride=stride, bias=False),
            nn.BatchNorm2d(out_ch),
        )
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        return self.relu(self.conv(x) + self.shortcut(x))

Code walkthrough / 代码要点解释

Residual path is not decoration / 残差路径不是装饰

The shortcut path keeps gradient flow alive through deeper stacks. Without it, optimization gets much harder as depth increases.
shortcut 路径会让梯度在更深网络中仍然保持流动，没有它，深层优化会明显更难。

Normalization and augmentation define the training regime / 归一化和增强决定训练制度

Many bad runs are not architecture failures, they are data pipeline failures. CIFAR-10 normalization and augmentation choices directly affect convergence.
很多失败训练不是架构失败，而是数据管线失败。CIFAR-10 的归一化和增强策略会直接影响收敛效果。

Scheduler timing matters / 调度器时机非常关键

OneCycleLR only works as intended if total steps match the actual training loop. Otherwise the learning-rate curve collapses too early.
只有当 OneCycleLR 的总 step 数与真实训练循环对齐时，它才会按设计工作，否则学习率曲线会过早崩塌。

The residual block is the conceptual center / 残差块是这个项目的概念中心

The two-convolution path extracts features, while the shortcut preserves a clean optimization route. This is what lets deeper CNNs improve representational capacity without becoming dramatically harder to train.
双卷积路径负责提取特征，shortcut 则负责保留一条更干净的优化路径。也正因为如此，更深的 CNN 才能在增强表征能力的同时，不至于训练难度陡增。

A real project includes export and sanity checks / 真正的项目必须包含导出与验证

Training success is not enough. Once you export the model, you need to verify that TorchScript or deployment inference still matches the original PyTorch behavior closely enough to trust.
训练成功还不够。模型一旦导出，你必须验证 TorchScript 或部署推理结果是否仍然和原始 PyTorch 行为足够一致，才能真正放心使用。

Full runnable code / 完整可运行代码

A compact PyTorch script for training a ResNet-style classifier on CIFAR-10. Save this as cnn_cifar10_train.py and install the listed dependencies for the project stack.
A compact PyTorch script for training a ResNet-style classifier on CIFAR-10. 可将下面代码保存为 cnn_cifar10_train.py，并安装对应项目依赖后直接运行。

Dependencies / 依赖

▸ python>=3.10
▸ torch
▸ torchvision

Run commands / 运行命令

pip install torch torchvision

python cnn_cifar10_train.py

File tree / 目录结构

cnn-classifier/
├── cnn_cifar10_train.py
├── data/
│   └── cifar-10-batches-py/
└── checkpoints/
    └── best_model.pt

import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms


device = 'cuda' if torch.cuda.is_available() else 'cpu'
BATCH_SIZE = 128
EPOCHS = 10
LR = 3e-4

train_tfms = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616)),
])

test_tfms = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616)),
])

train_ds = datasets.CIFAR10(root='data', train=True, download=True, transform=train_tfms)
test_ds = datasets.CIFAR10(root='data', train=False, download=True, transform=test_tfms)
train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True, num_workers=2)
test_loader = DataLoader(test_ds, batch_size=BATCH_SIZE, shuffle=False, num_workers=2)


class ResBlock(nn.Module):
    def __init__(self, in_ch, out_ch, stride=1):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, 3, stride=stride, padding=1, bias=False),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_ch, out_ch, 3, padding=1, bias=False),
            nn.BatchNorm2d(out_ch),
        )
        self.shortcut = nn.Identity() if stride == 1 and in_ch == out_ch else nn.Sequential(
            nn.Conv2d(in_ch, out_ch, 1, stride=stride, bias=False),
            nn.BatchNorm2d(out_ch),
        )
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        return self.relu(self.conv(x) + self.shortcut(x))


class SmallResNet(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.stem = nn.Sequential(
            nn.Conv2d(3, 64, 3, padding=1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
        )
        self.layer1 = nn.Sequential(ResBlock(64, 64), ResBlock(64, 64))
        self.layer2 = nn.Sequential(ResBlock(64, 128, stride=2), ResBlock(128, 128))
        self.layer3 = nn.Sequential(ResBlock(128, 256, stride=2), ResBlock(256, 256))
        self.pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Linear(256, num_classes)

    def forward(self, x):
        x = self.stem(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.pool(x).flatten(1)
        return self.fc(x)


model = SmallResNet().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=LR, weight_decay=1e-4)

for epoch in range(EPOCHS):
    model.train()
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        logits = model(images)
        loss = criterion(logits, labels)
        optimizer.zero_grad(set_to_none=True)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()

    model.eval()
    correct = total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            preds = model(images).argmax(dim=1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)
    print(f"epoch={epoch+1} acc={correct/total:.4f}")

Build Steps / 构建步骤

Data Pipeline / 数据管线

Use CIFAR-10 with train augmentation, normalization, and batch visualization before training.
使用 CIFAR-10，并在训练前完成数据增强、归一化和 batch 可视化检查。

Build the Model / 构建模型

Stack ResBlocks with downsampling stages and a final classifier head.
堆叠带下采样阶段的 ResBlock，并接上最终分类头。

Training Loop / 训练循环

Train with AdamW, OneCycleLR, mixed precision, and gradient clipping.
使用 AdamW、OneCycleLR、混合精度和梯度裁剪进行训练。

Evaluation / 评估

Track top-1 accuracy, confusion matrix, and per-class failure cases.
跟踪 top-1 accuracy、confusion matrix 和类别级失败案例。

Logging and Export / 日志与导出

Use TensorBoard and export the trained model to TorchScript.
使用 TensorBoard，并把训练好的模型导出到 TorchScript。

Common Pitfalls / 常见坑

⚠️ Wrong normalization stats / 归一化统计值错误

Incorrect CIFAR-10 mean/std silently lowers final accuracy.
如果 CIFAR-10 的 mean/std 用错，最终精度会静默下降。

⚠️ Broken residual shortcut / 残差 shortcut 写坏

A wrong projection shortcut breaks optimization and slows convergence badly.
如果 projection shortcut 写错，会显著破坏优化稳定性。

⚠️ Scheduler timing mismatch / 调度器步进时机不对

OneCycleLR must match total training steps, not just epoch count.
OneCycleLR 需要匹配总 step 数，而不是只看 epoch 数。

Success Criteria / 完成标准

✅ Validation accuracy reaches a strong baseline / 验证集准确率达到可靠 baseline
✅ Confusion matrix and class-level errors are analyzed / 完成 confusion matrix 和类别级错误分析
✅ TensorBoard logs are complete / TensorBoard 日志完整
✅ Exported model passes inference sanity check / 导出模型通过推理检查