🏗️ CNN Architecture Explorer

Click on layers to learn about each component

Image
32×32
Input
(3, 32, 32)
Conv1
3×3
Conv2d
(32, 32, 32)
BN
BatchNorm
ReLU
Activation
Pool
2×2
MaxPool
(32, 16, 16)
Conv2
3×3
Conv2d
(64, 16, 16)
BN
ReLU
Pool
2×2
MaxPool
(64, 8, 8)
Conv3
3×3
Conv2d
(128, 8, 8)
BN
ReLU
Pool
MaxPool
(128, 4, 4)
Flatten
Reshape
(2048)
FC1
Linear
(256)
ReLU
Logits
Output
(10)
Input Image
Convolution
BatchNorm
ReLU
Pooling
Fully Connected

📋 Layer Details

👆 Click a Layer
Click on any layer in the architecture above to see its details, parameters, and PyTorch code.
📥 Input Image
The input to a CNN is an image represented as a 3D tensor. For CIFAR-10, images are 32×32 pixels with 3 color channels (RGB).
# Shape: (batch_size, 3, 32, 32)
# Channels: Red, Green, Blue
🔲 Convolution Layer 1
First conv layer detects basic features like edges and gradients. 32 filters slide over the input, each producing a feature map.
nn.Conv2d(3, 32, kernel_size=3, padding=1)
🔲 Convolution Layer 2
Second conv layer combines lower-level features into more complex patterns. 64 filters detect textures and simple shapes.
nn.Conv2d(32, 64, kernel_size=3, padding=1)
🔲 Convolution Layer 3
Deeper layers detect high-level features like object parts. 128 filters capture complex patterns specific to the classes.
nn.Conv2d(64, 128, kernel_size=3, padding=1)
📊 Batch Normalization
Normalizes activations to have zero mean and unit variance. Speeds up training, allows higher learning rates, and acts as regularization.
nn.BatchNorm2d(num_features)
⚡ ReLU Activation
Introduces non-linearity: f(x) = max(0, x). Simple, fast, and avoids vanishing gradients (for positive values).
nn.ReLU(inplace=True)
📉 Max Pooling
Reduces spatial dimensions by half. Takes maximum value in each 2×2 region, providing translation invariance.
nn.MaxPool2d(kernel_size=2, stride=2)
📏 Flatten
Converts 3D feature maps to 1D vector for fully connected layers. (128, 4, 4) → (2048).
nn.Flatten()
🔗 Fully Connected Layer
Combines all features to make final decision. Each neuron connects to all inputs from the flattened layer.
nn.Linear(2048, 256)
🎯 Output Layer
Final layer outputs logits for each class. For CIFAR-10, outputs 10 values (one per class). Apply softmax for probabilities.
nn.Linear(256, 10)

📊 Network Statistics

Layer Output Shape Parameters
Input (3, 32, 32) 0
Conv1 + BN (32, 32, 32) 896 + 64
Pool1 (32, 16, 16) 0
Conv2 + BN (64, 16, 16) 18,496 + 128
Pool2 (64, 8, 8) 0
Conv3 + BN (128, 8, 8) 73,856 + 256
Pool3 (128, 4, 4) 0
FC1 (256) 524,544
FC2 (Output) (10) 2,570
Total ~620K