CNN Architecture Visualization

👆 Click a Layer

Click on any layer in the architecture above to see its details, parameters, and PyTorch code.

📥 Input Image

The input to a CNN is an image represented as a 3D tensor. For CIFAR-10, images are 32×32 pixels with 3 color channels (RGB).

# Shape: (batch_size, 3, 32, 32)
# Channels: Red, Green, Blue

🔲 Convolution Layer 1

First conv layer detects basic features like edges and gradients. 32 filters slide over the input, each producing a feature map.

nn.Conv2d(3, 32, kernel_size=3, padding=1)

🔲 Convolution Layer 2

Second conv layer combines lower-level features into more complex patterns. 64 filters detect textures and simple shapes.

nn.Conv2d(32, 64, kernel_size=3, padding=1)

🔲 Convolution Layer 3

Deeper layers detect high-level features like object parts. 128 filters capture complex patterns specific to the classes.

nn.Conv2d(64, 128, kernel_size=3, padding=1)

📊 Batch Normalization

Normalizes activations to have zero mean and unit variance. Speeds up training, allows higher learning rates, and acts as regularization.

nn.BatchNorm2d(num_features)

⚡ ReLU Activation

Introduces non-linearity: f(x) = max(0, x). Simple, fast, and avoids vanishing gradients (for positive values).

nn.ReLU(inplace=True)

📉 Max Pooling

Reduces spatial dimensions by half. Takes maximum value in each 2×2 region, providing translation invariance.

nn.MaxPool2d(kernel_size=2, stride=2)

📏 Flatten

Converts 3D feature maps to 1D vector for fully connected layers. (128, 4, 4) → (2048).

nn.Flatten()

🔗 Fully Connected Layer

Combines all features to make final decision. Each neuron connects to all inputs from the flattened layer.

nn.Linear(2048, 256)

🎯 Output Layer

Final layer outputs logits for each class. For CIFAR-10, outputs 10 values (one per class). Apply softmax for probabilities.

nn.Linear(256, 10)

🏗️ CNN Architecture Explorer