The Modern Transformer Block

Interactive architecture diagram β€” every upgrade from 2017 to 2026 in one view

πŸ—οΈ Modern Transformer Block β€” Click Any Component

Pre-RMSNorm + GQA with RoPE + SwiGLU FFN. Click on any block to see its details, formulas, and what it replaced.

RMSNorm
GQA + RoPE
SwiGLU FFN
Residual Add
RoPE

πŸ”€ 2017 Original vs 2026 Modern β€” Side by Side

Same high-level structure β€” different components in every slot. Hover over each upgrade to learn more.

πŸ”„ Step-by-Step Data Flow Through One Block

Follow a tensor through every operation β€” see shapes, operations, and the residual highway.

Pre-Norm pattern: Notice how the residual x is saved before normalization and added back after each sublayer. The norm never sits on the residual path β€” this creates a clean "gradient highway" for stable deep training.

πŸ“Š Parameter Count Breakdown

Where do the parameters live in a modern transformer block? Adjust the model config to explore.

ComponentShapeParams%