Interactive architecture diagram β every upgrade from 2017 to 2026 in one view
Pre-RMSNorm + GQA with RoPE + SwiGLU FFN. Click on any block to see its details, formulas, and what it replaced.
Same high-level structure β different components in every slot. Hover over each upgrade to learn more.
Follow a tensor through every operation β see shapes, operations, and the residual highway.
x is saved before normalization and added back after each sublayer.
The norm never sits on the residual path β this creates a clean "gradient highway" for stable deep training.
Where do the parameters live in a modern transformer block? Adjust the model config to explore.
| Component | Shape | Params | % |
|---|