↓
📍 Positional Encoding
↓
🔲 Encoder Block
Multi-Head Attention
↓
Feed-Forward Network
× N (Stack 6-96 blocks)
↓
📊 Output Projection
Select a Component
Click to exploreClick on any component in the architecture diagram to see its PyTorch implementation and details.