🎯 Self-Attention Visualization

Interactive exploration of the attention mechanism

📝 Input Sentence

🎯 Query-Key-Value

Query (Q)
-
Keys (K)
All tokens
Values (V)
All tokens
Click a word to begin
Attention(Q, K, V) = softmax(QKT/√d) × V
How Self-Attention Works:
1. Each word creates Q, K, V vectors
2. Query asks: "What should I attend to?"
3. Keys answer: "Here's what I contain"
4. Score = Q ¡ K (dot product)
5. Softmax → attention weights

📊 Attention Weights Matrix

Step 1 / 5
📄 PyTorch Implementation
📖 Step Explanation
📤 Tensor Shapes