đ Input Sentence
đŻ Query-Key-Value
Query (Q)
-
Keys (K)
All tokens
Values (V)
All tokens
Click a word to begin
Attention(Q, K, V) = softmax(QKT/âd) Ă V
How Self-Attention Works:
1. Each word creates Q, K, V vectors
2. Query asks: "What should I attend to?"
3. Keys answer: "Here's what I contain"
4. Score = Q ¡ K (dot product)
5. Softmax â attention weights
1. Each word creates Q, K, V vectors
2. Query asks: "What should I attend to?"
3. Keys answer: "Here's what I contain"
4. Score = Q ¡ K (dot product)
5. Softmax â attention weights