Interactive exploration β rotation encodes position, dot products encode relative distance
RoPE rotates each 2D pair of a vector by an angle proportional to position. The same vector at different positions points in different directions.
Each dimension pair i gets a different rotation frequency. Low indices rotate fast (local patterns), high indices rotate slow (global patterns).
Heatmap showing rotation angle (mod 2Ο) for each dimension pair at each position. High-frequency pairs cycle rapidly; low-frequency pairs change slowly.
A head_dim=8 vector split into 4 pairs. Each pair rotates at its own frequency β watch how position affects each pair differently.
The magic of RoPE: when Q at position m and K at position n are both rotated, their dot product depends only on (m β n) β the relative distance.
With random Q and K vectors, RoPE causes a natural decay in expected dot product as relative distance increases β a soft locality bias emerges without being explicitly programmed.