📝 Input Tokens
GLM-5 Routing Steps:
1.
2.
3.
4.
5.
1.
Sigmoid(logits): Scores experts independently [0,1].2.
+ Bias: Adds load-balancing bias for selection.3.
Top-K: Selects top 2 of 8 experts (in real GLM-5: 8 of 256).4.
Weights: Uses original sigmoid scores (no bias) to weight outputs.5.
Shared Expert: ALWAYS added to the final output.
Routed Experts Top-2 Selected
Sigmoid Router Scores
Raw Sigmoid Score
With Bias (Used for Selection)