đ BERT vs GPT
Compare bidirectional (BERT) vs autoregressive (GPT) attention patterns
Click "Next Step" to see how each model processes the sentence token by token
đ¯ Training Task: Masked Language Modeling (MLM)
Predict the [MASK] token using context from both sides
đ Best For:
- Text classification
- Named entity recognition
- Question answering
- Sentence similarity
Encoder-Only
Bidirectional
Understanding
đ¯ Training Task: Next Token Prediction
Predict the next token using only previous context (left-to-right)
đ Best For:
- Text generation
- Code completion
- Reasoning & chatbots
- Creative writing
Decoder-Only
Autoregressive
Generation
Step 0 of 6