🔄 BERT vs GPT

Compare bidirectional (BERT) vs autoregressive (GPT) attention patterns

Click "Next Step" to see how each model processes the sentence token by token

🔶

BERT

Bidirectional Encoder Representations from Transformers

🎯 Training Task: Masked Language Modeling (MLM)

Predict the [MASK] token using context from both sides

📋 Best For:

Text classification
Named entity recognition
Question answering
Sentence similarity

Encoder-Only Bidirectional Understanding

🟣

GPT

Generative Pre-trained Transformer

🎯 Training Task: Next Token Prediction

Predict the next token using only previous context (left-to-right)

📋 Best For:

Text generation
Code completion
Reasoning & chatbots
Creative writing

Decoder-Only Autoregressive Generation

Step 0 of 6