🔄 BERT vs GPT

Compare bidirectional (BERT) vs autoregressive (GPT) attention patterns

Click "Next Step" to see how each model processes the sentence token by token
đŸ”ļ
BERT
Bidirectional Encoder Representations from Transformers

đŸŽ¯ Training Task: Masked Language Modeling (MLM)

Predict the [MASK] token using context from both sides

📋 Best For:

  • Text classification
  • Named entity recognition
  • Question answering
  • Sentence similarity
Encoder-Only Bidirectional Understanding
đŸŸŖ
GPT
Generative Pre-trained Transformer

đŸŽ¯ Training Task: Next Token Prediction

Predict the next token using only previous context (left-to-right)

📋 Best For:

  • Text generation
  • Code completion
  • Reasoning & chatbots
  • Creative writing
Decoder-Only Autoregressive Generation
Step 0 of 6