🔤 Byte-Pair Encoding (BPE)

Watch how BPE builds a subword vocabulary by merging frequent pairs

Step 0: Click "Next Step" to start the BPE algorithm

📝 Corpus (Tokenized Words)

🔗 Current Pair Frequencies

📚 Vocabulary

26
Vocab Size
0
Merges Done

💡 How BPE Works

  1. Split all words into characters
  2. Count frequency of adjacent pairs
  3. Merge the most frequent pair
  4. Add merged token to vocabulary
  5. Repeat until desired vocab size
Characters
Merged Token