đ Corpus (Tokenized Words)
đ Current Pair Frequencies
đ Vocabulary
26
Vocab Size
0
Merges Done
đĄ How BPE Works
- Split all words into characters
- Count frequency of adjacent pairs
- Merge the most frequent pair
- Add merged token to vocabulary
- Repeat until desired vocab size
Characters
Merged Token