🔤 Tokenizer Visualization
See how text is split into tokens for language models
📝 Enter Text
Hello, how are you doing today? I'm learning about tokenization!
GPT-2 (BPE)
Character
Word
🧩 Tokens
Word start
Continuation
Punctuation
Space/Special
🔢 Token IDs
[...]
0
Characters
0
Tokens
0
Chars/Token
Fun fact:
GPT-2/3/4 use ~50,000 tokens. On average, 1 token ≈ 4 characters or ¾ of a word in English.