Token
A token is the smallest unit of text a language model processes. Most modern models use subword tokenization, where common words map to single tokens and rare words split into multiple subtokens. As a rough rule, 1 token ≈ 0.75 English words ≈ 4 characters.
Tokens matter because models are billed (in cloud APIs) and constrained (in context window) per-token, not per-word. A 100K-token context fits roughly 75,000 English words. Code typically tokenizes denser than prose due to whitespace and operators.
Different model families use different tokenizers: GPT models use BPE; Llama uses SentencePiece; Mistral Nemo introduced the new Tekken tokenizer. Switching tokenizers between training and inference produces gibberish, which is why tools like llama.cpp ship with the model's tokenizer baked in.