Natural language processing

N-gram

An n-gram is a contiguous sequence of n items (usually tokens or characters) from a text. In local AI, n-grams appear in tokenization and language modeling: the tokenizer splits text into tokens, and the model predicts the next token based on the previous n-1 tokens. For operators, n-gram size affects context window usage and generation speed—larger n captures more context but increases memory and compute per token.

Deeper dive

N-grams are foundational to statistical language models and still relevant in neural models via tokenization. A unigram (n=1) treats each token independently; a bigram (n=2) considers the previous token; a trigram (n=3) considers the previous two. Modern LLMs use transformer architectures with attention over the full context, but tokenizers like BPE or SentencePiece still rely on n-gram statistics to build subword vocabularies. For operators, n-gram size in tokenization determines vocabulary size and compression ratio—larger n yields fewer tokens per word but a larger embedding table, impacting VRAM. In practice, most LLMs use subword tokenizers with n-gram-like merges (e.g., GPT-2 tokenizer uses byte-pair encoding with n-gram frequencies).

Practical example

When running Llama 3.1 8B via llama.cpp, the tokenizer uses a BPE model with a vocabulary of 128k tokens built from n-gram statistics. A word like 'unbelievable' might be split into ['un', 'belie', 'vable'] (3 tokens) vs. a character-level tokenizer that would use 12 tokens. Fewer tokens mean faster generation and less VRAM for the same context length.

Workflow example

In LM Studio, when you load a model, the tokenizer's n-gram-based vocabulary is loaded into VRAM. You can inspect tokenization by typing text in the chat box and seeing the token count—e.g., 'Hello world' might be 2 tokens. If you see unexpected splits, it's due to the n-gram merge rules learned during training. In Ollama, you can use ollama run llama3.1:8b and then /show info to see the model's tokenizer details.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work