Data & datasets

One-Hot Encoding

One-hot encoding converts categorical data (e.g., token IDs) into binary vectors where only one element is 'hot' (1) and all others are 'cold' (0). In local AI, tokenizers map text to integer IDs; one-hot encoding is the intermediate step before embedding lookup. For a vocabulary of size V, each token becomes a V-dimensional vector with a single 1 at the token's index. This representation is sparse and memory-intensive—embedding layers replace it with dense vectors via a matrix multiplication, avoiding the full one-hot vector in practice.

Deeper dive

One-hot encoding is rarely materialized in modern LLM inference. Instead, token IDs are used directly as indices into an embedding matrix. For example, with a vocabulary of 32,000 tokens, a one-hot vector would be 32,000 floats (128 KB at FP32) per token—impractical for a 4096-token context (512 MB). The embedding layer performs a gather operation: it selects the row of the embedding matrix corresponding to the token ID. This is equivalent to multiplying the one-hot vector by the embedding matrix, but avoids storing the one-hot vector. Operators may encounter one-hot encoding in text preprocessing for custom models (e.g., bag-of-words classifiers) or in understanding tokenizer outputs, but not in transformer inference pipelines.

Practical example

Consider Llama 3.1's tokenizer with a vocabulary of 128,000 tokens. A one-hot vector for token ID 42 would be a 128,000-element vector with a 1 at position 42. Storing this for a single token at FP32 takes 128,000 × 4 bytes = 512 KB. For a 4096-token context, that's 2 GB of one-hot vectors—far exceeding VRAM. The embedding layer avoids this by using a lookup table: the token ID directly indexes into a 128,000 × 4096 matrix (the embedding dimension), consuming only 4 bytes per token for the ID itself.

Workflow example

When you run ollama run llama3.1:8b and type a prompt, the runtime tokenizes the text into integer IDs. These IDs are fed to the embedding layer, which performs a gather operation—not one-hot encoding. You'll never see one-hot vectors in the output. However, if you use Hugging Face Transformers' BertTokenizer and inspect the input_ids, you can manually create one-hot vectors with torch.nn.functional.one_hot(input_ids, num_classes=vocab_size) for educational purposes, but this is not part of the inference pipeline.

Reviewed by Fredoline Eruo. See our editorial policy.