Logits
Logits are the raw, unnormalized scores output by the final linear layer of a transformer model, before the softmax function converts them into probabilities. Each token in the vocabulary gets a logit value, and higher logits indicate higher likelihood. Operators encounter logits when adjusting sampling parameters like temperature or top-k, which directly modify logits to control generation randomness.
Deeper dive
In transformer language models, the final layer is typically a linear projection that maps the hidden state to a vector of size equal to the vocabulary. This vector contains logits. The softmax function exponentiates and normalizes logits to produce a probability distribution over tokens. Temperature scaling divides logits by a temperature value before softmax: lower temperature sharpens the distribution (more deterministic), higher temperature flattens it (more random). Top-k sampling keeps only the k highest logits and sets the rest to negative infinity before softmax. Top-p (nucleus) sampling selects the smallest set of tokens whose cumulative probability exceeds p, which is computed from the softmax of logits. Understanding logits is key to controlling output diversity and coherence.
Practical example
When running Llama 3.1 8B via llama.cpp with --temp 0.7, the runtime divides each logit by 0.7 before softmax, increasing randomness. With --top-k 40, only the 40 highest logits survive. A logit of 10.0 for token 'A' vs 9.5 for 'B' means 'A' is more likely, but after temperature scaling, the difference may shrink or grow.
Workflow example
In Ollama, you set temperature in the Modelfile: PARAMETER temperature 0.8. In LM Studio, the slider for temperature adjusts logit scaling. In Hugging Face Transformers, you access logits via outputs.logits from the model's forward pass. In vLLM, sampling parameters like temperature and top_p are passed to the generate call, affecting logits internally.
Reviewed by Fredoline Eruo. See our editorial policy.