Deterministic Decoding
Deterministic decoding means same prompt → same output, every time. Achieved by setting temperature to 0 (always pick the highest-probability token) and pinning the random seed for any tiebreaks.
Sounds simple, isn't. Even at temperature 0, GPU floating-point non-associativity can produce different logits across runs (especially with batch-size variation), which can flip ties. True bit-exact reproducibility requires single-batch, deterministic kernels (CUBLAS_DETERMINISTIC, cuDNN deterministic mode), and pinned seed everywhere.
For local AI evaluation, "deterministic enough" usually means temperature 0 + single batch + same hardware/runtime version. Cross-runtime reproducibility (llama.cpp ↔ vLLM) is essentially never bit-exact even with identical sampling settings.
Related terms
Reviewed by Fredoline Eruo. See our editorial policy.