How to set temperature to zero for deterministic responses
Ollama installed
What this does
Setting temperature to 0 makes the model always choose the most probable next token. This produces reproducible, consistent outputs ideal for factual queries, code generation, and testing.
Steps
Set temperature to 0 via API.
curl -s http://localhost:11434/api/generate \ -d '{"model": "llama3.2", "prompt": "What is 2+2?", "options": {"temperature": 0}, "stream": false}' \ | jq -r '.response'Add a fixed seed for full reproducibility. Some models support
seedto make sampling deterministic even at non-zero temperatures.curl -s http://localhost:11434/api/generate \ -d '{"model": "llama3.2", "prompt": "Explain gravity", "options": {"temperature": 0, "seed": 42}, "stream": false}'Set in Ollama interactive session.
ollama run llama3.2 /set parameter temperature 0Verify determinism by running the same prompt twice.
# First call curl -s ... -d '{"options":{"temperature":0}}' | jq -r '.response' > out1.txt # Second call curl -s ... -d '{"options":{"temperature":0}}' | jq -r '.response' > out2.txt # Compare fc out1.txt out2.txt
Verification
fc out1.txt out2.txt
# Expected output: "FC: no differences encountered" — both responses are byte-for-byte identical
Common failures
- Outputs still differ at temperature 0: Some models have non-deterministic CUDA kernels. Set environment variable
CUBLAS_WORKSPACE_CONFIG=:4096:8for CUDA determinism. seedparameter ignored: Not all models support seed. Check model documentation or usetemperature=0alone.- Too deterministic for creative tasks: Use temperature 0 only for factual recall, code, and math. Creative tasks need non-zero temperature.
Operator checkpoint
Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.
Operator checkpoint
Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.