RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /What is Local AI — And Why It Matters
  6. /Ch. 13
What is Local AI — And Why It Matters

13. Temperature and Sampling

Chapter 13 of 20 · 18 min
KEY INSIGHT

Temperature is a dial from "most predictable output" to "most creative output"—understanding it helps you get the behavior you want for different tasks rather than accepting whatever the defaults produce.

Understanding Temperature

Temperature controls how "random" the model's output is.

High temperature (e.g., 0.9-1.2):

  • More creative, varied output
  • Good for brainstorming, creative writing
  • Higher chance of unexpected (sometimes wrong) responses

Low temperature (e.g., 0.1-0.3):

  • More focused, deterministic output
  • Good for factual responses, code, structured tasks
  • More consistent across runs

Temperature = 0:

  • Greedy decoding—always picks the most likely next token
  • Deterministic but often lower quality (repetitive)

How It Works

At each step, the model produces a probability distribution over possible next tokens. With temperature = 1, sampling uses the natural probabilities. Lower temperature makes high-probability tokens more likely. Higher temperature flattens the distribution, giving low-probability tokens a chance.

Token probabilities (example):
"the": 0.15, "a": 0.08, "cat": 0.05, "dog": 0.04, ...

Temperature 0.1: "the" becomes ~0.9 probability
Temperature 1.0: Keep original distribution
Temperature 2.0: Almost uniform—any token is equally likely

Setting Temperature in Ollama

# Set temperature inline
ollama run llama3.2:7b "Write a poem about stars" --param temperature 0.9

# Or in Modelfile
echo 'PARAMETER temperature 0.7' >> Modelfile

Other Sampling Parameters

top_p (nucleus sampling): Controls the percentage of probability mass considered. top_p 0.9 means only tokens in the top 90% of probability mass are considered.

ollama run llama3.2:7b --param top_p 0.9 "Continue this story"

top_k: Limits to the top k most likely tokens. top_k 40 means only the 40 most likely tokens can be chosen.

ollama run llama3.2:7b --param top_k 40 "Explain recursion"

Typical values for creative tasks: temperature 0.8-1.0, top_p 0.9-1.0 Typical values for factual/coding: temperature 0.2-0.5, top_p 0.9

Common Issues

Too high temperature:

  • Nonsensical output
  • Repetition
  • Incoherence

Too low temperature:

  • Repetitive, formulaic responses
  • "Safe" but boring
  • May miss creative solutions

Interaction with top_p: Often best to set temperature OR top_p, not both. Default Ollama behavior is usually fine.

Practical Guidelines

Task Recommended Settings
Creative writing, brainstorming temp 0.8-1.0, top_p 0.95
Code generation temp 0.2-0.5, top_p 0.9
Factual Q&A temp 0.1-0.3, top_p 0.9
Summarization temp 0.3-0.5, top_p 0.9
Translation temp 0.2-0.4, top_p 0.9
EXERCISE

Take the same creative prompt and run it three times with Ollama: once with temperature 0.2, once with temperature 0.7, once with temperature 1.2. Compare the outputs for creativity, coherence, and variation. Then try the same with a factual question (e.g., "What is the capital of Brazil?")—notice how temperature affects accuracy.

← Chapter 12
System Prompts
Chapter 14 →
Context Windows