Configuration
Verified by owner
Ollama truncates input — default context length is only 2048
(no error — long inputs get silently truncated)
By Fredoline Eruo · Last verified Jun 12, 2026
Cause
Ollama's default num_ctx is 2048 tokens, regardless of what the underlying model supports. A model that "supports 128K context" still defaults to 2K when run via ollama run. Your long prompts get silently truncated.
Solution
Set context per-request via API:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "...",
"options": { "num_ctx": 32768 }
}'
Or create a Modelfile to make it stick for a model:
# Save as Modelfile
FROM llama3.1:8b
PARAMETER num_ctx 32768
ollama create llama3.1:8b-32k -f Modelfile
ollama run llama3.1:8b-32k
Tradeoff: higher context = more VRAM via KV cache. A 7B model with 32K context needs ~12 GB VRAM (vs ~5 GB at 2K). Use Will it run? to find your sweet spot.
Set globally via env (affects all Ollama models in this session):
OLLAMA_NUM_CTX=32768 ollama serve
Related errors
Did this fix it?
If your case was different, email Contact support with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.