Ollama truncates input — default context length is only 2048

(no error — long inputs get silently truncated)

By Fredoline Eruo · Last verified Jun 12, 2026

Cause

Ollama's default num_ctx is 2048 tokens, regardless of what the underlying model supports. A model that "supports 128K context" still defaults to 2K when run via ollama run. Your long prompts get silently truncated.

Solution

Set context per-request via API:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "...",
  "options": { "num_ctx": 32768 }
}'

Or create a Modelfile to make it stick for a model:

# Save as Modelfile
FROM llama3.1:8b
PARAMETER num_ctx 32768

ollama create llama3.1:8b-32k -f Modelfile
ollama run llama3.1:8b-32k

Tradeoff: higher context = more VRAM via KV cache. A 7B model with 32K context needs ~12 GB VRAM (vs ~5 GB at 2K). Use Will it run? to find your sweet spot.

Set globally via env (affects all Ollama models in this session):

OLLAMA_NUM_CTX=32768 ollama serve

Related errors

Did this fix it?

If your case was different, email Contact support with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.