What this does

Streaming returns tokens one by one as they are generated instead of waiting for the full response. This provides a better user experience for chat applications and real-time tools.

Steps

Enable streaming in an Ollama API request. Set "stream": true in the JSON body.
```
curl -N http://localhost:11434/api/generate \
  -d '{"model": "llama3.2", "prompt": "Write a short poem", "stream": true}'
```
Expected: Tokens arrive incrementally as newline-delimited JSON objects.

Stream from the chat endpoint for multi-turn conversations.

curl -N http://localhost:11434/api/chat \
  -d '{"model": "llama3.2", "messages": [{"role": "user", "content": "Tell me a joke"}], "stream": true}'

Stream in Python, processing each chunk as it arrives.

import requests, json

response = requests.post("http://localhost:11434/api/generate",
    json={"model": "llama3.2", "prompt": "Write a haiku", "stream": True},
    stream=True)
for line in response.iter_lines():
    if line:
        chunk = json.loads(line)
        if chunk.get("response"):
            print(chunk["response"], end="", flush=True)
        if chunk.get("done"):
            print()
            print(f"Tokens: {chunk['eval_count']}, Duration: {chunk['eval_duration']/1e9:.2f}s")

Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

curl -N -s http://localhost:11434/api/generate \
  -d '{"model":"llama3.2","prompt":"Count to 5","stream":true}' \
  | python -c "import sys,json; [print(json.loads(l)['response'],end='',flush=True) for l in sys.stdin if l.strip()]"
# Expected: Characters appear one at a time, not all at once

Common failures

No streaming, response arrives all at once: Verify "stream": true is in the request body. Some clients default to stream: false.
Chunks arrive with delay: The first chunk includes model loading time. Keep the model loaded with a warm-up request first.
Connection closed prematurely: Network proxies may buffer streaming responses. Use --no-buffer with nginx or stream=True in Python.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

How to enable streaming responses for real-time output

What this does

Steps

Verification

Common failures

Operator checkpoint

Related guides