What this does

Streaming APIs return Server-Sent Events (SSE) with token-by-token data. This guide covers parsing, accumulating, and handling errors in both Python and JavaScript applications.

Steps

Parse SSE chunks in Python. Each line is a JSON object with "response" and "done" fields.

import requests, json

def stream_completion(model, prompt):
    full_response = []
    with requests.post("http://localhost:11434/api/generate",
            json={"model": model, "prompt": prompt, "stream": True},
            stream=True) as r:
        for line in r.iter_lines():
            if not line:
                continue
            chunk = json.loads(line)
            if chunk.get("response"):
                full_response.append(chunk["response"])
                yield chunk["response"]
            if chunk.get("done"):
                print(f"\nTotal tokens: {chunk['eval_count']}")
    return "".join(full_response)

for token in stream_completion("llama3.2", "Explain streaming"):
    print(token, end="", flush=True)

Handle SSE in JavaScript (browser/Node.js).

const response = await fetch('http://localhost:11434/api/generate', {
  method: 'POST',
  body: JSON.stringify({ model: 'llama3.2', prompt: 'Hello', stream: true }),
  headers: { 'Content-Type': 'application/json' }
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n');
  buffer = lines.pop(); // Keep incomplete line
  for (const line of lines) {
    if (!line.trim()) continue;
    const chunk = JSON.parse(line);
    if (chunk.response) process.stdout.write(chunk.response);
  }
}

Accumulate chunks for the final response. Concatenate all response fields and strip the trailing newline.

Handle streaming errors with timeout and retry.

import signal

class TimeoutError(Exception): pass
def handler(signum, frame): raise TimeoutError()

signal.signal(signal.SIGALRM, handler)
signal.alarm(30)  # 30 second timeout
try:
    for token in stream_completion("llama3.2", "Long prompt"):
        pass
except TimeoutError:
    print("Stream timed out — consider shorter prompts")

Verification

# Expected: Tokens streamed to stdout one by one, final accumulated response equals non-streamed output
# Compare: streamed text matches non-streamed response

Common failures

Partial JSON at buffer boundary: SSE messages may split across chunks. Always buffer and split by \n.
Missing done signal: The stream may close without a final {"done": true}. Set a timeout as a safety net.
Memory leak from unbounded accumulation: For very long responses, periodically flush accumulated text to disk or a database.

How to handle streaming response chunks in your application

What this does

Steps

Verification

Common failures

Related guides