What this does

Streaming yields LLM output token-by-token as it is generated, enabling real-time display in UIs and reducing perceived latency for end users.

Steps

Set up a chain normally. Use LCEL composition.

from langchain_ollama import ChatOllama
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser

llm = ChatOllama(model="llama3.2", temperature=0.7, streaming=True)
prompt = ChatPromptTemplate.from_template("Tell me a short story about {topic}")

chain = prompt | llm | StrOutputParser()

Stream tokens with .stream(). Iterate over the generator.

for chunk in chain.stream({"topic": "a brave robot"}):
    print(chunk, end="", flush=True)
# Output: Each token printed as it arrives

Collect streaming chunks into a complete response. Useful when you need both streaming display and the full result.

full_response = ""
for chunk in chain.stream({"topic": "space exploration"}):
    full_response += chunk
    print(chunk, end="", flush=True)

print("\n--- Full response ---")
print(full_response)

Stream in async context. Use .astream() for async applications.

import asyncio

async def stream_async():
    async for chunk in chain.astream({"topic": "ocean life"}):
        print(chunk, end="", flush=True)

asyncio.run(stream_async())

Handle streaming in a FastAPI endpoint.

from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

@app.get("/generate/{topic}")
async def generate(topic: str):
    async def event_stream():
        async for chunk in chain.astream({"topic": topic}):
            yield f"data: {chunk}\n\n"
    return StreamingResponse(event_stream(), media_type="text/event-stream")

Verification

python -c "
from langchain_ollama import ChatOllama
from langchain.schema.output_parser import StrOutputParser
llm = ChatOllama(model='llama3.2', streaming=True)
chain = llm | StrOutputParser()
count = 0
for chunk in chain.stream({'input': 'Count to 5'}):
    count += 1
print(f'Streamed {count} chunks')
# Expected: Streamed >1 chunks
"

Common failures

streaming=True ignored in non-streaming calls. If you call .invoke() instead of .stream(), the model may still batch the output. Always use .stream() or .astream().
Output parser breaks streaming. Some output parsers buffer tokens. Use StrOutputParser() which passes through token-by-token.
Async event loop conflict. Calling .astream() inside a running event loop without asyncio.run() raises RuntimeError. Use proper async entry points.
Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

How to Create Basic LangChain Chains with LCEL
How to Debug LangChain Chain Execution