HOW-TO · RAG
How to Add Streaming to LangChain Chains
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES
LangChain installed, streaming-capable LLM (Ollama, OpenAI), Python 3.10+
What this does
Streaming yields LLM output token-by-token as it is generated, enabling real-time display in UIs and reducing perceived latency for end users.
Steps
- Set up a chain normally. Use LCEL composition.
from langchain_ollama import ChatOllama
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
llm = ChatOllama(model="llama3.2", temperature=0.7, streaming=True)
prompt = ChatPromptTemplate.from_template("Tell me a short story about {topic}")
chain = prompt | llm | StrOutputParser()
- Stream tokens with
.stream(). Iterate over the generator.
for chunk in chain.stream({"topic": "a brave robot"}):
print(chunk, end="", flush=True)
# Output: Each token printed as it arrives
- Collect streaming chunks into a complete response. Useful when you need both streaming display and the full result.
full_response = ""
for chunk in chain.stream({"topic": "space exploration"}):
full_response += chunk
print(chunk, end="", flush=True)
print("\n--- Full response ---")
print(full_response)
- Stream in async context. Use
.astream()for async applications.
import asyncio
async def stream_async():
async for chunk in chain.astream({"topic": "ocean life"}):
print(chunk, end="", flush=True)
asyncio.run(stream_async())
- Handle streaming in a FastAPI endpoint.
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
@app.get("/generate/{topic}")
async def generate(topic: str):
async def event_stream():
async for chunk in chain.astream({"topic": topic}):
yield f"data: {chunk}\n\n"
return StreamingResponse(event_stream(), media_type="text/event-stream")
Verification
python -c "
from langchain_ollama import ChatOllama
from langchain.schema.output_parser import StrOutputParser
llm = ChatOllama(model='llama3.2', streaming=True)
chain = llm | StrOutputParser()
count = 0
for chunk in chain.stream({'input': 'Count to 5'}):
count += 1
print(f'Streamed {count} chunks')
# Expected: Streamed >1 chunks
"
Common failures
streaming=Trueignored in non-streaming calls. If you call.invoke()instead of.stream(), the model may still batch the output. Always use.stream()or.astream().- Output parser breaks streaming. Some output parsers buffer tokens. Use
StrOutputParser()which passes through token-by-token. - Async event loop conflict. Calling
.astream()inside a running event loop withoutasyncio.run()raisesRuntimeError. Use proper async entry points. - Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
- Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.
Related guides
- How to Create Basic LangChain Chains with LCEL
- How to Debug LangChain Chain Execution