RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to Add Streaming to LangChain Chains
HOW-TO · RAG

How to Add Streaming to LangChain Chains

intermediate·15 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

LangChain installed, streaming-capable LLM (Ollama, OpenAI), Python 3.10+

What this does

Streaming yields LLM output token-by-token as it is generated, enabling real-time display in UIs and reducing perceived latency for end users.

Steps

  • Set up a chain normally. Use LCEL composition.
from langchain_ollama import ChatOllama
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser

llm = ChatOllama(model="llama3.2", temperature=0.7, streaming=True)
prompt = ChatPromptTemplate.from_template("Tell me a short story about {topic}")

chain = prompt | llm | StrOutputParser()
  • Stream tokens with .stream(). Iterate over the generator.
for chunk in chain.stream({"topic": "a brave robot"}):
    print(chunk, end="", flush=True)
# Output: Each token printed as it arrives
  • Collect streaming chunks into a complete response. Useful when you need both streaming display and the full result.
full_response = ""
for chunk in chain.stream({"topic": "space exploration"}):
    full_response += chunk
    print(chunk, end="", flush=True)

print("\n--- Full response ---")
print(full_response)
  • Stream in async context. Use .astream() for async applications.
import asyncio

async def stream_async():
    async for chunk in chain.astream({"topic": "ocean life"}):
        print(chunk, end="", flush=True)

asyncio.run(stream_async())
  • Handle streaming in a FastAPI endpoint.
from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

@app.get("/generate/{topic}")
async def generate(topic: str):
    async def event_stream():
        async for chunk in chain.astream({"topic": topic}):
            yield f"data: {chunk}\n\n"
    return StreamingResponse(event_stream(), media_type="text/event-stream")

Verification

python -c "
from langchain_ollama import ChatOllama
from langchain.schema.output_parser import StrOutputParser
llm = ChatOllama(model='llama3.2', streaming=True)
chain = llm | StrOutputParser()
count = 0
for chunk in chain.stream({'input': 'Count to 5'}):
    count += 1
print(f'Streamed {count} chunks')
# Expected: Streamed >1 chunks
"

Common failures

  • streaming=True ignored in non-streaming calls. If you call .invoke() instead of .stream(), the model may still batch the output. Always use .stream() or .astream().
  • Output parser breaks streaming. Some output parsers buffer tokens. Use StrOutputParser() which passes through token-by-token.
  • Async event loop conflict. Calling .astream() inside a running event loop without asyncio.run() raises RuntimeError. Use proper async entry points.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • How to Create Basic LangChain Chains with LCEL
  • How to Debug LangChain Chain Execution
← All how-to guidesCourses →