RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /LangGraph for Local Agents
  6. /Ch. 16
LangGraph for Local Agents

16. Streaming Agent Output

Chapter 16 of 18 · 20 min
KEY INSIGHT

Three streaming modes exist for three audiences: `"values"` for debugging the full state, `"updates"` for observing node-level activity, and `"messages"` for end-user token streaming.

LangGraph's .stream() method provides three streaming modes controlled by the stream_mode parameter: "values" (emits the full state after each node), "updates" (~ emits the state update dict from each node), and "messages" (streams individual tokens from model output). For terminal-user applications, "messages" is the mode to use—it gives you real-time token-by-token output from the language model.

for event in graph.stream(
    {"messages": [{"role": "user", "content": "Write a hello world in Rust"}]},
    config=config,
    stream_mode="messages"
):
    print(event, end="", flush=True)

For agent tools, stream_mode="updates" is more useful—it shows you which node is producing what updates without the verbosity of the full state on every step:

for step in graph.stream({"messages": [{"role": "user", "content": query}]},
                        config=config,
                        stream_mode="updates"):
    print(f"Node: {list(step.keys())}")
    current_messages = list(step.values())[0].get("messages", [])
    if current_messages:
        last = current_messages[-1]
        print(f"  Last message: {last.type} - {getattr(last, 'content', '')[:80]}")

For production streaming with a web server (FastAPI, Flask), wrap the stream in an async generator:

from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

@app.post("/chat/stream")
async def stream_chat(request: ChatRequest):
    async def event_stream():
        config = {"configurable": {"thread_id": request.session_id}}
        async for event in graph.astream(
            {"messages": [{"role": "user", "content": request.message}]},
            config=config,
            stream_mode="messages"
        ):
            yield f"data: {event.content}\n\n"
    return StreamingResponse(event_stream(), media_type="text/event-stream")

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Build a FastAPI endpoint that streams a ReAct agent's output using SSE. Include proper data: framing and Content-Type: text/event-stream headers. Test it with curl or Postman to confirm tokens arrive incrementally.

← Chapter 15
Persistence
Chapter 17 →
LangGraph vs LangChain