Streaming Agent Output — LangGraph for Local Agents (Chapter 16)

LangGraph's .stream() method provides three streaming modes controlled by the stream_mode parameter: "values" (emits the full state after each node), "updates" (~ emits the state update dict from each node), and "messages" (streams individual tokens from model output). For terminal-user applications, "messages" is the mode to use—it gives you real-time token-by-token output from the language model.

for event in graph.stream(
    {"messages": [{"role": "user", "content": "Write a hello world in Rust"}]},
    config=config,
    stream_mode="messages"
):
    print(event, end="", flush=True)

For agent tools, stream_mode="updates" is more useful—it shows you which node is producing what updates without the verbosity of the full state on every step:

for step in graph.stream({"messages": [{"role": "user", "content": query}]},
                        config=config,
                        stream_mode="updates"):
    print(f"Node: {list(step.keys())}")
    current_messages = list(step.values())[0].get("messages", [])
    if current_messages:
        last = current_messages[-1]
        print(f"  Last message: {last.type} - {getattr(last, 'content', '')[:80]}")

For production streaming with a web server (FastAPI, Flask), wrap the stream in an async generator:

from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

@app.post("/chat/stream")
async def stream_chat(request: ChatRequest):
    async def event_stream():
        config = {"configurable": {"thread_id": request.session_id}}
        async for event in graph.astream(
            {"messages": [{"role": "user", "content": request.message}]},
            config=config,
            stream_mode="messages"
        ):
            yield f"data: {event.content}\n\n"
    return StreamingResponse(event_stream(), media_type="text/event-stream")

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.