16. Streaming Agent Output
LangGraph's .stream() method provides three streaming modes controlled by the stream_mode parameter: "values" (emits the full state after each node), "updates" (~ emits the state update dict from each node), and "messages" (streams individual tokens from model output). For terminal-user applications, "messages" is the mode to use—it gives you real-time token-by-token output from the language model.
for event in graph.stream(
{"messages": [{"role": "user", "content": "Write a hello world in Rust"}]},
config=config,
stream_mode="messages"
):
print(event, end="", flush=True)
For agent tools, stream_mode="updates" is more useful—it shows you which node is producing what updates without the verbosity of the full state on every step:
for step in graph.stream({"messages": [{"role": "user", "content": query}]},
config=config,
stream_mode="updates"):
print(f"Node: {list(step.keys())}")
current_messages = list(step.values())[0].get("messages", [])
if current_messages:
last = current_messages[-1]
print(f" Last message: {last.type} - {getattr(last, 'content', '')[:80]}")
For production streaming with a web server (FastAPI, Flask), wrap the stream in an async generator:
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
@app.post("/chat/stream")
async def stream_chat(request: ChatRequest):
async def event_stream():
config = {"configurable": {"thread_id": request.session_id}}
async for event in graph.astream(
{"messages": [{"role": "user", "content": request.message}]},
config=config,
stream_mode="messages"
):
yield f"data: {event.content}\n\n"
return StreamingResponse(event_stream(), media_type="text/event-stream")
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Build a FastAPI endpoint that streams a ReAct agent's output using SSE. Include proper data: framing and Content-Type: text/event-stream headers. Test it with curl or Postman to confirm tokens arrive incrementally.