08. Checkpointing
Checkpointing is the mechanism by which LangGraph saves state after each node execution to a storage backend. This serves two purposes: survivability (resume after a crash or restart) and branching (fork a run from a saved checkpoint). LangGraph ships MemorySaver for in-memory checkpointing during development and .SqliteSaver / PostgresSaver for production durability.
from langgraph.checkpoint.memory import MemorySaver
from langgraph.checkpoint.postgres import PostgresSaver
import os
# Development: in-memory
memory_checkpointer = MemorySaver()
# Production: Postgres
prod_checkpointer = PostgresSaver.from_conn_string(os.environ["DATABASE_URL"])
# Pass to compile()
graph = builder.compile(checkpointer=memory_checkpointer)
The checkpointer stores a serialized snapshot of the state after each step. Thread IDs isolate concurrent runs. To resume:
config = {"configurable": {"thread_id": "unique-session-123"}}
graph.invoke(initial_state, config=config)
# Later, same thread resumes from last checkpoint
resume_state = graph.invoke(Command(resume={}), config=config)
To list all saved checkpoints for a thread:
checkpoints = list(graph.get_checkpoints(config))
The first entry is the oldest checkpoint; the last is the most recent. You can branch a new run from any checkpoint by passing the checkpoint's checkpoint_id in the config:
branch_config = {"configurable": {"thread_id": "branch-from-history", "checkpoint_id": checkpoint_id}}
graph.invoke(initial_state, config=branch_config)
A failure mode: the SQLite checkpointer does not support concurrent writes to the same thread_id from multiple processes. Use Postgres for multi-process production deployments. Another common mistake is not passing the checkpointer to compile() before using interrupt()—this silently creates no checkpoints and causes resume failures.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Build a two-node graph with a checkpointer. Run it with thread_id "ckpt-demo", inspect graph.get_checkpoints(config), then start a new branch from the saved checkpoint ID and confirm it replayed correctly.