14. Error Handling

Chapter 14 of 18 · 15 min

LangGraph does not have built-in retry logic by default. The standard approach is to implement error handling inside nodes or in a dedicated error-recovery node that the graph routes to after an error. The pattern: define an error: str | None field in the state schema, let nodes catch exceptions and populate that field, then use a conditional edge that routes to a recovery node when error is set.

def safe_researcher(state: TeamState) -> TeamState:
    try:
        result = risky_search(state["task"])
        findings = result["content"]
        return {"findings": findings, "error": None}
    except SearchError as e:
        return {"error": f"Search failed: {e}"}
    except TimeoutError as e:
        return {"error": f"Search timed out: {e}"}

def error_router(state: TeamState) -> Literal["retry_researcher", "escalate", END]:
    if state.get("retry_count", 0) >= 2:
        return "escalate"
    if state.get("error"):
        return "retry_researcher"
    return END

The retry counter increments in the recovery node. After N retries, the graph routes to escalate, which might send a notification or write an error to a log. This gives you bounded retries with an escalation path rather than infinite loops.

START → researcher → [if error] → retry_researcher → researcher
                                              ↓ (after 2 retries)
                                          escalate → END

For transient errors (network timeouts, rate limits), exponential backoff can be implemented by tracking last_attempt in state and comparing timestamps in the retry node's routing logic. For permanent errors (bad API keys, malformed input), skip retry and escalate immediately.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Add a retry_count field to an existing node. Wrap the node's logic in try/except. Add a retry node that increments the counter and a conditional edge that escalates after two failures. Verify the graph correctly terminates the retry loop on the third attempt.