What this does

AI APIs enforce rate limits to prevent abuse and ensure fair resource allocation. When a rate limit is exceeded, the API returns a 429 status code. This guide explains how to implement exponential backoff with jitter for AI API calls so that transient rate limit errors resolve automatically without crashing workflows or losing request data. The pattern is language-agnostic and works with OpenAI, Anthropic, or any compatible AI endpoint.

Steps

Identify the exact HTTP status code that the AI provider returns when the rate limit is hit. Most AI providers return 429 Too Many Requests and include a Retry-After header.
Choose a retry strategy. Exponential backoff doubles the wait time after each failure starting from a base interval (1 second recommended). Jitter adds a random component to prevent thundering herd.
Implement the retry wrapper. In Python, use tenacity decorators; in Node.js, use a custom async function or axios-retry with a custom retryCondition.
Configure the wrapper to check for the 429 status code and read the Retry-After header. If the header is present, use its integer value as the wait time instead of computing one.
Set a maximum number of retry attempts (5 is standard) and a global timeout to prevent indefinite loops.
Wrap every AI API call site with the retry-enabled HTTP client.
Add structured logging so that each retry and each ultimate failure is captured with timestamp, attempt count, and error code.

Verification

# Verify retry script exits cleanly with successful status
python3 -c "
import subprocess, sys
result = subprocess.run(['python3', 'scripts/ai_retry_demo.py'],
    capture_output=True, text=True, timeout=30)
print(result.stdout)
print(result.stderr)
sys.exit(result.returncode)
"
echo "Exit code: $?"
# Expected: Exit code: 0

Common failures

Retry-After header is missing or invalid. Some AI providers do not send the header. Fall back to computing the wait time from the retry count if the header is absent.
Retries consume excessive API quota during an outage. Implement a circuit breaker that stops retrying after three consecutive 429 errors and returns a clear error to the caller.
The retry loop never terminates in offline or network-partition scenarios. Always enforce a hard timeout (120 seconds total) and a maximum retry count (5 attempts) to break the loop.
Idempotency is not preserved across retries for non-idempotent requests. Use idempotency keys (via the Idempotency-Key header where the API supports it) to safely retry POST requests.
Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

Build a type-safe API client using an AI assistant that reads your OpenAPI spec
Integrate an AI assistant into your API client library to auto-generate usage examples