HOW-TO · DEV
How to handle API rate limiting and retry logic in AI-integrated API calls
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES
Application making AI API calls, documented rate limits (requests/minute or tokens/minute), Node.js 18+ or Python 3.10+
What this does
AI APIs enforce rate limits to prevent abuse and ensure fair resource allocation. When a rate limit is exceeded, the API returns a 429 status code. This guide explains how to implement exponential backoff with jitter for AI API calls so that transient rate limit errors resolve automatically without crashing workflows or losing request data. The pattern is language-agnostic and works with OpenAI, Anthropic, or any compatible AI endpoint.
Steps
- Identify the exact HTTP status code that the AI provider returns when the rate limit is hit. Most AI providers return
429 Too Many Requestsand include aRetry-Afterheader. - Choose a retry strategy. Exponential backoff doubles the wait time after each failure starting from a base interval (1 second recommended). Jitter adds a random component to prevent thundering herd.
- Implement the retry wrapper. In Python, use
tenacitydecorators; in Node.js, use a custom async function oraxios-retrywith a customretryCondition. - Configure the wrapper to check for the
429status code and read theRetry-Afterheader. If the header is present, use its integer value as the wait time instead of computing one. - Set a maximum number of retry attempts (5 is standard) and a global timeout to prevent indefinite loops.
- Wrap every AI API call site with the retry-enabled HTTP client.
- Add structured logging so that each retry and each ultimate failure is captured with timestamp, attempt count, and error code.
Verification
# Verify retry script exits cleanly with successful status
python3 -c "
import subprocess, sys
result = subprocess.run(['python3', 'scripts/ai_retry_demo.py'],
capture_output=True, text=True, timeout=30)
print(result.stdout)
print(result.stderr)
sys.exit(result.returncode)
"
echo "Exit code: $?"
# Expected: Exit code: 0
Common failures
- Retry-After header is missing or invalid. Some AI providers do not send the header. Fall back to computing the wait time from the retry count if the header is absent.
- Retries consume excessive API quota during an outage. Implement a circuit breaker that stops retrying after three consecutive 429 errors and returns a clear error to the caller.
- The retry loop never terminates in offline or network-partition scenarios. Always enforce a hard timeout (120 seconds total) and a maximum retry count (5 attempts) to break the loop.
- Idempotency is not preserved across retries for non-idempotent requests. Use idempotency keys (via the
Idempotency-Keyheader where the API supports it) to safely retry POST requests. - Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
- Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.