RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to handle API rate limiting and retry logic in AI-integrated API calls
HOW-TO · DEV

How to handle API rate limiting and retry logic in AI-integrated API calls

intermediate·20 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Application making AI API calls, documented rate limits (requests/minute or tokens/minute), Node.js 18+ or Python 3.10+

What this does

AI APIs enforce rate limits to prevent abuse and ensure fair resource allocation. When a rate limit is exceeded, the API returns a 429 status code. This guide explains how to implement exponential backoff with jitter for AI API calls so that transient rate limit errors resolve automatically without crashing workflows or losing request data. The pattern is language-agnostic and works with OpenAI, Anthropic, or any compatible AI endpoint.

Steps

  1. Identify the exact HTTP status code that the AI provider returns when the rate limit is hit. Most AI providers return 429 Too Many Requests and include a Retry-After header.
  2. Choose a retry strategy. Exponential backoff doubles the wait time after each failure starting from a base interval (1 second recommended). Jitter adds a random component to prevent thundering herd.
  3. Implement the retry wrapper. In Python, use tenacity decorators; in Node.js, use a custom async function or axios-retry with a custom retryCondition.
  4. Configure the wrapper to check for the 429 status code and read the Retry-After header. If the header is present, use its integer value as the wait time instead of computing one.
  5. Set a maximum number of retry attempts (5 is standard) and a global timeout to prevent indefinite loops.
  6. Wrap every AI API call site with the retry-enabled HTTP client.
  7. Add structured logging so that each retry and each ultimate failure is captured with timestamp, attempt count, and error code.

Verification

# Verify retry script exits cleanly with successful status
python3 -c "
import subprocess, sys
result = subprocess.run(['python3', 'scripts/ai_retry_demo.py'],
    capture_output=True, text=True, timeout=30)
print(result.stdout)
print(result.stderr)
sys.exit(result.returncode)
"
echo "Exit code: $?"
# Expected: Exit code: 0

Common failures

  • Retry-After header is missing or invalid. Some AI providers do not send the header. Fall back to computing the wait time from the retry count if the header is absent.
  • Retries consume excessive API quota during an outage. Implement a circuit breaker that stops retrying after three consecutive 429 errors and returns a clear error to the caller.
  • The retry loop never terminates in offline or network-partition scenarios. Always enforce a hard timeout (120 seconds total) and a maximum retry count (5 attempts) to break the loop.
  • Idempotency is not preserved across retries for non-idempotent requests. Use idempotency keys (via the Idempotency-Key header where the API supports it) to safely retry POST requests.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • Build a type-safe API client using an AI assistant that reads your OpenAPI spec
  • Integrate an AI assistant into your API client library to auto-generate usage examples
← All how-to guidesCourses →