How do I prevent a $30K cloud LLM bill from a runaway agent?

Q: How do I prevent a $30K cloud LLM bill from a runaway agent?

The $30K bill story (r/artificial, May 2026): a developer wired an LLM into a job queue. A bug caused the queue to fan out an order of magnitude more work than intended. The LLM happily processed every job. Result: $30K in API charges over a weekend.

The answer

One paragraph. No hedging beyond what the data actually warrants.

The $30K bill story (r/artificial, May 2026): a developer wired an LLM into a job queue. A bug caused the queue to fan out an order of magnitude more work than intended. The LLM happily processed every job. Result: $30K in API charges over a weekend.

Five controls that prevent this:

1. Cloud-provider budget alerts (cheapest, fastest, most ignored). Anthropic / OpenAI / AWS all let you set monthly spend caps and email alerts at 50% / 75% / 90%. Set them on every API key, every workspace, every account. Set the cap below what would hurt to lose. Configure within 60 seconds of creating any new API key.

2. Per-key rate limits. Anthropic + OpenAI both let you cap requests-per-minute on an API key. Set it to 1.5-2× your normal peak. A bug that explodes traffic by 10× hits the rate limit, fails, and you find out in minutes instead of days.

3. Local-first defaults for non-critical paths. Background jobs, batch summarization, classification, embedding — none of these need frontier-cloud quality. Route them to local Ollama via LiteLLM. Cloud stays for the 5% of paths that genuinely need frontier quality.

4. Observability you actually look at. A Langfuse / Helicone / Phoenix dashboard that surfaces "what did we spend this hour" prominently. If the dashboard is buried, you won't see the spike.

5. A kill switch. A circuit-breaker pattern: if hourly cost exceeds threshold X, automatically downgrade to local-only or fail-fast. Better to fail loudly than spend silently.

The cost-vs-cloud calculator on this site is meant for the decision direction (when does local pay back). The defensive direction is just as important: assume your cloud usage will misbehave and configure for that.

Explore the numbers for your specific stack

Open /cost-vs-cloud calculator →

See exactly what your workload would cost on Anthropic / OpenAI / Together / DeepInfra / Groq, side by side with the local-rig equivalent.

Where we got the numbers

$30K bill story: r/artificial thread May 2026. Best-practice budget controls: Anthropic + OpenAI + AWS billing documentation. LiteLLM cost tracking: docs.litellm.ai.

Also see

When does local pay back? →

The TCO compounder — at what daily volume does local-rig hardware beat cloud?

LiteLLM — the proxy that lets you mix →

Route non-critical jobs to local Ollama; keep cloud for the 5% that needs frontier quality.

Decode your Claude Code costs →

Paste a Claude Code session — see per-tool cost attribution + insights.

Local AI observability →

The dashboard + alerting setup that catches runaway loops before they bill.