How do I prevent a $30K cloud LLM bill from a runaway agent?
The answer
One paragraph. No hedging beyond what the data actually warrants.
The $30K bill story (r/artificial, May 2026): a developer wired an LLM into a job queue. A bug caused the queue to fan out an order of magnitude more work than intended. The LLM happily processed every job. Result: $30K in API charges over a weekend.
Five controls that prevent this:
1. Cloud-provider budget alerts (cheapest, fastest, most ignored). Anthropic / OpenAI / AWS all let you set monthly spend caps and email alerts at 50% / 75% / 90%. Set them on every API key, every workspace, every account. Set the cap below what would hurt to lose. Configure within 60 seconds of creating any new API key.
2. Per-key rate limits. Anthropic + OpenAI both let you cap requests-per-minute on an API key. Set it to 1.5-2× your normal peak. A bug that explodes traffic by 10× hits the rate limit, fails, and you find out in minutes instead of days.
3. Local-first defaults for non-critical paths. Background jobs, batch summarization, classification, embedding — none of these need frontier-cloud quality. Route them to local Ollama via LiteLLM. Cloud stays for the 5% of paths that genuinely need frontier quality.
4. Observability you actually look at. A Langfuse / Helicone / Phoenix dashboard that surfaces "what did we spend this hour" prominently. If the dashboard is buried, you won't see the spike.
5. A kill switch. A circuit-breaker pattern: if hourly cost exceeds threshold X, automatically downgrade to local-only or fail-fast. Better to fail loudly than spend silently.
The cost-vs-cloud calculator on this site is meant for the decision direction (when does local pay back). The defensive direction is just as important: assume your cloud usage will misbehave and configure for that.
Explore the numbers for your specific stack
Where we got the numbers
$30K bill story: r/artificial thread May 2026. Best-practice budget controls: Anthropic + OpenAI + AWS billing documentation. LiteLLM cost tracking: docs.litellm.ai.
Also see
The TCO compounder — at what daily volume does local-rig hardware beat cloud?
Route non-critical jobs to local Ollama; keep cloud for the 5% that needs frontier quality.
Paste a Claude Code session — see per-tool cost attribution + insights.
The dashboard + alerting setup that catches runaway loops before they bill.
Found this via a forum search? Bookmark the URL — we update these pages as new data lands. Have a question that should live here? Open a GitHub issue.