My cloud LLM provider just changed pricing. What are my local options?

The answer

One paragraph. No hedging beyond what the data actually warrants.

Cloud LLM pricing changes are a normal part of the market. Local AI is the structural hedge — but only for the workloads where local actually works.

When a vendor changes pricing (split a free tier to paid, deprecate a "free" mode, raise per-token rates), the framing matters: this isn't a betrayal, it's how SaaS economics work. Cloud providers chose pricing levels that didn't reflect serving cost long enough to capture market share. Eventually the unit economics catch up. The same pattern happened with Anthropic's --print mode shift to credits in May 2026, and earlier with OpenAI's GPT-4 pricing tiers.

What's the structural defense? Run as much of your workload locally as the work itself allows. By workload class:

Coding (your highest-value workload):

Tool: Aider / Cline / Continue / Tabby / Twinny (all in our /apps directory)
Model: Qwen 2.5 Coder 32B Q4_K_M on 24GB VRAM, or Qwen 2.5 Coder 7B Q4_K_M on 12GB
Cost crossover: at typical Claude Sonnet 4.5 coding usage (a few hundred thousand tokens/day across agent loops), a $700 used RTX 3090 typically pays back inside a year. Heavy users — multi-million token days — recoup in months. Plug your actual monthly token volume into /cost-vs-cloud for the exact crossover.
Editorial honest take: local 32B coding is genuinely competitive with cloud Sonnet/Opus for most coding tasks, especially with agentic loops

Chat (the easy migration):

Tool: Open WebUI, Jan, LibreChat, AnythingLLM (all in /apps)
Model: Llama 3.1 8B Q4_K_M, Qwen 3 14B Q4_K_M, or Qwen 3 32B Q4_K_M (if you have the VRAM)
Cost-equivalent: nearly any consumer GPU pays back in months for daily chat use

RAG / docs:

Tool: PrivateGPT, Verba, Khoj, AnythingLLM
Model: 7-8B base model + local embedder (bge-small, nomic-embed)
Editorial honest take: this is where local wins decisively — you keep your documents on your hardware, no per-query cost, no rate limits

Voice / transcription:

Tool: MacWhisper, Buzz, OpenedAI-Speech (TTS)
Cost-equivalent: pays back almost instantly for any sustained transcription workload

What stays cloud (be honest with yourself):

Frontier-quality one-shot tasks where local 32B genuinely isn't enough
Cutting-edge multimodal (vision-language at frontier scale)
405B+ model needs (datacenter-only locally)
Workloads where latency hides the cost difference (your time matters more than dollars)

The migration order: start with the workload where local genuinely beats cloud on quality + cost + latency. That's coding for most operators. Run Aider with Qwen 2.5 Coder 32B for a week. If the gap is small enough (it usually is), expand from there. Don't try to migrate everything at once.

Sanity check first: before any panic-migration, run /cost-vs-cloud with your actual monthly volume. The break-even on a $700 used 3090 vs Claude Sonnet 4.5 typically lands at 5-10M tokens/month of API usage. Below that, the migration overhead isn't worth it.

Explore the numbers for your specific stack

Run the cost vs cloud calculator →

Plug your actual monthly API spend in. See exactly when local hardware amortizes, and which workloads have the fastest payback.

Where we got the numbers

Anthropic --print mode → credits change: r/ClaudeAI May 2026 threads (533+ upvotes, 463+ comments). General SaaS-pricing-unit-economics observation: industry analysis since 2020. Local-vs-cloud crossover math: /cost-vs-cloud calculator + /compounder.

Also see

Coding agents (the fastest local-payback workload) →

Aider, Cline, Continue, Tabby, Twinny — all run fully against local Ollama / vLLM.

Which coding agent for local? →

Decision rule by agent style + model pairing.

When does local pay back? →

The TCO compounder — drag a daily-volume slider, watch the cross-over point.

Build a coding-focused stack →

Full rig recipe — GPU + runtime + model + install script for replacing cloud coding.