How much does local AI actually cost
An honest cost breakdown of running AI on your own hardware. Hardware tiers from $0 to $4000+, electricity math, maintenance time, and the comparison to ChatGPT Plus. With ranges, not point values, and explicit conditions for when local AI does and does not save money.
The internet's standard answer to "how much does local AI cost" is either a marketing-flavored "$0 — it's free!" or a doomer "you need a $5,000 workstation to do anything useful." Both are wrong. The real answer is a small grid of ranges that depend on what you're trying to do, how often you do it, and which costs you remember to count. This page is the grid, with the working shown.
We are not making ROI claims here. Whether local AI saves you money depends on usage that varies by an order of magnitude across people who claim to need the same thing. We will list conditions, not verdicts.
The four cost categories
Any honest cost analysis has to count four things, not one. Most of the comparisons you'll see online count one and call it a day, which is why the answers come out wrong.
- Hardware upfront. The card or the Mac, plus PSU, case, RAM, NVMe, sometimes a UPS. Treat as amortized over 3-4 years for desktop GPU builds, 4-6 years for Apple Silicon (Apple machines depreciate slower because the OS keeps supporting them).
- Electricity. The card under load, plus idle draw when the box is on, times your local kWh price. This is the cost most "local AI saves money" claims forget to compute.
- Your time. Setup, troubleshooting, model downloads, driver upgrades, the inevitable Friday-night Docker puzzle. This is the cost most "ChatGPT is just easier" claims quietly assume is zero.
- Opportunity cost. What you would have done with the hardware budget and the time. Real if your alternative is a subscription that meets all your actual needs; less real if your alternative is "more cloud bills for the same tasks."
Skip any of the four and the comparison is fiction. Add all four and the answer is usually "depends on your usage tier" — which is why the next section is a tier table, not a single number.
Hardware tiers and what they actually run
Five realistic starting points. Capabilities below are the comfortable envelope, not the absolute ceiling — running 70B Q4 on an 8 GB card is technically possible with offload at 1-2 tok/s, but you'll hate it.
$0 — CPU-only on a machine you already own
16 GB system RAM, any reasonably modern CPU, no discrete GPU. Runs 7B-class models in Q4 quants at 4-12 tok/s through Ollama or LM Studio. Real workloads it handles: chat, light coding assist, single-document summarization, cover-letter drafts (see /workflows/private-job-search-assistant). Workloads it doesn't: anything where you need 13B+ for quality, anything multi-stream, image generation, real-time voice. Total marginal cost: the electricity, which on a CPU-only run is roughly the same as having a few browser tabs open.
~$300 — used RTX 3060 12 GB
The cost-per-VRAM-GB sweet spot on the new-and-used market. 12 GB of VRAM is the floor for serious 8B-14B work. Pair with a $50-150 used PSU/board if your existing system can't accept it. Total entry cost realistically lands in the $300-500 range once you account for a case fan and a reseat. Runs Llama 3.1 8B, Qwen 2.5 14B Q4_K_M, Whisper large-v3-turbo, and most embedding models comfortably. Will not run 32B-class for chat at usable speeds.
~$1,500 — used RTX 4090 24 GB
The "I want everything" tier on a single card. 24 GB of VRAM runs 32B-class at Q4 (Qwen 2.5 32B Coder, Mixtral 8x7B), 14B at full precision, and image generation at comfortable resolutions. New 4090s sit higher when available; used pricing in 2026 has settled into the $1,300-1,700 band. Add ~$200-400 for a 1000 W gold PSU if you don't have one — 4090 transients are real, see /systems/local-ai-maintenance. See /guides/choosing-a-gpu-for-local-ai-2026 for the full new-vs-used analysis.
~$2,500 — Mac Studio M3 Max 64 GB unified memory
The quiet path. Unified memory means no VRAM/RAM split, so you can load 70B-class models in 4-bit that wouldn't fit a 4090. Decode speeds are 30-50% slower than a 4090 on the same model class, but the box runs at ~120 W under inference instead of ~450 W and is silent. Tradeoff: software ecosystem narrower than CUDA; image-generation tooling lags by months. See /hardware/apple-m3-ultra for the higher tier.
$4,000+ — dual 3090 NVLink or single 5090
70B-and-above territory. Dual-3090 NVLink runs Llama 3.1 70B at Q4 at usable single-stream speeds. Single 5090 has 32 GB and is faster per-card but caps at ~32B-class without offload. Power draw crosses 600 W under sustained load; budget for a 1200 W PSU and consider a UPS. This tier is for people who genuinely need 70B locally — most of those people are fine-tuning, running an agent on a real codebase, or running for several users.
Electricity math
Show the working, refuse to invent point estimates. The formula is:
monthly_cost = avg_watts_under_load × hours_per_day × 30 × kWh_price / 1000
Three real-shape ranges:
- RTX 3060 12 GB at 170 W average, 2 hours/day inference, $0.10-0.20/kWh: roughly $1.00 - $2.00 per month under load. Add ~$1-2/month idle if you leave the PC on. Total: a few dollars a month, less than the dollar value of one ChatGPT Plus week.
- RTX 4090 at 350 W average, 4 hours/day inference, $0.15/kWh: roughly $6.30 per month under load. At $0.30/kWh (parts of California, EU): roughly $12.60 per month. At 8 hours/day instead of 4: double both numbers.
- M3 Max at 70 W average, 4 hours/day inference, $0.15/kWh: roughly $1.30 per month. The unified-memory architecture is genuinely cheap to run; the cost is upfront, not ongoing.
Idle draw matters if you leave the machine on 24/7. A typical gaming-tower-with-4090 idles around 80-110 W; that's another $7-12/month at $0.15/kWh just for being plugged in and powered. A Mac Studio idles below 10 W. The honest comparison is "system idle + load on top," not "load watts × usage."
Source for U.S. retail kWh: the EIA monthly tables. Your number is the one on your bill — not the national average. We will not invent a single "local AI costs $X/month" number because the variance is 15× across realistic households.
Maintenance time
The cost everyone forgets. Setting up local AI well takes time; keeping it running takes more. Honest ranges:
- Casual user, single-machine, model-of-the-month updates: 1-2 hours/month. Mostly pulling new model weights, the occasional driver bump.
- Active user, multiple workflows, custom configs: 3-4 hours/month. Workflow tweaks, embedding re-ingest, the occasional Docker mystery, light reading on a new release.
- Production-adjacent (small-team or always-on): 4-6 hours/month, sometimes more in the months around a major driver or runtime release. The full failure-mode catalog is in /systems/local-ai-maintenance.
Translate to dollars at your hourly rate to make the comparison fair. At $40/hour, casual maintenance is $40-80/month of opportunity cost. At $150/hour, production-adjacent maintenance is $600-900/month — a large enough number to dominate the comparison.
Comparison to ChatGPT Plus
ChatGPT Plus is $20/month, $240/year. The structural comparison question is: at what cumulative usage does $1,500 of hardware (the "comfortable everything" tier) recover its cost vs the subscription?
Honest answer: often never, for casual users. If your ChatGPT Plus subscription is meeting your needs and you are using it under an hour a day, the subscription is dramatically cheaper than the hardware over any reasonable horizon. $1,500 of hardware is over 6 years of ChatGPT Plus.
Honest answer for heavy users: break-even arrives quickly. If you are hitting Plus rate limits, paying for ChatGPT Team or API tokens on top, running multiple workflows, or doing anything that you'd otherwise hand to the OpenAI API at meter rates, the picture flips. A team of 4 sharing one $1,500 box vs four $20 subscriptions is roughly 18-month payback before you even count API spend.
The cleanest mental model: subscription cost scales with users and intensity; local hardware cost is fixed. Below some usage threshold, subscription wins. Above it, local wins. The threshold for an individual is somewhere in the 2-4 hours-of-active-use-per-day band, give or take, depending on hardware tier and electricity cost.
When local AI does NOT save money
Be willing to admit this, because it is true for a sizable fraction of people who ask the question.
- You only chat 30 minutes a day. A $20 subscription is cheaper than $300 of hardware plus electricity plus your time even over four years.
- You can't troubleshoot Docker. The maintenance time crosses into "this is now my hobby," which is fine if you want a hobby, expensive if you don't. The honest version of the tradeoff is: are you willing to spend the time, or are you trying to save money?
- You don't need privacy. If your prompts are birthday-card drafts and code-review notes for public OSS, the privacy benefit isn't a benefit, and the subscription is just easier.
- You need frontier-model quality. Local 8B-14B-32B models are not GPT-5 / Claude Opus class. If your work depends on frontier reasoning, the local stack supplements rather than replaces.
- You're already paying for a workstation that runs warm. If you live in a hot apartment with no AC, the additional 350 W of an RTX 4090 makes the room less comfortable in July, which is a real cost even if it isn't on your power bill.
When local AI DOES save money
The conditions where the math actually works out:
- 200+ tokens/day across multiple use cases. Coding assistant + writing draft + RAG over your notes + the occasional image — at that intensity, the API meter or the multiple subscriptions add up faster than the hardware does.
- Privacy-required workflows. If you can't legally or sensibly send the data to a cloud provider — patient records, legal discovery, internal source code at a regulated firm, multi-year career data (see the job-search workflow above) — the comparison isn't "local vs subscription," it's "local vs not using AI at all." Local wins against zero.
- Small team sharing one box. A single 4090 workstation with vLLM serves 5-15 concurrent chats; the per-seat cost crashes. See /will-it-run/custom to size for concurrency.
- Always-on workloads. If you have a stream of inference that runs unattended (RAG, classification, embedding ingestion of a corpus), API rates compound and local hardware flattens.
Value beyond cost
Some of the reasons people run AI locally don't reduce to dollars. Worth naming so you can decide whether they apply to you.
- Privacy. Your data does not leave your machine. For some workloads (career data, medical, legal) this is the only real reason; the cost question is secondary.
- Offline capability. Flights, internet outages, travel to places with bad bandwidth. Local AI works on a plane. Cloud AI does not.
- Reliability. No outages, no rate limits, no service deprecations forcing model switches mid-project. The model you have today is the model you have tomorrow.
- Learning. Running the inference engine yourself is the fastest way to actually understand how these models work. Hard to put a dollar value on, real value to people who want it.
- No silent training. Cloud providers' terms of service vary; some train on your inputs, some don't, some used to and don't anymore, some say they don't but your prompts still get logged. Local sidesteps the question.
Bottom line
Don't pick local AI to save money unless you've done the math honestly — all four cost categories, your real usage, your real electricity rate, your real hourly value of time. Most people who run the math discover that the cost case is real for them, the cost case is irrelevant for them (subscription is fine), or the cost is real but secondary to privacy / offline / learning.
If you want to size hardware to your actual workload before spending, /will-it-run/custom takes a model + context + concurrency target and tells you the minimum viable build. If you want to learn what the long-term ownership actually feels like, read /systems/local-ai-maintenance. If you hit setup pain along the way, the /errors catalog covers most common failures with concrete fixes.
We will not give you a single dollar figure. We will give you the ranges, the working, and the conditions. Your number is yours to compute.