Hardware · Budget

Local AI hardware under $1,000

Honest framing for sub-$1,000 builds: used RTX 3060 12 GB, RTX 4060 Ti 16 GB, used RTX 3090, M2 Mac mini 16 GB. What each runs at what tok/s, what each can't run, and which one matches your real workload.

By Fredoline Eruo · Last reviewed 2026-05-08 · ~1,300 words

Answer first

Four genuine sub-$1,000 paths. A used RTX 3060 12 GB (~$200-280) added to a PC you already own gets you a 12 GB tier that runs 14B Q4 models comfortably. A new or open-box RTX 4060 Ti 16 GB (~$380-450) gives you 16 GB and the same model class with more headroom for context. A used RTX 3090 24 GB (~$700-900) is the highest-leverage pick at this budget — it's the floor for running 32-70B-class models. An M2 Mac mini with 16 GB unified memory (~$600-800 used or refurbished) is the all-in-one option that also doubles as a desktop machine.

The right one depends on what you actually want to run. Honest matching is in the “Picking by workload” section below. If you want the fast path, hit /choose-my-gpu and let the recommender narrow it for your situation.

The four candidates and what they cost in 2026

Prices below are honest ranges from used market (eBay, Craigslist, Marketplace) and current retail in May 2026. Used prices vary $50-150 either way depending on condition, region, and how aggressive you negotiate.

  • Used RTX 3060 12 GB: $200-280 used, $300-340 new. 12 GB GDDR6, 360 GB/s memory bandwidth, ~170W TDP. Three-slot card, fits most cases.
  • RTX 4060 Ti 16 GB: $380-450 new. 16 GB GDDR6, 288 GB/s memory bandwidth (lower than the 3060 because of a narrower bus), ~165W TDP. The bandwidth is a real downgrade vs the 3060 12 GB; the extra 4 GB matters more for capacity than speed.
  • Used RTX 3090 24 GB: $700-900 used. 24 GB GDDR6X, 936 GB/s memory bandwidth, ~350W TDP. Three-slot card, large, hot, loud. Often pulled from mining; verify the seller.
  • M2 Mac mini 16 GB unified: $600-800 refurbished or used, $799 new from Apple. ~100 GB/s memory bandwidth (much lower than the GPUs above) but the unified-memory architecture means the “VRAM” is the system memory.

Add ~$50-100 for a power supply upgrade if your existing PC can't feed a 3090. Add nothing for the Mac mini — it draws under 100W flat out.

What each runs and at what tok/s

Honest ranges from operator reports. Tok/s varies 20-30% by quantization, context length, and runtime; numbers below are for Q4_K_M GGUF on llama.cpp/Ollama unless noted.

  • RTX 3060 12 GB: Llama 3.1 8B at 50-80 tok/s. Qwen 2.5 14B at 25-45 tok/s. Llama 3.3 70B does not fit. 16K-32K context comfortably on the 8B; 8K-16K on the 14B.
  • RTX 4060 Ti 16 GB: Llama 3.1 8B at 45-70 tok/s (the bandwidth disadvantage shows). Qwen 2.5 14B at 25-40 tok/s. The 4 GB extra over the 3060 12 GB lets you run longer contexts (32K+ on 14B) and try borderline 24-32B AWQ quantizations that don't fit on 12 GB.
  • RTX 3090 24 GB: Qwen 2.5 14B at 60-90 tok/s. Qwen 2.5 32B AWQ at 25-45 tok/s. Llama 3.3 70B Q4 at 12-22 tok/s with tight context. This is the tier where you stop apologizing for the model.
  • M2 Mac mini 16 GB: Llama 3.1 8B at 18-30 tok/s. Qwen 2.5 14B at 8-15 tok/s (memory bandwidth-limited). The unified-memory advantage shows up at the floor: an 8B model fits with room to spare, where a 16 GB Windows machine without GPU has to share memory with the OS.

Confirm any specific model+hardware combo at /will-it-run/custom; the full hardware-tier comparison is at /compare/hardware-tiers.

What each can't run

The honest negative space, because aspiration buys are how people end up unhappy.

  • None of these run Llama 3.3 70B Q4 comfortably. The 3090 fits it but with very tight context (4K-8K) and the experience is on the slow side of usable. A 24 GB card runs 70B; a 24 GB card running 70B at 32K context is a different conversation.
  • None of these run vLLM with paged-attention serving multiple concurrent users at production speed. If you want multi-user serving for a homelab API, the 3090 is the floor and even there you're limited.
  • None of these run image generation at the speed of a 4090 or 5090. SDXL on a 3060 takes 8-15 seconds per image; on a 3090 it's 2-5 seconds; on a 4090 under 1.5 seconds. If image generation is the primary workload, this budget tier is genuinely cramped.
  • None of these are good for fine-tuning anything past a small LoRA. Full-parameter fine-tuning of even a 7B needs more VRAM than any single card here, even with optimizer-state offload tricks. A used 3090 handles small QLoRA jobs; the others don't really.
  • None of these handle 128K-context loading on heavy models. The KV cache for long context grows fast; 128K context on a 32B model can need more memory than the model itself. Stay under 32K context unless you measured the math first.

Picking by workload

Concrete matching of card to use case.

You want a daily-driver chat replacement for ChatGPT Plus. RTX 3060 12 GB used. Cheapest entry to the 14B class. Pays back inside a year if you're a regular ChatGPT user.

You want longer context (résumé corpora, long-document RAG, multi-file coding) on a budget. RTX 4060 Ti 16 GB. The extra 4 GB matters more than the bandwidth dip. Honest tradeoff but the right one for context-heavy work.

You want to run 32-70B-class models without compromise. Used RTX 3090 24 GB. The single highest-leverage pick at this budget. Confirm the seller is not a former miner; check power-on hours if possible; expect to need a 750-850W PSU.

You want a quiet, integrated setup that doubles as a desktop machine. M2 Mac mini 16 GB. Slower than the GPUs at this budget, but the all-in-one factor (no separate desktop, no fan noise, low power draw) is real. Step up to 24 GB or 32 GB unified if you can stretch the budget.

You want to run a coding agent on a local model. The 3090 is the strongest pick because it fits Qwen 2.5 Coder 32B, which is meaningfully better at code than the 14B class. The 4060 Ti 16 GB is the next-best pick. Below 16 GB the experience works but you're at the 14B coder ceiling.

Common mistakes at this price point

Operator-grade honesty about budget-tier purchases that go wrong.

  • Buying an 8 GB GPU and discovering it's the wrong tier. The 3060 8 GB and 4060 8 GB look like budget options but they're too small for the 14B class that's actually useful. Spend $80-100 more and get the 12 GB or 16 GB version.
  • Buying a 3090 without checking PSU and case clearance. The card is large, hot, and power-hungry. A 550W PSU is not enough; 750W is the floor for a clean build. Three-slot case clearance is required.
  • Buying a Mac mini with 8 GB instead of 16 GB. The base Mac mini is too small to run anything past a 3B model with system overhead. The 16 GB or 24 GB version is what makes the unified-memory math work.
  • Buying a 4090 used at $1,100-1,300 instead of two 3090s. A 4090 is not in this budget; a 3090 is. If your budget can stretch past $1,000, the buy decision is in /guides/choosing-a-gpu-for-local-ai-2026, not here.

The full ladder from $0 to $4,000 is in /guides/best-hardware-for-running-local-ai-models; the buyer-engine recommender is at /choose-my-gpu; the cost-payback math is in /guides/does-running-ai-locally-save-money.

Next recommended step

Answer four questions about your workload; get a per-card rationale.