NVIDIA RTX A6000 (Ampere) for local AI

Q: What models can NVIDIA RTX A6000 (Ampere) run?

With 48GB VRAM, the NVIDIA RTX A6000 (Ampere) runs 70B models in 4-bit quantization, plus everything smaller. See the model list below for tested combinations.

Q: Does NVIDIA RTX A6000 (Ampere) support CUDA?

Yes — NVIDIA RTX A6000 (Ampere) is an NVIDIA card with full CUDA support, the most mature local-AI backend. llama.cpp, Ollama, vLLM, and ExLlamaV2 all run natively.

Q: How much does NVIDIA RTX A6000 (Ampere) cost?

Current street price for NVIDIA RTX A6000 (Ampere) is around $3500 (MSRP $4650). Prices vary by region and supply.

What it does well

The RTX A6000 (Ampere) is the cheapest path to 48 GB of CUDA-compatible workstation VRAM in 2026. Used pricing has settled around $3,500–$4,500 — roughly half the cost of a new RTX 6000 Ada at the same memory tier. 768 GB/s bandwidth on 48 GB GDDR6 ECC is enough for comfortable 70B Q4 inference at workstation speeds (25–35 tok/s decode), 32B FP16 with 64K context, or running multiple smaller models simultaneously. The full CUDA stack works (vLLM, SGLang, TensorRT-LLM all support sm_86 Ampere) — you're not stuck on consumer paths. NVLink pairs it to 96 GB combined for $7,000–$9,000 used, which is the cheapest CUDA path to 96 GB if you can hunt down two A6000s with NVLink bridge. 300 W TDP is workstation-friendly, no datacenter cooling required. ECC + 3-year warranty (when bought new) + Studio driver lineage make this a serious tool, not a consumer card cosplaying as one.

Where it breaks

Two architecture generations behind in 2026. Ampere (sm_86) launched in 2020. RTX 6000 Ada (sm_89) and RTX PRO 6000 Blackwell (sm_120) both have meaningfully better tensor compute, FP8 support, and architecture-specific optimizations. New CUDA features land on Hopper/Blackwell first; A6000 gets second-class support.
Bandwidth ceiling. 768 GB/s is comfortable but lower than the RTX 6000 Ada's 960 GB/s and well below the PRO 6000 Blackwell's 1.79 TB/s. For memory-bound decode on 70B Q4, expect 25–35 tok/s vs RTX 6000 Ada's 30–45 vs PRO 6000 Blackwell's 50–70.
No FP8 native. Ampere has FP16/BF16 but no native FP8. Inference frameworks that exploit FP8 (TRT-LLM, ExLlamaV2 newer modes, vLLM's FP8 paths) don't get the speedup here.
Used market only at the value price point. Buying one new at $4,650 retail is bad value — you should pay used or step up to RTX 6000 Ada / PRO 6000 Blackwell. eBay / r/hardwareswap / used Dell precision pulls are the real source.
End-of-life risk on driver support. Ampere is still supported in 2026 but it's the oldest tier NVIDIA actively prioritizes. A 5-year horizon for new feature support is doubtful.

Ideal model range

Sweet spot: 70B Q4 with 32K context, single-card workstation. 25–35 tok/s decode is comfortable for interactive work.
Sweet spot: 32B FP16 with 64K context, or 32B Q8 with 200K+ context for long-document workflows.
Sweet spot (NVLink pair): 70B FP16 across 2× A6000 NVLinked (96 GB combined) with TP = 2. ~10–15% NVLink overhead vs theoretical, but functional.
Stretch: 70B Q8 with paged offload, or 13B QLoRA fine-tuning.
Comfortable: Anything an RTX 3090 does, but at 2× memory and ECC.

Bad use cases

Buying new at retail. Don't pay $4,650 new. Either find used at $3,500–4,500, or step up to RTX 6000 Ada for the meaningful architecture upgrade.
Hobbyists fitting in 24 GB. RTX 4090 at $1,800 (or used 3090 at $700–1000) wins for everything that fits 24 GB.
Production rack inference. L40S is the datacenter-tier 48 GB card. A6000 in a rack is a workstation card pretending to be a datacenter card.
Long-horizon investment. Architecture support has a sunset. Pick newer for anything you'll keep 4+ years.
Frontier-model training. Wrong tool. Rent on cloud or use proper datacenter SKUs.

Verdict

Buy this if you find a used RTX A6000 at $3,500–$4,500, you need 48 GB CUDA VRAM on one card without paying RTX 6000 Ada / PRO 6000 Blackwell prices, you're inference-focused (not training), and a 4-year operational horizon is enough. The A6000 is the canonical "I want 48 GB CUDA on a budget" used-market pick — and at the right price, it's a great deal.

Skip this if you find one at retail $4,650 (step up to RTX 6000 Ada for the architecture upgrade), your model fits 24 GB (RTX 4090 wins), you need FP8 native (Ada or Blackwell), you're deploying production racks (L40S wins), or you want long-horizon driver support (newer tier).

How it compares

vs RTX 6000 Ada (48 GB) → 6000 Ada wins on bandwidth (1.25×), tensor compute (2.4× FP16), FP8 support, and architecture pedigree. A6000 wins on price (~$2,000–$3,000 less). Pick A6000 if you find it at <$4,500 used; pick 6000 Ada for new builds. See /compare/rtx-a6000-vs-rtx-6000-ada.
vs RTX 3090 (24 GB) → 2× the VRAM, same Ampere architecture, ~80% more bandwidth (vs 936 GB/s on 3090). Pick A6000 if you need 48 GB; pick 3090 for hobbyist 24 GB use cases — used 3090 at $700–1000 is dramatically better $/$ for everything that fits 24 GB. See /compare/rtx-a6000-vs-rtx-3090.
vs Dual RTX 3090 homelab → Dual 3090 = 48 GB combined for $1,400–$2,000 used. A6000 = 48 GB combined for $3,500–$4,500. Dual 3090 wins on $/VRAM by 2×; A6000 wins on power (300 W vs 700 W combined), single-card simplicity, and ECC. For homelabs, dual 3090 dominates. For workstations, A6000 is cleaner.
vs Mac Studio M3 Ultra (96–192 GB) → Mac Studio at 96 GB unified memory is similar tier money for 2× the memory ceiling. No CUDA. Pick A6000 for CUDA-required workflows; Mac Studio for unified-memory workflows where MLX/Metal works.
vs RTX A6000 Ada (Quadro RTX A6000 Ada) → Same name pattern, different products. The Ada Generation is the post-A6000 successor; we cover it as "RTX 6000 Ada". Don't confuse the original Ampere A6000 with the newer Ada A6000.

Frequently asked

What models can NVIDIA RTX A6000 (Ampere) run?

With 48GB VRAM, the NVIDIA RTX A6000 (Ampere) runs 70B models in 4-bit quantization, plus everything smaller. See the model list below for tested combinations.

Does NVIDIA RTX A6000 (Ampere) support CUDA?

Yes — NVIDIA RTX A6000 (Ampere) is an NVIDIA card with full CUDA support, the most mature local-AI backend. llama.cpp, Ollama, vLLM, and ExLlamaV2 all run natively.

How much does NVIDIA RTX A6000 (Ampere) cost?

Current street price for NVIDIA RTX A6000 (Ampere) is around $3500 (MSRP $4650). Prices vary by region and supply.

NVIDIA RTX A6000 (Ampere)

Our verdict

What it does well

Where it breaks

Ideal model range

Bad use cases

Verdict

How it compares

Overview

Specs

Models that fit

Hardware worth comparing

Frequently asked

What models can NVIDIA RTX A6000 (Ampere) run?

Does NVIDIA RTX A6000 (Ampere) support CUDA?

How much does NVIDIA RTX A6000 (Ampere) cost?

Where next?

VRAM	48 GB
Power draw (peak)	300 W
Released	2020
MSRP	$4650
Backends	CUDA Vulkan