NVIDIA RTX A6000 (Ampere)
No editorial image yet — generic vendor mark shown. Credentials in spec table below.
Ampere-gen workstation card with 48GB. Common in AI labs; used market is reasonable for 48GB at this point.
Affiliate disclosure: as an Amazon Associate and partner of other retailers, we earn from qualifying purchases. The verdict on this page is our editorial opinion; affiliate links never influence what we recommend.
Sub-scores sum to 682 / 1000. Headline = 682 × 0.70 (Estimated-confidence discount) = 477. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →
Extrapolated from 768 GB/s bandwidth — 92.2 tok/s estimated. No measured benchmarks yet.
Plain-English: Runs 70B with care — snappy enough for a coding agent; vision models supported.
Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.
What it does well
The RTX A6000 (Ampere) is the cheapest path to 48 GB of CUDA-compatible workstation VRAM in 2026. Used pricing has settled around $3,500–$4,500 — roughly half the cost of a new RTX 6000 Ada at the same memory tier. 768 GB/s bandwidth on 48 GB GDDR6 ECC is enough for comfortable 70B Q4 inference at workstation speeds (25–35 tok/s decode), 32B FP16 with 64K context, or running multiple smaller models simultaneously. The full CUDA stack works (vLLM, SGLang, TensorRT-LLM all support sm_86 Ampere) — you're not stuck on consumer paths. NVLink pairs it to 96 GB combined for $7,000–$9,000 used, which is the cheapest CUDA path to 96 GB if you can hunt down two A6000s with NVLink bridge. 300 W TDP is workstation-friendly, no datacenter cooling required. ECC + 3-year warranty (when bought new) + Studio driver lineage make this a serious tool, not a consumer card cosplaying as one.
Where it breaks
- Two architecture generations behind in 2026. Ampere (sm_86) launched in 2020. RTX 6000 Ada (sm_89) and RTX PRO 6000 Blackwell (sm_120) both have meaningfully better tensor compute, FP8 support, and architecture-specific optimizations. New CUDA features land on Hopper/Blackwell first; A6000 gets second-class support.
- Bandwidth ceiling. 768 GB/s is comfortable but lower than the RTX 6000 Ada's 960 GB/s and well below the PRO 6000 Blackwell's 1.79 TB/s. For memory-bound decode on 70B Q4, expect 25–35 tok/s vs RTX 6000 Ada's 30–45 vs PRO 6000 Blackwell's 50–70.
- No FP8 native. Ampere has FP16/BF16 but no native FP8. Inference frameworks that exploit FP8 (TRT-LLM, ExLlamaV2 newer modes, vLLM's FP8 paths) don't get the speedup here.
- Used market only at the value price point. Buying one new at $4,650 retail is bad value — you should pay used or step up to RTX 6000 Ada / PRO 6000 Blackwell. eBay / r/hardwareswap / used Dell precision pulls are the real source.
- End-of-life risk on driver support. Ampere is still supported in 2026 but it's the oldest tier NVIDIA actively prioritizes. A 5-year horizon for new feature support is doubtful.
Ideal model range
- Sweet spot: 70B Q4 with 32K context, single-card workstation. 25–35 tok/s decode is comfortable for interactive work.
- Sweet spot: 32B FP16 with 64K context, or 32B Q8 with 200K+ context for long-document workflows.
- Sweet spot (NVLink pair): 70B FP16 across 2× A6000 NVLinked (96 GB combined) with TP = 2. ~10–15% NVLink overhead vs theoretical, but functional.
- Stretch: 70B Q8 with paged offload, or 13B QLoRA fine-tuning.
- Comfortable: Anything an RTX 3090 does, but at 2× memory and ECC.
Bad use cases
- Buying new at retail. Don't pay $4,650 new. Either find used at $3,500–4,500, or step up to RTX 6000 Ada for the meaningful architecture upgrade.
- Hobbyists fitting in 24 GB. RTX 4090 at $1,800 (or used 3090 at $700–1000) wins for everything that fits 24 GB.
- Production rack inference. L40S is the datacenter-tier 48 GB card. A6000 in a rack is a workstation card pretending to be a datacenter card.
- Long-horizon investment. Architecture support has a sunset. Pick newer for anything you'll keep 4+ years.
- Frontier-model training. Wrong tool. Rent on cloud or use proper datacenter SKUs.
Verdict
Buy this if you find a used RTX A6000 at $3,500–$4,500, you need 48 GB CUDA VRAM on one card without paying RTX 6000 Ada / PRO 6000 Blackwell prices, you're inference-focused (not training), and a 4-year operational horizon is enough. The A6000 is the canonical "I want 48 GB CUDA on a budget" used-market pick — and at the right price, it's a great deal.
Skip this if you find one at retail $4,650 (step up to RTX 6000 Ada for the architecture upgrade), your model fits 24 GB (RTX 4090 wins), you need FP8 native (Ada or Blackwell), you're deploying production racks (L40S wins), or you want long-horizon driver support (newer tier).
How it compares
- vs RTX 6000 Ada (48 GB) → 6000 Ada wins on bandwidth (1.25×), tensor compute (2.4× FP16), FP8 support, and architecture pedigree. A6000 wins on price (~$2,000–$3,000 less). Pick A6000 if you find it at <$4,500 used; pick 6000 Ada for new builds. See /compare/rtx-a6000-vs-rtx-6000-ada.
- vs RTX 3090 (24 GB) → 2× the VRAM, same Ampere architecture, ~80% more bandwidth (vs 936 GB/s on 3090). Pick A6000 if you need 48 GB; pick 3090 for hobbyist 24 GB use cases — used 3090 at $700–1000 is dramatically better $/$ for everything that fits 24 GB. See /compare/rtx-a6000-vs-rtx-3090.
- vs Dual RTX 3090 homelab → Dual 3090 = 48 GB combined for $1,400–$2,000 used. A6000 = 48 GB combined for $3,500–$4,500. Dual 3090 wins on $/VRAM by 2×; A6000 wins on power (300 W vs 700 W combined), single-card simplicity, and ECC. For homelabs, dual 3090 dominates. For workstations, A6000 is cleaner.
- vs Mac Studio M3 Ultra (96–192 GB) → Mac Studio at 96 GB unified memory is similar tier money for 2× the memory ceiling. No CUDA. Pick A6000 for CUDA-required workflows; Mac Studio for unified-memory workflows where MLX/Metal works.
- vs RTX A6000 Ada (Quadro RTX A6000 Ada) → Same name pattern, different products. The Ada Generation is the post-A6000 successor; we cover it as "RTX 6000 Ada". Don't confuse the original Ampere A6000 with the newer Ada A6000.
Overview
Ampere-gen workstation card with 48GB. Common in AI labs; used market is reasonable for 48GB at this point.
Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.
Specs
| VRAM | 48 GB |
| Power draw (peak) | 300 W |
| Released | 2020 |
| MSRP | $4650 |
| Backends | CUDA Vulkan |
Models that fit
Open-weight models small enough to run on NVIDIA RTX A6000 (Ampere) with usable context.
Hardware worth comparing
The closest alternatives by price, memory bandwidth, and form factor, plus a step up and down — so you can frame the buying decision against real options.
Frequently asked
What models can NVIDIA RTX A6000 (Ampere) run?
Does NVIDIA RTX A6000 (Ampere) support CUDA?
How much does NVIDIA RTX A6000 (Ampere) cost?
Where next?
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.