NVIDIA A100 80GB SXM
Ampere datacenter flagship. 80GB HBM2e at 2 TB/s. Still common at cloud providers.
Affiliate disclosure: as an Amazon Associate and partner of other retailers, we earn from qualifying purchases. The verdict on this page is our editorial opinion; affiliate links never influence what we recommend.
Sub-scores sum to 939 / 1000. Headline = 939 × 0.70 (Estimated-confidence discount) = 657. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →
Extrapolated from 2039 GB/s bandwidth — 244.7 tok/s estimated. No measured benchmarks yet.
Plain-English: Runs 70B comfortably — snappy enough for a coding agent; vision models supported.
Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.
What it does well
The A100 80GB SXM is the GPU that defined the modern LLM era — every model from GPT-3.5 through Llama 2 was trained or first deployed on this hardware. In 2026 it's a legacy SKU but still ubiquitous on cloud providers and still legitimately good for many production inference workloads. 80 GB HBM2e at 2.0 TB/s sits very close to H100 PCIe's bandwidth, which means inference performance on memory-bound workloads (the dominant case) is much closer to H100 than the 4-year-architecture-gap suggests. The full CUDA stack works — vLLM, SGLang, TRT-LLM all support sm_80 and many providers' default deployment images target A100 first. NVLink 600 GB/s between SXM cards enables genuine multi-card tensor parallelism at scale; an 8× A100 80GB DGX node with NVLink full-mesh remains a serious 70B–200B production setup. Cloud rental at ~$1.50–$2.50/hr SXM is roughly half the H100 SXM price, and the gap on inference $/throughput often makes A100 the right pick for budget-conscious production. Used-market A100 80GB SXM has settled around $14,000–$17,000 — still a serious cap-ex, but the lowest path to 80 GB HBM datacenter memory.
Where it breaks
- No FP8 native. Ampere is BF16/FP16/INT8 only — no FP8 native, no Transformer Engine. Modern inference frameworks that exploit FP8 (TRT-LLM, vLLM FP8 paths) lose substantial throughput here vs H100 / H200 / B200. Quantization at FP8 is software-emulated, not hardware.
- Bandwidth is good, not best. 2 TB/s vs H100's 3.35 TB/s vs H200's 4.8 TB/s vs B200's 8 TB/s. For long-context decode where bandwidth dominates, newer cards win cleanly.
- Architecture EOL is approaching. NVIDIA still supports sm_80 in CUDA 12.x but feature parity with newer architectures is fading. New optimizations (FP4, Blackwell-specific Transformer Engine 2, etc.) skip A100.
- Cap-ex is hard to justify in 2026. $14,000–$17,000 for used A100 80GB SXM with no warranty + 4-year-old architecture + no FP8 vs $25,000 for H100 PCIe or H200 PCIe NVL with full warranty + Hopper architecture + FP8. Buying A100 retail in 2026 is rarely the right call; renting is the dominant pattern.
- Power and cooling are datacenter-grade. 400 W TDP SXM, requires a SXM4 motherboard or DGX-class server. Not for any office workstation deployment.
Ideal model range
- Sweet spot: 70B Q4–Q5 production inference. A100 still serves this beautifully — 80 GB fits 70B Q5 with 32K context comfortably; 2 TB/s bandwidth keeps it fed.
- Sweet spot: 405B FP16 across 8× A100 NVLinked DGX node. The most-deployed 405B inference setup as of late 2025 / early 2026.
- Sweet spot: 32B–70B production multi-tenant serving via vLLM continuous batching with 16–32 concurrent users.
- Sweet spot: BF16 fine-tuning at 7B–70B QLoRA, 7B FP16 full fine-tuning. The proven training tier.
- Comfortable: Embedding models, classifiers, smaller LMs at very high concurrency.
- Stretch: 671B at Q3 across 8× A100 (640 GB combined). Workable, slower than H100 cluster.
Bad use cases
- Buying retail in 2026. Pick H200 for new datacenter cap-ex; rent A100 if your workload is intermittent.
- Workloads that need FP8 throughput. Pick H100 or newer.
- Anything that fits 48 GB. L40S at 1/4 the cap-ex (~$7,500) wins for 48 GB tier production serving.
- Frontier training. B200 is the right tier; H200 is the value-conscious pick.
- Single-user / hobbyist workloads. Rent for a few hours at $1.50–$2.50; don't buy.
Verdict
Use this (rental) if you're running production inference for 70B–200B models at moderate concurrency, your serving stack already targets sm_80 (most do), $/throughput at $1.50–$2.50/hr beats your H100/H200 rental rate, and you don't need FP8. A100 is still the silent workhorse of cloud LLM inference in 2026 — most providers default to A100 unless you specifically request newer.
Buy this (used) if you're building a 8× A100 DGX-class node for $80k–$120k all-in (vs $200k+ for an 8× H100 SXM node), you have steady-state utilization >70%, and a 3–4 year operational horizon. Hard to justify for smaller deployments.
Skip this if you're standing up new datacenter cap-ex (H200 is the right tier), you need FP8 throughput (Hopper or Blackwell), your workload fits L40S (better $/throughput at the 48 GB tier), you're a hobbyist (rent or buy consumer), or you're frontier-training (B200 cluster).
How it compares
- vs H100 SXM (80 GB) → H100 SXM has ~67% more bandwidth (3.35 TB/s vs 2 TB/s), FP8 native, Transformer Engine 1, and ~2× FP16 tensor compute. Both are 80 GB. Pick H100 SXM for new builds; pick A100 for cost-conscious rental or value used cap-ex. See /compare/nvidia-a100-80gb-sxm-vs-nvidia-h100-sxm.
- vs H200 (141 GB) → H200 has 76% more memory + 140% more bandwidth + FP8 + better architecture at higher rental ($3–$4.50/hr) and cap-ex ($31,000 retail). Pick H200 for new builds and frontier inference; A100 for cost-conscious 70B-class rental.
- vs A100 40GB → Same architecture, same bandwidth band, half the memory, ~$11,000 used vs ~$15,000 used. Pick 80GB SXM for serious production use; 40GB is a cost-floor pick that gets memory-constrained on 70B-class.
- vs L40S (48 GB) → L40S at $7,500 is roughly 1/2 the price + Ada-generation features but with 60% the memory ceiling and 43% the bandwidth. For 70B Q4 / 32B FP16 inference under 48 GB, L40S wins $/throughput. For >48 GB workloads, A100 80GB is the floor.
- vs renting on Runpod / Lambda / Together → A100 80GB SXM rents at ~$1.50–$2.50/hr on most providers — the most-available serious-LLM rental tier. For workloads under 50% utilization or short-horizon experiments, rent A100 first; only buy after sustained high utilization makes cap-ex pencil out.
Overview
Ampere datacenter flagship. 80GB HBM2e at 2 TB/s. Still common at cloud providers.
Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.
Specs
| VRAM | 80 GB |
| Power draw (peak) | 400 W |
| Released | 2020 |
| MSRP | $17000 |
| Backends | CUDA |
Models that fit
Open-weight models small enough to run on NVIDIA A100 80GB SXM with usable context.
Frequently asked
What models can NVIDIA A100 80GB SXM run?
Does NVIDIA A100 80GB SXM support CUDA?
Where next?
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.