UNIT · NVIDIA · GPU
80 GB VRAMworkstationReviewed June 2026

NVIDIA H100 PCIe

No editorial image yet — generic vendor mark shown. Credentials in spec table below.

PCIe Hopper. Lower power, lower bandwidth than SXM. Server-tier.

Released 2022·2039 GB/s memory bandwidth
▼ CHECK CURRENT PRICE· 1 retailer
NVIDIA H100 PCIe

Affiliate disclosure: as an Amazon Associate and partner of other retailers, we earn from qualifying purchases. The verdict on this page is our editorial opinion; affiliate links never influence what we recommend.

RUNLOCALAI SCORE
See full leaderboard →
662/ 1000
BB-tier
Estimated
Throughput
500/ 500
VRAM-fit
190/ 200
Ecosystem
200/ 200
Efficiency
56/ 100

Sub-scores sum to 946 / 1000. Headline = 946 × 0.70 (Estimated-confidence discount) = 662. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →

Extrapolated from 2039 GB/s bandwidth — 244.7 tok/s estimated. No measured benchmarks yet.

Plain-English: Runs 70B comfortably — snappy enough for a coding agent; vision models supported.

7B chat
Comfortable
14B chat
Comfortable
32B chat
Comfortable
70B chat
Comfortable
Coding agent
Comfortable
Vision (≤8B VLM)
Comfortable
Long context (32K)
Comfortable
Comfortable — fits with headroom
~Tight — works, no slack
Marginal — needs aggressive quant
Doesn't fit usefully

Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
10.0/10

What it does well

The H100 PCIe is the most-deployed Hopper SKU outside hyperscaler cap-ex. 80 GB HBM3 at 2.0 TB/s in a standard PCIe Gen 5 x16 form factor means it slots into any 2026-era server with a PCIe Gen 5 chassis — no SXM5 motherboard, no DGX premium. The full Hopper feature set is here: native FP8, Transformer Engine 1, second-gen NVLink (paired up to 188 GB on the H100 NVL), MIG (multi-instance GPU partitioning), confidential computing extensions. CUDA + cuDNN + TensorRT-LLM + vLLM + SGLang all have first-class H100 support — frameworks are aggressively tuned for Hopper. 350 W TDP is roughly half the SXM5 H100's 700 W, which makes 4×–8× H100 PCIe servers practical in conventional rack cooling. Cap-ex around $25,000 retail (vs $30,000+ for SXM5) and ~$2.50–$3.50/hr cloud rental makes this the "serious GPU without DGX" tier. For most production inference workloads in 2026 — the kind that don't need B200's FP4 — H100 PCIe is a perfectly sized buy.

Where it breaks

  • Bandwidth ceiling vs H200 / B200. 2 TB/s is good but the H200 at 4.8 TB/s and B200 at 8 TB/s are now the architecture-current options. For long-context decode at scale, newer cards win cleanly.
  • No NVLink at scale on PCIe form. PCIe Gen 5 x16 is 64 GB/s effective (vs SXM5 NVLink's 900 GB/s). Multi-card tensor parallelism takes the standard PCIe-only TP penalty (10–20%) vs SXM5 H100s. Pair-NVLink via H100 NVL SKU is workable but only at the 2-card 188 GB tier.
  • Power and thermals are still real. 350 W TDP needs proper rack cooling. Not for a workstation tower without thoughtful airflow.
  • No FP4 native. Blackwell B200 added FP4 with second-gen Transformer Engine. For workloads that exploit FP4 (and many production stacks now do in 2026), B200 wins meaningfully.
  • Pricing has held vs Ada-tier consumer cards. $25,000 for H100 PCIe vs $2,500 for RTX 5090. 10× the price for ~1.7× the bandwidth and 2.5× the memory + ECC + NVLink + MIG. Worth it for production but not for hobbyists.

Ideal model range

  • Sweet spot: 70B production multi-tenant serving via vLLM continuous batching — 16–48 concurrent users at 32K context, ~30–60 tok/s decode each. The most-deployed Hopper inference workload.
  • Sweet spot: 200B-class production at FP8. The full Hopper Transformer Engine pays off here vs A100.
  • Sweet spot: Long-context production inference (32K–128K) where bandwidth dominates. H100's 2 TB/s is comfortable for this.
  • Stretch: 405B inference across 4× H100 PCIe with PCIe-only TP. Workable, slower than 4× SXM5 with full NVLink mesh.
  • Stretch: 70B FP16 fine-tuning across 2× H100 NVLinked, or 70B QLoRA fine-tuning on a single H100 PCIe.
  • Comfortable: Anything an A100 80GB does, but with FP8 throughput improvements and modern Transformer Engine optimizations.

Bad use cases

  • Single-developer hobby workloads. Rent or buy RTX 4090 / 5090. H100 PCIe is wasted on single-user.
  • Anything that fits 48 GB. L40S at $7,500 is dramatically better $/throughput. Don't overprovision.
  • Frontier training where FP4 throughput matters. B200 is the right tier.
  • Cap-ex without sustained utilization. Renting H100 PCIe at $2.50–3.50/hr breaks even with $25k cap-ex around 7,000 hours = 9 months of 24×7. Most workloads don't justify this.
  • Multi-card SXM-tier deployments. Pick H100 SXM5 if you need full NVLink mesh; don't try to outwit the SXM5 advantage with PCIe-only TP at 4×+ scale.

Verdict

Buy this if you're operating production datacenter inference for 70B–200B models with multi-tenant serving, you need ECC + 5-year warranty + datacenter pedigree, your workloads benefit from FP8 (most modern frameworks do), and you've calculated cap-ex vs rental over a 12+ month horizon. H100 PCIe is the pragmatic "serious production inference without SXM5 motherboard premium" pick for 2026 datacenters.

Skip this if you're standing up new builds and budget allows (H200 at +25% price gives 76% more memory + 140% more bandwidth — almost always the better buy in 2026), your workload fits L40S (much better $/throughput), you're frontier-training (B200 is the right tier), or you're at <50% utilization (rent on Runpod / Lambda instead).

How it compares

  • vs H100 SXM (80 GB) → Same chip, same 80 GB memory. SXM5 has full NVLink mesh (900 GB/s between cards) and 700 W TDP at higher cap-ex ($30,000+) and DGX motherboard requirement. PCIe is half the TDP, fits any PCIe Gen 5 server, ~$25,000. Pick SXM5 for 4×–8× clusters with multi-card TP; pick PCIe for single-card or pair-NVLink (via H100 NVL) deployments. See /compare/nvidia-h100-pcie-vs-nvidia-h100-sxm.
  • vs H200 (141 GB) → H200 has 76% more memory + 140% more bandwidth on the same architecture at +25% price. Pick H200 for any new build; pick H100 PCIe only when you find it discounted vs H200 or you're matching existing H100 cluster. See /compare/nvidia-h100-pcie-vs-nvidia-h200.
  • vs A100 80GB SXM → A100 is one architecture generation older, no FP8, ~67% the bandwidth, but ~$15,000 used vs $25,000 retail H100 PCIe. Pick H100 PCIe for new builds and FP8-capable workloads; pick A100 for cost-conscious rental or older-stack production where FP8 isn't exploited. See /compare/nvidia-h100-pcie-vs-nvidia-a100-80gb-sxm.
  • vs B200 (192 GB) → B200 has 2.4× memory + 4× bandwidth + native FP4 + Transformer Engine 2 at +60% price. Pick B200 for frontier training and FP4-exploiting production; pick H100 PCIe for proven Hopper-tier inference at lower cap-ex.
  • vs L40S (48 GB) → L40S at $7,500 is roughly 1/3 the price + Ada FP8 support but 60% the memory + 43% the bandwidth. For 70B Q4 / 32B FP16 production inference under 48 GB, L40S wins $/throughput. For workloads that need full 80 GB (or H100's bandwidth advantage on long-context), H100 PCIe is the right pick.
BLK · OVERVIEW

Overview

PCIe Hopper. Lower power, lower bandwidth than SXM. Server-tier.

Retailers we'd check:Amazon

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

BLK · SPECS

Specs

VRAM80 GB
Power draw (peak)350 W
Released2022
MSRP$25000
Backends
CUDA

Models that fit

Open-weight models small enough to run on NVIDIA H100 PCIe with usable context.

Compare alternatives

Hardware worth comparing

The closest alternatives by price, memory bandwidth, and form factor, plus a step up and down — so you can frame the buying decision against real options.

Editorial deep-dive comparisons

Curated head-to-heads against specific cards — the buyer-decision shape that crosses VRAM bands.

Frequently asked

What models can NVIDIA H100 PCIe run?

With 80GB VRAM, the NVIDIA H100 PCIe runs 70B models in 4-bit quantization, plus everything smaller. See the model list below for tested combinations.

Does NVIDIA H100 PCIe support CUDA?

Yes — NVIDIA H100 PCIe is an NVIDIA card with full CUDA support, the most mature local-AI backend. llama.cpp, Ollama, vLLM, and ExLlamaV2 all run natively.

Where next?

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.