NVIDIA B300 (Blackwell Ultra)
No editorial image yet — generic vendor mark shown. Credentials in spec table below.
The Blackwell Ultra datacenter refresh of the B200. 288GB HBM3e per GPU, ~8 TB/s, up to 1,400W; GB300 NVL72 racks reach 1.1 ExaFLOPS FP4. The current top-end reference for large-model serving, volume-shipping since late 2025.
Sub-scores sum to 955 / 1000. Headline = 955 × 0.70 (Estimated-confidence discount) = 669. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →
Extrapolated from 8000 GB/s bandwidth — 960.0 tok/s estimated. No measured benchmarks yet.
Plain-English: Runs 70B comfortably — snappy enough for a coding agent; vision models supported.
Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.
What it is
The B300 is NVIDIA's Blackwell Ultra — the mid-cycle datacenter upgrade over the B200, with 288GB of HBM3e per GPU (up from 192GB) and ~50% more inference throughput. In the GB300 NVL72 rack-scale form it delivers 1.1 ExaFLOPS of FP4 and roughly 1.5x the B200 system. It's been volume-shipping since September 2025 across CoreWeave, Azure, AWS, and Google.
Relevance to local AI
This is a hyperscale serving part, not something an individual buys — it belongs here as the current ceiling reference for 'what the frontier labs serve giant models on,' against which local hardware is contextualized. The 288GB-per-GPU figure is the useful anchor: it's why frontier models are trained/served on these and why local users quantize. If you're speccing on-prem inference for a well-funded org, the B300/GB300 is the top option; for everyone else it's the line on the chart showing how far datacenter VRAM has pulled ahead of consumer.
Bottom line
The top-end datacenter reference point. Not a buyable local-AI card — included for context and for the rare on-prem org speccing frontier-scale serving.
Overview
The Blackwell Ultra datacenter refresh of the B200. 288GB HBM3e per GPU, ~8 TB/s, up to 1,400W; GB300 NVL72 racks reach 1.1 ExaFLOPS FP4. The current top-end reference for large-model serving, volume-shipping since late 2025.
Search-fallback link — editorial hasn't yet curated a retailer URL for this card.
Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.
Specs
| VRAM | 288 GB |
| Power draw (peak) | 1400 W |
| Released | 2025 |
| Backends | CUDA |
Models that fit
Open-weight models small enough to run on NVIDIA B300 (Blackwell Ultra) with usable context.
Frequently asked
What models can NVIDIA B300 (Blackwell Ultra) run?
Does NVIDIA B300 (Blackwell Ultra) support CUDA?
Where next?
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.