NVIDIA B300 (Blackwell Ultra) for local AI

NVIDIA B300 (Blackwell Ultra)

NVDA · HARDWARE

NVIDIA B300 (Blackwell Ultra)

No editorial image yet — generic vendor mark shown. Credentials in spec table below.

The Blackwell Ultra datacenter refresh of the B200. 288GB HBM3e per GPU, ~8 TB/s, up to 1,400W; GB300 NVL72 racks reach 1.1 ExaFLOPS FP4. The current top-end reference for large-model serving, volume-shipping since late 2025.

Released 2025·8000 GB/s memory bandwidth

What it is

The B300 is NVIDIA's Blackwell Ultra — the mid-cycle datacenter upgrade over the B200, with 288GB of HBM3e per GPU (up from 192GB) and ~50% more inference throughput. In the GB300 NVL72 rack-scale form it delivers 1.1 ExaFLOPS of FP4 and roughly 1.5x the B200 system. It's been volume-shipping since September 2025 across CoreWeave, Azure, AWS, and Google.

Relevance to local AI

This is a hyperscale serving part, not something an individual buys — it belongs here as the current ceiling reference for 'what the frontier labs serve giant models on,' against which local hardware is contextualized. The 288GB-per-GPU figure is the useful anchor: it's why frontier models are trained/served on these and why local users quantize. If you're speccing on-prem inference for a well-funded org, the B300/GB300 is the top option; for everyone else it's the line on the chart showing how far datacenter VRAM has pulled ahead of consumer.

Bottom line

The top-end datacenter reference point. Not a buyable local-AI card — included for context and for the rare on-prem org speccing frontier-scale serving.

Frequently asked

What models can NVIDIA B300 (Blackwell Ultra) run?

With 288GB VRAM, the NVIDIA B300 (Blackwell Ultra) runs 70B models in 4-bit quantization, plus everything smaller. See the model list below for tested combinations.

Does NVIDIA B300 (Blackwell Ultra) support CUDA?

Yes — NVIDIA B300 (Blackwell Ultra) is an NVIDIA card with full CUDA support, the most mature local-AI backend. llama.cpp, Ollama, vLLM, and ExLlamaV2 all run natively.

NVIDIA B300 (Blackwell Ultra)

NVDA · HARDWARE

NVIDIA B300 (Blackwell Ultra)

No editorial image yet — generic vendor mark shown. Credentials in spec table below.

Released 2025·8000 GB/s memory bandwidth

What it is

Relevance to local AI

Frequently asked

What models can NVIDIA B300 (Blackwell Ultra) run?

With 288GB VRAM, the NVIDIA B300 (Blackwell Ultra) runs 70B models in 4-bit quantization, plus everything smaller. See the model list below for tested combinations.

Does NVIDIA B300 (Blackwell Ultra) support CUDA?

Yes — NVIDIA B300 (Blackwell Ultra) is an NVIDIA card with full CUDA support, the most mature local-AI backend. llama.cpp, Ollama, vLLM, and ExLlamaV2 all run natively.

NVIDIA B300 (Blackwell Ultra)

Our verdict

What it is

Relevance to local AI

Bottom line

Overview

Specs

Models that fit

Frequently asked

What models can NVIDIA B300 (Blackwell Ultra) run?

Does NVIDIA B300 (Blackwell Ultra) support CUDA?

Where next?

NVIDIA B300 (Blackwell Ultra)

Our verdict

What it is

Relevance to local AI

Bottom line

Overview

Specs

Models that fit

Frequently asked

What models can NVIDIA B300 (Blackwell Ultra) run?

Does NVIDIA B300 (Blackwell Ultra) support CUDA?

Where next?

Hardware worth comparing

VRAM	288 GB
Power draw (peak)	1400 W
Released	2025
Backends	CUDA

VRAM	288 GB
Power draw (peak)	1400 W
Released	2025
Backends	CUDA