nvidia

GPU

40GB VRAM

workstation

NVIDIA A100 40GB

Original A100. 40GB HBM2 at 1.55 TB/s. Trained the early generation of frontier models.

Released 2020

Overview

Original A100. 40GB HBM2 at 1.55 TB/s. Trained the early generation of frontier models.

Specs

VRAM	40 GB
Power draw	400 W
Released	2020
MSRP	$11000
Backends	CUDA

Models that fit

Open-weight models small enough to run on NVIDIA A100 40GB with usable context.

Llama 3.1 8B Instruct

Llama 3.2 3B Instruct

Mistral Small 3 24B

Qwen 2.5 7B Instruct

DeepSeek R1 Distill Qwen 7B

Frequently asked

What models can NVIDIA A100 40GB run?

With 40GB VRAM, the NVIDIA A100 40GB runs 70B models in 4-bit quantization, plus everything smaller. See the model list below for tested combinations.

Does NVIDIA A100 40GB support CUDA?

Yes — NVIDIA A100 40GB is an NVIDIA card with full CUDA support, the most mature local-AI backend. llama.cpp, Ollama, vLLM, and ExLlamaV2 all run natively.

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.