Custom build engine — what local AI can your hardware run?

Describe your build

Add GPUs, set CPU/RAM/OS, optionally pick a runtime + use case. URL updates as you change fields — share a build by copying the URL.

GPUs in your build

CPU class

System RAM (GB)

OS

Use case

Your skill level

Runtime preference

Build summary

Total VRAM

320 GB

Effective VRAM

~266 GB

range 251-273 GB

Topology

single node multi gpu

pcie

Setup difficulty

advanced

speed penalty ~18%

Why effective VRAM is lower than total

4× NVIDIA H100 SXM = 320 GB total VRAM, but without NVLink, cross-card bandwidth is PCIe-bound (~32 GB/s vs NVLink ~112 GB/s). With tensor-parallelism, each card holds ~1/4 of the model weights and replicates activations + KV cache. After 15% TP overhead, effective model capacity is ~266 GB. Largest single tensor on one card is ~78 GB.

Recommended runtime

Best engine for this topology + skill level + use case.

vLLM

primary

involved

Tensor-parallel across NVLink/PCIe — works on every recent consumer + datacenter pair. AWQ-INT4 + 70B fits dual 3090 / dual 4090 cleanly.

ExLlamaV2

alternative

involved

Single-stream king. EXL2 4.0bpw + 70B fits dual 3090 with NVLink and beats vLLM on solo-user throughput.

llama.cpp

alternative

moderate

Layer-split via --tensor-split is the experimentation-friendly path. Worse throughput than vLLM but easier to debug.

Models that fit your build

183 models considered. Categorized by headroom at the recommended quant + a sensible context for your use case.

Comfortable

24 models · ≥15% headroom

Model	Params	Quant	VRAM est.	Context	Note
Mixtral 8x22B Instruct	141B	Q4_K_M	169.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 36% headroom.
WizardLM-2 8x22B	141B	Q4_K_M	169.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 36% headroom.
DBRX Base	132B	Q4_K_M	159.7 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 40% headroom.
DBRX Instruct	132B	AWQ-INT4	214.6 GB	8,192	Fits cleanly at AWQ-INT4 + 8,192 ctx with 19% headroom.
Mistral Large 2 (123B)	123B	Q4_K_M	149.5 GB	8,192	Comfortable fit with 44% headroom — room to extend context or run alongside other workloads.
Nemotron 3 Super (120B-A12B)	120B	Q4_K_M	146.1 GB	8,192	Comfortable fit with 45% headroom — room to extend context or run alongside other workloads.
Llama 4 Scout	109B	Q5_K_M	143.2 GB	8,192	Comfortable fit with 46% headroom — room to extend context or run alongside other workloads.
Command R+ (Aug 2024)	104B	AWQ-INT4	171.2 GB	8,192	Fits cleanly at AWQ-INT4 + 8,192 ctx with 36% headroom.
Command R+ 104B	104B	Q4_K_M	127.9 GB	8,192	Comfortable fit with 52% headroom — room to extend context or run alongside other workloads.
Llama 3.2 90B Vision Instruct	90B	Q4_K_M	112 GB	8,192	Comfortable fit with 58% headroom — room to extend context or run alongside other workloads.
Llama 3.2 90B Vision	90B	AWQ-INT4	149.5 GB	8,192	Comfortable fit with 44% headroom — room to extend context or run alongside other workloads.
InternVL 2.5 78B	78B	Q4_K_M	98.4 GB	8,192	Comfortable fit with 63% headroom — room to extend context or run alongside other workloads.
Qwen 2.5 Math 72B	72B	Q4_K_M	69.5 GB	4,096	Comfortable fit with 74% headroom — room to extend context or run alongside other workloads.
Qwen 3 72B	72B	AWQ-INT4	121.6 GB	8,192	Comfortable fit with 54% headroom — room to extend context or run alongside other workloads.
Qwen 2.5 72B Instruct	72B	Q5_K_M	98 GB	8,192	Comfortable fit with 63% headroom — room to extend context or run alongside other workloads.
Molmo 72B	72B	Q4_K_M	69.5 GB	4,096	Comfortable fit with 74% headroom — room to extend context or run alongside other workloads.
Qwen 2.5-VL 72B	72B	AWQ-INT4	121.6 GB	8,192	Comfortable fit with 54% headroom — room to extend context or run alongside other workloads.
Hermes 3 Llama 3.1 70B	70B	Q4_K_M	89.4 GB	8,192	Comfortable fit with 66% headroom — room to extend context or run alongside other workloads.
Hermes 4 Llama 3.3 70B	70B	AWQ-INT4	118.5 GB	8,192	Comfortable fit with 55% headroom — room to extend context or run alongside other workloads.
Llama 3.3 70B Instruct	70B	Q8_0	90.8 GB	8,192	Comfortable fit with 66% headroom — room to extend context or run alongside other workloads.
DeepSeek R1 Distill Llama 70B	70B	Q5_K_M	95.5 GB	8,192	Comfortable fit with 64% headroom — room to extend context or run alongside other workloads.
Dolphin 3 Llama 3.3 70B	70B	AWQ-INT4	118.5 GB	8,192	Comfortable fit with 55% headroom — room to extend context or run alongside other workloads.
EVA Llama 3.3 70B	70B	AWQ-INT4	118.5 GB	8,192	Comfortable fit with 55% headroom — room to extend context or run alongside other workloads.
Llama 4 70B	70B	AWQ-INT4	118.5 GB	8,192	Comfortable fit with 55% headroom — room to extend context or run alongside other workloads.

Borderline

2 models · tight, may need quant downgrade

Model	Params	Quant	VRAM est.	Context	Note
GLM-5	200B	Q4_K_M	236.8 GB	8,192	Tight fit at Q4_K_M — only 11% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
GLM-5 Pro	144B	AWQ-INT4	233.2 GB	8,192	Tight fit at AWQ-INT4 — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.

Not practical

16 models · oversize for this build

Model	Params	Quant	VRAM est.	Context	Note
Kimi K1.5	200B	AWQ-INT4	320 GB	8,192	~320.0 GB needed at AWQ-INT4 + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Qwen 3 235B-A22B	235B	Q4_K_M	276.5 GB	8,192	~276.5 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 4%. Drop quant or move to a larger build.
DeepSeek Coder V2 236B	236B	Q4_K_M	277.6 GB	8,192	~277.6 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 4%. Drop quant or move to a larger build.
DeepSeek V2.5 236B	236B	Q4_K_M	277.6 GB	8,192	~277.6 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 4%. Drop quant or move to a larger build.
Llama 3.1 Nemotron Ultra 253B	253B	Q4_K_M	296.9 GB	8,192	~296.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 12%. Drop quant or move to a larger build.
DeepSeek V4 Flash (284B MoE)	284B	Q4_K_M	332 GB	8,192	~332.0 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 25%. Drop quant or move to a larger build.
Hunyuan Large 389B MoE	389B	Q4_K_M	451.1 GB	8,192	~451.1 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 70%. Drop quant or move to a larger build.
Qwen 3.5 235B-A17B (MoE)	397B	Q4_K_M	460.2 GB	8,192	~460.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 73%. Drop quant or move to a larger build.
Jamba 1.5 Large	398B	Q4_K_M	461.3 GB	8,192	~461.3 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 73%. Drop quant or move to a larger build.
Llama 4 Maverick	400B	Q4_K_M	463.6 GB	8,192	~463.6 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 74%. Drop quant or move to a larger build.
Llama 4 405B	405B	AWQ-INT4	637.7 GB	8,192	~637.7 GB needed at AWQ-INT4 + 8,192 ctx — overshoots effective VRAM by 140%. Drop quant or move to a larger build.
DeepSeek V3 (671B MoE)	671B	Q4_K_M	770.9 GB	8,192	~770.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 190%. Drop quant or move to a larger build.
DeepSeek R1 (671B reasoning)	671B	Q4_K_M	770.9 GB	8,192	~770.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 190%. Drop quant or move to a larger build.
Mistral Medium 3.5 (675B MoE)	675B	Q4_K_M	775.4 GB	8,192	~775.4 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 192%. Drop quant or move to a larger build.
DeepSeek V4	745B	AWQ-INT4	1164.7 GB	8,192	~1164.7 GB needed at AWQ-INT4 + 8,192 ctx — overshoots effective VRAM by 338%. Drop quant or move to a larger build.
Kimi K2.6	1000B	Q4_K_M	1143.9 GB	8,192	~1143.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 330%. Drop quant or move to a larger build.

Shopping a full build instead of a single card?

If you're sizing a fresh AI build (not just a card to drop into an existing system), the build-budget walkthroughs cover the whole BOM honestly: AI PC build under $1,000 or AI PC build under $2,000 cover the realistic 2026 budget tiers.

Vertical-fit shopping? AI PC for students covers the budget + portability tradeoffs; AI PC for developers covers the coding workflow specifics; AI PC for small business covers the document-RAG / always-on machine.

Form-factor first? See best laptop for local AI, best Mac for local AI, best mini PC for local AI, or best used GPU for local AI.

Describe your build

Build summary

Recommended runtime

Models that fit your build

Related

Shopping a full build instead of a single card?