Custom build engine — what local AI can your hardware run?

Describe your build

Add GPUs, set CPU/RAM/OS, optionally pick a runtime + use case. URL updates as you change fields — share a build by copying the URL.

GPUs in your build

CPU class

System RAM (GB)

OS

Use case

Your skill level

Runtime preference

Build summary

Total VRAM

32 GB

Effective VRAM

~31 GB

range 28-30 GB

Topology

single gpu

none

Setup difficulty

beginner

speed penalty ~0%

Why effective VRAM is lower than total

Single NVIDIA GeForce RTX 5090 — 32 GB VRAM minus ~1.5 GB runtime overhead = ~30 GB usable for weights + KV cache + activations. The 8% headroom we reserve covers the typical OS/driver footprint and gives KV-cache room for an 8K-32K context.

Recommended runtime

Best engine for this topology + skill level + use case.

ExLlamaV2

primary

involved

Highest single-stream throughput on consumer NVIDIA. EXL2 mixed-bit quants are the leading consumer-tier inference format.

llama.cpp

alternative

moderate

Cross-format flexibility — GGUF works everywhere; the engine that powers Ollama and LM Studio.

Models that fit your build

183 models considered. Categorized by headroom at the recommended quant + a sensible context for your use case.

Comfortable

24 models · ≥15% headroom

Model	Params	Quant	VRAM est.	Context	Note
DeepSeek MoE 16B Base	16B	Q4_K_M	20 GB	4,096	Fits cleanly at Q4_K_M + 4,096 ctx with 35% headroom.
DeepSeek R1 Distill Qwen 14B	14B	Q4_K_M	25.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
Phi-4 Multimodal	14B	Q4_K_M	25.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
Qwen 2.5 Coder 14B Instruct	14B	Q4_K_M	25.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
Phi-4 14B	14B	Q4_K_M	25.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
Phi-4 Reasoning 14B	14B	Q4_K_M	25.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
Qwen 3 14B	14B	Q4_K_M	25.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
GLM-4V 9B	14B	Q4_K_M	25.8 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
OLMo 2 13B	13B	Q4_K_M	17.4 GB	4,096	Comfortable fit with 44% headroom — room to extend context or run alongside other workloads.
Baichuan 4 13B	13B	Q4_K_M	24.7 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 20% headroom.
Stable LM 2 12B	12B	Q4_K_M	16.5 GB	4,096	Comfortable fit with 47% headroom — room to extend context or run alongside other workloads.
Llama 3.2 11B Vision	11B	Q4_K_M	22.5 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 28% headroom.
Falcon 3 10B	10B	Q4_K_M	21.3 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 31% headroom.
Gemma 2 9B Instruct	9B	Q8_0	24.5 GB	8,192	Fits cleanly at Q8_0 + 8,192 ctx with 21% headroom.
Yi Coder 9B	9B	Q4_K_M	20.2 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 35% headroom.
Nemotron 3 Nano 9B	9B	Q4_K_M	20.2 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 35% headroom.
GLM-4 9B	9B	Q4_K_M	20.2 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 35% headroom.
Tulu 3 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 39% headroom.
Molmo 7B-D	8B	Q4_K_M	13 GB	4,096	Comfortable fit with 58% headroom — room to extend context or run alongside other workloads.
Llama 3.1 Nemotron Nano 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 39% headroom.
Granite 3.0 8B Instruct	8B	Q4_K_M	13 GB	4,096	Comfortable fit with 58% headroom — room to extend context or run alongside other workloads.
DeepSeek R1 Distill Llama 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 39% headroom.
MiniCPM-V 2.6 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 39% headroom.
OpenCoder 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 39% headroom.

Borderline

12 models · tight, may need quant downgrade

Model	Params	Quant	VRAM est.	Context	Note
DeepSeek V3 Lite (16B MoE)	16B	Q4_K_M	28.1 GB	8,192	Tight fit at Q4_K_M — only 9% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
DeepSeek Coder V2 Lite (16B)	16B	Q4_K_M	28.1 GB	8,192	Tight fit at Q4_K_M — only 9% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Granite 3 MoE (3B active)	16B	Q4_K_M	28.1 GB	8,192	Tight fit at Q4_K_M — only 9% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
StarCoder 2 15B	15B	Q4_K_M	27 GB	8,192	Tight fit at Q4_K_M — only 13% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 2.5 14B Instruct	14B	Q5_K_M	27.1 GB	8,192	Tight fit at Q5_K_M — only 13% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Pixtral 12B	12B	Q8_0	29.4 GB	8,192	Tight fit at Q8_0 — only 5% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Gemma 3 12B	12B	Q8_0	29.4 GB	8,192	Tight fit at Q8_0 — only 5% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Mistral Nemo 12B Instruct	12B	Q8_0	29.4 GB	8,192	Tight fit at Q8_0 — only 5% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Llama 3.2 11B Vision Instruct	11B	Q8_0	27.8 GB	8,192	Tight fit at Q8_0 — only 10% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 3 Embedding 8B	8B	FP16	30.8 GB	8,192	Tight fit at FP16 — only 1% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Llama 3.1 8B Instruct	8B	FP16	27.9 GB	8,192	Tight fit at FP16 — only 10% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
NV-Embed v2	8B	FP16	30.4 GB	8,192	Tight fit at FP16 — only 2% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.

Not practical

16 models · oversize for this build

Model	Params	Quant	VRAM est.	Context	Note
PaliGemma 2 10B	10B	BF16	36 GB	8,192	~36.0 GB needed at BF16 + 8,192 ctx — overshoots effective VRAM by 16%. Drop quant or move to a larger build.
Codestral 22B	22B	Q4_K_M	34.9 GB	8,192	~34.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
Mistral Small 3 24B	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
DeepSeek R1 Distill Mistral 24B	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Dolphin 3.0 Mistral 24B	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Mistral Medium 3 24B (dense)	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Mistral Small 3.2 24B	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Devstral Small 2 24B	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Mistral Saba 24B	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Gemma 4 26B MoE	26B	Q4_K_M	39.5 GB	8,192	~39.5 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 27%. Drop quant or move to a larger build.
InternVL 2.5 26B	26B	Q4_K_M	39.5 GB	8,192	~39.5 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 27%. Drop quant or move to a larger build.
Gemma 3 27B	27B	Q4_K_M	40.6 GB	8,192	~40.6 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 31%. Drop quant or move to a larger build.
MedGemma 27B	27B	Q4_K_M	40.6 GB	8,192	~40.6 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 31%. Drop quant or move to a larger build.
Qwen 3 30B-A3B	30B	Q4_K_M	44 GB	8,192	~44.0 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 42%. Drop quant or move to a larger build.
Nemotron 3 Nano (30B-A3B)	30B	Q4_K_M	44 GB	8,192	~44.0 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 42%. Drop quant or move to a larger build.
Gemma 4 31B Dense	31B	Q4_K_M	45.1 GB	8,192	~45.1 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 46%. Drop quant or move to a larger build.

Shopping a full build instead of a single card?

If you're sizing a fresh AI build (not just a card to drop into an existing system), the build-budget walkthroughs cover the whole BOM honestly: AI PC build under $1,000 or AI PC build under $2,000 cover the realistic 2026 budget tiers.

Vertical-fit shopping? AI PC for students covers the budget + portability tradeoffs; AI PC for developers covers the coding workflow specifics; AI PC for small business covers the document-RAG / always-on machine.

Form-factor first? See best laptop for local AI, best Mac for local AI, best mini PC for local AI, or best used GPU for local AI.

Describe your build

Build summary

Recommended runtime

Models that fit your build

Related

Shopping a full build instead of a single card?