Custom build engine — what local AI can your hardware run?

Describe your build

Add GPUs, set CPU/RAM/OS, optionally pick a runtime + use case. URL updates as you change fields — share a build by copying the URL.

GPUs in your build

CPU class

System RAM (GB)

OS

Use case

Your skill level

Runtime preference

Build summary

Total VRAM

24 GB

Effective VRAM

~23 GB

range 20-22 GB

Topology

single gpu

none

Setup difficulty

beginner

speed penalty ~0%

Why effective VRAM is lower than total

Single NVIDIA GeForce RTX 3090 — 24 GB VRAM minus ~1.5 GB runtime overhead = ~22 GB usable for weights + KV cache + activations. The 8% headroom we reserve covers the typical OS/driver footprint and gives KV-cache room for an 8K-32K context.

Recommended runtime

Best engine for this topology + skill level + use case.

ExLlamaV2

primary

involved

Highest single-stream throughput on consumer NVIDIA. EXL2 mixed-bit quants are the leading consumer-tier inference format.

llama.cpp

alternative

moderate

Cross-format flexibility — GGUF works everywhere; the engine that powers Ollama and LM Studio.

Models that fit your build

183 models considered. Categorized by headroom at the recommended quant + a sensible context for your use case.

Comfortable

24 models · ≥15% headroom

Model	Params	Quant	VRAM est.	Context	Note
OLMo 2 13B	13B	Q4_K_M	17.4 GB	4,096	Fits cleanly at Q4_K_M + 4,096 ctx with 24% headroom.
Stable LM 2 12B	12B	Q4_K_M	16.5 GB	4,096	Fits cleanly at Q4_K_M + 4,096 ctx with 28% headroom.
Tulu 3 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
Molmo 7B-D	8B	Q4_K_M	13 GB	4,096	Comfortable fit with 44% headroom — room to extend context or run alongside other workloads.
Llama 3.1 Nemotron Nano 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
Granite 3.0 8B Instruct	8B	Q4_K_M	13 GB	4,096	Comfortable fit with 44% headroom — room to extend context or run alongside other workloads.
DeepSeek R1 Distill Llama 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
MiniCPM-V 2.6 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
OpenCoder 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
Granite 3.2 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
InternLM 3 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
Granite 3.3 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
Ministral 8B Instruct	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
MiniCPM-V 3 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
Aya 23 8B	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
Llama 3.3 8B Instruct	8B	Q4_K_M	19.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 17% headroom.
EXAONE 3.5 8B	8B	Q4_K_M	18.8 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
CodeGemma 7B	7B	Q4_K_M	17.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
Falcon Mamba 7B	7B	Q4_K_M	17.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
Mistral 7B Instruct v0.3	7B	Q5_K_M	18.5 GB	8,192	Fits cleanly at Q5_K_M + 8,192 ctx with 19% headroom.
Qwen 2.5 7B Instruct	7B	Q8_0	18.3 GB	8,192	Fits cleanly at Q8_0 + 8,192 ctx with 21% headroom.
Qwen 2.5-VL 7B	7B	Q4_K_M	17.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
Codestral Mamba 7B	7B	Q4_K_M	17.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
Qwen 3 7B	7B	Q4_K_M	17.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.

Borderline

13 models · tight, may need quant downgrade

Model	Params	Quant	VRAM est.	Context	Note
DeepSeek MoE 16B Base	16B	Q4_K_M	20 GB	4,096	Tight fit at Q4_K_M — only 13% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Llama 3.2 11B Vision Instruct	11B	Q4_K_M	22.5 GB	8,192	Tight fit at Q4_K_M — only 2% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Llama 3.2 11B Vision	11B	Q4_K_M	22.5 GB	8,192	Tight fit at Q4_K_M — only 2% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Falcon 3 10B	10B	Q4_K_M	21.3 GB	8,192	Tight fit at Q4_K_M — only 7% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Gemma 2 9B Instruct	9B	Q4_K_M	20.2 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Yi Coder 9B	9B	Q4_K_M	20.2 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Nemotron 3 Nano 9B	9B	Q4_K_M	20.2 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
GLM-4 9B	9B	Q4_K_M	20.2 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Hermes 3 Llama 3.1 8B	8B	Q8_0	22.9 GB	8,192	Tight fit at Q8_0 — only 0% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 3 8B	8B	Q8_0	22.9 GB	8,192	Tight fit at Q8_0 — only 0% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Llama 3.1 8B Instruct	8B	Q8_0	20 GB	8,192	Tight fit at Q8_0 — only 13% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
DeepSeek R1 Distill Qwen 7B	7B	Q8_0	21.3 GB	8,192	Tight fit at Q8_0 — only 7% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 2.5 Coder 7B Instruct	7B	Q6_K	19.6 GB	8,192	Tight fit at Q6_K — only 15% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.

Not practical

16 models · oversize for this build

Model	Params	Quant	VRAM est.	Context	Note
NV-Embed v2	8B	FP16	30.4 GB	8,192	~30.4 GB needed at FP16 + 8,192 ctx — overshoots effective VRAM by 32%. Drop quant or move to a larger build.
Qwen 3 Embedding 8B	8B	FP16	30.8 GB	8,192	~30.8 GB needed at FP16 + 8,192 ctx — overshoots effective VRAM by 34%. Drop quant or move to a larger build.
PaliGemma 2 10B	10B	BF16	36 GB	8,192	~36.0 GB needed at BF16 + 8,192 ctx — overshoots effective VRAM by 56%. Drop quant or move to a larger build.
Pixtral 12B	12B	Q4_K_M	23.6 GB	8,192	~23.6 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 3%. Drop quant or move to a larger build.
Gemma 3 12B	12B	Q4_K_M	23.6 GB	8,192	~23.6 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 3%. Drop quant or move to a larger build.
Mistral Nemo 12B Instruct	12B	Q4_K_M	23.6 GB	8,192	~23.6 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 3%. Drop quant or move to a larger build.
Baichuan 4 13B	13B	Q4_K_M	24.7 GB	8,192	~24.7 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 8%. Drop quant or move to a larger build.
GLM-4V 9B	14B	Q4_K_M	25.8 GB	8,192	~25.8 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 12%. Drop quant or move to a larger build.
DeepSeek R1 Distill Qwen 14B	14B	Q4_K_M	25.9 GB	8,192	~25.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 12%. Drop quant or move to a larger build.
Qwen 2.5 14B Instruct	14B	Q4_K_M	25.9 GB	8,192	~25.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 12%. Drop quant or move to a larger build.
Phi-4 Multimodal	14B	Q4_K_M	25.9 GB	8,192	~25.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 12%. Drop quant or move to a larger build.
Qwen 2.5 Coder 14B Instruct	14B	Q4_K_M	25.9 GB	8,192	~25.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 12%. Drop quant or move to a larger build.
Phi-4 14B	14B	Q4_K_M	25.9 GB	8,192	~25.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 12%. Drop quant or move to a larger build.
Phi-4 Reasoning 14B	14B	Q4_K_M	25.9 GB	8,192	~25.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 12%. Drop quant or move to a larger build.
Qwen 3 14B	14B	Q4_K_M	25.9 GB	8,192	~25.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 12%. Drop quant or move to a larger build.
StarCoder 2 15B	15B	Q4_K_M	27 GB	8,192	~27.0 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 17%. Drop quant or move to a larger build.

Shopping a full build instead of a single card?

If you're sizing a fresh AI build (not just a card to drop into an existing system), the build-budget walkthroughs cover the whole BOM honestly: AI PC build under $1,000 or AI PC build under $2,000 cover the realistic 2026 budget tiers.

Vertical-fit shopping? AI PC for students covers the budget + portability tradeoffs; AI PC for developers covers the coding workflow specifics; AI PC for small business covers the document-RAG / always-on machine.

Form-factor first? See best laptop for local AI, best Mac for local AI, best mini PC for local AI, or best used GPU for local AI.

Describe your build

Build summary

Recommended runtime

Models that fit your build

Related

Shopping a full build instead of a single card?