Custom build engine — what local AI can your hardware run?

Describe your build

Add GPUs, set CPU/RAM/OS, optionally pick a runtime + use case. URL updates as you change fields — share a build by copying the URL.

GPUs in your build

CPU class

System RAM (GB)

OS

Use case

Your skill level

Runtime preference

Interconnect (force override)

Split strategy (force override)

Multi-node deployment

Allow CPU offload as fallback

Build summary

Total VRAM

48 GB

Effective VRAM

~33 GB

range 29-36 GB

Topology

mixed gpu

pcie

Setup difficulty

advanced

speed penalty ~35%

Why effective VRAM is lower than total

Mixed-GPU (asymmetric) configuration. Tensor-parallel doesn't work cleanly because TP requires identical cards — your faster card stalls waiting on the slower one every layer. Use llama.cpp's layer-split with manual --tensor-split tuning to distribute layers by VRAM ratio. Effective capacity ~33 GB after layer-split overhead, but the slowest card (22 GB effective) bottlenecks single-tensor operations.

Recommended runtime

Best engine for this topology + skill level + use case.

llama.cpp (layer-split)

primary

involved

Mixed-GPU configurations need llama.cpp's --tensor-split flag with manual ratio tuning by VRAM. vLLM's tensor-parallel requires identical cards and won't run cleanly here.

Ollama

alternative

moderate

Inherits llama.cpp's layer-split path with friendlier UX. OLLAMA_GPU_OVERHEAD and per-card env vars do most of what manual flags do.

Models that fit your build

183 models considered. Categorized by headroom at the recommended quant + a sensible context for your use case.

Comfortable

24 models · ≥15% headroom

Model	Params	Quant	VRAM est.	Context	Note
DeepSeek MoE 16B Base	16B	Q4_K_M	20 GB	4,096	Fits cleanly at Q4_K_M + 4,096 ctx with 39% headroom.
StarCoder 2 15B	15B	Q4_K_M	27 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
DeepSeek R1 Distill Qwen 14B	14B	Q4_K_M	25.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
Phi-4 Multimodal	14B	Q4_K_M	25.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
Qwen 2.5 Coder 14B Instruct	14B	Q4_K_M	25.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
Phi-4 Reasoning 14B	14B	Q4_K_M	25.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
GLM-4V 9B	14B	Q4_K_M	25.8 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
OLMo 2 13B	13B	Q4_K_M	17.4 GB	4,096	Comfortable fit with 47% headroom — room to extend context or run alongside other workloads.
Baichuan 4 13B	13B	Q4_K_M	24.7 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 25% headroom.
Stable LM 2 12B	12B	Q4_K_M	16.5 GB	4,096	Comfortable fit with 50% headroom — room to extend context or run alongside other workloads.
Llama 3.2 11B Vision Instruct	11B	Q8_0	27.8 GB	8,192	Fits cleanly at Q8_0 + 8,192 ctx with 16% headroom.
Llama 3.2 11B Vision	11B	Q4_K_M	22.5 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 32% headroom.
Falcon 3 10B	10B	Q4_K_M	21.3 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 35% headroom.
Gemma 2 9B Instruct	9B	Q8_0	24.5 GB	8,192	Fits cleanly at Q8_0 + 8,192 ctx with 26% headroom.
Yi Coder 9B	9B	Q4_K_M	20.2 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 39% headroom.
Nemotron 3 Nano 9B	9B	Q4_K_M	20.2 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 39% headroom.
GLM-4 9B	9B	Q4_K_M	20.2 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 39% headroom.
Tulu 3 8B	8B	Q4_K_M	19.1 GB	8,192	Comfortable fit with 42% headroom — room to extend context or run alongside other workloads.
Molmo 7B-D	8B	Q4_K_M	13 GB	4,096	Comfortable fit with 61% headroom — room to extend context or run alongside other workloads.
Llama 3.1 Nemotron Nano 8B	8B	Q4_K_M	19.1 GB	8,192	Comfortable fit with 42% headroom — room to extend context or run alongside other workloads.
Granite 3.0 8B Instruct	8B	Q4_K_M	13 GB	4,096	Comfortable fit with 61% headroom — room to extend context or run alongside other workloads.
DeepSeek R1 Distill Llama 8B	8B	Q4_K_M	19.1 GB	8,192	Comfortable fit with 42% headroom — room to extend context or run alongside other workloads.
MiniCPM-V 2.6 8B	8B	Q4_K_M	19.1 GB	8,192	Comfortable fit with 42% headroom — room to extend context or run alongside other workloads.
OpenCoder 8B	8B	Q4_K_M	19.1 GB	8,192	Comfortable fit with 42% headroom — room to extend context or run alongside other workloads.

Borderline

12 models · tight, may need quant downgrade

Model	Params	Quant	VRAM est.	Context	Note
Qwen 2.5 Coder 32B Instruct	32B	Q4_K_M	32.4 GB	8,192	Tight fit at Q4_K_M — only 2% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
DeepSeek V3 Lite (16B MoE)	16B	Q4_K_M	28.1 GB	8,192	Tight fit at Q4_K_M — only 15% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
DeepSeek Coder V2 Lite (16B)	16B	Q4_K_M	28.1 GB	8,192	Tight fit at Q4_K_M — only 15% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Granite 3 MoE (3B active)	16B	Q4_K_M	28.1 GB	8,192	Tight fit at Q4_K_M — only 15% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 2.5 14B Instruct	14B	Q8_0	32.6 GB	8,192	Tight fit at Q8_0 — only 1% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Phi-4 14B	14B	Q8_0	32.6 GB	8,192	Tight fit at Q8_0 — only 1% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 3 14B	14B	Q8_0	32.6 GB	8,192	Tight fit at Q8_0 — only 1% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Pixtral 12B	12B	Q8_0	29.4 GB	8,192	Tight fit at Q8_0 — only 11% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Gemma 3 12B	12B	Q8_0	29.4 GB	8,192	Tight fit at Q8_0 — only 11% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Mistral Nemo 12B Instruct	12B	Q8_0	29.4 GB	8,192	Tight fit at Q8_0 — only 11% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 3 Embedding 8B	8B	FP16	30.8 GB	8,192	Tight fit at FP16 — only 7% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
NV-Embed v2	8B	FP16	30.4 GB	8,192	Tight fit at FP16 — only 8% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.

Not practical

16 models · oversize for this build

Model	Params	Quant	VRAM est.	Context	Note
PaliGemma 2 10B	10B	BF16	36 GB	8,192	~36.0 GB needed at BF16 + 8,192 ctx — overshoots effective VRAM by 9%. Drop quant or move to a larger build.
Codestral 22B	22B	Q4_K_M	34.9 GB	8,192	~34.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 6%. Drop quant or move to a larger build.
Mistral Small 3 24B	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
DeepSeek R1 Distill Mistral 24B	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
Dolphin 3.0 Mistral 24B	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
Mistral Medium 3 24B (dense)	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
Mistral Small 3.2 24B	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
Devstral Small 2 24B	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
Mistral Saba 24B	24B	Q4_K_M	37.2 GB	8,192	~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
Gemma 4 26B MoE	26B	Q4_K_M	39.5 GB	8,192	~39.5 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
InternVL 2.5 26B	26B	Q4_K_M	39.5 GB	8,192	~39.5 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Gemma 3 27B	27B	Q4_K_M	40.6 GB	8,192	~40.6 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 23%. Drop quant or move to a larger build.
MedGemma 27B	27B	Q4_K_M	40.6 GB	8,192	~40.6 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 23%. Drop quant or move to a larger build.
Qwen 3 30B-A3B	30B	Q4_K_M	44 GB	8,192	~44.0 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 33%. Drop quant or move to a larger build.
Nemotron 3 Nano (30B-A3B)	30B	Q4_K_M	44 GB	8,192	~44.0 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 33%. Drop quant or move to a larger build.
Gemma 4 31B Dense	31B	Q4_K_M	45.1 GB	8,192	~45.1 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 37%. Drop quant or move to a larger build.

Shopping a full build instead of a single card?

If you're sizing a fresh AI build (not just a card to drop into an existing system), the build-budget walkthroughs cover the whole BOM honestly: AI PC build under $1,000 or AI PC build under $2,000 cover the realistic 2026 budget tiers.

Vertical-fit shopping? AI PC for students covers the budget + portability tradeoffs; AI PC for developers covers the coding workflow specifics; AI PC for small business covers the document-RAG / always-on machine.

Form-factor first? See best laptop for local AI, best Mac for local AI, best mini PC for local AI, or best used GPU for local AI.

Describe your build

Build summary

Recommended runtime

Models that fit your build

Related

Shopping a full build instead of a single card?