Custom build engine — what local AI can your hardware run?

Describe your build

Add GPUs, set CPU/RAM/OS, optionally pick a runtime + use case. URL updates as you change fields — share a build by copying the URL.

GPUs in your build

CPU class

System RAM (GB)

OS

Use case

Your skill level

Runtime preference

Build summary

Total VRAM

16 GB

Effective VRAM

~15 GB

range 13-14 GB

Topology

single gpu

none

Setup difficulty

beginner

speed penalty ~0%

Why effective VRAM is lower than total

Single NVIDIA GeForce RTX 3080 16GB (Mobile) — 16 GB VRAM minus ~1.5 GB runtime overhead = ~14 GB usable for weights + KV cache + activations. The 8% headroom we reserve covers the typical OS/driver footprint and gives KV-cache room for an 8K-32K context.

Recommended runtime

Best engine for this topology + skill level + use case.

ExLlamaV2

primary

involved

Highest single-stream throughput on consumer NVIDIA. EXL2 mixed-bit quants are the leading consumer-tier inference format.

llama.cpp

alternative

moderate

Cross-format flexibility — GGUF works everywhere; the engine that powers Ollama and LM Studio.

Models that fit your build

183 models considered. Categorized by headroom at the recommended quant + a sensible context for your use case.

Comfortable

19 models · ≥15% headroom

Model	Params	Quant	VRAM est.	Context	Note
Qwen 2.5 Math 7B	7B	Q4_K_M	12.1 GB	4,096	Fits cleanly at Q4_K_M + 4,096 ctx with 19% headroom.
Janus-Pro 7B	7B	Q4_K_M	12.1 GB	4,096	Fits cleanly at Q4_K_M + 4,096 ctx with 19% headroom.
Nemotron Mini 4B Instruct	4B	Q4_K_M	9.4 GB	4,096	Fits cleanly at Q4_K_M + 4,096 ctx with 37% headroom.
EXAONE 3.5 2.4B	2B	Q4_K_M	12.7 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 15% headroom.
Granite 3.0 2B Instruct	2B	Q4_K_M	7.7 GB	4,096	Comfortable fit with 49% headroom — room to extend context or run alongside other workloads.
Moondream 2	2B	Q4_K_M	5.3 GB	2,048	Comfortable fit with 65% headroom — room to extend context or run alongside other workloads.
SmolLM 2 1.7B Instruct	2B	Q4_K_M	11.9 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 21% headroom.
Whisper Large v3	2B	FP16	5.1 GB	0	Comfortable fit with 66% headroom — room to extend context or run alongside other workloads.
DeepSeek R1 Distill Qwen 1.5B	2B	Q4_K_M	11.7 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
Qwen 2.5 Coder 1.5B	2B	Q4_K_M	11.7 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
RWKV 7 'Goose' 1.5B	2B	Q5_K_M	11.8 GB	8,192	Fits cleanly at Q5_K_M + 8,192 ctx with 21% headroom.
Qwen 2.5 1.5B Instruct	2B	Q4_K_M	11.7 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
Llama 3.2 1B Instruct	1B	Q8_0	11.6 GB	8,192	Fits cleanly at Q8_0 + 8,192 ctx with 23% headroom.
Gemma 3 1B	1B	Q4_K_M	11.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 26% headroom.
Whisper Large v3 Turbo	1B	FP16	3.5 GB	0	Comfortable fit with 77% headroom — room to extend context or run alongside other workloads.
BGE M3	1B	FP16	11.5 GB	8,192	Fits cleanly at FP16 + 8,192 ctx with 24% headroom.
BGE Reranker v2 M3	1B	FP16	11.5 GB	8,192	Fits cleanly at FP16 + 8,192 ctx with 24% headroom.
Qwen 2.5 0.5B Instruct	1B	Q4_K_M	10.6 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 30% headroom.
SmolLM 2 360M Instruct	0B	Q4_K_M	10.4 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 31% headroom.

Borderline

16 models · tight, may need quant downgrade

Model	Params	Quant	VRAM est.	Context	Note
Molmo 7B-D	8B	Q4_K_M	13 GB	4,096	Tight fit at Q4_K_M — only 14% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Granite 3.0 8B Instruct	8B	Q4_K_M	13 GB	4,096	Tight fit at Q4_K_M — only 14% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 2.5 7B Instruct	7B	Q4_K_M	14.9 GB	8,192	Tight fit at Q4_K_M — only 1% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Phi-3.5 Vision	4B	Q4_K_M	14.8 GB	8,192	Tight fit at Q4_K_M — only 2% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 3 4B	4B	Q4_K_M	14.5 GB	8,192	Tight fit at Q4_K_M — only 3% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Gemma 4 E4B (Effective 4B)	4B	Q4_K_M	14.5 GB	8,192	Tight fit at Q4_K_M — only 3% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Gemma 3 4B	4B	Q4_K_M	14.5 GB	8,192	Tight fit at Q4_K_M — only 3% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
MiniCPM 3 4B	4B	Q4_K_M	14.5 GB	8,192	Tight fit at Q4_K_M — only 3% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Phi-3.5 Mini Instruct	4B	Q4_K_M	14.3 GB	8,192	Tight fit at Q4_K_M — only 5% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Phi-4 Mini 4B	4B	Q4_K_M	14.3 GB	8,192	Tight fit at Q4_K_M — only 5% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Phi-4 Reasoning Mini 4B	4B	Q4_K_M	14.3 GB	8,192	Tight fit at Q4_K_M — only 5% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Llama 3.2 3B Instruct	3B	Q8_0	14.8 GB	8,192	Tight fit at Q8_0 — only 1% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 2.5 Coder 3B	3B	Q4_K_M	13.4 GB	8,192	Tight fit at Q4_K_M — only 11% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Dolphin 3.0 Llama 3.2 3B	3B	Q4_K_M	13.4 GB	8,192	Tight fit at Q4_K_M — only 11% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Hermes 3 Llama 3.2 3B	3B	Q4_K_M	13.4 GB	8,192	Tight fit at Q4_K_M — only 11% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Ministral 3B Instruct	3B	Q4_K_M	13.4 GB	8,192	Tight fit at Q4_K_M — only 11% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.

Not practical

16 models · oversize for this build

Model	Params	Quant	VRAM est.	Context	Note
PaliGemma 2 3B	3B	BF16	17.8 GB	8,192	~17.8 GB needed at BF16 + 8,192 ctx — overshoots effective VRAM by 19%. Drop quant or move to a larger build.
CodeGemma 7B	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Falcon Mamba 7B	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
DeepSeek R1 Distill Qwen 7B	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Mistral 7B Instruct v0.3	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Qwen 2.5 Coder 7B Instruct	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Qwen 2.5-VL 7B	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Codestral Mamba 7B	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Qwen 3 7B	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Qwen 2-VL 7B	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
StarCoder 2 7B	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
CodeQwen 1.5 7B	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
LLaVA 1.6 Mistral 7B	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
LLaVA-OneVision 7B	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Falcon 3 7B Instruct	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
InternLM 2.5 7B Chat	7B	Q4_K_M	17.9 GB	8,192	~17.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.

Shopping a full build instead of a single card?

If you're sizing a fresh AI build (not just a card to drop into an existing system), the build-budget walkthroughs cover the whole BOM honestly: AI PC build under $1,000 or AI PC build under $2,000 cover the realistic 2026 budget tiers.

Vertical-fit shopping? AI PC for students covers the budget + portability tradeoffs; AI PC for developers covers the coding workflow specifics; AI PC for small business covers the document-RAG / always-on machine.

Form-factor first? See best laptop for local AI, best Mac for local AI, best mini PC for local AI, or best used GPU for local AI.

Describe your build

Build summary

Recommended runtime

Models that fit your build

Related

Shopping a full build instead of a single card?