Custom build engine — what local AI can your hardware run?

Describe your build

Add GPUs, set CPU/RAM/OS, optionally pick a runtime + use case. URL updates as you change fields — share a build by copying the URL.

GPUs in your build

CPU class

System RAM (GB)

OS

Use case

Your skill level

Runtime preference

Build summary

Total VRAM

24 GB

Effective VRAM

~22 GB

range 20-22 GB

Topology

single gpu

none

Setup difficulty

beginner

speed penalty ~0%

Why effective VRAM is lower than total

Single AMD Radeon RX 7900 XTX — 24 GB VRAM minus ~1.8 GB runtime overhead = ~22 GB usable for weights + KV cache + activations. The 8% headroom we reserve covers the typical OS/driver footprint and gives KV-cache room for an 8K-32K context.

Recommended runtime

Best engine for this topology + skill level + use case.

llama.cpp (HIPBLAS)

primary

moderate

The most reliable AMD inference path in 2026. GGUF format works on every AMD card; HIPBLAS backend matches llama.cpp's CUDA backend within ~20% on RDNA3.

Models that fit your build

183 models considered. Categorized by headroom at the recommended quant + a sensible context for your use case.

Comfortable

24 models · ≥15% headroom

Model	Params	Quant	VRAM est.	Context	Note
OLMo 2 13B	13B	Q4_K_M	17.6 GB	4,096	Fits cleanly at Q4_K_M + 4,096 ctx with 20% headroom.
Stable LM 2 12B	12B	Q4_K_M	16.7 GB	4,096	Fits cleanly at Q4_K_M + 4,096 ctx with 24% headroom.
Molmo 7B-D	8B	Q4_K_M	13.2 GB	4,096	Comfortable fit with 40% headroom — room to extend context or run alongside other workloads.
Granite 3.0 8B Instruct	8B	Q4_K_M	13.2 GB	4,096	Comfortable fit with 40% headroom — room to extend context or run alongside other workloads.
CodeGemma 7B	7B	Q4_K_M	18.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
Falcon Mamba 7B	7B	Q4_K_M	18.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
Qwen 2.5 7B Instruct	7B	Q8_0	18.5 GB	8,192	Fits cleanly at Q8_0 + 8,192 ctx with 16% headroom.
Qwen 2.5-VL 7B	7B	Q4_K_M	18.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
Codestral Mamba 7B	7B	Q4_K_M	18.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
Qwen 3 7B	7B	Q4_K_M	18.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
Qwen 2.5 Math 7B	7B	Q4_K_M	12.3 GB	4,096	Comfortable fit with 44% headroom — room to extend context or run alongside other workloads.
Janus-Pro 7B	7B	Q4_K_M	12.3 GB	4,096	Comfortable fit with 44% headroom — room to extend context or run alongside other workloads.
Qwen 2-VL 7B	7B	Q4_K_M	18.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
StarCoder 2 7B	7B	Q4_K_M	18.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
CodeQwen 1.5 7B	7B	Q4_K_M	18.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
LLaVA 1.6 Mistral 7B	7B	Q4_K_M	18.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
LLaVA-OneVision 7B	7B	Q4_K_M	18.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
Falcon 3 7B Instruct	7B	Q4_K_M	18.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
InternLM 2.5 7B Chat	7B	Q4_K_M	18.1 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
Phi-3.5 Vision	4B	Q4_K_M	15 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 32% headroom.
Qwen 3 4B	4B	Q8_0	16.7 GB	8,192	Fits cleanly at Q8_0 + 8,192 ctx with 24% headroom.
Gemma 4 E4B (Effective 4B)	4B	Q8_0	16.7 GB	8,192	Fits cleanly at Q8_0 + 8,192 ctx with 24% headroom.
Gemma 3 4B	4B	Q8_0	16.7 GB	8,192	Fits cleanly at Q8_0 + 8,192 ctx with 24% headroom.
MiniCPM 3 4B	4B	Q4_K_M	14.7 GB	8,192	Fits cleanly at Q4_K_M + 8,192 ctx with 33% headroom.

Borderline

16 models · tight, may need quant downgrade

Model	Params	Quant	VRAM est.	Context	Note
DeepSeek MoE 16B Base	16B	Q4_K_M	20.2 GB	4,096	Tight fit at Q4_K_M — only 8% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Falcon 3 10B	10B	Q4_K_M	21.5 GB	8,192	Tight fit at Q4_K_M — only 2% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Gemma 2 9B Instruct	9B	Q4_K_M	20.4 GB	8,192	Tight fit at Q4_K_M — only 7% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Yi Coder 9B	9B	Q4_K_M	20.4 GB	8,192	Tight fit at Q4_K_M — only 7% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Nemotron 3 Nano 9B	9B	Q4_K_M	20.4 GB	8,192	Tight fit at Q4_K_M — only 7% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
GLM-4 9B	9B	Q4_K_M	20.4 GB	8,192	Tight fit at Q4_K_M — only 7% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Tulu 3 8B	8B	Q4_K_M	19.3 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Llama 3.1 Nemotron Nano 8B	8B	Q4_K_M	19.3 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
DeepSeek R1 Distill Llama 8B	8B	Q4_K_M	19.3 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
MiniCPM-V 2.6 8B	8B	Q4_K_M	19.3 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
OpenCoder 8B	8B	Q4_K_M	19.3 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Granite 3.2 8B	8B	Q4_K_M	19.3 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
InternLM 3 8B	8B	Q4_K_M	19.3 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Granite 3.3 8B	8B	Q4_K_M	19.3 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Ministral 8B Instruct	8B	Q4_K_M	19.3 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
MiniCPM-V 3 8B	8B	Q4_K_M	19.3 GB	8,192	Tight fit at Q4_K_M — only 12% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.

Not practical

16 models · oversize for this build

Model	Params	Quant	VRAM est.	Context	Note
NV-Embed v2	8B	FP16	30.6 GB	8,192	~30.6 GB needed at FP16 + 8,192 ctx — overshoots effective VRAM by 39%. Drop quant or move to a larger build.
Qwen 3 Embedding 8B	8B	FP16	31 GB	8,192	~31.0 GB needed at FP16 + 8,192 ctx — overshoots effective VRAM by 41%. Drop quant or move to a larger build.
PaliGemma 2 10B	10B	BF16	36.2 GB	8,192	~36.2 GB needed at BF16 + 8,192 ctx — overshoots effective VRAM by 65%. Drop quant or move to a larger build.
Llama 3.2 11B Vision Instruct	11B	Q4_K_M	22.7 GB	8,192	~22.7 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 3%. Drop quant or move to a larger build.
Llama 3.2 11B Vision	11B	Q4_K_M	22.7 GB	8,192	~22.7 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 3%. Drop quant or move to a larger build.
Pixtral 12B	12B	Q4_K_M	23.8 GB	8,192	~23.8 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 8%. Drop quant or move to a larger build.
Gemma 3 12B	12B	Q4_K_M	23.8 GB	8,192	~23.8 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 8%. Drop quant or move to a larger build.
Mistral Nemo 12B Instruct	12B	Q4_K_M	23.8 GB	8,192	~23.8 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 8%. Drop quant or move to a larger build.
Baichuan 4 13B	13B	Q4_K_M	24.9 GB	8,192	~24.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
GLM-4V 9B	14B	Q4_K_M	26 GB	8,192	~26.0 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 18%. Drop quant or move to a larger build.
DeepSeek R1 Distill Qwen 14B	14B	Q4_K_M	26.1 GB	8,192	~26.1 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 18%. Drop quant or move to a larger build.
Qwen 2.5 14B Instruct	14B	Q4_K_M	26.1 GB	8,192	~26.1 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 18%. Drop quant or move to a larger build.
Phi-4 Multimodal	14B	Q4_K_M	26.1 GB	8,192	~26.1 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 18%. Drop quant or move to a larger build.
Qwen 2.5 Coder 14B Instruct	14B	Q4_K_M	26.1 GB	8,192	~26.1 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 18%. Drop quant or move to a larger build.
Phi-4 14B	14B	Q4_K_M	26.1 GB	8,192	~26.1 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 18%. Drop quant or move to a larger build.
Phi-4 Reasoning 14B	14B	Q4_K_M	26.1 GB	8,192	~26.1 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 18%. Drop quant or move to a larger build.

Shopping a full build instead of a single card?

If you're sizing a fresh AI build (not just a card to drop into an existing system), the build-budget walkthroughs cover the whole BOM honestly: AI PC build under $1,000 or AI PC build under $2,000 cover the realistic 2026 budget tiers.

Vertical-fit shopping? AI PC for students covers the budget + portability tradeoffs; AI PC for developers covers the coding workflow specifics; AI PC for small business covers the document-RAG / always-on machine.

Form-factor first? See best laptop for local AI, best Mac for local AI, best mini PC for local AI, or best used GPU for local AI.

Describe your build

Build summary

Recommended runtime

Models that fit your build

Related

Shopping a full build instead of a single card?