Custom build engine — what local AI can your hardware run?

Describe your build

Add GPUs, set CPU/RAM/OS, optionally pick a runtime + use case. URL updates as you change fields — share a build by copying the URL.

GPUs in your build

CPU class

System RAM (GB)

OS

Use case

Your skill level

Runtime preference

Build summary

Total VRAM

128 GB

Effective VRAM

~126 GB

range 116-126 GB

Topology

single gpu

none

Setup difficulty

beginner

speed penalty ~0%

Why effective VRAM is lower than total

Single AMD Instinct MI300A (APU) — 128 GB VRAM minus ~1.8 GB runtime overhead = ~126 GB usable for weights + KV cache + activations. The 8% headroom we reserve covers the typical OS/driver footprint and gives KV-cache room for an 8K-32K context.

Coding agents — agentic tool-call burst

Workload-specific bottleneck. Where this kind of work actually breaks first, and what to budget for.

Bottleneck: kv cache

Coding agents emit 5-15 tool calls per task. Each call carries the full agent system prompt + context. KV-cache budget for that prompt × concurrent requests is the limit. The decode side is well-served by any modern card; the prefill side bottlenecks first.

Budget for

•32K context with KV-cache room to spare (~3-4 GB on 4090 AWQ-INT4)
•Prefix cache: prefer SGLang for >5 tool calls / task
•Decode latency: aim for >40 tok/s sustained

Recommended runtime

Best engine for this topology + skill level + use case.

llama.cpp (HIPBLAS)

primary

moderate

The most reliable AMD inference path in 2026. GGUF format works on every AMD card; HIPBLAS backend matches llama.cpp's CUDA backend within ~20% on RDNA3.

Models that fit your build

47 models considered (filtered by coding). Categorized by headroom at the recommended quant + a sensible context for your use case.

Comfortable

24 models · ≥15% headroom

Model	Params	Quant	VRAM est.	Context	Note
Phind CodeLlama 34B v2	34B	Q4_K_M	73.9 GB	16,384	Comfortable fit with 41% headroom — room to extend context or run alongside other workloads.
Qwen 2.5 Coder 32B Instruct	32B	Q8_0	79.1 GB	32,768	Fits cleanly at Q8_0 + 32,768 ctx with 37% headroom.
Devstral Small 2 24B	24B	Q4_K_M	98 GB	32,768	Fits cleanly at Q4_K_M + 32,768 ctx with 22% headroom.
Codestral 22B	22B	Q8_0	103.3 GB	32,768	Fits cleanly at Q8_0 + 32,768 ctx with 18% headroom.
DeepSeek V3 Lite (16B MoE)	16B	Q4_K_M	76.9 GB	32,768	Fits cleanly at Q4_K_M + 32,768 ctx with 39% headroom.
DeepSeek Coder V2 Lite (16B)	16B	Q4_K_M	76.9 GB	32,768	Fits cleanly at Q4_K_M + 32,768 ctx with 39% headroom.
StarCoder 2 15B	15B	Q4_K_M	42.9 GB	16,384	Comfortable fit with 66% headroom — room to extend context or run alongside other workloads.
Qwen 2.5 14B Instruct	14B	Q8_0	78.4 GB	32,768	Fits cleanly at Q8_0 + 32,768 ctx with 38% headroom.
Qwen 3 14B	14B	Q8_0	78.4 GB	32,768	Fits cleanly at Q8_0 + 32,768 ctx with 38% headroom.
Qwen 2.5 Coder 14B Instruct	14B	Q4_K_M	71.6 GB	32,768	Comfortable fit with 43% headroom — room to extend context or run alongside other workloads.
Yi Coder 9B	9B	Q4_K_M	58.5 GB	32,768	Comfortable fit with 54% headroom — room to extend context or run alongside other workloads.
OpenCoder 8B	8B	Q4_K_M	55.8 GB	32,768	Comfortable fit with 56% headroom — room to extend context or run alongside other workloads.
Qwen 3 8B	8B	Q8_0	59.7 GB	32,768	Comfortable fit with 53% headroom — room to extend context or run alongside other workloads.
Llama 3.1 8B Instruct	8B	FP16	55.9 GB	32,768	Comfortable fit with 56% headroom — room to extend context or run alongside other workloads.
DeepSeek R1 Distill Llama 8B	8B	Q4_K_M	55.8 GB	32,768	Comfortable fit with 56% headroom — room to extend context or run alongside other workloads.
Qwen 2.5 Coder 7B Instruct	7B	Q6_K	54.8 GB	32,768	Comfortable fit with 56% headroom — room to extend context or run alongside other workloads.
Qwen 2.5 7B Instruct	7B	Q8_0	44.5 GB	32,768	Comfortable fit with 65% headroom — room to extend context or run alongside other workloads.
CodeGemma 7B	7B	Q4_K_M	18.1 GB	8,192	Comfortable fit with 86% headroom — room to extend context or run alongside other workloads.
Codestral Mamba 7B	7B	Q4_K_M	53.2 GB	32,768	Comfortable fit with 58% headroom — room to extend context or run alongside other workloads.
CodeQwen 1.5 7B	7B	Q4_K_M	53.2 GB	32,768	Comfortable fit with 58% headroom — room to extend context or run alongside other workloads.
StarCoder 2 7B	7B	Q4_K_M	29.8 GB	16,384	Comfortable fit with 76% headroom — room to extend context or run alongside other workloads.
Qwen 2.5 Coder 3B	3B	Q4_K_M	42.7 GB	32,768	Comfortable fit with 66% headroom — room to extend context or run alongside other workloads.
StarCoder 2 3B	3B	Q4_K_M	23.3 GB	16,384	Comfortable fit with 82% headroom — room to extend context or run alongside other workloads.
Qwen 2.5 Coder 1.5B	2B	Q4_K_M	38.7 GB	32,768	Comfortable fit with 69% headroom — room to extend context or run alongside other workloads.

Borderline

8 models · tight, may need quant downgrade

Model	Params	Quant	VRAM est.	Context	Note
Llama 3.3 70B Instruct	70B	Q8_0	123.6 GB	32,768	Tight fit at Q8_0 — only 2% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 3.6 35B-A3B (MTP)	35B	Q3_K_M	122.7 GB	32,768	Tight fit at Q3_K_M — only 3% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 2.5 32B Instruct	32B	Q4_K_M	119.1 GB	32,768	Tight fit at Q4_K_M — only 6% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 3 32B	32B	Q5_K_M	121.9 GB	32,768	Tight fit at Q5_K_M — only 3% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Gemma 4 31B Dense	31B	Q4_K_M	116.4 GB	32,768	Tight fit at Q4_K_M — only 8% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 3 30B-A3B	30B	Q4_K_M	113.8 GB	32,768	Tight fit at Q4_K_M — only 10% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 3.6 27B (MTP)	27B	Q8_0	118.9 GB	32,768	Tight fit at Q8_0 — only 6% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Mistral Small 3 24B	24B	Q8_0	109.5 GB	32,768	Tight fit at Q8_0 — only 13% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.

Not practical

15 models · oversize for this build

Model	Params	Quant	VRAM est.	Context	Note
DeepSeek R1 Distill Qwen 3 32B	32B	AWQ-INT4	132.4 GB	32,768	~132.4 GB needed at AWQ-INT4 + 32,768 ctx — overshoots effective VRAM by 5%. Drop quant or move to a larger build.
Qwen 3 Coder 32B	32B	AWQ-INT4	132.4 GB	32,768	~132.4 GB needed at AWQ-INT4 + 32,768 ctx — overshoots effective VRAM by 5%. Drop quant or move to a larger build.
DeepSeek Coder V3	33B	AWQ-INT4	135.4 GB	32,768	~135.4 GB needed at AWQ-INT4 + 32,768 ctx — overshoots effective VRAM by 7%. Drop quant or move to a larger build.
Llama 4 70B	70B	AWQ-INT4	248.3 GB	32,768	~248.3 GB needed at AWQ-INT4 + 32,768 ctx — overshoots effective VRAM by 97%. Drop quant or move to a larger build.
Llama 3.1 70B Instruct	70B	Q4_K_M	219.1 GB	32,768	~219.1 GB needed at Q4_K_M + 32,768 ctx — overshoots effective VRAM by 74%. Drop quant or move to a larger build.
Qwen 3 72B	72B	AWQ-INT4	254.4 GB	32,768	~254.4 GB needed at AWQ-INT4 + 32,768 ctx — overshoots effective VRAM by 102%. Drop quant or move to a larger build.
Llama 4 Scout	109B	Q4_K_M	321.9 GB	32,768	~321.9 GB needed at Q4_K_M + 32,768 ctx — overshoots effective VRAM by 155%. Drop quant or move to a larger build.
Qwen 3 235B-A22B	235B	Q4_K_M	653.7 GB	32,768	~653.7 GB needed at Q4_K_M + 32,768 ctx — overshoots effective VRAM by 419%. Drop quant or move to a larger build.
DeepSeek Coder V2 236B	236B	Q4_K_M	656.4 GB	32,768	~656.4 GB needed at Q4_K_M + 32,768 ctx — overshoots effective VRAM by 421%. Drop quant or move to a larger build.
DeepSeek V2.5 236B	236B	Q4_K_M	656.4 GB	32,768	~656.4 GB needed at Q4_K_M + 32,768 ctx — overshoots effective VRAM by 421%. Drop quant or move to a larger build.
DeepSeek V4 Flash (284B MoE)	284B	Q4_K_M	782.8 GB	32,768	~782.8 GB needed at Q4_K_M + 32,768 ctx — overshoots effective VRAM by 521%. Drop quant or move to a larger build.
DeepSeek V4	745B	AWQ-INT4	2307 GB	32,768	~2307.0 GB needed at AWQ-INT4 + 32,768 ctx — overshoots effective VRAM by 1731%. Drop quant or move to a larger build.
Kimi K2.6	1000B	Q4_K_M	2668.7 GB	32,768	~2668.7 GB needed at Q4_K_M + 32,768 ctx — overshoots effective VRAM by 2018%. Drop quant or move to a larger build.
Ring-2.6-1T	1000B	Q3_K_M	2546.6 GB	32,768	~2546.6 GB needed at Q3_K_M + 32,768 ctx — overshoots effective VRAM by 1921%. Drop quant or move to a larger build.
DeepSeek V4 Pro (1.6T MoE)	1600B	Q4_K_M	4249.1 GB	32,768	~4249.1 GB needed at Q4_K_M + 32,768 ctx — overshoots effective VRAM by 3272%. Drop quant or move to a larger build.

Shopping a full build instead of a single card?

If you're sizing a fresh AI build (not just a card to drop into an existing system), the build-budget walkthroughs cover the whole BOM honestly: AI PC build under $1,000 or AI PC build under $2,000 cover the realistic 2026 budget tiers.

Vertical-fit shopping? AI PC for students covers the budget + portability tradeoffs; AI PC for developers covers the coding workflow specifics; AI PC for small business covers the document-RAG / always-on machine.

Form-factor first? See best laptop for local AI, best Mac for local AI, best mini PC for local AI, or best used GPU for local AI.