RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
← Back to Will-it-run

Custom build engine

Describe your build — any GPUs, CPU, RAM, OS, runtime, use case. We'll compute effective VRAM honestly, recommend a runtime, and tell you which models fit comfortably, which are borderline, and which aren't practical.

Total VRAM ≠ pooled VRAM. We never sum VRAM unless the silicon truly pools (Apple unified memory). We always explain why effective is lower than total.

Describe your build

Add GPUs, set CPU/RAM/OS, optionally pick a runtime + use case. URL updates as you change fields — share a build by copying the URL.

Build summary

Total VRAM
48 GB
Effective VRAM
~33 GB
range 29-36 GB
Topology
mixed gpu
pcie
Setup difficulty
advanced
speed penalty ~35%
Why effective VRAM is lower than total

Mixed-GPU (asymmetric) configuration. Tensor-parallel doesn't work cleanly because TP requires identical cards — your faster card stalls waiting on the slower one every layer. Use llama.cpp's layer-split with manual --tensor-split tuning to distribute layers by VRAM ratio. Effective capacity ~33 GB after layer-split overhead, but the slowest card (22 GB effective) bottlenecks single-tensor operations.

Recommended runtime

Best engine for this topology + skill level + use case.

llama.cpp (layer-split)
primary
involved

Mixed-GPU configurations need llama.cpp's --tensor-split flag with manual ratio tuning by VRAM. vLLM's tensor-parallel requires identical cards and won't run cleanly here.

Ollama
alternative
moderate

Inherits llama.cpp's layer-split path with friendlier UX. OLLAMA_GPU_OVERHEAD and per-card env vars do most of what manual flags do.

Models that fit your build

183 models considered. Categorized by headroom at the recommended quant + a sensible context for your use case.

Comfortable
24 models · ≥15% headroom
ModelParamsQuantVRAM est.ContextNote
DeepSeek MoE 16B Base16BQ4_K_M20 GB4,096Fits cleanly at Q4_K_M + 4,096 ctx with 39% headroom.
StarCoder 2 15B15BQ4_K_M27 GB8,192Fits cleanly at Q4_K_M + 8,192 ctx with 18% headroom.
DeepSeek R1 Distill Qwen 14B14BQ4_K_M25.9 GB8,192Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
Phi-4 Multimodal14BQ4_K_M25.9 GB8,192Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
Qwen 2.5 Coder 14B Instruct14BQ4_K_M25.9 GB8,192Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
Phi-4 Reasoning 14B14BQ4_K_M25.9 GB8,192Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
GLM-4V 9B14BQ4_K_M25.8 GB8,192Fits cleanly at Q4_K_M + 8,192 ctx with 22% headroom.
OLMo 2 13B13BQ4_K_M17.4 GB4,096Comfortable fit with 47% headroom — room to extend context or run alongside other workloads.
Baichuan 4 13B13BQ4_K_M24.7 GB8,192Fits cleanly at Q4_K_M + 8,192 ctx with 25% headroom.
Stable LM 2 12B12BQ4_K_M16.5 GB4,096Comfortable fit with 50% headroom — room to extend context or run alongside other workloads.
Llama 3.2 11B Vision Instruct11BQ8_027.8 GB8,192Fits cleanly at Q8_0 + 8,192 ctx with 16% headroom.
Llama 3.2 11B Vision11BQ4_K_M22.5 GB8,192Fits cleanly at Q4_K_M + 8,192 ctx with 32% headroom.
Falcon 3 10B10BQ4_K_M21.3 GB8,192Fits cleanly at Q4_K_M + 8,192 ctx with 35% headroom.
Gemma 2 9B Instruct9BQ8_024.5 GB8,192Fits cleanly at Q8_0 + 8,192 ctx with 26% headroom.
Yi Coder 9B9BQ4_K_M20.2 GB8,192Fits cleanly at Q4_K_M + 8,192 ctx with 39% headroom.
Nemotron 3 Nano 9B9BQ4_K_M20.2 GB8,192Fits cleanly at Q4_K_M + 8,192 ctx with 39% headroom.
GLM-4 9B9BQ4_K_M20.2 GB8,192Fits cleanly at Q4_K_M + 8,192 ctx with 39% headroom.
Tulu 3 8B8BQ4_K_M19.1 GB8,192Comfortable fit with 42% headroom — room to extend context or run alongside other workloads.
Molmo 7B-D8BQ4_K_M13 GB4,096Comfortable fit with 61% headroom — room to extend context or run alongside other workloads.
Llama 3.1 Nemotron Nano 8B8BQ4_K_M19.1 GB8,192Comfortable fit with 42% headroom — room to extend context or run alongside other workloads.
Granite 3.0 8B Instruct8BQ4_K_M13 GB4,096Comfortable fit with 61% headroom — room to extend context or run alongside other workloads.
DeepSeek R1 Distill Llama 8B8BQ4_K_M19.1 GB8,192Comfortable fit with 42% headroom — room to extend context or run alongside other workloads.
MiniCPM-V 2.6 8B8BQ4_K_M19.1 GB8,192Comfortable fit with 42% headroom — room to extend context or run alongside other workloads.
OpenCoder 8B8BQ4_K_M19.1 GB8,192Comfortable fit with 42% headroom — room to extend context or run alongside other workloads.
Borderline
12 models · tight, may need quant downgrade
ModelParamsQuantVRAM est.ContextNote
Qwen 2.5 Coder 32B Instruct32BQ4_K_M32.4 GB8,192Tight fit at Q4_K_M — only 2% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
DeepSeek V3 Lite (16B MoE)16BQ4_K_M28.1 GB8,192Tight fit at Q4_K_M — only 15% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
DeepSeek Coder V2 Lite (16B)16BQ4_K_M28.1 GB8,192Tight fit at Q4_K_M — only 15% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Granite 3 MoE (3B active)16BQ4_K_M28.1 GB8,192Tight fit at Q4_K_M — only 15% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 2.5 14B Instruct14BQ8_032.6 GB8,192Tight fit at Q8_0 — only 1% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Phi-4 14B14BQ8_032.6 GB8,192Tight fit at Q8_0 — only 1% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 3 14B14BQ8_032.6 GB8,192Tight fit at Q8_0 — only 1% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Pixtral 12B12BQ8_029.4 GB8,192Tight fit at Q8_0 — only 11% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Gemma 3 12B12BQ8_029.4 GB8,192Tight fit at Q8_0 — only 11% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Mistral Nemo 12B Instruct12BQ8_029.4 GB8,192Tight fit at Q8_0 — only 11% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Qwen 3 Embedding 8B8BFP1630.8 GB8,192Tight fit at FP16 — only 7% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
NV-Embed v28BFP1630.4 GB8,192Tight fit at FP16 — only 8% headroom. KV cache for longer context will OOM. Cap context tighter or drop one quant level.
Not practical
16 models · oversize for this build
ModelParamsQuantVRAM est.ContextNote
PaliGemma 2 10B10BBF1636 GB8,192~36.0 GB needed at BF16 + 8,192 ctx — overshoots effective VRAM by 9%. Drop quant or move to a larger build.
Codestral 22B22BQ4_K_M34.9 GB8,192~34.9 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 6%. Drop quant or move to a larger build.
Mistral Small 3 24B24BQ4_K_M37.2 GB8,192~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
DeepSeek R1 Distill Mistral 24B24BQ4_K_M37.2 GB8,192~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
Dolphin 3.0 Mistral 24B24BQ4_K_M37.2 GB8,192~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
Mistral Medium 3 24B (dense)24BQ4_K_M37.2 GB8,192~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
Mistral Small 3.2 24B24BQ4_K_M37.2 GB8,192~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
Devstral Small 2 24B24BQ4_K_M37.2 GB8,192~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
Mistral Saba 24B24BQ4_K_M37.2 GB8,192~37.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 13%. Drop quant or move to a larger build.
Gemma 4 26B MoE26BQ4_K_M39.5 GB8,192~39.5 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
InternVL 2.5 26B26BQ4_K_M39.5 GB8,192~39.5 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 20%. Drop quant or move to a larger build.
Gemma 3 27B27BQ4_K_M40.6 GB8,192~40.6 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 23%. Drop quant or move to a larger build.
MedGemma 27B27BQ4_K_M40.6 GB8,192~40.6 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 23%. Drop quant or move to a larger build.
Qwen 3 30B-A3B30BQ4_K_M44 GB8,192~44.0 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 33%. Drop quant or move to a larger build.
Nemotron 3 Nano (30B-A3B)30BQ4_K_M44 GB8,192~44.0 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 33%. Drop quant or move to a larger build.
Gemma 4 31B Dense31BQ4_K_M45.1 GB8,192~45.1 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by 37%. Drop quant or move to a larger build.

Related

Multi-GPU buying guide →

NVLink vs PCIe, tensor- vs pipeline-parallel, mixed-card honesty.

Hardware combinations →

Curated multi-GPU / cluster setups with effective-VRAM math.

Setup path-finder →

OS + runtime install commands for your stack.

Compatibility matrix →

Runtime × OS × hardware support truth table.

Shopping a full build instead of a single card?

If you're sizing a fresh AI build (not just a card to drop into an existing system), the build-budget walkthroughs cover the whole BOM honestly: AI PC build under $1,000 or AI PC build under $2,000 cover the realistic 2026 budget tiers.

Vertical-fit shopping? AI PC for students covers the budget + portability tradeoffs; AI PC for developers covers the coding workflow specifics; AI PC for small business covers the document-RAG / always-on machine.

Form-factor first? See best laptop for local AI, best Mac for local AI, best mini PC for local AI, or best used GPU for local AI.

See something off?Submit a benchmark·Report outdated·Suggest a correctionWe read every submission. Editorial review takes 1-7 days.