Describe your build — any GPUs, CPU, RAM, OS, runtime, use case. We'll compute effective VRAM honestly, recommend a runtime, and tell you which models fit comfortably, which are borderline, and which aren't practical.
Total VRAM ≠ pooled VRAM. We never sum VRAM unless the silicon truly pools (Apple unified memory). We always explain why effective is lower than total.
Calculations follow the RunLocalAI Will-It-Run Framework: effective VRAM, model working set, runtime constraints, fit tiers, and measured-vs-estimated evidence labels.
Add GPUs, set CPU/RAM/OS, optionally pick a runtime + use case. URL updates as you change fields — share a build by copying the URL.
Apple Silicon unified memory IS genuinely pooled — 0 GB shared between CPU and GPU. macOS reserves ~8 GB for the OS and apps, leaving ~0 GB for inference. Unlike NVIDIA multi-GPU, you don't pay an interconnect penalty here. The trade is bandwidth: 800 GB/s vs an RTX 4090's 1 TB/s.
Publicly inspectable measured rows for the selected hardware slug(s). Exact measured rows calibrate the fit table instead of leaving it as pure VRAM estimation.
No publicly inspectable benchmark rows are attached to this exact hardware yet. The engine will still calculate fit and runtime, but speed rows will remain estimated.
Best engine for this topology + skill level + use case.
315 models considered. Categorized by headroom at the recommended quant + a sensible context for your use case.
No model fits comfortably on this build.
No borderline models — clean fit ladder.
| Model | Params | Quant | VRAM est. | Context | Evidence | Note |
|---|---|---|---|---|---|---|
| all-MiniLM-L6-v2 | 0B | Q4_K_M | 0 GB | 256 | No measured row yet | ~0.0 GB needed at Q4_K_M + 256 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| Piper | 0B | Q4_K_M | 0 GB | 0 | No measured row yet | ~0.0 GB needed at Q4_K_M + 0 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| Whisper Tiny | 0B | Q4_K_M | 0 GB | 30 | No measured row yet | ~0.0 GB needed at Q4_K_M + 30 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| Whisper Base | 0B | Q4_K_M | 0 GB | 30 | No measured row yet | ~0.0 GB needed at Q4_K_M + 30 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| Kokoro 82M | 0B | Q4_K_M | 0.1 GB | 0 | No measured row yet | ~0.1 GB needed at Q4_K_M + 0 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| all-mpnet-base-v2 | 0B | Q4_K_M | 0.1 GB | 384 | No measured row yet | ~0.1 GB needed at Q4_K_M + 384 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| paraphrase-multilingual-MiniLM-L12-v2 | 0B | Q4_K_M | 0.1 GB | 128 | No measured row yet | ~0.1 GB needed at Q4_K_M + 128 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| gpt2-base-french | 0B | Q4_K_M | 0.1 GB | 1,024 | No measured row yet | ~0.1 GB needed at Q4_K_M + 1,024 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| GPT-2 Spanish | 0B | Q4_K_M | 0.1 GB | 1,024 | No measured row yet | ~0.1 GB needed at Q4_K_M + 1,024 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| SmolLM2 135M Instruct | 0B | Q4_K_M | 0.2 GB | 8,192 | No measured row yet | ~0.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| Nomic Embed Text v1.5 | 0B | Q4_K_M | 0.2 GB | 8,192 | No measured row yet | ~0.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| GTE ModernBERT Base | 0B | Q4_K_M | 0.2 GB | 8,192 | No measured row yet | ~0.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| Dostoevsky Doesn't Write It GPT2 | 0B | Q4_K_M | 0.1 GB | 1,024 | No measured row yet | ~0.1 GB needed at Q4_K_M + 1,024 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| Whisper Small | 0B | Q4_K_M | 0.2 GB | 30 | No measured row yet | ~0.2 GB needed at Q4_K_M + 30 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| Gemma 3 270M | 0B | Q4_K_M | 0.3 GB | 8,192 | No measured row yet | ~0.3 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
| Jina Reranker v2 Base Multilingual | 0B | Q4_K_M | 0.2 GB | 1,024 | No measured row yet | ~0.2 GB needed at Q4_K_M + 1,024 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build. |
NVLink vs PCIe, tensor- vs pipeline-parallel, mixed-card honesty.
Curated multi-GPU / cluster setups with effective-VRAM math.
OS + runtime install commands for your stack.
Runtime × OS × hardware support truth table.
If you're sizing a fresh AI build (not just a card to drop into an existing system), the build-budget walkthroughs cover the whole BOM honestly: AI PC build under $1,000 or AI PC build under $2,000 cover the realistic 2026 budget tiers.
Vertical-fit shopping? AI PC for students covers the budget + portability tradeoffs; AI PC for developers covers the coding workflow specifics; AI PC for small business covers the document-RAG / always-on machine.
Form-factor first? See best laptop for local AI, best Mac for local AI, best mini PC for local AI, or best used GPU for local AI.