RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
← Back to Will-it-run

Custom build engine

Describe your build — any GPUs, CPU, RAM, OS, runtime, use case. We'll compute effective VRAM honestly, recommend a runtime, and tell you which models fit comfortably, which are borderline, and which aren't practical.

Total VRAM ≠ pooled VRAM. We never sum VRAM unless the silicon truly pools (Apple unified memory). We always explain why effective is lower than total.

Calculations follow the RunLocalAI Will-It-Run Framework: effective VRAM, model working set, runtime constraints, fit tiers, and measured-vs-estimated evidence labels.

Describe your build

Add GPUs, set CPU/RAM/OS, optionally pick a runtime + use case. URL updates as you change fields — share a build by copying the URL.

Build summary

Total VRAM
0 GB
Effective VRAM
~0 GB
range 0-0 GB
Topology
apple cluster
thunderbolt
Setup difficulty
advanced
speed penalty ~60%

Measured evidence on this hardware

Publicly inspectable measured rows for the selected hardware slug(s). Exact measured rows calibrate the fit table instead of leaving it as pure VRAM estimation.

No publicly inspectable benchmark rows are attached to this exact hardware yet. The engine will still calculate fit and runtime, but speed rows will remain estimated.

Recommended runtime

Best engine for this topology + skill level + use case.

Exo Labs
primary
expert

Designed for multi-Mac clustering — shards model layers across Macs over Thunderbolt. Only viable runtime for spanning Apple Silicon machines today.

MLX-LM (single-node)
alternative
moderate

If your largest model fits a single Mac, run on one Mac. Cluster latency makes single-stream inference 3-5× slower; only cluster when capacity demands it.

WORKLOAD PROFILE
OVERFLOW
all-MiniLM-L6-v2 @ Q4_K_M, 0.3K context on Apple M4 Pro
0 GB0.0 GBVRAM ceiling
Weights0.0 GB
KV cache0.0 GB
Activations0.0 GB
Runtime0.7 GB
Overflow0.7 GB
ESTIMATED DECODE RATE
15015 tok/s
Bandwidth-derived estimate · efficiency 0.55. Real-world rates land within ±20% on well-tuned runtimes.
15015 tokens per second02550100150

Models that fit your build

315 models considered. Categorized by headroom at the recommended quant + a sensible context for your use case.

Comfortable
0 models · ≥15% headroom

No model fits comfortably on this build.

Borderline
0 models · tight, may need quant downgrade

No borderline models — clean fit ladder.

Not practical
16 models · oversize for this build
ModelParamsQuantVRAM est.ContextEvidenceNote
all-MiniLM-L6-v20BQ4_K_M0 GB256No measured row yet~0.0 GB needed at Q4_K_M + 256 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Piper0BQ4_K_M0 GB0No measured row yet~0.0 GB needed at Q4_K_M + 0 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Whisper Tiny0BQ4_K_M0 GB30No measured row yet~0.0 GB needed at Q4_K_M + 30 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Whisper Base0BQ4_K_M0 GB30No measured row yet~0.0 GB needed at Q4_K_M + 30 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Kokoro 82M0BQ4_K_M0.1 GB0No measured row yet~0.1 GB needed at Q4_K_M + 0 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
all-mpnet-base-v20BQ4_K_M0.1 GB384No measured row yet~0.1 GB needed at Q4_K_M + 384 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
paraphrase-multilingual-MiniLM-L12-v20BQ4_K_M0.1 GB128No measured row yet~0.1 GB needed at Q4_K_M + 128 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
gpt2-base-french0BQ4_K_M0.1 GB1,024No measured row yet~0.1 GB needed at Q4_K_M + 1,024 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
GPT-2 Spanish0BQ4_K_M0.1 GB1,024No measured row yet~0.1 GB needed at Q4_K_M + 1,024 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
SmolLM2 135M Instruct0BQ4_K_M0.2 GB8,192No measured row yet~0.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Nomic Embed Text v1.50BQ4_K_M0.2 GB8,192No measured row yet~0.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
GTE ModernBERT Base0BQ4_K_M0.2 GB8,192No measured row yet~0.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Dostoevsky Doesn't Write It GPT20BQ4_K_M0.1 GB1,024No measured row yet~0.1 GB needed at Q4_K_M + 1,024 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Whisper Small0BQ4_K_M0.2 GB30No measured row yet~0.2 GB needed at Q4_K_M + 30 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Gemma 3 270M0BQ4_K_M0.3 GB8,192No measured row yet~0.3 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Jina Reranker v2 Base Multilingual0BQ4_K_M0.2 GB1,024No measured row yet~0.2 GB needed at Q4_K_M + 1,024 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.

Related

Multi-GPU buying guide →

NVLink vs PCIe, tensor- vs pipeline-parallel, mixed-card honesty.

Hardware combinations →

Curated multi-GPU / cluster setups with effective-VRAM math.

Setup path-finder →

OS + runtime install commands for your stack.

Compatibility matrix →

Runtime × OS × hardware support truth table.

Shopping a full build instead of a single card?

If you're sizing a fresh AI build (not just a card to drop into an existing system), the build-budget walkthroughs cover the whole BOM honestly: AI PC build under $1,000 or AI PC build under $2,000 cover the realistic 2026 budget tiers.

Vertical-fit shopping? AI PC for students covers the budget + portability tradeoffs; AI PC for developers covers the coding workflow specifics; AI PC for small business covers the document-RAG / always-on machine.

Form-factor first? See best laptop for local AI, best Mac for local AI, best mini PC for local AI, or best used GPU for local AI.

See something off?Submit a benchmark·Report outdated·Suggest a correctionWe read every submission. Editorial review takes 1-7 days.