Custom build engine

Describe your build — any GPUs, CPU, RAM, OS, runtime, use case. We'll compute effective VRAM honestly, recommend a runtime, and tell you which models fit comfortably, which are borderline, and which aren't practical.

Total VRAM ≠ pooled VRAM. We never sum VRAM unless the silicon truly pools (Apple unified memory). We always explain why effective is lower than total.

Calculations follow the RunLocalAI Will-It-Run Framework: effective VRAM, model working set, runtime constraints, fit tiers, and measured-vs-estimated evidence labels.

Describe your build

Add GPUs, set CPU/RAM/OS, optionally pick a runtime + use case. URL updates as you change fields — share a build by copying the URL.

GPUs in your build

CPU class

System RAM (GB)

Use case

Your skill level

Runtime preference

Build summary

Total VRAM

0 GB

Effective VRAM

~0 GB

range 0-0 GB

Topology

apple cluster

thunderbolt

Setup difficulty

advanced

speed penalty ~60%

Measured evidence on this hardware

Publicly inspectable measured rows for the selected hardware slug(s). Exact measured rows calibrate the fit table instead of leaving it as pure VRAM estimation.

No publicly inspectable benchmark rows are attached to this exact hardware yet. The engine will still calculate fit and runtime, but speed rows will remain estimated.

Recommended runtime

Best engine for this topology + skill level + use case.

Exo Labs

primary

expert

Designed for multi-Mac clustering — shards model layers across Macs over Thunderbolt. Only viable runtime for spanning Apple Silicon machines today.

MLX-LM (single-node)

alternative

moderate

If your largest model fits a single Mac, run on one Mac. Cluster latency makes single-stream inference 3-5× slower; only cluster when capacity demands it.

WORKLOAD PROFILE

OVERFLOW

all-MiniLM-L6-v2 @ Q4_K_M, 0.3K context on Apple M4 Pro

0 GB0.0 GBVRAM ceiling

Weights0.0 GB

KV cache0.0 GB

Activations0.0 GB

Runtime0.7 GB

Overflow0.7 GB

ESTIMATED DECODE RATE

15015 tok/s

Bandwidth-derived estimate · efficiency 0.55. Real-world rates land within ±20% on well-tuned runtimes.

Models that fit your build

315 models considered. Categorized by headroom at the recommended quant + a sensible context for your use case.

Comfortable

0 models · ≥15% headroom

No model fits comfortably on this build.

Borderline

0 models · tight, may need quant downgrade

No borderline models — clean fit ladder.

Not practical

16 models · oversize for this build

Model	Params	Quant	VRAM est.	Context	Evidence	Note
all-MiniLM-L6-v2	0B	Q4_K_M	0 GB	256	No measured row yet	~0.0 GB needed at Q4_K_M + 256 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Piper	0B	Q4_K_M	0 GB	0	No measured row yet	~0.0 GB needed at Q4_K_M + 0 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Whisper Tiny	0B	Q4_K_M	0 GB	30	No measured row yet	~0.0 GB needed at Q4_K_M + 30 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Whisper Base	0B	Q4_K_M	0 GB	30	No measured row yet	~0.0 GB needed at Q4_K_M + 30 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Kokoro 82M	0B	Q4_K_M	0.1 GB	0	No measured row yet	~0.1 GB needed at Q4_K_M + 0 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
all-mpnet-base-v2	0B	Q4_K_M	0.1 GB	384	No measured row yet	~0.1 GB needed at Q4_K_M + 384 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
paraphrase-multilingual-MiniLM-L12-v2	0B	Q4_K_M	0.1 GB	128	No measured row yet	~0.1 GB needed at Q4_K_M + 128 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
gpt2-base-french	0B	Q4_K_M	0.1 GB	1,024	No measured row yet	~0.1 GB needed at Q4_K_M + 1,024 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
GPT-2 Spanish	0B	Q4_K_M	0.1 GB	1,024	No measured row yet	~0.1 GB needed at Q4_K_M + 1,024 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
SmolLM2 135M Instruct	0B	Q4_K_M	0.2 GB	8,192	No measured row yet	~0.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Nomic Embed Text v1.5	0B	Q4_K_M	0.2 GB	8,192	No measured row yet	~0.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
GTE ModernBERT Base	0B	Q4_K_M	0.2 GB	8,192	No measured row yet	~0.2 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Dostoevsky Doesn't Write It GPT2	0B	Q4_K_M	0.1 GB	1,024	No measured row yet	~0.1 GB needed at Q4_K_M + 1,024 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Whisper Small	0B	Q4_K_M	0.2 GB	30	No measured row yet	~0.2 GB needed at Q4_K_M + 30 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Gemma 3 270M	0B	Q4_K_M	0.3 GB	8,192	No measured row yet	~0.3 GB needed at Q4_K_M + 8,192 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.
Jina Reranker v2 Base Multilingual	0B	Q4_K_M	0.2 GB	1,024	No measured row yet	~0.2 GB needed at Q4_K_M + 1,024 ctx — overshoots effective VRAM by Infinity%. Drop quant or move to a larger build.

Multi-GPU buying guide →

NVLink vs PCIe, tensor- vs pipeline-parallel, mixed-card honesty.

Hardware combinations →

Curated multi-GPU / cluster setups with effective-VRAM math.

Setup path-finder →

OS + runtime install commands for your stack.

Compatibility matrix →

Runtime × OS × hardware support truth table.

Shopping a full build instead of a single card?

If you're sizing a fresh AI build (not just a card to drop into an existing system), the build-budget walkthroughs cover the whole BOM honestly: AI PC build under $1,000 or AI PC build under $2,000 cover the realistic 2026 budget tiers.

Vertical-fit shopping? AI PC for students covers the budget + portability tradeoffs; AI PC for developers covers the coding workflow specifics; AI PC for small business covers the document-RAG / always-on machine.

Form-factor first? See best laptop for local AI, best Mac for local AI, best mini PC for local AI, or best used GPU for local AI.