RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Will it run? / MacBook Pro 16" M4 Max

What can MacBook Pro 16" M4 Max run?

Build: MacBook Pro M4 Max 64GB

Memory: 64 GB unified memory
Runner: MLX-LM (Apple Metal)
AnyChatCodingAgentsReasoningVisionLong contextCreative

Runs comfortably
255 models

Full-VRAM resident, with room for context. No compromises.

#1all-MiniLM-L6-v2
0.022B
other
Commercial OK
Quant: Q4_K_MContext: 256VRAM: 0.5 GBHeadroom: 55.5 GB
22609
tok/s
Estimated
Weights
0.01 GB
KV cache
0.00 GB
Activations
0.00 GB
Runtime
0.50 GB
Model details →
#2Piper
0.025B
other
Commercial OK
Quant: Q4_K_MContext: 0VRAM: 0.5 GBHeadroom: 55.5 GB
19896
tok/s
Estimated
Weights
0.02 GB
KV cache
0.00 GB
Activations
0.00 GB
Runtime
0.50 GB
Model details →
#3Whisper Tiny
0.039B
other
Commercial OK
Quant: Q4_K_MContext: 30VRAM: 0.5 GBHeadroom: 55.5 GB
12754
tok/s
Estimated
Weights
0.02 GB
KV cache
0.00 GB
Activations
0.00 GB
Runtime
0.50 GB
Model details →
#4Whisper Base
0.074B
other
Commercial OK
Quant: Q4_K_MContext: 30VRAM: 0.5 GBHeadroom: 55.5 GB
6722
tok/s
Estimated
Weights
0.04 GB
KV cache
0.00 GB
Activations
0.00 GB
Runtime
0.50 GB
Model details →
#5Kokoro 82M
0.082B
other
Commercial OK
Quant: Q4_K_MContext: 0VRAM: 0.6 GBHeadroom: 55.4 GB
6066
tok/s
Estimated
Weights
0.05 GB
KV cache
0.00 GB
Activations
0.00 GB
Runtime
0.50 GB
Model details →
#6all-mpnet-base-v2
0.109B
other
Commercial OK
Quant: Q4_K_MContext: 384VRAM: 0.6 GBHeadroom: 55.4 GB
4563
tok/s
Estimated
Weights
0.07 GB
KV cache
0.00 GB
Activations
0.00 GB
Runtime
0.50 GB
Model details →
#7paraphrase-multilingual-MiniLM-L12-v2
0.118B
other
Commercial OK
Quant: Q4_K_MContext: 128VRAM: 0.6 GBHeadroom: 55.4 GB
4215
tok/s
Estimated
Weights
0.07 GB
KV cache
0.00 GB
Activations
0.00 GB
Runtime
0.50 GB
Model details →
#8Nomic Embed Text v1.5
0.137B
other
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 0.7 GBHeadroom: 55.3 GB
3631
tok/s
Estimated
Weights
0.08 GB
KV cache
0.07 GB
Activations
0.01 GB
Runtime
0.50 GB
Model details →
#9SmolLM2 135M Instruct
0.135B
other
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 0.7 GBHeadroom: 55.3 GB
3684
tok/s
Estimated
Weights
0.08 GB
KV cache
0.07 GB
Activations
0.01 GB
Runtime
0.50 GB
Model details →
#10GTE ModernBERT Base
0.149B
other
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 0.7 GBHeadroom: 55.3 GB
3338
tok/s
Estimated
Weights
0.09 GB
KV cache
0.07 GB
Activations
0.01 GB
Runtime
0.50 GB
Model details →
#11Whisper Small
0.244B
other
Commercial OK
Quant: Q4_K_MContext: 30VRAM: 0.7 GBHeadroom: 55.3 GB
2038
tok/s
Estimated
Weights
0.15 GB
KV cache
0.00 GB
Activations
0.01 GB
Runtime
0.50 GB
Model details →
#12Gemma 3 270M
0.27B
gemma
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 0.8 GBHeadroom: 55.2 GB
1842
tok/s
Estimated
Weights
0.16 GB
KV cache
0.14 GB
Activations
0.02 GB
Runtime
0.50 GB
Model details →

Runs with tradeoffs
16 models

Tight VRAM, partial CPU offload, or context-limited.

Llama 3.3 70B Instruct
70B
llama
Commercial OK
Quant: Q5_K_MContext: 8,192VRAM: 53.7 GBHeadroom: 2.3 GB
  • • Tight VRAM fit — only 2.3 GB headroom left for context growth
ollama run llama3.3:70b
6
tok/s
Estimated
Weights
48.13 GB
KV cache
2.68 GB
Activations
2.41 GB
Runtime
0.50 GB
Model details →
Qwen 3 32B
32B
qwen
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 52.2 GBHeadroom: 3.8 GB
  • • Tight VRAM fit — only 3.8 GB headroom left for context growth
ollama run qwen3:32b
9
tok/s
Estimated
Weights
34.00 GB
KV cache
16.00 GB
Activations
1.71 GB
Runtime
0.50 GB
Model details →
DeepSeek R1 Distill Llama 70B
70B
deepseek
Commercial OK
Quant: Q4_K_MContext: 2,048VRAM: 53.6 GBHeadroom: 2.4 GB
  • • Tight VRAM fit — only 2.4 GB headroom left for context growth
ollama run deepseek-r1:70b
7
tok/s
Estimated
Weights
42.26 GB
KV cache
8.75 GB
Activations
2.12 GB
Runtime
0.50 GB
Model details →
DeepSeek R1 Distill Qwen 32B
32B
deepseek
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 52.2 GBHeadroom: 3.8 GB
  • • Tight VRAM fit — only 3.8 GB headroom left for context growth
ollama run deepseek-r1:32b
9
tok/s
Estimated
Weights
34.00 GB
KV cache
16.00 GB
Activations
1.71 GB
Runtime
0.50 GB
Model details →
Llama 3.1 70B Instruct
70B
llama
Commercial OK
Quant: Q4_K_MContext: 2,048VRAM: 53.6 GBHeadroom: 2.4 GB
  • • Tight VRAM fit — only 2.4 GB headroom left for context growth
ollama run llama3.1:70b
7
tok/s
Estimated
Weights
42.26 GB
KV cache
8.75 GB
Activations
2.12 GB
Runtime
0.50 GB
Model details →
Qwen 2.5 32B Instruct
32B
qwen
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 52.2 GBHeadroom: 3.8 GB
  • • Tight VRAM fit — only 3.8 GB headroom left for context growth
ollama run qwen2.5:32b
9
tok/s
Estimated
Weights
34.00 GB
KV cache
16.00 GB
Activations
1.71 GB
Runtime
0.50 GB
Model details →
Llama 3.1 Nemotron 70B Instruct
70B
llama
Commercial OK
Quant: Q4_K_MContext: 2,048VRAM: 53.6 GBHeadroom: 2.4 GB
  • • Tight VRAM fit — only 2.4 GB headroom left for context growth
ollama run nemotron:70b
7
tok/s
Estimated
Weights
42.26 GB
KV cache
8.75 GB
Activations
2.12 GB
Runtime
0.50 GB
Model details →
Qwen 2.5 72B Instruct
72B
qwen
Commercial OK
Quant: Q4_K_MContext: 2,048VRAM: 55.1 GBHeadroom: 0.9 GB
  • • Tight VRAM fit — only 0.9 GB headroom left for context growth
ollama run qwen2.5:72b
7
tok/s
Estimated
Weights
43.47 GB
KV cache
9.00 GB
Activations
2.18 GB
Runtime
0.50 GB
Model details →

What if you upgraded?

Hypothetical scenarios. We re-ran the compatibility engine for each.

Move up an Apple memory tier

~$200–400 over base

On Apple Silicon, more unified memory is the only path forward — VRAM and system RAM are the same pool.

Shop this upgrade↗

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

Won't run
top 5 popular models

Need more memory than you have. Shown for orientation.

DeepSeek V4 Pro (1.6T MoE)
1600B
deepseek
Commercial OK

Needs ~1024 GB unified memory minimum at smallest quant; you have 56 GB available after OS overhead.

—
Qwen 3.5 235B-A17B (MoE)
397B
qwen
Commercial OK

Needs ~256 GB unified memory minimum at smallest quant; you have 56 GB available after OS overhead.

—
Qwen 3 235B-A22B
235B
qwen
Commercial OK

Needs ~160 GB unified memory minimum at smallest quant; you have 56 GB available after OS overhead.

—
DeepSeek R1 (671B reasoning)
671B
deepseek
Commercial OK

Needs ~420 GB unified memory minimum at smallest quant; you have 56 GB available after OS overhead.

—
DeepSeek V4 Flash (284B MoE)
284B
deepseek
Commercial OK

Needs ~192 GB unified memory minimum at smallest quant; you have 56 GB available after OS overhead.

—

How to read these numbers

Measured here
Measured here - RunLocalAI ran this exact combo on owner hardware with public evidence.

Source-backed
Source-backed / community - a reproduced public source supports the speed, but it is not labeled as owner-measured.

Extrapolated
Extrapolated - predicted from a measured benchmark on similar-bandwidth hardware.

Estimated
Estimated - formula based on VRAM bandwidth and model architecture; not a benchmark row.

RunLocalAI Will-It-Run Framework →

Want a specific benchmark we don't have? Email Contact support and we'll prioritize it.