What can MacBook Pro 16" M4 Max run?

Build: MacBook Pro M4 Max 128GB

Memory: 128 GB unified memory
Runner: MLX-LM (Apple Metal)

Runs comfortably
59 models

Full-VRAM resident, with room for context. No compromises.

#1Gemma 3 1B
1B
gemma
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 9.8 GBHeadroom: 110.2 GB
ollama run gemma3:1b
865
tok/s
E
Weights
0.60 GB
KV cache
0.50 GB
Activations
8.22 GB
Runtime
0.50 GB
#2Llama 3.2 1B Instruct
1B
llama
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 10.3 GBHeadroom: 109.7 GB
ollama run llama3.2:1b
492
tok/s
E
Weights
1.06 GB
KV cache
0.50 GB
Activations
8.25 GB
Runtime
0.50 GB
#3Gemma 4 E2B (Effective 2B)
2B
gemma
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 11.9 GBHeadroom: 108.1 GB
ollama run gemma4:e2b
246
tok/s
E
Weights
2.13 GB
KV cache
1.00 GB
Activations
8.30 GB
Runtime
0.50 GB
#4Llama 3.2 3B Instruct
3B
llama
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 13.5 GBHeadroom: 106.5 GB
ollama run llama3.2:3b
164
tok/s
E
Weights
3.19 GB
KV cache
1.50 GB
Activations
8.35 GB
Runtime
0.50 GB
#5Phi-3.5 Vision
4.2B
phi
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 13.5 GBHeadroom: 106.5 GB
206
tok/s
E
Weights
2.54 GB
KV cache
2.10 GB
Activations
8.32 GB
Runtime
0.50 GB
#6Phi-3.5 Mini Instruct
3.8B
phi
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 14.8 GBHeadroom: 105.2 GB
ollama run phi3.5:3.8b
129
tok/s
E
Weights
4.04 GB
KV cache
1.90 GB
Activations
8.39 GB
Runtime
0.50 GB
#7Gemma 4 E4B (Effective 4B)
4B
gemma
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 15.2 GBHeadroom: 104.8 GB
ollama run gemma4:e4b
123
tok/s
E
Weights
4.25 GB
KV cache
2.00 GB
Activations
8.40 GB
Runtime
0.50 GB
#8Qwen 3 4B
4B
qwen
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 15.2 GBHeadroom: 104.8 GB
ollama run qwen3:4b
123
tok/s
E
Weights
4.25 GB
KV cache
2.00 GB
Activations
8.40 GB
Runtime
0.50 GB
#9Gemma 3 4B
4B
gemma
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 15.2 GBHeadroom: 104.8 GB
ollama run gemma3:4b
123
tok/s
E
Weights
4.25 GB
KV cache
2.00 GB
Activations
8.40 GB
Runtime
0.50 GB
#10Llama 3.1 Nemotron Nano 8B
8B
llama
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 17.8 GBHeadroom: 102.2 GB
108
tok/s
E
Weights
4.83 GB
KV cache
4.00 GB
Activations
8.43 GB
Runtime
0.50 GB
#11Mistral 7B Instruct v0.3
7B
mistral
Commercial OK
Quant: Q5_K_MContext: 8,192VRAM: 17.2 GBHeadroom: 102.8 GB
ollama run mistral:7b
109
tok/s
E
Weights
4.81 GB
KV cache
3.50 GB
Activations
8.43 GB
Runtime
0.50 GB
#12CodeGemma 7B
7B
gemma
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 16.6 GBHeadroom: 103.4 GB
ollama run codegemma:7b
124
tok/s
E
Weights
4.23 GB
KV cache
3.50 GB
Activations
8.40 GB
Runtime
0.50 GB

What if you upgraded?

Hypothetical scenarios. We re-ran the compatibility engine for each.

Move up an Apple memory tier

~$200–400 over base

On Apple Silicon, more unified memory is the only path forward — VRAM and system RAM are the same pool.

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

Won't run
top 5 popular models

Need more memory than you have. Shown for orientation.

Qwen 3 235B-A22B
235B
qwen
Commercial OK

Needs ~160 GB unified memory minimum at smallest quant; you have 120 GB available after OS overhead.

DeepSeek R1 (671B reasoning)
671B
deepseek
Commercial OK

Needs ~420 GB unified memory minimum at smallest quant; you have 120 GB available after OS overhead.

GLM-5
200B
other
Commercial OK

Needs ~140 GB unified memory minimum at smallest quant; you have 120 GB available after OS overhead.

DeepSeek V3 (671B MoE)
671B
deepseek
Commercial OK

Needs ~420 GB unified memory minimum at smallest quant; you have 120 GB available after OS overhead.

Kimi K2.6
1000B
other
Commercial OK

Needs ~700 GB unified memory minimum at smallest quant; you have 120 GB available after OS overhead.

How to read these numbers

M
Measured — we ran this exact combo on owner hardware.

~
Extrapolated — predicted from a measured benchmark on similar-bandwidth hardware.

E
Estimated — pure formula based on VRAM bandwidth and model architecture.

Full methodology →

Want a specific benchmark we don't have? Email benchmarks@runlocalai.co and we'll prioritize it.