What can MacBook Pro 16" M4 Max run locally?

59 models fit comfortably on MacBook Pro 16" M4 Max with 128 GB system RAM. 0 more run with tradeoffs. Top pick: Gemma 3 1B.

What's the recommended runner for this hardware?

MLX-LM (Apple Metal). This is the runner that gives best tokens-per-second on MacBook Pro 16" M4 Max based on macos support and the GPU backend.

What can MacBook Pro 16" M4 Max run?

Build: MacBook Pro M4 Max 128GB

Memory: 128 GB unified memory

Runner: MLX-LM (Apple Metal)

Any Chat Coding Agents Reasoning Vision Long context Creative

Runs comfortably
59 models

Full-VRAM resident, with room for context. No compromises.

#1Gemma 3 1B

gemma

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 9.8 GBHeadroom: 110.2 GB

ollama run gemma3:1b

865

tok/s

Weights

0.60 GB

KV cache

0.50 GB

Activations

8.22 GB

Runtime

0.50 GB

Model details →Run-on benchmark page →

#2Llama 3.2 1B Instruct

llama

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 10.3 GBHeadroom: 109.7 GB

ollama run llama3.2:1b

492

tok/s

Weights

1.06 GB

KV cache

0.50 GB

Activations

8.25 GB

Runtime

0.50 GB

Model details →Run-on benchmark page →

#3Gemma 4 E2B (Effective 2B)

gemma

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 11.9 GBHeadroom: 108.1 GB

ollama run gemma4:e2b

246

tok/s

Weights

2.13 GB

KV cache

1.00 GB

Activations

8.30 GB

Runtime

0.50 GB

Model details →Run-on benchmark page →

#4Llama 3.2 3B Instruct

llama

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 13.5 GBHeadroom: 106.5 GB

ollama run llama3.2:3b

164

tok/s

Weights

3.19 GB

KV cache

1.50 GB

Activations

8.35 GB

Runtime

0.50 GB

Model details →Run-on benchmark page →

#5Phi-3.5 Vision

4.2B

phi

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 13.5 GBHeadroom: 106.5 GB

206

tok/s

Weights

2.54 GB

KV cache

2.10 GB

Activations

8.32 GB

Runtime

0.50 GB

Model details →Run-on benchmark page →

#6Phi-3.5 Mini Instruct

3.8B

phi

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 14.8 GBHeadroom: 105.2 GB

ollama run phi3.5:3.8b

129

tok/s

Weights

4.04 GB

KV cache

1.90 GB

Activations

8.39 GB

Runtime

0.50 GB

Model details →Run-on benchmark page →

#7Gemma 4 E4B (Effective 4B)

gemma

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 15.2 GBHeadroom: 104.8 GB

ollama run gemma4:e4b

123

tok/s

Weights

4.25 GB

KV cache

2.00 GB

Activations

8.40 GB

Runtime

0.50 GB

Model details →Run-on benchmark page →

#8Qwen 3 4B

qwen

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 15.2 GBHeadroom: 104.8 GB

ollama run qwen3:4b

123

tok/s

Weights

4.25 GB

KV cache

2.00 GB

Activations

8.40 GB

Runtime

0.50 GB

Model details →Run-on benchmark page →

#9Gemma 3 4B

gemma

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 15.2 GBHeadroom: 104.8 GB

ollama run gemma3:4b

123

tok/s

Weights

4.25 GB

KV cache

2.00 GB

Activations

8.40 GB

Runtime

0.50 GB

Model details →Run-on benchmark page →

#10Llama 3.1 Nemotron Nano 8B

llama

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 17.8 GBHeadroom: 102.2 GB

108

tok/s

Weights

4.83 GB

KV cache

4.00 GB

Activations

8.43 GB

Runtime

0.50 GB

Model details →Run-on benchmark page →

#11Mistral 7B Instruct v0.3

mistral

Commercial OK

Quant: Q5_K_MContext: 8,192VRAM: 17.2 GBHeadroom: 102.8 GB

ollama run mistral:7b

109

tok/s

Weights

4.81 GB

KV cache

3.50 GB

Activations

8.43 GB

Runtime

0.50 GB

Model details →Run-on benchmark page →

#12CodeGemma 7B

gemma

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 16.6 GBHeadroom: 103.4 GB

ollama run codegemma:7b

124

tok/s

Weights

4.23 GB

KV cache

3.50 GB

Activations

8.40 GB

Runtime

0.50 GB

Model details →Run-on benchmark page →

What if you upgraded?

Hypothetical scenarios. We re-ran the compatibility engine for each.

Move up an Apple memory tier

~$200–400 over base

On Apple Silicon, more unified memory is the only path forward — VRAM and system RAM are the same pool.

Shop this upgrade↗

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

Won't run
top 5 popular models

Need more memory than you have. Shown for orientation.

Qwen 3 235B-A22B

235B

qwen

Commercial OK

Needs ~160 GB unified memory minimum at smallest quant; you have 120 GB available after OS overhead.

—

DeepSeek R1 (671B reasoning)

671B

deepseek

Commercial OK

Needs ~420 GB unified memory minimum at smallest quant; you have 120 GB available after OS overhead.

—

GLM-5

200B

other

Commercial OK

Needs ~140 GB unified memory minimum at smallest quant; you have 120 GB available after OS overhead.

—

DeepSeek V3 (671B MoE)

671B

deepseek

Commercial OK

Needs ~420 GB unified memory minimum at smallest quant; you have 120 GB available after OS overhead.

—

Kimi K2.6

1000B

other

Commercial OK

Needs ~700 GB unified memory minimum at smallest quant; you have 120 GB available after OS overhead.

—

How to read these numbers

Measured — we ran this exact combo on owner hardware.

Extrapolated — predicted from a measured benchmark on similar-bandwidth hardware.

Estimated — pure formula based on VRAM bandwidth and model architecture.

Full methodology →

Want a specific benchmark we don't have? Email benchmarks@runlocalai.co and we'll prioritize it.

Runs comfortably59 models

What if you upgraded?

Move up an Apple memory tier

Won't runtop 5 popular models

How to read these numbers

Runs comfortably
59 models

Won't run
top 5 popular models