What can MacBook Pro 16" M4 Max run locally?

21 models fit comfortably on MacBook Pro 16" M4 Max with 32 GB system RAM. 14 more run with tradeoffs. Top pick: Gemma 3 1B.

What's the recommended runner for this hardware?

llama.cpp (Metal). This is the runner that gives best tokens-per-second on MacBook Pro 16" M4 Max based on windows support and the GPU backend.

What can MacBook Pro 16" M4 Max run for coding?

Build: MacBook Pro 16" M4 Max + — + 32 GB RAM (windows)

Memory: 32 GB unified memory

Runner: llama.cpp (Metal)

Any Chat Coding Agents Reasoning Vision Long context Creative

Runs comfortably
21 models

Ranked by fit for coding use case + predicted speed. Click a row for VRAM breakdown.

#1Gemma 3 1B

gemma

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 10.0 GBHeadroom: 14.0 GB

ollama run gemma3:1b

976

tok/s

Weights

0.60 GB

KV cache

0.50 GB

Activations

8.22 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#2Llama 3.2 1B Instruct

llama

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 10.5 GBHeadroom: 13.5 GB

ollama run llama3.2:1b

554

tok/s

Weights

1.06 GB

KV cache

0.50 GB

Activations

8.25 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#3Gemma 4 E2B (Effective 2B)

gemma

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 12.1 GBHeadroom: 11.9 GB

ollama run gemma4:e2b

277

tok/s

Weights

2.13 GB

KV cache

1.00 GB

Activations

8.30 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#4Phi-3.5 Vision

4.2B

phi

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 13.7 GBHeadroom: 10.3 GB

232

tok/s

Weights

2.54 GB

KV cache

2.10 GB

Activations

8.32 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#5CodeGemma 7B

gemma

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 16.8 GBHeadroom: 7.2 GB

ollama run codegemma:7b

139

tok/s

Weights

4.23 GB

KV cache

3.50 GB

Activations

8.40 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#6Llama 3.2 3B Instruct

llama

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 13.7 GBHeadroom: 10.3 GB

ollama run llama3.2:3b

185

tok/s

Weights

3.19 GB

KV cache

1.50 GB

Activations

8.35 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#7DeepSeek Coder V2 Lite (16B)

16B

deepseek

Commercial OK

Quant: Q4_K_MContext: 2,048VRAM: 14.9 GBHeadroom: 9.1 GB

ollama run deepseek-coder-v2:16b

tok/s

Weights

9.66 GB

KV cache

2.00 GB

Activations

2.53 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#8Phi-3.5 Mini Instruct

3.8B

phi

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 15.0 GBHeadroom: 9.0 GB

ollama run phi3.5:3.8b

146

tok/s

Weights

4.04 GB

KV cache

1.90 GB

Activations

8.39 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#9Qwen 2.5 7B Instruct

qwen

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 17.2 GBHeadroom: 6.8 GB

ollama run qwen2.5:7b

tok/s

Weights

7.44 GB

KV cache

0.47 GB

Activations

8.56 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#10Gemma 4 E4B (Effective 4B)

gemma

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 15.4 GBHeadroom: 8.6 GB

ollama run gemma4:e4b

139

tok/s

Weights

4.25 GB

KV cache

2.00 GB

Activations

8.40 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#11Qwen 3 4B

qwen

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 15.4 GBHeadroom: 8.6 GB

ollama run qwen3:4b

139

tok/s

Weights

4.25 GB

KV cache

2.00 GB

Activations

8.40 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#12Gemma 3 4B

gemma

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 15.4 GBHeadroom: 8.6 GB

ollama run gemma3:4b

139

tok/s

Weights

4.25 GB

KV cache

2.00 GB

Activations

8.40 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Runs with tradeoffs
14 models

Tight VRAM, partial CPU offload, or context-limited.

Qwen 3 8B

qwen

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 21.8 GBHeadroom: 2.2 GB

• Tight VRAM fit — only 2.2 GB headroom left for context growth

ollama run qwen3:8b

tok/s

Weights

8.50 GB

KV cache

4.00 GB

Activations

8.62 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Qwen 2.5 Coder 32B Instruct

32B

qwen

Commercial OK

Quant: Q4_K_MContext: 2,048VRAM: 23.6 GBHeadroom: 0.4 GB

• Tight VRAM fit — only 0.4 GB headroom left for context growth

ollama run qwen2.5-coder:32b

tok/s

Weights

19.32 GB

KV cache

0.54 GB

Activations

3.01 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Mistral Small 3 24B

24B

mistral

Commercial OK

Quant: Q4_K_MContext: 2,048VRAM: 21.0 GBHeadroom: 3.0 GB

• Tight VRAM fit — only 3.0 GB headroom left for context growth

ollama run mistral-small:24b

tok/s

Weights

14.49 GB

KV cache

3.00 GB

Activations

2.77 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Llama 3.2 11B Vision Instruct

11B

llama

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 21.4 GBHeadroom: 2.6 GB

• Tight VRAM fit — only 2.6 GB headroom left for context growth

ollama run llama3.2-vision:11b

tok/s

Weights

6.64 GB

KV cache

5.50 GB

Activations

8.52 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Gemma 3 12B

12B

gemma

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 22.5 GBHeadroom: 1.5 GB

• Tight VRAM fit — only 1.5 GB headroom left for context growth

ollama run gemma3:12b

tok/s

Weights

7.25 GB

KV cache

6.00 GB

Activations

8.55 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Pixtral 12B

12B

mistral

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 22.5 GBHeadroom: 1.5 GB

• Tight VRAM fit — only 1.5 GB headroom left for context growth

ollama run pixtral:12b

tok/s

Weights

7.25 GB

KV cache

6.00 GB

Activations

8.55 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

DeepSeek R1 Distill Qwen 7B

deepseek

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 20.2 GBHeadroom: 3.8 GB

• Tight VRAM fit — only 3.8 GB headroom left for context growth

ollama run deepseek-r1:7b

tok/s

Weights

7.44 GB

KV cache

3.50 GB

Activations

8.56 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Mistral Nemo 12B Instruct

12B

mistral

Commercial OK

Quant: Q5_K_MContext: 8,192VRAM: 23.6 GBHeadroom: 0.4 GB

• Tight VRAM fit — only 0.4 GB headroom left for context growth

ollama run mistral-nemo:12b

tok/s

Weights

8.25 GB

KV cache

6.00 GB

Activations

8.60 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

What if you upgraded?

Hypothetical scenarios. We re-ran the compatibility engine for each.

Move up an Apple memory tier

~$200–400 over base

On Apple Silicon, more unified memory is the only path forward — VRAM and system RAM are the same pool.

Shop this upgrade↗

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

Won't run
top 5 popular models

Need more memory than you have. Shown for orientation.

Qwen 3 235B-A22B

235B

qwen

Commercial OK

Needs ~160 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—

Llama 4 Scout

109B

llama

Commercial OK

Needs ~80 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—

DeepSeek R1 (671B reasoning)

671B

deepseek

Commercial OK

Needs ~420 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—

Qwen 3 30B-A3B

30B

qwen

Commercial OK

Needs ~22 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—

Llama 3.3 70B Instruct

70B

llama

Commercial OK

Needs ~48 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—

How to read these numbers

Measured — we ran this exact combo on owner hardware.

Extrapolated — predicted from a measured benchmark on similar-bandwidth hardware.

Estimated — pure formula based on VRAM bandwidth and model architecture.

Full methodology →

Want a specific benchmark we don't have? Email benchmarks@runlocalai.co and we'll prioritize it.

Runs comfortably21 models

Runs with tradeoffs14 models

What if you upgraded?

Move up an Apple memory tier

Won't runtop 5 popular models

How to read these numbers

Runs comfortably
21 models

Runs with tradeoffs
14 models

Won't run
top 5 popular models