What can Apple Mac Studio (M3 Ultra) run locally?

21 models fit comfortably on Apple Mac Studio (M3 Ultra) with 32 GB system RAM. 14 more run with tradeoffs. Top pick: Gemma 3 1B.

What's the recommended runner for this hardware?

llama.cpp (Metal). This is the runner that gives best tokens-per-second on Apple Mac Studio (M3 Ultra) based on windows support and the GPU backend.

What can Apple Mac Studio (M3 Ultra) run for reasoning?

Build: Apple Mac Studio (M3 Ultra) + — + 32 GB RAM (windows)

Memory: 32 GB unified memory

Runner: llama.cpp (Metal)

Any Chat Coding Agents Reasoning Vision Long context Creative

Runs comfortably
21 models

Ranked by fit for reasoning use case + predicted speed. Click a row for VRAM breakdown.

#1Gemma 3 1B

gemma

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 10.0 GBHeadroom: 14.0 GB

ollama run gemma3:1b

976

tok/s

Weights

0.60 GB

KV cache

0.50 GB

Activations

8.22 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#2Llama 3.2 1B Instruct

llama

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 10.5 GBHeadroom: 13.5 GB

ollama run llama3.2:1b

554

tok/s

Weights

1.06 GB

KV cache

0.50 GB

Activations

8.25 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#3Gemma 4 E2B (Effective 2B)

gemma

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 12.1 GBHeadroom: 11.9 GB

ollama run gemma4:e2b

277

tok/s

Weights

2.13 GB

KV cache

1.00 GB

Activations

8.30 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#4Phi-3.5 Vision

4.2B

phi

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 13.7 GBHeadroom: 10.3 GB

232

tok/s

Weights

2.54 GB

KV cache

2.10 GB

Activations

8.32 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#5Llama 3.1 Nemotron Nano 8B

llama

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 18.0 GBHeadroom: 6.0 GB

122

tok/s

Weights

4.83 GB

KV cache

4.00 GB

Activations

8.43 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#6Llama 3.2 3B Instruct

llama

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 13.7 GBHeadroom: 10.3 GB

ollama run llama3.2:3b

185

tok/s

Weights

3.19 GB

KV cache

1.50 GB

Activations

8.35 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#7Phi-4 Reasoning 14B

14B

phi

Commercial OK

Quant: Q4_K_MContext: 2,048VRAM: 13.4 GBHeadroom: 10.6 GB

ollama run phi4-reasoning:14b

tok/s

Weights

8.45 GB

KV cache

1.75 GB

Activations

2.47 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#8DeepSeek R1 Distill Qwen 14B

14B

deepseek

Commercial OK

Quant: Q4_K_MContext: 2,048VRAM: 13.4 GBHeadroom: 10.6 GB

ollama run deepseek-r1:14b

tok/s

Weights

8.45 GB

KV cache

1.75 GB

Activations

2.47 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#9Phi-3.5 Mini Instruct

3.8B

phi

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 15.0 GBHeadroom: 9.0 GB

ollama run phi3.5:3.8b

146

tok/s

Weights

4.04 GB

KV cache

1.90 GB

Activations

8.39 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#10CodeGemma 7B

gemma

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 16.8 GBHeadroom: 7.2 GB

ollama run codegemma:7b

139

tok/s

Weights

4.23 GB

KV cache

3.50 GB

Activations

8.40 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#11Gemma 4 E4B (Effective 4B)

gemma

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 15.4 GBHeadroom: 8.6 GB

ollama run gemma4:e4b

139

tok/s

Weights

4.25 GB

KV cache

2.00 GB

Activations

8.40 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

#12Qwen 3 4B

qwen

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 15.4 GBHeadroom: 8.6 GB

ollama run qwen3:4b

139

tok/s

Weights

4.25 GB

KV cache

2.00 GB

Activations

8.40 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Runs with tradeoffs
14 models

Tight VRAM, partial CPU offload, or context-limited.

DeepSeek R1 Distill Qwen 7B

deepseek

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 20.2 GBHeadroom: 3.8 GB

• Tight VRAM fit — only 3.8 GB headroom left for context growth

ollama run deepseek-r1:7b

tok/s

Weights

7.44 GB

KV cache

3.50 GB

Activations

8.56 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Llama 3.2 11B Vision Instruct

11B

llama

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 21.4 GBHeadroom: 2.6 GB

• Tight VRAM fit — only 2.6 GB headroom left for context growth

ollama run llama3.2-vision:11b

tok/s

Weights

6.64 GB

KV cache

5.50 GB

Activations

8.52 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Gemma 3 12B

12B

gemma

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 22.5 GBHeadroom: 1.5 GB

• Tight VRAM fit — only 1.5 GB headroom left for context growth

ollama run gemma3:12b

tok/s

Weights

7.25 GB

KV cache

6.00 GB

Activations

8.55 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Pixtral 12B

12B

mistral

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 22.5 GBHeadroom: 1.5 GB

• Tight VRAM fit — only 1.5 GB headroom left for context growth

ollama run pixtral:12b

tok/s

Weights

7.25 GB

KV cache

6.00 GB

Activations

8.55 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Mistral Nemo 12B Instruct

12B

mistral

Commercial OK

Quant: Q5_K_MContext: 8,192VRAM: 23.6 GBHeadroom: 0.4 GB

• Tight VRAM fit — only 0.4 GB headroom left for context growth

ollama run mistral-nemo:12b

tok/s

Weights

8.25 GB

KV cache

6.00 GB

Activations

8.60 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Qwen 3 8B

qwen

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 21.8 GBHeadroom: 2.2 GB

• Tight VRAM fit — only 2.2 GB headroom left for context growth

ollama run qwen3:8b

tok/s

Weights

8.50 GB

KV cache

4.00 GB

Activations

8.62 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Hermes 3 Llama 3.1 8B

hermes

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 21.8 GBHeadroom: 2.2 GB

• Tight VRAM fit — only 2.2 GB headroom left for context growth

ollama run hermes3:8b

tok/s

Weights

8.50 GB

KV cache

4.00 GB

Activations

8.62 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

Gemma 2 9B Instruct

gemma

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 23.4 GBHeadroom: 0.6 GB

• Tight VRAM fit — only 0.6 GB headroom left for context growth

ollama run gemma2:9b

tok/s

Weights

9.56 GB

KV cache

4.50 GB

Activations

8.67 GB

Runtime

0.70 GB

Model details →Run-on benchmark page →

What if you upgraded?

Hypothetical scenarios. We re-ran the compatibility engine for each.

Move up an Apple memory tier

~$200–400 over base

On Apple Silicon, more unified memory is the only path forward — VRAM and system RAM are the same pool.

Shop this upgrade↗

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

Won't run
top 5 popular models

Need more memory than you have. Shown for orientation.

Qwen 3 235B-A22B

235B

qwen

Commercial OK

Needs ~160 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—

Llama 4 Scout

109B

llama

Commercial OK

Needs ~80 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—

DeepSeek R1 (671B reasoning)

671B

deepseek

Commercial OK

Needs ~420 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—

Qwen 3 30B-A3B

30B

qwen

Commercial OK

Needs ~22 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—

Llama 3.3 70B Instruct

70B

llama

Commercial OK

Needs ~48 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—

How to read these numbers

Measured — we ran this exact combo on owner hardware.

Extrapolated — predicted from a measured benchmark on similar-bandwidth hardware.

Estimated — pure formula based on VRAM bandwidth and model architecture.

Full methodology →

Want a specific benchmark we don't have? Email benchmarks@runlocalai.co and we'll prioritize it.

Runs comfortably21 models

Runs with tradeoffs14 models

What if you upgraded?

Move up an Apple memory tier

Won't runtop 5 popular models

How to read these numbers

Runs comfortably
21 models

Runs with tradeoffs
14 models

Won't run
top 5 popular models