RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Will it run? / Apple Mac Studio (M3 Ultra) / coding

What can Apple Mac Studio (M3 Ultra) run for coding?

Build: Apple Mac Studio (M3 Ultra) + — + 32 GB RAM (windows)

Memory: 32 GB unified memory
Runner: llama.cpp (Metal)
AnyChatCodingAgentsReasoningVisionLong contextCreative

Runs comfortably
138 models

Ranked by fit for coding use case + predicted speed. Click a row for VRAM breakdown.

#1CodeGemma 7B
7B
gemma
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 8.6 GBHeadroom: 15.4 GB
ollama run codegemma:7b
117
tok/s
Estimated
Weights
4.23 GB
KV cache
3.50 GB
Activations
0.22 GB
Runtime
0.70 GB
Model details →
#2DeepSeek Coder V2 Lite (16B)
16B
deepseek
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 18.9 GBHeadroom: 5.1 GB
ollama run deepseek-coder-v2:16b
51
tok/s
Estimated
Weights
9.66 GB
KV cache
8.00 GB
Activations
0.49 GB
Runtime
0.70 GB
Model details →
#3Codestral 22B
22B
mistral
Quant: Q4_K_MContext: 2,048VRAM: 17.4 GBHeadroom: 6.6 GB
ollama run codestral:22b
37
tok/s
Estimated
Weights
13.28 GB
KV cache
2.75 GB
Activations
0.67 GB
Runtime
0.70 GB
Model details →
#4Qwen 2.5 7B Instruct
7B
qwen
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 9.0 GBHeadroom: 15.0 GB
ollama run qwen2.5:7b
67
tok/s
Estimated
Weights
7.44 GB
KV cache
0.47 GB
Activations
0.38 GB
Runtime
0.70 GB
Model details →
#5Qwen 3 8B
8B
qwen
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 13.6 GBHeadroom: 10.4 GB
ollama run qwen3:8b
58
tok/s
Estimated
Weights
8.50 GB
KV cache
4.00 GB
Activations
0.43 GB
Runtime
0.70 GB
Model details →
#6Mistral Small 3 24B
24B
mistral
Commercial OK
Quant: Q4_K_MContext: 2,048VRAM: 18.9 GBHeadroom: 5.1 GB
ollama run mistral-small:24b
34
tok/s
Estimated
Weights
14.49 GB
KV cache
3.00 GB
Activations
0.73 GB
Runtime
0.70 GB
Model details →
#7Llama 3.1 8B Instruct
8B
llama
Commercial OK
Quant: FP16Context: 8,192VRAM: 18.6 GBHeadroom: 5.4 GB
ollama run llama3.1:8b
31
tok/s
Estimated
Weights
16.00 GB
KV cache
1.07 GB
Activations
0.81 GB
Runtime
0.70 GB
Model details →
#8StarCoder 2 3B
3B
other
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 4.1 GBHeadroom: 19.9 GB
274
tok/s
Estimated
Weights
1.81 GB
KV cache
1.50 GB
Activations
0.10 GB
Runtime
0.70 GB
Model details →
#9Qwen 2.5 Coder 3B
3B
qwen
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 4.1 GBHeadroom: 19.9 GB
274
tok/s
Estimated
Weights
1.81 GB
KV cache
1.50 GB
Activations
0.10 GB
Runtime
0.70 GB
Model details →
#10StarCoder 2 7B
7B
other
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 8.6 GBHeadroom: 15.4 GB
117
tok/s
Estimated
Weights
4.23 GB
KV cache
3.50 GB
Activations
0.22 GB
Runtime
0.70 GB
Model details →
#11Codestral Mamba 7B
7B
mistral
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 8.6 GBHeadroom: 15.4 GB
117
tok/s
Estimated
Weights
4.23 GB
KV cache
3.50 GB
Activations
0.22 GB
Runtime
0.70 GB
Model details →
#12Gervásio 8B PTPT
8B
llama
Commercial OK
Quant: Q4_K_MContext: 4,096VRAM: 7.8 GBHeadroom: 16.2 GB
103
tok/s
Estimated
Weights
4.83 GB
KV cache
2.00 GB
Activations
0.25 GB
Runtime
0.70 GB
Model details →

Runs with tradeoffs
20 models

Tight VRAM, partial CPU offload, or context-limited.

Qwen 2.5 Coder 32B Instruct
32B
qwen
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 23.1 GBHeadroom: 0.9 GB
  • • Tight VRAM fit — only 0.9 GB headroom left for context growth
ollama run qwen2.5-coder:32b
26
tok/s
Estimated
Weights
19.32 GB
KV cache
2.15 GB
Activations
0.97 GB
Runtime
0.70 GB
Model details →
Qwen 3 30B-A3B
30B
qwen
Commercial OK
Quant: Q4_K_MContext: 2,048VRAM: 23.5 GBHeadroom: 0.5 GB
  • • Tight VRAM fit — only 0.5 GB headroom left for context growth
ollama run qwen3:30b
27
tok/s
Estimated
Weights
18.11 GB
KV cache
3.75 GB
Activations
0.91 GB
Runtime
0.70 GB
Model details →
Qwen 3 14B
14B
qwen
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 23.3 GBHeadroom: 0.7 GB
  • • Tight VRAM fit — only 0.7 GB headroom left for context growth
ollama run qwen3:14b
33
tok/s
Estimated
Weights
14.88 GB
KV cache
7.00 GB
Activations
0.75 GB
Runtime
0.70 GB
Model details →
Qwen 2.5 14B Instruct
14B
qwen
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 23.3 GBHeadroom: 0.7 GB
  • • Tight VRAM fit — only 0.7 GB headroom left for context growth
ollama run qwen2.5:14b
33
tok/s
Estimated
Weights
14.88 GB
KV cache
7.00 GB
Activations
0.75 GB
Runtime
0.70 GB
Model details →
Nemotron 3 Nano (30B-A3B)
30B
other
Commercial OK
Quant: Q4_K_MContext: 2,048VRAM: 23.5 GBHeadroom: 0.5 GB
  • • Tight VRAM fit — only 0.5 GB headroom left for context growth
ollama run nemotron3:nano
27
tok/s
Estimated
Weights
18.11 GB
KV cache
3.75 GB
Activations
0.91 GB
Runtime
0.70 GB
Model details →
Sarvam 30B
30B
other
Commercial OK
Quant: Q4_K_MContext: 2,048VRAM: 23.5 GBHeadroom: 0.5 GB
  • • Tight VRAM fit — only 0.5 GB headroom left for context growth
27
tok/s
Estimated
Weights
18.11 GB
KV cache
3.75 GB
Activations
0.91 GB
Runtime
0.70 GB
Model details →
GPT-OSS Swallow 20B RL v0.1
20B
other
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 23.4 GBHeadroom: 0.6 GB
  • • Tight VRAM fit — only 0.6 GB headroom left for context growth
41
tok/s
Estimated
Weights
12.07 GB
KV cache
10.00 GB
Activations
0.61 GB
Runtime
0.70 GB
Model details →
Sarvam M
24B
mistral
Commercial OK
Quant: Q4_K_MContext: 4,096VRAM: 21.9 GBHeadroom: 2.1 GB
  • • Tight VRAM fit — only 2.1 GB headroom left for context growth
34
tok/s
Estimated
Weights
14.49 GB
KV cache
6.00 GB
Activations
0.73 GB
Runtime
0.70 GB
Model details →

What if you upgraded?

Hypothetical scenarios. We re-ran the compatibility engine for each.

Move up an Apple memory tier

~$200–400 over base

On Apple Silicon, more unified memory is the only path forward — VRAM and system RAM are the same pool.

Shop this upgrade↗

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

Won't run
top 5 popular models

Need more memory than you have. Shown for orientation.

DeepSeek V4 Pro (1.6T MoE)
1600B
deepseek
Commercial OK

Needs ~1024 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—
Qwen 3.5 235B-A17B (MoE)
397B
qwen
Commercial OK

Needs ~256 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—
Qwen 3 235B-A22B
235B
qwen
Commercial OK

Needs ~160 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—
DeepSeek R1 (671B reasoning)
671B
deepseek
Commercial OK

Needs ~420 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—
DeepSeek V4 Flash (284B MoE)
284B
deepseek
Commercial OK

Needs ~192 GB unified memory minimum at smallest quant; you have 24 GB available after OS overhead.

—

How to read these numbers

Measured here
Measured here - RunLocalAI ran this exact combo on owner hardware with public evidence.

Source-backed
Source-backed / community - a reproduced public source supports the speed, but it is not labeled as owner-measured.

Extrapolated
Extrapolated - predicted from a measured benchmark on similar-bandwidth hardware.

Estimated
Estimated - formula based on VRAM bandwidth and model architecture; not a benchmark row.

RunLocalAI Will-It-Run Framework →

Want a specific benchmark we don't have? Email Contact support and we'll prioritize it.