What can Apple Mac Studio (M3 Ultra) run locally?

300 models fit comfortably on Apple Mac Studio (M3 Ultra) with 256 GB system RAM. 0 more run with tradeoffs. Top pick: all-MiniLM-L6-v2.

What's the recommended runner for this hardware?

MLX-LM (Apple Metal). This is the runner that gives best tokens-per-second on Apple Mac Studio (M3 Ultra) based on macos support and the GPU backend.

What can Apple Mac Studio (M3 Ultra) run?

Build: Mac Studio M3 Ultra 256GB

Memory: 256 GB unified memory

Runner: MLX-LM (Apple Metal)

Any Chat Coding Agents Reasoning Vision Long context Creative

Runs comfortably
300 models

Full-VRAM resident, with room for context. No compromises.

#1all-MiniLM-L6-v2

0.022B

other

Commercial OK

Quant: Q4_K_MContext: 256VRAM: 0.5 GBHeadroom: 247.5 GB

33126

tok/s

Estimated

Weights

0.01 GB

KV cache

0.00 GB

Activations

0.00 GB

Runtime

0.50 GB

Model details →

#2Piper

0.025B

other

Commercial OK

Quant: Q4_K_MContext: 0VRAM: 0.5 GBHeadroom: 247.5 GB

29151

tok/s

Estimated

Weights

0.02 GB

KV cache

0.00 GB

Activations

0.00 GB

Runtime

0.50 GB

Model details →

#3Whisper Tiny

0.039B

other

Commercial OK

Quant: Q4_K_MContext: 30VRAM: 0.5 GBHeadroom: 247.5 GB

18687

tok/s

Estimated

Weights

0.02 GB

KV cache

0.00 GB

Activations

0.00 GB

Runtime

0.50 GB

Model details →

#4Whisper Base

0.074B

other

Commercial OK

Quant: Q4_K_MContext: 30VRAM: 0.5 GBHeadroom: 247.5 GB

9848

tok/s

Estimated

Weights

0.04 GB

KV cache

0.00 GB

Activations

0.00 GB

Runtime

0.50 GB

Model details →

#5Kokoro 82M

0.082B

other

Commercial OK

Quant: Q4_K_MContext: 0VRAM: 0.6 GBHeadroom: 247.4 GB

8888

tok/s

Estimated

Weights

0.05 GB

KV cache

0.00 GB

Activations

0.00 GB

Runtime

0.50 GB

Model details →

#6all-mpnet-base-v2

0.109B

other

Commercial OK

Quant: Q4_K_MContext: 384VRAM: 0.6 GBHeadroom: 247.4 GB

6686

tok/s

Estimated

Weights

0.07 GB

KV cache

0.00 GB

Activations

0.00 GB

Runtime

0.50 GB

Model details →

#7paraphrase-multilingual-MiniLM-L12-v2

0.118B

other

Commercial OK

Quant: Q4_K_MContext: 128VRAM: 0.6 GBHeadroom: 247.4 GB

6176

tok/s

Estimated

Weights

0.07 GB

KV cache

0.00 GB

Activations

0.00 GB

Runtime

0.50 GB

Model details →

#8Nomic Embed Text v1.5

0.137B

other

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 0.7 GBHeadroom: 247.3 GB

5320

tok/s

Estimated

Weights

0.08 GB

KV cache

0.07 GB

Activations

0.01 GB

Runtime

0.50 GB

Model details →

#9SmolLM2 135M Instruct

0.135B

other

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 0.7 GBHeadroom: 247.3 GB

5398

tok/s

Estimated

Weights

0.08 GB

KV cache

0.07 GB

Activations

0.01 GB

Runtime

0.50 GB

Model details →

#10GTE ModernBERT Base

0.149B

other

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 0.7 GBHeadroom: 247.3 GB

4891

tok/s

Estimated

Weights

0.09 GB

KV cache

0.07 GB

Activations

0.01 GB

Runtime

0.50 GB

Model details →

#11Whisper Small

0.244B

other

Commercial OK

Quant: Q4_K_MContext: 30VRAM: 0.7 GBHeadroom: 247.3 GB

2987

tok/s

Estimated

Weights

0.15 GB

KV cache

0.00 GB

Activations

0.01 GB

Runtime

0.50 GB

Model details →

#12Gemma 3 270M

0.27B

gemma

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 0.8 GBHeadroom: 247.2 GB

2699

tok/s

Estimated

Weights

0.16 GB

KV cache

0.14 GB

Activations

0.02 GB

Runtime

0.50 GB

Model details →

What if you upgraded?

Hypothetical scenarios. We re-ran the compatibility engine for each.

Move up an Apple memory tier

~$200–400 over base

On Apple Silicon, more unified memory is the only path forward — VRAM and system RAM are the same pool.

Shop this upgrade↗

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

Won't run
top 5 popular models

Need more memory than you have. Shown for orientation.

DeepSeek V4 Pro (1.6T MoE)

1600B

deepseek

Commercial OK

Needs ~1024 GB unified memory minimum at smallest quant; you have 248 GB available after OS overhead.

—

Qwen 3.5 235B-A17B (MoE)

397B

qwen

Commercial OK

Needs ~256 GB unified memory minimum at smallest quant; you have 248 GB available after OS overhead.

—

DeepSeek R1 (671B reasoning)

671B

deepseek

Commercial OK

Needs ~420 GB unified memory minimum at smallest quant; you have 248 GB available after OS overhead.

—

Mistral Medium 3.5 (675B MoE)

675B

mistral

Needs ~448 GB unified memory minimum at smallest quant; you have 248 GB available after OS overhead.

—

DeepSeek V3 (671B MoE)

671B

deepseek

Commercial OK

Needs ~420 GB unified memory minimum at smallest quant; you have 248 GB available after OS overhead.

—

How to read these numbers

Measured here

Measured here - RunLocalAI ran this exact combo on owner hardware with public evidence.

Source-backed

Source-backed / community - a reproduced public source supports the speed, but it is not labeled as owner-measured.

Extrapolated

Extrapolated - predicted from a measured benchmark on similar-bandwidth hardware.

Estimated

Estimated - formula based on VRAM bandwidth and model architecture; not a benchmark row.

RunLocalAI Will-It-Run Framework →

Want a specific benchmark we don't have? Email Contact support and we'll prioritize it.

Runs comfortably300 models

What if you upgraded?

Move up an Apple memory tier

Won't runtop 5 popular models

How to read these numbers

Runs comfortably
300 models

Won't run
top 5 popular models