Exo
Personal AI cluster software. Auto-discovers Apple Silicon devices on a LAN and shards a model across them via pipeline + tensor parallelism on top of MLX. The 2026 unlock: Thunderbolt 5 + macOS 26.2 RDMA dropped inter-device latency by ~99%, making consumer-Mac clusters credible — DeepSeek V3 671B runs at 5.37 tok/s on 8x M4 Pro Mac Minis. The default answer for 'I have several Macs and want to run a frontier model.'
Overview
Personal AI cluster software. Auto-discovers Apple Silicon devices on a LAN and shards a model across them via pipeline + tensor parallelism on top of MLX. The 2026 unlock: Thunderbolt 5 + macOS 26.2 RDMA dropped inter-device latency by ~99%, making consumer-Mac clusters credible — DeepSeek V3 671B runs at 5.37 tok/s on 8x M4 Pro Mac Minis. The default answer for 'I have several Macs and want to run a frontier model.'
Stack & relationships
How Exo relates to other entries in the catalog — recommended pairings, alternatives, dependencies, and edges to avoid. Each edge carries a one-line operator note from our editorial team.
Recommended stack
- Pairs withMLX-LM
Exo is how you scale MLX-LM beyond a single Mac. The 2026 unlock — Thunderbolt 5 + macOS 26.2 RDMA — makes the cluster credible for serious models.
Alternatives
- Alternative toPetals
Petals shards over WAN volunteers; Exo shards over a controlled LAN cluster. Same architectural shape (pipeline parallel across machines), opposite trust models — public swarm vs personal devices.
- Alternative tovLLM
Different hardware target. vLLM = NVIDIA/Linux datacenter; Exo = Apple Silicon LAN cluster. Pick by which hardware you already own.
- Competes withPetals
Both are multi-machine inference; Exo runs over a controlled LAN with strong privacy, Petals runs over WAN volunteers with no privacy. Pick by trust model and what hardware you have.
- Alternative toHyperspace (P2P inference network)
Different consumer-multi-machine paths. Exo is Apple Silicon LAN clustering; Hyperspace targets WAN P2P. Pick by hardware and trust model.
Depends on
- Depends onMLX-LM
Exo runs MLX under the hood for the per-device inference layer. Pipeline-parallel scheduling is Exo; the actual matmul kernels are MLX.
Featured in this stack
The L3 execution stacks that pick this tool as a recommended component, with the one-line note explaining the role it plays in each.
- Stack · L3·Workstation tier·Role: Distributed serving (multi-Mac cluster)Build a Mac-native AI stack (May 2026)
Exo is what makes multi-Mac credible in 2026: auto-discovers nearby Apple Silicon devices on the LAN, shards models across them via pipeline parallel on top of MLX. Thunderbolt 5 + macOS 26.2 RDMA cuts inter-device latency by ~99%, turning consumer-Mac clusters into a real serving option.
Pros
- Auto-discovers nearby devices, no cluster manager required
- RDMA over Thunderbolt 5 makes inter-Mac latency nearly local
- Runs 670B-class models on consumer hardware that can't fit them otherwise
Cons
- Apple-Silicon-first; Linux/CUDA path is secondary
- Thunderbolt-5 RDMA requires specific Macs (M4 Pro+, macOS 26.2+)
- Not a production-serving solution — designed for personal clusters
Compatibility
| Operating systems | macOS Linux |
| GPU backends | Apple Metal NVIDIA CUDA |
| License | Open source · free (OSS, GPL-3.0) |
Get Exo
Frequently asked
Is Exo free?
What operating systems does Exo support?
Which GPUs work with Exo?
Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.