Exo

Personal AI cluster software. Auto-discovers Apple Silicon devices on a LAN and shards a model across them via pipeline + tensor parallelism on top of MLX. The 2026 unlock: Thunderbolt 5 + macOS 26.2 RDMA dropped inter-device latency by ~99%, making consumer-Mac clusters credible — DeepSeek V3 671B runs at 5.37 tok/s on 8x M4 Pro Mac Minis. The default answer for 'I have several Macs and want to run a frontier model.'

By Fredoline Eruo·Last verified May 6, 2026·28,000 GitHub stars

Overview

Stack & relationships

How Exo relates to other entries in the catalog — recommended pairings, alternatives, dependencies, and edges to avoid. Each edge carries a one-line operator note from our editorial team.

Exo ↔ ecosystem

Recommended stack

Pairs with
MLX-LM
Exo is how you scale MLX-LM beyond a single Mac. The 2026 unlock — Thunderbolt 5 + macOS 26.2 RDMA — makes the cluster credible for serious models.

Alternatives

Alternative to
Petals
Petals shards over WAN volunteers; Exo shards over a controlled LAN cluster. Same architectural shape (pipeline parallel across machines), opposite trust models — public swarm vs personal devices.
Alternative to
vLLM
Different hardware target. vLLM = NVIDIA/Linux datacenter; Exo = Apple Silicon LAN cluster. Pick by which hardware you already own.
Competes with
Petals
Both are multi-machine inference; Exo runs over a controlled LAN with strong privacy, Petals runs over WAN volunteers with no privacy. Pick by trust model and what hardware you have.
Alternative to
Hyperspace (P2P inference network)
Different consumer-multi-machine paths. Exo is Apple Silicon LAN clustering; Hyperspace targets WAN P2P. Pick by hardware and trust model.

Depends on

Depends on
MLX-LM
Exo runs MLX under the hood for the per-device inference layer. Pipeline-parallel scheduling is Exo; the actual matmul kernels are MLX.

Featured in this stack

The L3 execution stacks that pick this tool as a recommended component, with the one-line note explaining the role it plays in each.

Stack · L3·Workstation tier·Role: Distributed serving (multi-Mac cluster)
Build a Mac-native AI stack (May 2026)
Exo is what makes multi-Mac credible in 2026: auto-discovers nearby Apple Silicon devices on the LAN, shards models across them via pipeline parallel on top of MLX. Thunderbolt 5 + macOS 26.2 RDMA cuts inter-device latency by ~99%, turning consumer-Mac clusters into a real serving option.

Pros

Auto-discovers nearby devices, no cluster manager required
RDMA over Thunderbolt 5 makes inter-Mac latency nearly local
Runs 670B-class models on consumer hardware that can't fit them otherwise

Cons

Apple-Silicon-first; Linux/CUDA path is secondary
Thunderbolt-5 RDMA requires specific Macs (M4 Pro+, macOS 26.2+)
Not a production-serving solution — designed for personal clusters

Compatibility

Operating systems	macOS Linux
GPU backends	Apple Metal NVIDIA CUDA
License	Open source · free (OSS, GPL-3.0)

Get Exo

Official site

https://exolabs.net

GitHub

https://github.com/exo-explore/exo

Frequently asked

Is Exo free?

Exo has a paid tier (free (OSS, GPL-3.0)). Check the pricing page for current terms.

What operating systems does Exo support?

Exo supports macOS, Linux.

Which GPUs work with Exo?

Exo supports Apple Metal, NVIDIA CUDA. CPU-only inference is also possible but slow.

Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.