Hardware & infrastructure

Metal (Apple)

Metal is Apple's low-level GPU programming framework and API, analogous to Vulkan on other platforms. For local AI operators, Metal enables GPU-accelerated inference on Apple Silicon (M-series) and AMD GPUs in macOS. It is the backend that llama.cpp, MLX, and other runtimes use to offload model computation to the GPU, directly affecting tokens-per-second and the maximum model size that fits in VRAM (unified memory on Apple Silicon).

Deeper dive

Metal provides direct access to the GPU's compute units, allowing neural network operations (matrix multiplications, attention) to run efficiently. On Apple Silicon, Metal leverages the unified memory architecture, meaning the CPU and GPU share the same pool of RAM (e.g., 16 GB, 64 GB). This removes the PCIe transfer bottleneck seen with discrete GPUs, but also means that VRAM is not separate—total system RAM is the limit. Metal Performance Shaders (MPS) offer optimized kernels for common ML operations. In practice, llama.cpp's Metal backend compiles model layers into Metal shaders, achieving ~30-50 tok/s for 7B-13B models on M1/M2/M3 Max/Ultra chips. Operators must ensure their build includes Metal support (e.g., LLAMA_METAL=1).

Practical example

On an M2 MacBook Air with 24 GB unified memory, running llama.cpp with Metal enabled (./main -m model.gguf -ngl 99) loads all layers onto the GPU. A 13B Q4 model (8 GB) fits entirely, yielding ~25 tok/s. Without Metal (CPU-only), the same model runs at ~5 tok/s. On an M1 Ultra with 128 GB, a 70B Q4 model (40 GB) fits, achieving ~10 tok/s.

Workflow example

When building llama.cpp from source, operators set LLAMA_METAL=1 to enable the Metal backend. At runtime, the -ngl N flag offloads N layers to the GPU. In LM Studio, selecting "Metal" as the backend in settings activates GPU acceleration. In MLX (Apple's ML framework), Metal is used automatically. Operators can verify Metal usage by checking for "ggml_metal_init" in logs or monitoring GPU utilization via Activity Monitor.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work

Compare hardware

Hardware

Apple M4 Max →