08. Apple Silicon Deep Dive
Apple Silicon provides compelling performance for local AI through tight hardware-software integration. Understanding the unified memory architecture is essential.
Current Apple Silicon lineup
| Chip | CPU Cores | GPU Cores | Unified Memory | Neural Engine |
|---|---|---|---|---|
| M2 | 8 (4P+4E) | 10 | Up to 24GB | 16-core |
| M2 Pro | 10+2 (P+E) | 16-19 | Up to 32GB | 16-core |
| M3 | 8 (4P+4E) | 10 | Up to 24GB | 16-core |
| M3 Pro | 11+5 (P+E) | 14-18 | Up to 36GB | 16-core |
| M3 Max | 12+4 (P+E) | 30-40 | Up to 128GB | 16-core |
| M4 | 8 (4P+4E) | 10 | Up to 24GB | 38-core |
| M4 Pro | 10+4 (P+E) | 20 | Up to 64GB | 38-core |
| M4 Max | 12+4 (P+E) | 32-40 | Up to 128GB | 38-core |
Unified Memory Architecture
Unlike NVIDIA systems where VRAM is separate from system RAM, Apple Silicon shares memory between CPU, GPU, and Neural Engine. Bandwidth scales with memory size:
| Memory Config | CPU→Memory BW | GPU→Memory BW |
|---|---|---|
| 24GB | 100 GB/s | 300 GB/s |
| 36GB | 150 GB/s | 400 GB/s |
| 64GB | 200 GB/s | 500 GB/s |
| 128GB | 300 GB/s | 800 GB/s |
The M3 Max 128GB configuration matches data center GPU memory bandwidth while using unified architecture.
Performance Benchmarks
Running Llama 3 8B via llama.cpp with Metal backend:
| Device | Backend | Tokens/sec |
|---|---|---|
| M2 MacBook Air 24GB | Metal | 18-22 |
| M3 Pro MacBook Pro 36GB | Metal | 35-40 |
| M3 Max MacBook Pro 128GB | Metal | 75-85 |
| M4 Max MacBook Pro 128GB | Metal | 95-110 |
GPU Cores and AI Workloads
GPU core count affects inference performance differently than raw compute:
- 7B INT4 models: 14-16 GB requirement
- 13B INT4 models: 22-26 GB requirement
- Full performance requires 24GB+ unified memory
M2 (10-core GPU) is constrained for larger models. M3 Pro and above provide better headroom.
Power Efficiency
Running Mistral 7B on battery:
- M2 MacBook Air: 8W average, 4-6 hours
- M3 MacBook Pro 14": 12W average, 8-10 hours
Equivalent NVIDIA laptop would require 50W+ for similar performance.
Compare the cost-per-GB of unified memory across M3 Pro (36GB), M3 Max (64GB), and M3 Max (128GB) configurations. Calculate which provides best value for running 13B models.