MLX: Memory pressure detected — consider reducing batch size
Cause
Environment: Apple Silicon running mlx-lm batch generation, fine-tuning, or RAG embedding.
Severity: low to medium — not fatal, but throughput collapses when macOS starts swapping.
- macOS detects unified-memory pressure (yellow / red in Activity Monitor)
- MLX's allocator hasn't hit its hard limit yet, but the OS is preparing to swap
- Background indexing (Spotlight, Time Machine) competing for pages
- MLX caching tensors that haven't been freed
- Batch size + sequence length × hidden dim exceeds practical free memory
Solution
1. Reduce batch size first (most direct fix):
# Was: batch_size=32
mlx_lm.generate(model, tokenizer, prompts, batch_size=8)
2. Set MLX's GPU memory limit explicitly so the warning happens before swap kicks in:
import mlx.core as mx
# Cap at 75% of physical RAM (e.g. 24 GB on 32 GB Mac)
mx.metal.set_memory_limit(int(0.75 * 32 * 1024**3))
mx.metal.set_cache_limit(0) # disable cache; free more for tensors
3. Free the cache after each batch:
import gc, mlx.core as mx
for batch in batches:
out = mlx_lm.generate(model, tokenizer, batch, ...)
mx.metal.clear_cache()
gc.collect()
4. Watch macOS pressure live:
vm_stat 1 # Pages free / inactive / wired columns
# Or: open Activity Monitor → Memory → Memory Pressure graph
5. Disable swap pressure for long jobs:
sudo sysctl -w kern.maxvnodes=750000
caffeinate -dimsu mlx_lm.generate ...
6. Bigger picture: Apple Silicon swap is fast SSD but still 10-50× slower than RAM. Once you swap during inference, throughput collapses. Resize the workload to stay green.
Alternative solutions
On a 16 GB Mac, treat the warning as fatal — swap will dominate and effective tok/s drops below CPU-only inference. Move the workload to a Mac with ≥ 32 GB unified memory, or to a Linux box with a discrete GPU.
Related errors
Did this fix it?
If your case was different, email Contact support with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.