Out of memory
Verified by owner

Process killed (OOM killer) when loading large model

Killed
By Fredoline Eruo · Last verified May 6, 2026

Cause

On Linux, the kernel's OOM (Out-Of-Memory) killer terminates processes that try to allocate more memory than available. The terse "Killed" output (no Python traceback) is the giveaway — Python itself never got to handle the error.

Common scenario: pulling a 70B model on a 32 GB RAM machine. The model file (~40 GB at Q4) tries to fit in RAM during load.

Solution

Confirm OOM was the cause:

sudo dmesg | tail -50
# Look for "Out of memory: Killed process X"

Add swap (provides "soft" memory at disk speed):

sudo fallocate -l 32G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile swap swap defaults 0 0' | sudo tee -a /etc/fstab

Loading takes 5-10 minutes the first time but works.

Use a smaller quantization so the model fits in RAM:

# 70B Q4 ≈ 40 GB. 70B Q2 ≈ 26 GB. Quality drop is severe at Q2 — use only as fallback.
ollama pull llama3.3:70b-instruct-q3_K_M  # 31 GB, better quality than Q2

Add physical RAM. For 70B-class models the practical floor is 64 GB system RAM. 128 GB is comfortable. Apple Silicon's unified memory bypasses this entirely — 128 GB unified runs 70B without swap tricks.

Related errors

Did this fix it?

If your case was different, email hello@runlocalai.co with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.