RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Errors / Out of memory / Process killed (OOM killer) when loading large model
Out of memory
Verified by owner

Process killed (OOM killer) when loading large model

Killed
By Fredoline Eruo · Last verified Jun 12, 2026

Cause

On Linux, the kernel's OOM (Out-Of-Memory) killer terminates processes that try to allocate more memory than available. The terse "Killed" output (no Python traceback) is the giveaway — Python itself never got to handle the error.

Common scenario: pulling a 70B model on a 32 GB RAM machine. The model file (~40 GB at Q4) tries to fit in RAM during load.

Solution

Confirm OOM was the cause:

sudo dmesg | tail -50
# Look for "Out of memory: Killed process X"

Add swap (provides "soft" memory at disk speed):

sudo fallocate -l 32G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile swap swap defaults 0 0' | sudo tee -a /etc/fstab

Loading takes 5-10 minutes the first time but works.

Use a smaller quantization so the model fits in RAM:

# 70B Q4 ≈ 40 GB. 70B Q2 ≈ 26 GB. Quality drop is severe at Q2 — use only as fallback.
ollama pull llama3.3:70b-instruct-q3_K_M  # 31 GB, better quality than Q2

Add physical RAM. For 70B-class models the practical floor is 64 GB system RAM. 128 GB is comfortable. Apple Silicon's unified memory bypasses this entirely — 128 GB unified runs 70B without swap tricks.

Related errors

  • Ollama: model requires more system memory than is available
  • SGLang: RadixAttention KV cache overflow / out of memory
  • CUDA OOM that only happens at long context (KV cache blowup)
  • vLLM AsyncEngineDeadError after large batch / OOM
  • Out of memory specifically at long context lengths

Did this fix it?

If your case was different, email Contact support with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.