RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Errors / Out of memory / Ollama: model requires more system memory than is available
Out of memory

Ollama: model requires more system memory than is available

Error: model requires more system memory than is available
By Fredoline Eruo · Last verified Jun 12, 2026

Cause

Different from VRAM OOM — this is system RAM. Ollama needs to load the model file into RAM before transferring to VRAM (mmap'd or copied). On systems with low RAM and large models, the load step fails before the GPU is even involved.

Solution

Check current system RAM usage:

# macOS / Linux
free -h
# Windows PowerShell
Get-Counter '\Memory\Available MBytes'

Free system RAM by closing other apps (browsers, IDEs, Slack — these can chew 4-8 GB easily).

Use a smaller quantization so the file is smaller on disk and in RAM:

# Q4 instead of Q8 — half the file size
ollama pull llama3.1:8b-instruct-q4_K_M

Add swap (Linux):

sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Loading is slower but possible.

Get more RAM. For local AI work, 32 GB RAM is the practical floor; 64 GB or 128 GB unlocks larger models with CPU offload.

Related errors

  • SGLang: RadixAttention KV cache overflow / out of memory
  • CUDA OOM that only happens at long context (KV cache blowup)
  • vLLM AsyncEngineDeadError after large batch / OOM
  • Process killed (OOM killer) when loading large model
  • Out of memory specifically at long context lengths

Did this fix it?

If your case was different, email Contact support with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.