Local AI errors & fixes
21 common errors when running AI locally — with verified causes and solutions. Paste your error message into Google and you should land on the right page.
Solutions tagged
Configuration5 entries
Token generation slows as conversation gets longer
(no error — tok/s drops from 50 to 5 as context fills)
Ollama: bind: address already in use (port 11434)
Error: listen tcp 127.0.0.1:11434: bind: address already in use
Ollama truncates input — default context length is only 2048
(no error — long inputs get silently truncated)
Ollama: connection refused on localhost:11434
Error: connect ECONNREFUSED 127.0.0.1:11434
Ollama: Error: model 'X' not found
Error: model 'X' not found, try pulling it first
Model format / GGUF1 entry
Tokenizer mismatches2 entries
Out of memory5 entries
Process killed (OOM killer) when loading large model
Killed
CUDA out of memory when loading a model
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate X.XX GiB
vLLM: No available KV cache blocks
RuntimeError: No available KV cache blocks
Ollama: model requires more system memory than is available
Error: model requires more system memory than is available
Out of memory specifically at long context lengths
torch.cuda.OutOfMemoryError or 'cannot allocate KV cache' at >32K tokens
ROCm / AMD1 entry
Network / downloads2 entries
Driver issues2 entries
Quantization issues1 entry
Build / compile failures1 entry
Metal / Apple Silicon1 entry
Hit an error we don't have?
We add ~5 new errors per month based on what readers report.
Email hello@runlocalai.co with the literal error message and what you tried. If it's a common one we'll write it up; if it's something only you hit, we'll often help directly.