llama.cpp: error loading model — bad magic / unsupported GGUF
Cause
Environment: llama.cpp and downstream tools (Ollama, LM Studio, koboldcpp) loading a GGUF file built by an incompatible version.
Severity: medium — file is unusable until re-quantized.
- GGUF v1/v2 file loaded by a llama.cpp build that only supports v3+
- Newer GGUF v3 file with new metadata loaded by old llama.cpp
- File is actually a different format renamed to .gguf (e.g. AWQ safetensors)
- Partial download — first bytes are HTML from a CDN error page
- Custom-quantized file with non-standard tensor types your build wasn't compiled with
Solution
1. Verify the file isn't corrupted — first 4 bytes should be GGUF:
xxd model.gguf | head -1
# 00000000: 4747 5546 ... (i.e. "GGUF")
If you see HTML or zeros, redownload.
2. Update llama.cpp — most "bad magic" errors are stale builds:
cd llama.cpp
git pull
make clean
GGML_CUDA=1 make -j
3. Re-quantize from the original safetensors with a current llama-quantize:
# Convert HF model to GGUF F16
python convert_hf_to_gguf.py /path/to/model --outfile model-f16.gguf
# Quantize to your target
./llama-quantize model-f16.gguf model-Q4_K_M.gguf Q4_K_M
4. Pull a known-good GGUF from a maintained repo:
hf download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF \
Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
5. If the file came from a vendor build, check their llama.cpp commit pin in the model card and match it:
git checkout <commit-sha>
make clean && make -j
Related errors
Did this fix it?
If your case was different, email Contact support with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.