Failed to load model: GGUF version mismatch
Cause
GGUF has been versioned several times: v1 (early 2024), v2 (mid-2024), v3 (late 2024). Each bump added fields (chat templates, tokenizer config, KV cache hints). Older llama.cpp builds reject newer GGUFs; very old GGUFs may also be rejected by current builds if backward compatibility was dropped for a deprecated field.
A common form: download a fresh quant from a recent uploader (bartowski, lmstudio-community), point an old llama.cpp at it, and hit this on load.
Solution
1. Update llama.cpp / Ollama / LM Studio to the latest release:
# llama.cpp from source
cd llama.cpp && git pull && make clean && make GGML_CUDA=1 -j
# Homebrew
brew upgrade llama.cpp
# Ollama
curl -fsSL https://ollama.com/install.sh | sh
# LM Studio: in-app "Check for updates"
2. If you can't update the runner, find an older GGUF of the same model. Hugging Face shows multiple uploaders per model — one of them usually has a v2-format file.
3. Convert from safetensors yourself with the matching llama.cpp version:
python convert_hf_to_gguf.py /path/to/hf-model --outfile model.gguf --outtype f16
./llama-quantize model.gguf model.Q4_K_M.gguf Q4_K_M
4. Check the GGUF version from the file header:
xxd model.gguf | head -1
# Bytes 4-7 are the version (little-endian)
Related errors
Did this fix it?
If your case was different, email Contact support with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.