Model produces gibberish or repeats one token forever
Cause
When the runtime uses a different tokenizer than the model was trained with, output looks superficially structured but is meaningless. Common causes:
- Mistral Nemo's "Tekken" tokenizer is new — older runners use the wrong tokenizer
- A LoRA adapter was loaded against the wrong base model
- Special tokens (
<|im_start|>,<bos>,<eos>) are not being applied because chat template is missing or wrong - Quantization step accidentally stripped tokenizer files
Solution
1. Update your runner. New tokenizers ship in major llama.cpp / Ollama / vLLM releases.
# Ollama
ollama --version # check
# Update via official installer if behind 0.5.x
# llama.cpp — pull and rebuild
git pull && make clean && make GGML_CUDA=1 -j
2. Verify the chat template. The system+user format must match what the model was trained on. For ChatML models:
<|im_start|>system
You are helpful.<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Wrong template = gibberish even with the right tokenizer.
3. Re-download the model. The GGUF file should include the tokenizer. If you converted it yourself with an old llama.cpp, the tokenizer metadata may be stale:
ollama pull mistral-nemo:12b
4. Don't mix LoRA + base model from different versions. A Llama 3.1 LoRA loaded on a Llama 3.0 base produces gibberish even if the parameter shapes match.
Related errors
Did this fix it?
If your case was different, email Contact support with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.