RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Errors / Tokenizer mismatches / GGUF model outputs garbage — tokenizer / chat-template mismatch
Tokenizer mismatches
Verified by owner

GGUF model outputs garbage — tokenizer / chat-template mismatch

(no error — generation is fluent gibberish, repeats one token, or emits raw special tokens like <|im_start|>)
By Fredoline Eruo · Last verified Jun 12, 2026

Cause

Environment: llama.cpp, Ollama, LM Studio, koboldcpp running GGUF files.

Severity: medium — model loads but output is unusable.

  • GGUF was converted with an old llama.cpp that didn't bundle the right tokenizer (Tekken for Mistral Nemo, GPT-2 BPE merges for Llama 3)
  • Chat template baked into the GGUF doesn't match the model — Modelfile override needed
  • BOS token policy wrong (add_bos_token true on a model trained without BOS)
  • Special tokens (<|eot_id|>, <|im_end|>) not registered as stop tokens — generation runs past end-of-turn
  • LoRA-merged GGUF where the merge didn't update tokenizer metadata

Solution

1. Re-pull the GGUF from a maintainer who tracks tokenizer updates:

ollama pull mistral-nemo:12b-instruct-2407
# or
hf download bartowski/Mistral-Nemo-Instruct-2407-GGUF \
  Mistral-Nemo-Instruct-2407-Q4_K_M.gguf

2. Inspect the embedded chat template:

./llama-cli -m model.gguf --chat-template-file /dev/stdout --interactive
# or
ollama show llama3.1:8b --modelfile | grep -A5 TEMPLATE

3. Override with the correct template via Modelfile:

FROM ./model.gguf
TEMPLATE """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|end_of_text|>"
ollama create llama3.1:8b-fixed -f Modelfile

4. Disable BOS injection if model was trained without one:

./llama-cli -m model.gguf --no-bos -p "Hello"

5. Update llama.cpp + reconvert if the tokenizer is genuinely missing tokens (rare but happens with research models):

git pull && make clean && make -j
python convert_hf_to_gguf.py /path/to/hf-model

Related errors

  • Model loaded but tokenizer vocab size mismatch
  • TypeError: 'NoneType' object is not subscriptable in tokenizer
  • Quantized model produces garbage / never stops generating
  • OSError: Can't load tokenizer for ... / no file named tokenizer.json
  • Model produces gibberish or repeats one token forever

Did this fix it?

If your case was different, email Contact support with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.