RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
Errors / Tokenizer mismatches / TypeError: 'NoneType' object is not subscriptable in tokenizer
Tokenizer mismatches

TypeError: 'NoneType' object is not subscriptable in tokenizer

TypeError: 'NoneType' object is not subscriptable
By Fredoline Eruo · Last verified May 8, 2026

Cause

AutoTokenizer.from_pretrained returned None for some attribute the caller dereferenced — almost always because the tokenizer files weren't actually downloaded, or because a custom tokenizer class wasn't registered.

Common scenarios: download interrupted before tokenizer.json finished, gated model where only the README came through, or a model that requires trust_remote_code=True because its tokenizer ships as a Python file in the repo.

Solution

1. Confirm the tokenizer files are present:

ls -la ~/.cache/huggingface/hub/models--<org>--<model>/snapshots/*/
# Should include: tokenizer.json, tokenizer_config.json, special_tokens_map.json

If tokenizer.json is missing or zero bytes, re-download:

hf download <org>/<model> --resume-download

2. Pass trust_remote_code=True for models that ship custom tokenizer code (Yi, some Qwen variants, DeepSeek-VL):

from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained(name, trust_remote_code=True)

3. Use the right loader for the right format. A GGUF file's tokenizer is embedded in the file — don't try to load it via HuggingFace AutoTokenizer; use llama-cpp-python or the model's own loader.

4. Check for permissions / gated access:

hf auth whoami  # confirm you're logged in

Llama and Gemma require accepting the license on HF before tokenizer files become accessible.

Related errors

  • Model loaded but tokenizer vocab size mismatch
  • Quantized model produces garbage / never stops generating
  • Model produces gibberish or repeats one token forever
  • OSError: Can't load tokenizer for ... / no file named tokenizer.json

Did this fix it?

If your case was different, email support@runlocalai.co with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.