TypeError: 'NoneType' object is not subscriptable in tokenizer

Q: What causes "TypeError: 'NoneType' object is not subscriptable in tokenizer"?

`AutoTokenizer.from_pretrained` returned None for some attribute the caller dereferenced — almost always because the tokenizer files weren't actually downloaded, or because a custom tokenizer class wasn't registered. Common scenarios: download interrupted before `tokenizer.json` finished, gated model where only the README came through, or a model that requires `trust_remote_code=True` because its tokenizer ships as a Python file in the repo.

Q: How do you fix "TypeError: 'NoneType' object is not subscriptable in tokenizer"?

**1. Confirm the tokenizer files are present:** ```bash ls -la ~/.cache/huggingface/hub/models-- -- /snapshots/*/ # Should include: tokenizer.json, tokenizer_config.json, special_tokens_map.json ``` If `tokenizer.json` is missing or zero bytes, re-download: ```bash hf download / --resume-download ``` **2. Pass `trust_remote_code=True` for models that ship custom tokenizer code** (Yi, some Qwen variants, DeepSeek-VL): ```python from transformers import AutoTokenizer tok = AutoTokenizer.from_pretrained(name, trust_remote_code=True) ``` **3. Use the right loader for the right format.** A GGUF file's tokenizer is embedded in the file — don't try to load it via HuggingFace AutoTokenizer; use llama-cpp-python or the model's own loader. **4. Check for permissions / gated access:** ```bash hf auth whoami # confirm you're logged in ``` Llama and Gemma require accepting the license on HF before tokenizer files become accessible.

Cause

AutoTokenizer.from_pretrained returned None for some attribute the caller dereferenced — almost always because the tokenizer files weren't actually downloaded, or because a custom tokenizer class wasn't registered.

Common scenarios: download interrupted before tokenizer.json finished, gated model where only the README came through, or a model that requires trust_remote_code=True because its tokenizer ships as a Python file in the repo.

Solution

1. Confirm the tokenizer files are present:

ls -la ~/.cache/huggingface/hub/models--<org>--<model>/snapshots/*/
# Should include: tokenizer.json, tokenizer_config.json, special_tokens_map.json

If tokenizer.json is missing or zero bytes, re-download:

hf download <org>/<model> --resume-download

2. Pass trust_remote_code=True for models that ship custom tokenizer code (Yi, some Qwen variants, DeepSeek-VL):

from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained(name, trust_remote_code=True)

3. Use the right loader for the right format. A GGUF file's tokenizer is embedded in the file — don't try to load it via HuggingFace AutoTokenizer; use llama-cpp-python or the model's own loader.

4. Check for permissions / gated access:

hf auth whoami  # confirm you're logged in

Llama and Gemma require accepting the license on HF before tokenizer files become accessible.

Cause

Solution

Related errors

Did this fix it?