11. Model Hallucination Debugging
When Hallucination is a Bug
Hallucination is normal for generative models. It becomes a debugging problem when the model fabricates information it should not fabricate or contradicts itself on identical inputs.
Reproducible Debugging
import torch
# Set seeds for reproducibility
torch.manual_seed(42)
torch.cuda.manual_seed_all(42)
# Generate with fixed prompt multiple times
for i in range(5):
torch.manual_seed(42)
output = model.generate(input_ids, max_new_tokens=100)
print(f"Run {i}: {tokenizer.decode(output[0])}")
print("---")
If the same seed produces different outputs, randomness leaked through a non-deterministic operation. Common sources: torch.backends.cudnn.deterministic=False, batch processing with varying order, or KV cache reuse across different prompts.
RAG Pipeline Debugging
In retrieval-augmented generation, hallucinations often trace to retrieval failures:
- Document not retrieved → check embedding model, vector DB query, top-k value
- Wrong document retrieved → check chunking strategy, reranker configuration
- Document retrieved but not used → check prompt template, context window overflow
# Debug retrieval by printing top-k results
results = vector_db.similarity_search_with_score(query, k=5)
for i, (doc, score) in enumerate(results):
print(f"Rank {i}: score={score:.4f}")
print(f"Content preview: {doc.page_content[:200]}")
print(f"Source: {doc.metadata}")
print("---")
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Run the same prompt 10 times with identical seeds and identical temperature. Note how often the output varies. If it varies, set torch.backends.cudnn.deterministic = True and retry.