Model Hallucination Debugging — Troubleshooting Local AI (Chapter 11)

When Hallucination is a Bug

Hallucination is normal for generative models. It becomes a debugging problem when the model fabricates information it should not fabricate or contradicts itself on identical inputs.

Reproducible Debugging

import torch

# Set seeds for reproducibility
torch.manual_seed(42)
torch.cuda.manual_seed_all(42)

# Generate with fixed prompt multiple times
for i in range(5):
    torch.manual_seed(42)
    output = model.generate(input_ids, max_new_tokens=100)
    print(f"Run {i}: {tokenizer.decode(output[0])}")
    print("---")

If the same seed produces different outputs, randomness leaked through a non-deterministic operation. Common sources: torch.backends.cudnn.deterministic=False, batch processing with varying order, or KV cache reuse across different prompts.

RAG Pipeline Debugging

In retrieval-augmented generation, hallucinations often trace to retrieval failures:

Document not retrieved → check embedding model, vector DB query, top-k value
Wrong document retrieved → check chunking strategy, reranker configuration
Document retrieved but not used → check prompt template, context window overflow

# Debug retrieval by printing top-k results
results = vector_db.similarity_search_with_score(query, k=5)
for i, (doc, score) in enumerate(results):
    print(f"Rank {i}: score={score:.4f}")
    print(f"Content preview: {doc.page_content[:200]}")
    print(f"Source: {doc.metadata}")
    print("---")

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.