11. Model Hallucination Debugging

Chapter 11 of 15 · 15 min

When Hallucination is a Bug

Hallucination is normal for generative models. It becomes a debugging problem when the model fabricates information it should not fabricate or contradicts itself on identical inputs.

Reproducible Debugging

import torch

# Set seeds for reproducibility
torch.manual_seed(42)
torch.cuda.manual_seed_all(42)

# Generate with fixed prompt multiple times
for i in range(5):
    torch.manual_seed(42)
    output = model.generate(input_ids, max_new_tokens=100)
    print(f"Run {i}: {tokenizer.decode(output[0])}")
    print("---")

If the same seed produces different outputs, randomness leaked through a non-deterministic operation. Common sources: torch.backends.cudnn.deterministic=False, batch processing with varying order, or KV cache reuse across different prompts.

RAG Pipeline Debugging

In retrieval-augmented generation, hallucinations often trace to retrieval failures:

  1. Document not retrieved → check embedding model, vector DB query, top-k value
  2. Wrong document retrieved → check chunking strategy, reranker configuration
  3. Document retrieved but not used → check prompt template, context window overflow
# Debug retrieval by printing top-k results
results = vector_db.similarity_search_with_score(query, k=5)
for i, (doc, score) in enumerate(results):
    print(f"Rank {i}: score={score:.4f}")
    print(f"Content preview: {doc.page_content[:200]}")
    print(f"Source: {doc.metadata}")
    print("---")

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Run the same prompt 10 times with identical seeds and identical temperature. Note how often the output varies. If it varies, set torch.backends.cudnn.deterministic = True and retry.