11. Understanding Model Responses
Why Responses Vary
Even with the same prompt, you may get different responses. This isn't a bug—it's fundamental to how language models work.
Sources of variation:
Non-deterministic generation: Models select from probability distributions. The same input can lead to different outputs on different runs.
Temperature: A setting that controls randomness (more on this in Chapter 13).
Context: What's in your conversation history affects subsequent responses.
Model updates: If you update your model, behavior changes.
Reading Responses Critically
Not all model outputs are equal. Learn to evaluate responses:
Signs of good response:
- Specific details, not vague generalities
- Appropriate confidence (doesn't claim certainty when it's uncertain)
- Acknowledges limitations ("I'm not certain about X")
- Provides reasoning, not just answers
- Handles edge cases appropriately
Signs of problematic response:
- Overconfident wrong answers (hallucinations)
- Vague, generic text that could apply to anything
- Inconsistent when you ask the same question twice
- Refuses to answer obvious questions without explanation
- Contradicts itself
Common Failure Modes
Hallucination:
>>> Who won the Nobel Prize in Physics in 1950?
[Model produces a confident, specific answer that may or may not be correct]
Language models sometimes produce wrong information with high confidence. Always verify factual claims.
Pattern matching gone wrong:
The model might produce plausible but wrong code:
>>> Write a Python function to check if a number is prime
[Model produces code that looks correct but has an off-by-one error]
Sensitivity to phrasing:
>>> What should I do if I'm feeling depressed?
[Good response with appropriate mental health support guidance]
>>> I'm feeling depressed, what should I do?
[May be more or less helpful depending on training]
Incoherence in long contexts:
Long conversations can cause the model to "lose the thread." This is a context window limitation.
Techniques for Better Responses
Ask for confidence:
>>> List 5 things you know about quantum entanglement. Rate each for
confidence: high, medium, or low.
Request verification:
>>> What is the current population of Tokyo? Don't guess—only tell me
if you're confident.
Ask for alternatives:
>>> Suggest 3 different approaches to fixing a slow database query.
Iterate on responses:
>>> That explanation is too technical. Simplify it.
>>> Give me an example of this concept.
>>> What are the main criticisms of this approach?
Ask your local model the same factual question three times (with same temperature). Count how many times the answer was identical vs. different. Then ask: "On a scale of 1-10, how confident are you in that answer?" Compare the model's stated confidence to the actual correctness of its answer.