Understanding Model Responses — What is Local AI — And Why It Matters (Chapter 11)

Why Responses Vary

Even with the same prompt, you may get different responses. This isn't a bug—it's fundamental to how language models work.

Sources of variation:

Non-deterministic generation: Models select from probability distributions. The same input can lead to different outputs on different runs.
Temperature: A setting that controls randomness (more on this in Chapter 13).
Context: What's in your conversation history affects subsequent responses.
Model updates: If you update your model, behavior changes.

Reading Responses Critically

Not all model outputs are equal. Learn to evaluate responses:

Signs of good response:

Specific details, not vague generalities
Appropriate confidence (doesn't claim certainty when it's uncertain)
Acknowledges limitations ("I'm not certain about X")
Provides reasoning, not just answers
Handles edge cases appropriately

Signs of problematic response:

Overconfident wrong answers (hallucinations)
Vague, generic text that could apply to anything
Inconsistent when you ask the same question twice
Refuses to answer obvious questions without explanation
Contradicts itself

Common Failure Modes

Hallucination:

>>> Who won the Nobel Prize in Physics in 1950?
[Model produces a confident, specific answer that may or may not be correct]

Language models sometimes produce wrong information with high confidence. Always verify factual claims.

Pattern matching gone wrong:

The model might produce plausible but wrong code:

>>> Write a Python function to check if a number is prime
[Model produces code that looks correct but has an off-by-one error]

Sensitivity to phrasing:

>>> What should I do if I'm feeling depressed?
[Good response with appropriate mental health support guidance]

>>> I'm feeling depressed, what should I do?
[May be more or less helpful depending on training]

Incoherence in long contexts:

Long conversations can cause the model to "lose the thread." This is a context window limitation.

Techniques for Better Responses

Ask for confidence:

>>> List 5 things you know about quantum entanglement. Rate each for 
confidence: high, medium, or low.

Request verification:

>>> What is the current population of Tokyo? Don't guess—only tell me 
if you're confident.

Ask for alternatives:

>>> Suggest 3 different approaches to fixing a slow database query.

Iterate on responses:

>>> That explanation is too technical. Simplify it.
>>> Give me an example of this concept.
>>> What are the main criticisms of this approach?