09. Qualitative Analysis

Chapter 9 of 18 · 15 min

KEY INSIGHT

Qualitative analysis explains the "why" behind quantitative results. Without it, you report observations but not understanding. Quantitative metrics capture aggregate performance. Qualitative analysis reveals what your system actually does—and why it sometimes fails. **Qualitative Analysis Methods:** 1. **Error Analysis:** Categorize and count failure modes. 2. **Qualitative Comparison:** Side-by-side examples of your output vs. baseline. 3. **Feature Visualization:** Examine what your model learned. 4. **Ablation Behavior:** Explain performance differences through architecture differences. **Error Analysis Framework:** ```python def categorize_errors(predictions, references, model_outputs): """ Categorize errors to understand failure modes. """ categories = { "fluency_error": 0, "accuracy_error": 0, "completeness_error": 0, "hallucination": 0, "other": 0, } for pred, ref, output in zip(predictions, references, model_outputs): error_type = classify_error(pred, ref, output) categories[error_type] += 1 # Report proportions total = sum(categories.values()) return {k: v/total for k, v in categories.items()} def classify_error(pred, ref, output): """Heuristic classification of error type.""" if contains_factual_error(output): return "accuracy_error" elif is_incomplete(output, ref): return "completeness_error" elif has_grammar_error(output): return "fluency_error" elif is_hallucinated(output, ref): return "hallucination" return "other" ``` **Example Qualitative Finding:** "While our method achieves 1.4 BLEU improvement overall, error analysis reveals the improvement concentrates in long-sequence translation (+3.2 BLEU) while short sequences show marginal degradation (-0.3 BLEU). This aligns with our design hypothesis: linear attention maintains information better in long-range dependencies." **Documentation Practice:** - Include 3-5 representative examples for each major finding - Use consistent formatting for all examples - Annotate examples with explanatory comments - Report confidence when qualitative judgments are subjective

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

Local verification checkpoint

EXERCISE

Perform error analysis on 50 examples from your test set. Create a table with error categories and representative examples. Submit this as supplementary material.