RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Capstone: Research AI System
  6. /Ch. 9
Capstone: Research AI System

09. Qualitative Analysis

Chapter 9 of 18 · 15 min
KEY INSIGHT

Qualitative analysis explains the "why" behind quantitative results. Without it, you report observations but not understanding. Quantitative metrics capture aggregate performance. Qualitative analysis reveals what your system actually does—and why it sometimes fails. **Qualitative Analysis Methods:** 1. **Error Analysis:** Categorize and count failure modes. 2. **Qualitative Comparison:** Side-by-side examples of your output vs. baseline. 3. **Feature Visualization:** Examine what your model learned. 4. **Ablation Behavior:** Explain performance differences through architecture differences. **Error Analysis Framework:** ```python def categorize_errors(predictions, references, model_outputs): """ Categorize errors to understand failure modes. """ categories = { "fluency_error": 0, "accuracy_error": 0, "completeness_error": 0, "hallucination": 0, "other": 0, } for pred, ref, output in zip(predictions, references, model_outputs): error_type = classify_error(pred, ref, output) categories[error_type] += 1 # Report proportions total = sum(categories.values()) return {k: v/total for k, v in categories.items()} def classify_error(pred, ref, output): """Heuristic classification of error type.""" if contains_factual_error(output): return "accuracy_error" elif is_incomplete(output, ref): return "completeness_error" elif has_grammar_error(output): return "fluency_error" elif is_hallucinated(output, ref): return "hallucination" return "other" ``` **Example Qualitative Finding:** "While our method achieves 1.4 BLEU improvement overall, error analysis reveals the improvement concentrates in long-sequence translation (+3.2 BLEU) while short sequences show marginal degradation (-0.3 BLEU). This aligns with our design hypothesis: linear attention maintains information better in long-range dependencies." **Documentation Practice:** - Include 3-5 representative examples for each major finding - Use consistent formatting for all examples - Annotate examples with explanatory comments - Report confidence when qualitative judgments are subjective

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Perform error analysis on 50 examples from your test set. Create a table with error categories and representative examples. Submit this as supplementary material.

← Chapter 8
Quantitative Evaluation
Chapter 10 →
Benchmarking