RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /RAG Evaluation and Metrics
  6. /Ch. 9
RAG Evaluation and Metrics

09. Context Recall

Chapter 9 of 18 · 15 min
KEY INSIGHT

Context Recall measures whether retrieval captures all information needed for a complete answer, requiring ground truth annotations for evaluation.

Context Recall measures whether the retrieved context contains all information needed to answer the query. This requires annotated ground truth—typically a reference answer marked with which context elements support each claim.

RAGAS calculates Context Recall by comparing the ground truth answer elements against the retrieved context. Context that covers all ground truth claims scores high. Context that misses information needed for a complete answer scores low.

from ragas.metrics import context_recall
from ragas import evaluate
from ragas.dataset import Dataset

# Example with complete vs partial context
complete_context = [
    {
        "user_input": "What affects mortgage interest rates?",
        "retrieved_contexts": [
            "Mortgage rates depend on credit score, down payment size, "
            "property type, loan term, and current economic conditions. "
            "Higher credit scores and larger down payments typically "
            "result in lower rates."
        ],
        "response": "Mortgage interest rates are affected by credit score, "
                   "down payment, property type, loan term, and economic "
                   "conditions with better credit and larger down payments "
                   "leading to lower rates."
    }
]

# Simulating incomplete recall: if context were missing down payment info
# the score would drop
incomplete_simulation = [
    {
        "user_input": "What affects mortgage interest rates?",
        "retrieved_contexts": [
            "Mortgage rates depend on credit score, property type, "
            "loan term, and current economic conditions. "
            "Higher credit scores typically result in lower rates."
        ],
        "response": "Mortgage interest rates are affected by credit score, "
                   "property type, loan term, and economic conditions."
    }
]

complete_ds = Dataset.from_list(complete_context)
incomplete_ds = Dataset.from_list(incomplete_simulation)

complete_result = evaluate(complete_ds, metrics=[context_recall])
incomplete_result = evaluate(incomplete_ds, metrics=[context_recall])

# Note: context_recall requires ground_truth annotation for proper scoring
print(f"Complete context: {complete_result['context_recall']:.2f}")
print(f"Incomplete context: {incomplete_result['context_recall']:.2f}")

Context Recall is the only RAGAS metric requiring ground truth annotation beyond the query and output. Without knowing what a complete answer should contain, the system cannot measure whether retrieved context enables completeness. Plan for annotation effort when incorporating Context Recall.

The metric directly measures retrieval recall—the system's ability to fetch all relevant documents. Missing information at retrieval cannot be recovered in generation. If Context Recall is low, no prompt engineering or model improvement fixes the gap. The solution is always better retrieval.

Tracking Context Recall over time reveals whether document corpus changes affect retrieval quality. A document set covering topic X fully may score high on Recall for queries about X. If documents get removed or reorganized, Recall drops before users notice missing answers. Automated monitoring catches these regressions.

EXERCISE

Select 10 representative queries and write complete reference answers marking which context elements support each claim. Run Context Recall evaluation on your current retrieval system. Queries with the lowest Recall indicate knowledge gaps in your document corpus or retrieval failures for existing content.

← Chapter 8
Context Precision
Chapter 10 →
Hallucination Detection