Context Recall — RAG Evaluation and Metrics (Chapter 9)

Context Recall measures whether the retrieved context contains all information needed to answer the query. This requires annotated ground truth—typically a reference answer marked with which context elements support each claim.

RAGAS calculates Context Recall by comparing the ground truth answer elements against the retrieved context. Context that covers all ground truth claims scores high. Context that misses information needed for a complete answer scores low.

from ragas.metrics import context_recall
from ragas import evaluate
from ragas.dataset import Dataset

# Example with complete vs partial context
complete_context = [
    {
        "user_input": "What affects mortgage interest rates?",
        "retrieved_contexts": [
            "Mortgage rates depend on credit score, down payment size, "
            "property type, loan term, and current economic conditions. "
            "Higher credit scores and larger down payments typically "
            "result in lower rates."
        ],
        "response": "Mortgage interest rates are affected by credit score, "
                   "down payment, property type, loan term, and economic "
                   "conditions with better credit and larger down payments "
                   "leading to lower rates."
    }
]

# Simulating incomplete recall: if context were missing down payment info
# the score would drop
incomplete_simulation = [
    {
        "user_input": "What affects mortgage interest rates?",
        "retrieved_contexts": [
            "Mortgage rates depend on credit score, property type, "
            "loan term, and current economic conditions. "
            "Higher credit scores typically result in lower rates."
        ],
        "response": "Mortgage interest rates are affected by credit score, "
                   "property type, loan term, and economic conditions."
    }
]

complete_ds = Dataset.from_list(complete_context)
incomplete_ds = Dataset.from_list(incomplete_simulation)

complete_result = evaluate(complete_ds, metrics=[context_recall])
incomplete_result = evaluate(incomplete_ds, metrics=[context_recall])

# Note: context_recall requires ground_truth annotation for proper scoring
print(f"Complete context: {complete_result['context_recall']:.2f}")
print(f"Incomplete context: {incomplete_result['context_recall']:.2f}")

Context Recall is the only RAGAS metric requiring ground truth annotation beyond the query and output. Without knowing what a complete answer should contain, the system cannot measure whether retrieved context enables completeness. Plan for annotation effort when incorporating Context Recall.

The metric directly measures retrieval recall—the system's ability to fetch all relevant documents. Missing information at retrieval cannot be recovered in generation. If Context Recall is low, no prompt engineering or model improvement fixes the gap. The solution is always better retrieval.

Tracking Context Recall over time reveals whether document corpus changes affect retrieval quality. A document set covering topic X fully may score high on Recall for queries about X. If documents get removed or reorganized, Recall drops before users notice missing answers. Automated monitoring catches these regressions.