Context Precision — RAG Evaluation and Metrics (Chapter 8)

Context Precision measures whether the retrieved documents contain only relevant information and whether that information ranks correctly by importance. The RAGAS implementation evaluates each context chunk individually, checking if it contributes to answering the query.

A retrieval system returning mostly relevant documents but interspersed with noise scores lower than one returning only relevant documents. Similarly, placing critical information at low rank when it could be at high rank reduces the precision score.

from ragas.metrics import context_precision
from ragas import evaluate
from ragas.dataset import Dataset

# Example comparing clean retrieval vs noisy retrieval
clean_retrieval = [
    {
        "user_input": "What symptoms indicate a heart attack in women?",
        "retrieved_contexts": [
            "Women may experience subtle heart attack symptoms including "
            "unusual fatigue, sleep disturbances, and shortness of breath. "
            "Chest pain may be less severe than in men."
        ],
        "response": "Women experiencing heart attack symptoms may have "
                   "unusual fatigue, sleep disturbances, shortness of breath, "
                   "and possibly milder chest pain than commonly expected."
    }
]

noisy_retrieval = [
    {
        "user_input": "What symptoms indicate a heart attack in women?",
        "retrieved_contexts": [
            "The gym is open 24 hours for members.",
            "Women may experience subtle heart attack symptoms including "
            "unusual fatigue, sleep disturbances, and shortness of breath. "
            "Chest pain may be less severe than in men.",
            "The cafeteria serves lunch from 11am-2pm."
        ],
        "response": "Women experiencing heart attack symptoms may have "
                   "unusual fatigue, sleep disturbances, shortness of breath, "
                   "and possibly milder chest pain than commonly expected."
    }
]

clean_ds = Dataset.from_list(clean_retrieval)
noisy_ds = Dataset.from_list(noisy_retrieval)

clean_result = evaluate(clean_ds, metrics=[context_precision])
noisy_result = evaluate(noisy_ds, metrics=[context_precision])

print(f"Clean retrieval: {clean_result['context_precision']:.2f}")
print(f"Noisy retrieval: {noisy_result['context_precision']:.2f}")
# Clean retrieval: 1.00
# Noisy retrieval: 0.67

The noisy retrieval example illustrates how irrelevant documents degrade precision. The gym hours and cafeteria information add no value for the medical query. The RAGAS metric penalizes this inflation of the context with non-contributing chunks.

Context Precision matters most when context length affects downstream performance. Including irrelevant documents increases token count, raises latency, and dilutes the attention signal that language models apply to relevant content. For RAG systems that truncate context, removing noise directly improves relevant content inclusion.

Low Context Precision often indicates chunking problems. When documents get split without regard for semantic boundaries, a chunk may contain noise from adjacent content. If a chunk includes a relevant paragraph alongside an irrelevant table, precision suffers. Improving chunk boundaries directly improves Context Precision.