RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /RAG Evaluation and Metrics
  6. /Ch. 8
RAG Evaluation and Metrics

08. Context Precision

Chapter 8 of 18 · 15 min
KEY INSIGHT

Context Precision measures whether retrieved documents actually contribute to answering the query, penalizing irrelevant content in the retrieved set.

Context Precision measures whether the retrieved documents contain only relevant information and whether that information ranks correctly by importance. The RAGAS implementation evaluates each context chunk individually, checking if it contributes to answering the query.

A retrieval system returning mostly relevant documents but interspersed with noise scores lower than one returning only relevant documents. Similarly, placing critical information at low rank when it could be at high rank reduces the precision score.

from ragas.metrics import context_precision
from ragas import evaluate
from ragas.dataset import Dataset

# Example comparing clean retrieval vs noisy retrieval
clean_retrieval = [
    {
        "user_input": "What symptoms indicate a heart attack in women?",
        "retrieved_contexts": [
            "Women may experience subtle heart attack symptoms including "
            "unusual fatigue, sleep disturbances, and shortness of breath. "
            "Chest pain may be less severe than in men."
        ],
        "response": "Women experiencing heart attack symptoms may have "
                   "unusual fatigue, sleep disturbances, shortness of breath, "
                   "and possibly milder chest pain than commonly expected."
    }
]

noisy_retrieval = [
    {
        "user_input": "What symptoms indicate a heart attack in women?",
        "retrieved_contexts": [
            "The gym is open 24 hours for members.",
            "Women may experience subtle heart attack symptoms including "
            "unusual fatigue, sleep disturbances, and shortness of breath. "
            "Chest pain may be less severe than in men.",
            "The cafeteria serves lunch from 11am-2pm."
        ],
        "response": "Women experiencing heart attack symptoms may have "
                   "unusual fatigue, sleep disturbances, shortness of breath, "
                   "and possibly milder chest pain than commonly expected."
    }
]

clean_ds = Dataset.from_list(clean_retrieval)
noisy_ds = Dataset.from_list(noisy_retrieval)

clean_result = evaluate(clean_ds, metrics=[context_precision])
noisy_result = evaluate(noisy_ds, metrics=[context_precision])

print(f"Clean retrieval: {clean_result['context_precision']:.2f}")
print(f"Noisy retrieval: {noisy_result['context_precision']:.2f}")
# Clean retrieval: 1.00
# Noisy retrieval: 0.67

The noisy retrieval example illustrates how irrelevant documents degrade precision. The gym hours and cafeteria information add no value for the medical query. The RAGAS metric penalizes this inflation of the context with non-contributing chunks.

Context Precision matters most when context length affects downstream performance. Including irrelevant documents increases token count, raises latency, and dilutes the attention signal that language models apply to relevant content. For RAG systems that truncate context, removing noise directly improves relevant content inclusion.

Low Context Precision often indicates chunking problems. When documents get split without regard for semantic boundaries, a chunk may contain noise from adjacent content. If a chunk includes a relevant paragraph alongside an irrelevant table, precision suffers. Improving chunk boundaries directly improves Context Precision.

EXERCISE

Inspect sample retrieval outputs from your system. For each retrieved document chunk, manually rate whether it contributes to answering the query (1) or not (0). Calculate the proportion of contributing chunks. Compare this self-assessed score to your system's actual RAGAS Context Precision. When they differ, the chunking or retrieval logic needs examination.

← Chapter 7
Answer Relevance
Chapter 9 →
Context Recall