RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /RAG Evaluation and Metrics
  6. /Ch. 3
RAG Evaluation and Metrics

03. Mean Reciprocal Rank

Chapter 3 of 18 · 15 min
KEY INSIGHT

MRR penalizes retrieval systems that bury relevant content, making position-optimized ranking improvements measurable.

Mean Reciprocal Rank (MRR) improves on Hit Rate by accounting for position. Instead of checking whether any relevant document appears in the results, MRR rewards systems that place relevant documents first.

The reciprocal rank for a single query is 1 divided by the position of the first relevant document. If the first document is relevant, the reciprocal rank is 1. If relevant documents appear at positions 3 and 5, the reciprocal rank is 1/3. If no relevant document appears, the reciprocal rank is 0.

def mean_reciprocal_rank(results: list[list[str]], relevance: list[set[str]]) -> float:
    """
    Calculate Mean Reciprocal Rank.
    
    Args:
        results: List of ranked doc IDs for each query
        relevance: List of sets containing relevant doc IDs for each query
    Returns:
        Average reciprocal rank across all queries
    """
    reciprocal_ranks = []
    
    for retrieved, relevant in zip(results, relevance):
        rr = 0.0
        for position, doc_id in enumerate(retrieved, start=1):
            if doc_id in relevant:
                rr = 1.0 / position
                break
        reciprocal_ranks.append(rr)
    
    return sum(reciprocal_ranks) / len(reciprocal_ranks)


# Example with comparison to Hit Rate
results = [
    ["doc_A", "doc_B", "doc_C"],
    ["doc_D", "doc_E", "doc_F"],
    ["doc_G", "doc_H", "doc_I"],
]

relevance = [
    {"doc_A"},  # Relevant at position 1
    {"doc_F"},  # Relevant at position 3
    {"doc_K"},  # No relevant documents
]

hr = hit_rate(results, relevance)
mrr = mean_reciprocal_rank(results, relevance)

print(f"Hit Rate: {hr}")       # 0.667 (2 hits out of 3)
print(f"MRR: {mrr}")          # 0.444 ((1 + 1/3 + 0) / 3)

The example shows why MRR matters. Query 2 has a hit (doc_F appears), so it contributes to Hit Rate. But doc_F appears at position 3, so it contributes only 1/3 to MRR. A system that retrieves doc_F at position 3 is genuinely worse than a system retrieving it at position 1.

MRR is sensitive to early failures. A relevant document at position 1 contributes 1.0 to the average. A relevant document at position 10 contributes 0.1. Gaps in ranking quality produce multiplicative drops in MRR compared to Hit Rate.

For RAG applications, MRR matters when the first retrieved document disproportionately influences the generated answer. If the RAG pipeline passes only top-k documents to the language model, getting the best document first has compounding benefits for downstream generation quality.

EXERCISE

Compare Hit Rate and MRR on your retrieval results. Calculate both metrics for your seed queries. If MRL is significantly lower than Hit Rate, your system retrieves relevant information but ranks it poorly—improving sort order would have substantial impact.

← Chapter 2
Retrieval Metrics: Hit Rate
Chapter 4 →
NDCG Explained