03. Mean Reciprocal Rank
Mean Reciprocal Rank (MRR) improves on Hit Rate by accounting for position. Instead of checking whether any relevant document appears in the results, MRR rewards systems that place relevant documents first.
The reciprocal rank for a single query is 1 divided by the position of the first relevant document. If the first document is relevant, the reciprocal rank is 1. If relevant documents appear at positions 3 and 5, the reciprocal rank is 1/3. If no relevant document appears, the reciprocal rank is 0.
def mean_reciprocal_rank(results: list[list[str]], relevance: list[set[str]]) -> float:
"""
Calculate Mean Reciprocal Rank.
Args:
results: List of ranked doc IDs for each query
relevance: List of sets containing relevant doc IDs for each query
Returns:
Average reciprocal rank across all queries
"""
reciprocal_ranks = []
for retrieved, relevant in zip(results, relevance):
rr = 0.0
for position, doc_id in enumerate(retrieved, start=1):
if doc_id in relevant:
rr = 1.0 / position
break
reciprocal_ranks.append(rr)
return sum(reciprocal_ranks) / len(reciprocal_ranks)
# Example with comparison to Hit Rate
results = [
["doc_A", "doc_B", "doc_C"],
["doc_D", "doc_E", "doc_F"],
["doc_G", "doc_H", "doc_I"],
]
relevance = [
{"doc_A"}, # Relevant at position 1
{"doc_F"}, # Relevant at position 3
{"doc_K"}, # No relevant documents
]
hr = hit_rate(results, relevance)
mrr = mean_reciprocal_rank(results, relevance)
print(f"Hit Rate: {hr}") # 0.667 (2 hits out of 3)
print(f"MRR: {mrr}") # 0.444 ((1 + 1/3 + 0) / 3)
The example shows why MRR matters. Query 2 has a hit (doc_F appears), so it contributes to Hit Rate. But doc_F appears at position 3, so it contributes only 1/3 to MRR. A system that retrieves doc_F at position 3 is genuinely worse than a system retrieving it at position 1.
MRR is sensitive to early failures. A relevant document at position 1 contributes 1.0 to the average. A relevant document at position 10 contributes 0.1. Gaps in ranking quality produce multiplicative drops in MRR compared to Hit Rate.
For RAG applications, MRR matters when the first retrieved document disproportionately influences the generated answer. If the RAG pipeline passes only top-k documents to the language model, getting the best document first has compounding benefits for downstream generation quality.
Compare Hit Rate and MRR on your retrieval results. Calculate both metrics for your seed queries. If MRL is significantly lower than Hit Rate, your system retrieves relevant information but ranks it poorly—improving sort order would have substantial impact.