09. Reciprocal Rank Fusion
Reciprocal Rank Fusion (RRF) combines rankings from multiple retrieval methods into a single unified ranking. Unlike score-level combination (which requires comparable score distributions), RRF uses only rank positions, making it resilient to heterogeneous retrieval methods.
The RRF Formula
Given retrieval results from multiple rankers, RRF computes:
RRF_score(doc d) = Σ 1 / (k + rank_r(d))
Where:
- k is a constant (typically 60)
- rank_r(d) is the rank of document d in retrieval result r
- The sum is across all result sets being fused
Higher k values make the fusion more conservative, reducing the impact of rank differences.
Implementing RRF
from collections import defaultdict
def reciprocal_rank_fusion(results_list, k=60, top_k=20):
"""
Combine multiple ranked lists using RRF.
Args:
results_list: List of retrieval result lists, each sorted by relevance
Each list contains dicts with 'document' and rank inferred from position
k: RRF constant, higher = more conservative fusion
top_k: Number of results to return
Returns:
Fused results sorted by RRF score
"""
rrf_scores = defaultdict(float)
doc_metadata = {}
for results in results_list:
for rank, doc in enumerate(results):
doc_id = doc.get('document_id', doc['document'])
rrf_scores[doc_id] += 1 / (k + rank + 1) # rank is 0-indexed
doc_metadata[doc_id] = doc.get('metadata', {})
# Sort by RRF score
sorted_docs = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
return [
{
'document': doc_id,
score: score,
'metadata': doc_metadata[doc_id]
}
for doc_id, score in sorted_docs[:top_k]
]
RRF in Hybrid Search Context
RRF naturally fits hybrid search where dense and sparse retrieval produce non-comparable scores:
def hybrid_search_with_rrf(query, vectorstore, bm25_index, documents, k=20):
"""
Hybrid search using RRF to combine dense and sparse rankings.
"""
# Dense retrieval
dense_results = vectorstore.similarity_search(query, k=50)
dense_ranked = [
{'document': doc.page_content, 'metadata': doc.metadata}
for doc in dense_results
]
# Sparse retrieval
tokenized_query = query.split()
bm25_scores = bm25_index.get_scores(tokenized_query)
top_sparse_indices = np.argsort(bm25_scores)[::-1][:50]
sparse_ranked = [
{'document': documents[i]['content'], 'metadata': documents[i].get('metadata', {})}
for i in top_sparse_indices
]
# Fuse with RRF
fused_results = reciprocal_rank_fusion([dense_ranked, sparse_ranked], k=60, top_k=k)
return fused_results
RRF vs. Score Combination
Score-level combination (e.g., alpha * dense_scores + sparse_scores) assumes scores are comparable. They rarely are:
- BM25 scores are unbounded, can be any positive number
- Cosine similarity scores are bounded [-1, 1] but often clustered near specific values
- Cross-encoder scores depend on model calibration
RRF avoids this problem entirely by using only rank positions. A document ranked #1 gets the same RRF contribution regardless of how its underlying score compares to others.
| Method | Requires Score Normalization | Sensitive to Score Outliers | Handles Missing Results |
|---|---|---|---|
| Score Addition | Yes | Yes | No |
| Score Multiplication | Yes | Yes | No |
| RRF | No | No | Yes |
Tuning the k Parameter
The k constant in RRF controls how much bonus a top-ranked document receives relative to a lower-ranked one:
k = 60 (default): Balanced. Documents at rank 1 get 1/61 ≈ 0.016 contribution. Documents at rank 10 get 1/70 ≈ 0.014. The difference is small.
k = 10: Aggressive. Rank 1 gets 1/11 ≈ 0.091. Rank 10 gets 1/20 = 0.050. Top ranks matter much more.
k = 100: Conservative. Rank 1 gets 1/101 ≈ 0.010. Rank 10 gets 1/110 ≈ 0.009. Many documents get similar contributions.
def tune_rrf_k(results_list, relevant_doc_ids, k_values=[10, 20, 40, 60, 80, 100]):
"""Find optimal k for RRF on your evaluation set."""
metrics = {}
for k in k_values:
fused = reciprocal_rank_fusion(results_list, k=k, top_k=50)
fused_ids = [doc['document'] for doc in fused]
recall = len(set(fused_ids) & set(relevant_doc_ids)) / len(relevant_doc_ids)
# Mean Reciprocal Rank
mrr = 0
for i, doc_id in enumerate(fused_ids):
if doc_id in relevant_doc_ids:
mrr = 1 / (i + 1)
break
metrics[k] = {'recall': recall, 'mrr': mrr}
return metrics
Implement hybrid search combining dense semantic search and BM25 using RRF. Evaluate recall at k=10, 20, 50 with different alpha values and different RRF k values. Report which combination performs best.