09. Reciprocal Rank Fusion

Chapter 9 of 22 · 25 min

Reciprocal Rank Fusion (RRF) combines rankings from multiple retrieval methods into a single unified ranking. Unlike score-level combination (which requires comparable score distributions), RRF uses only rank positions, making it resilient to heterogeneous retrieval methods.

The RRF Formula

Given retrieval results from multiple rankers, RRF computes:

RRF_score(doc d) = Σ 1 / (k + rank_r(d))

Where:

  • k is a constant (typically 60)
  • rank_r(d) is the rank of document d in retrieval result r
  • The sum is across all result sets being fused

Higher k values make the fusion more conservative, reducing the impact of rank differences.

Implementing RRF

from collections import defaultdict

def reciprocal_rank_fusion(results_list, k=60, top_k=20):
    """
    Combine multiple ranked lists using RRF.
    
    Args:
        results_list: List of retrieval result lists, each sorted by relevance
                      Each list contains dicts with 'document' and rank inferred from position
        k: RRF constant, higher = more conservative fusion
        top_k: Number of results to return
    
    Returns:
        Fused results sorted by RRF score
    """
    rrf_scores = defaultdict(float)
    doc_metadata = {}
    
    for results in results_list:
        for rank, doc in enumerate(results):
            doc_id = doc.get('document_id', doc['document'])
            rrf_scores[doc_id] += 1 / (k + rank + 1)  # rank is 0-indexed
            doc_metadata[doc_id] = doc.get('metadata', {})
    
    # Sort by RRF score
    sorted_docs = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
    
    return [
        {
            'document': doc_id,
            score: score,
            'metadata': doc_metadata[doc_id]
        }
        for doc_id, score in sorted_docs[:top_k]
    ]

RRF in Hybrid Search Context

RRF naturally fits hybrid search where dense and sparse retrieval produce non-comparable scores:

def hybrid_search_with_rrf(query, vectorstore, bm25_index, documents, k=20):
    """
    Hybrid search using RRF to combine dense and sparse rankings.
    """
    # Dense retrieval
    dense_results = vectorstore.similarity_search(query, k=50)
    dense_ranked = [
        {'document': doc.page_content, 'metadata': doc.metadata}
        for doc in dense_results
    ]
    
    # Sparse retrieval
    tokenized_query = query.split()
    bm25_scores = bm25_index.get_scores(tokenized_query)
    top_sparse_indices = np.argsort(bm25_scores)[::-1][:50]
    sparse_ranked = [
        {'document': documents[i]['content'], 'metadata': documents[i].get('metadata', {})}
        for i in top_sparse_indices
    ]
    
    # Fuse with RRF
    fused_results = reciprocal_rank_fusion([dense_ranked, sparse_ranked], k=60, top_k=k)
    
    return fused_results

RRF vs. Score Combination

Score-level combination (e.g., alpha * dense_scores + sparse_scores) assumes scores are comparable. They rarely are:

  • BM25 scores are unbounded, can be any positive number
  • Cosine similarity scores are bounded [-1, 1] but often clustered near specific values
  • Cross-encoder scores depend on model calibration

RRF avoids this problem entirely by using only rank positions. A document ranked #1 gets the same RRF contribution regardless of how its underlying score compares to others.

Method Requires Score Normalization Sensitive to Score Outliers Handles Missing Results
Score Addition Yes Yes No
Score Multiplication Yes Yes No
RRF No No Yes

Tuning the k Parameter

The k constant in RRF controls how much bonus a top-ranked document receives relative to a lower-ranked one:

  • k = 60 (default): Balanced. Documents at rank 1 get 1/61 ≈ 0.016 contribution. Documents at rank 10 get 1/70 ≈ 0.014. The difference is small.

  • k = 10: Aggressive. Rank 1 gets 1/11 ≈ 0.091. Rank 10 gets 1/20 = 0.050. Top ranks matter much more.

  • k = 100: Conservative. Rank 1 gets 1/101 ≈ 0.010. Rank 10 gets 1/110 ≈ 0.009. Many documents get similar contributions.

def tune_rrf_k(results_list, relevant_doc_ids, k_values=[10, 20, 40, 60, 80, 100]):
    """Find optimal k for RRF on your evaluation set."""
    metrics = {}
    
    for k in k_values:
        fused = reciprocal_rank_fusion(results_list, k=k, top_k=50)
        fused_ids = [doc['document'] for doc in fused]
        
        recall = len(set(fused_ids) & set(relevant_doc_ids)) / len(relevant_doc_ids)
        
        # Mean Reciprocal Rank
        mrr = 0
        for i, doc_id in enumerate(fused_ids):
            if doc_id in relevant_doc_ids:
                mrr = 1 / (i + 1)
                break
        
        metrics[k] = {'recall': recall, 'mrr': mrr}
    
    return metrics
EXERCISE

Implement hybrid search combining dense semantic search and BM25 using RRF. Evaluate recall at k=10, 20, 50 with different alpha values and different RRF k values. Report which combination performs best.