Hybrid Search with RRF — Advanced RAG — Chunking, Retrieval, Re-ranking (Chapter 7)

Reciprocal Rank Fusion (RRF) combines multiple retrieval strategies by combining their result lists rather than their scores directly.

Why score normalization fails: Different retrieval methods produce scores on incompatible scales. Dense retrieval might output values between 0.5 and 0.9, while BM25 outputs values between 0 and 50. Naive score averaging produces unpredictable weighting.

RRF formula: score(d) = Σ(1 / (k + rank_i(d))) where k is a constant (typically 60) and rank_i is the position of document d in list i.

def reciprocal_rank_fusion(results_by_strategy: List[List[dict]], k: int = 60) -> List[dict]:
    """
    Fuse multiple retrieval result lists using RRF.
    
    Args:
        results_by_strategy: List of ranked result lists from different strategies
        k: Constant controlling contribution of lower-ranked results
    
    Returns:
        Fused results sorted by RRF score
    """
    doc_scores = defaultdict(float)
    doc_metadata = {}
    
    for strategy_results in results_by_strategy:
        for rank, result in enumerate(strategy_results, start=1):
            doc_id = result.get('doc_id', result.get('index', rank))
            # RRF contribution
            doc_scores[doc_id] += 1 / (k + rank)
            # Preserve metadata from first occurrence
            if doc_id not in doc_metadata:
                doc_metadata[doc_id] = result
    
    # Sort by fused score
    sorted_docs = sorted(doc_scores.items(), key=lambda x: x[1], reverse=True)
    
    fused_results = []
    for doc_id, score in sorted_docs:
        result = doc_metadata[doc_id].copy()
        result['rrf_score'] = score
        result['doc_id'] = doc_id
        fused_results.append(result)
    
    return fused_results

# Usage example
dense_results = dense_retriever.search(query, top_k=100)
sparse_results = bm25_retriever.search(query, top_k=100)

fused = reciprocal_rank_fusion([dense_results, sparse_results])

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.