07. Hybrid Search with RRF
Reciprocal Rank Fusion (RRF) combines multiple retrieval strategies by combining their result lists rather than their scores directly.
Why score normalization fails: Different retrieval methods produce scores on incompatible scales. Dense retrieval might output values between 0.5 and 0.9, while BM25 outputs values between 0 and 50. Naive score averaging produces unpredictable weighting.
RRF formula: score(d) = Σ(1 / (k + rank_i(d))) where k is a constant (typically 60) and rank_i is the position of document d in list i.
def reciprocal_rank_fusion(results_by_strategy: List[List[dict]], k: int = 60) -> List[dict]:
"""
Fuse multiple retrieval result lists using RRF.
Args:
results_by_strategy: List of ranked result lists from different strategies
k: Constant controlling contribution of lower-ranked results
Returns:
Fused results sorted by RRF score
"""
doc_scores = defaultdict(float)
doc_metadata = {}
for strategy_results in results_by_strategy:
for rank, result in enumerate(strategy_results, start=1):
doc_id = result.get('doc_id', result.get('index', rank))
# RRF contribution
doc_scores[doc_id] += 1 / (k + rank)
# Preserve metadata from first occurrence
if doc_id not in doc_metadata:
doc_metadata[doc_id] = result
# Sort by fused score
sorted_docs = sorted(doc_scores.items(), key=lambda x: x[1], reverse=True)
fused_results = []
for doc_id, score in sorted_docs:
result = doc_metadata[doc_id].copy()
result['rrf_score'] = score
result['doc_id'] = doc_id
fused_results.append(result)
return fused_results
# Usage example
dense_results = dense_retriever.search(query, top_k=100)
sparse_results = bm25_retriever.search(query, top_k=100)
fused = reciprocal_rank_fusion([dense_results, sparse_results])
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Compare RRF fusion against score-level normalization (min-max scaling before averaging) on a set of 100 queries. Measure precision@10 difference.