15. Document Re-ranking
Initial retrieval uses fast embedding similarity. Re-ranking uses a more expensive model to score retrieved documents for actual relevance to the query.
Cross-Encoder Reranking
Cross-encoders jointly encode the query and document, producing a single relevance score. They're slower than bi-encoder similarity but more accurate.
from sentence_transformers import CrossEncoder
class Reranker:
def __init__(self, model_name: str = "cross-encoder/ms-marco-MiniLM-L-6-v2"):
self.model = CrossEncoder(model_name)
def rerank(self, query: str, documents: list[str], top_k: int = 10) -> list:
"""Re-rank documents by cross-encoder score."""
# Model expects query-document pairs
pairs = [(query, doc) for doc in documents]
# Get relevance scores
scores = self.model.predict(pairs)
# Sort by score descending
ranked_indices = np.argsort(scores)[::-1][:top_k]
return [(documents[i], scores[i]) for i in ranked_indices]
Reciprocal Rank Fusion
When combining results from multiple retrieval methods, RRF combines their rankings.
def reciprocal_rank_fusion(ranking_lists: list[list],
k: int = 60) -> list:
"""Combine rankings using reciprocal rank fusion."""
# Score each document across all rankings
doc_scores = {}
for ranking in ranking_lists:
for rank, doc_id in enumerate(ranking):
if doc_id not in doc_scores:
doc_scores[doc_id] = 0
# RRF formula: 1 / (k + rank)
doc_scores[doc_id] += 1 / (k + rank)
# Sort by fused score
sorted_docs = sorted(doc_scores.items(), key=lambda x: x[1], reverse=True)
return sorted_docs
Learning-to-Rank with LambdaMART
For production systems with labeled data, train a custom ranker.
from sklearn.ensemble import GradientBoostingRegressor
def train_ltr_ranker(training_data: list):
"""Train a simple LTR model using LambdaMART-style features."""
# Features: BM25 score, embedding similarity, term overlap, position
X = []
y = []
for query, doc, label in training_data:
features = [
bm25_score(query, doc),
embedding_similarity(query, doc),
term_overlap_ratio(query, doc),
first_occurrence_position(doc)
]
X.append(features)
y.append(label)
model = GradientBoostingRegressor(n_estimators=100)
model.fit(X, y)
return model
Handling Ties and Edge Cases
Reranking can produce ties when documents score similarly. Break ties by selecting the document with higher initial retrieval score. For very long documents, truncate to a maximum length before reranking to avoid position bias.
Implement a two-stage retriever that uses embeddings for initial retrieval and a cross-encoder for re-ranking. Compare precision@10 with and without re-ranking on a test set of 50 queries.