Document Re-ranking — RAG Systems: Part 2 (Chapter 15)

Initial retrieval uses fast embedding similarity. Re-ranking uses a more expensive model to score retrieved documents for actual relevance to the query.

Cross-Encoder Reranking

Cross-encoders jointly encode the query and document, producing a single relevance score. They're slower than bi-encoder similarity but more accurate.

from sentence_transformers import CrossEncoder

class Reranker:
    def __init__(self, model_name: str = "cross-encoder/ms-marco-MiniLM-L-6-v2"):
        self.model = CrossEncoder(model_name)
    
    def rerank(self, query: str, documents: list[str], top_k: int = 10) -> list:
        """Re-rank documents by cross-encoder score."""
        
        # Model expects query-document pairs
        pairs = [(query, doc) for doc in documents]
        
        # Get relevance scores
        scores = self.model.predict(pairs)
        
        # Sort by score descending
        ranked_indices = np.argsort(scores)[::-1][:top_k]
        
        return [(documents[i], scores[i]) for i in ranked_indices]

Reciprocal Rank Fusion

When combining results from multiple retrieval methods, RRF combines their rankings.

def reciprocal_rank_fusion(ranking_lists: list[list], 
                           k: int = 60) -> list:
    """Combine rankings using reciprocal rank fusion."""
    
    # Score each document across all rankings
    doc_scores = {}
    
    for ranking in ranking_lists:
        for rank, doc_id in enumerate(ranking):
            if doc_id not in doc_scores:
                doc_scores[doc_id] = 0
            # RRF formula: 1 / (k + rank)
            doc_scores[doc_id] += 1 / (k + rank)
    
    # Sort by fused score
    sorted_docs = sorted(doc_scores.items(), key=lambda x: x[1], reverse=True)
    
    return sorted_docs

Learning-to-Rank with LambdaMART

For production systems with labeled data, train a custom ranker.

from sklearn.ensemble import GradientBoostingRegressor

def train_ltr_ranker(training_data: list):
    """Train a simple LTR model using LambdaMART-style features."""
    
    # Features: BM25 score, embedding similarity, term overlap, position
    X = []
    y = []
    
    for query, doc, label in training_data:
        features = [
            bm25_score(query, doc),
            embedding_similarity(query, doc),
            term_overlap_ratio(query, doc),
            first_occurrence_position(doc)
        ]
        X.append(features)
        y.append(label)
    
    model = GradientBoostingRegressor(n_estimators=100)
    model.fit(X, y)
    
    return model

Handling Ties and Edge Cases

Reranking can produce ties when documents score similarly. Break ties by selecting the document with higher initial retrieval score. For very long documents, truncate to a maximum length before reranking to avoid position bias.