RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /RAG Systems: Part 2
  6. /Ch. 15
RAG Systems: Part 2

15. Document Re-ranking

Chapter 15 of 22 · 20 min
KEY INSIGHT

Re-ranking applies a more expensive scoring model to initial retrieval results, improving relevance at the cost of additional latency.

Initial retrieval uses fast embedding similarity. Re-ranking uses a more expensive model to score retrieved documents for actual relevance to the query.

Cross-Encoder Reranking

Cross-encoders jointly encode the query and document, producing a single relevance score. They're slower than bi-encoder similarity but more accurate.

from sentence_transformers import CrossEncoder

class Reranker:
    def __init__(self, model_name: str = "cross-encoder/ms-marco-MiniLM-L-6-v2"):
        self.model = CrossEncoder(model_name)
    
    def rerank(self, query: str, documents: list[str], top_k: int = 10) -> list:
        """Re-rank documents by cross-encoder score."""
        
        # Model expects query-document pairs
        pairs = [(query, doc) for doc in documents]
        
        # Get relevance scores
        scores = self.model.predict(pairs)
        
        # Sort by score descending
        ranked_indices = np.argsort(scores)[::-1][:top_k]
        
        return [(documents[i], scores[i]) for i in ranked_indices]

Reciprocal Rank Fusion

When combining results from multiple retrieval methods, RRF combines their rankings.

def reciprocal_rank_fusion(ranking_lists: list[list], 
                           k: int = 60) -> list:
    """Combine rankings using reciprocal rank fusion."""
    
    # Score each document across all rankings
    doc_scores = {}
    
    for ranking in ranking_lists:
        for rank, doc_id in enumerate(ranking):
            if doc_id not in doc_scores:
                doc_scores[doc_id] = 0
            # RRF formula: 1 / (k + rank)
            doc_scores[doc_id] += 1 / (k + rank)
    
    # Sort by fused score
    sorted_docs = sorted(doc_scores.items(), key=lambda x: x[1], reverse=True)
    
    return sorted_docs

Learning-to-Rank with LambdaMART

For production systems with labeled data, train a custom ranker.

from sklearn.ensemble import GradientBoostingRegressor

def train_ltr_ranker(training_data: list):
    """Train a simple LTR model using LambdaMART-style features."""
    
    # Features: BM25 score, embedding similarity, term overlap, position
    X = []
    y = []
    
    for query, doc, label in training_data:
        features = [
            bm25_score(query, doc),
            embedding_similarity(query, doc),
            term_overlap_ratio(query, doc),
            first_occurrence_position(doc)
        ]
        X.append(features)
        y.append(label)
    
    model = GradientBoostingRegressor(n_estimators=100)
    model.fit(X, y)
    
    return model

Handling Ties and Edge Cases

Reranking can produce ties when documents score similarly. Break ties by selecting the document with higher initial retrieval score. For very long documents, truncate to a maximum length before reranking to avoid position bias.

EXERCISE

Implement a two-stage retriever that uses embeddings for initial retrieval and a cross-encoder for re-ranking. Compare precision@10 with and without re-ranking on a test set of 50 queries.

← Chapter 14
Sliding Window Context
Chapter 16 →
Caching Strategies