Cross-Encoder Setup — Advanced RAG — Chunking, Retrieval, Re-ranking (Chapter 9)

Cross-encoders jointly encode query-document pairs, enabling precise relevance scoring at the cost of computation time. They serve as rerankers that refine initial retrieval results.

Architecture difference: Bi-encoders (used in dense retrieval) encode queries and documents independently, producing embeddings compared via similarity. Cross-encoders concatenate query and document, producing a single relevance score.

When to use cross-encoders: After initial retrieval narrows candidates to a manageable set (typically 50-100). Full cross-encoder scoring over millions of documents is computationally prohibitive.

from sentence_transformers import CrossEncoder

class CrossEncoderReranker:
    def __init__(self, model_name: str, max_length: int = 512):
        """
        Initialize cross-encoder reranker.
        
        Args:
            model_name: Hugging Face model identifier (e.g., 'cross-encoder/ms-marco-MiniLM-L-6-v2')
            max_length: Maximum sequence length
        """
        self.model = CrossEncoder(model_name, max_length=max_length)
    
    def rerank(self, query: str, candidates: List[dict], top_k: int = 10) -> List[dict]:
        """
        Rerank candidate documents by cross-encoder relevance scores.
        
        Args:
            query: User query string
            candidates: List of dicts with 'text' or 'content' field
            top_k: Number of results to return
        
        Returns:
            Reranked list with relevance scores
        """
        # Prepare query-document pairs
        doc_texts = []
        for candidate in candidates:
            text = candidate.get('text', candidate.get('content', ''))
            doc_texts.append(text)
        
        pairs = [(query, doc) for doc in doc_texts]
        
        # Get relevance scores
        scores = self.model.predict(pairs)
        
        # Combine with original metadata and sort
        scored_candidates = []
        for candidate, score in zip(candidates, scores):
            scored = candidate.copy()
            scored['cross_encoder_score'] = float(score)
            scored_candidates.append(scored)
        
        scored_candidates.sort(key=lambda x: x['cross_encoder_score'], reverse=True)
        
        return scored_candidates[:top_k]

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.