09. Cross-Encoder Setup
Cross-encoders jointly encode query-document pairs, enabling precise relevance scoring at the cost of computation time. They serve as rerankers that refine initial retrieval results.
Architecture difference: Bi-encoders (used in dense retrieval) encode queries and documents independently, producing embeddings compared via similarity. Cross-encoders concatenate query and document, producing a single relevance score.
When to use cross-encoders: After initial retrieval narrows candidates to a manageable set (typically 50-100). Full cross-encoder scoring over millions of documents is computationally prohibitive.
from sentence_transformers import CrossEncoder
class CrossEncoderReranker:
def __init__(self, model_name: str, max_length: int = 512):
"""
Initialize cross-encoder reranker.
Args:
model_name: Hugging Face model identifier (e.g., 'cross-encoder/ms-marco-MiniLM-L-6-v2')
max_length: Maximum sequence length
"""
self.model = CrossEncoder(model_name, max_length=max_length)
def rerank(self, query: str, candidates: List[dict], top_k: int = 10) -> List[dict]:
"""
Rerank candidate documents by cross-encoder relevance scores.
Args:
query: User query string
candidates: List of dicts with 'text' or 'content' field
top_k: Number of results to return
Returns:
Reranked list with relevance scores
"""
# Prepare query-document pairs
doc_texts = []
for candidate in candidates:
text = candidate.get('text', candidate.get('content', ''))
doc_texts.append(text)
pairs = [(query, doc) for doc in doc_texts]
# Get relevance scores
scores = self.model.predict(pairs)
# Combine with original metadata and sort
scored_candidates = []
for candidate, score in zip(candidates, scores):
scored = candidate.copy()
scored['cross_encoder_score'] = float(score)
scored_candidates.append(scored)
scored_candidates.sort(key=lambda x: x['cross_encoder_score'], reverse=True)
return scored_candidates[:top_k]
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Measure latency (P50, P95, P99) for cross-encoder reranking of 100 candidates. Compare against pure dense retrieval latency.