05. Dense Retrieval Deep Dive
Dense retrieval maps queries and documents into a shared embedding space where semantic similarity translates to geometric proximity.
Embedding models determine retrieval quality. BERT-based models (e.g., sentence-transformers/all-MiniLM-L6-v2) capture semantic relationships but may miss domain-specific terminology. Models fine-tuned on retrieval tasks (e.g., colbertv2.6) perform better on passage-level matching.
HNSW indexing enables fast approximate nearest neighbor search. The trade-off: query speed vs. recall. Higher efConstruction during indexing improves recall at the cost of memory and build time. Higher ef at query time improves recall at the cost of latency.
import numpy as np
from sentence_transformers import SentenceTransformer
class DenseRetriever:
def __init__(self, model_name: str, dimension: int = 384):
self.model = SentenceTransformer(model_name)
self.dimension = dimension
self.vectors = []
self.metadata = []
def index(self, texts: List[str], metadata: List[dict] = None):
"""Index documents with their embeddings."""
embeddings = self.model.encode(texts, show_progress_bar=True)
for i, (text, embedding) in enumerate(zip(texts, embeddings)):
self.vectors.append(embedding)
meta = metadata[i] if metadata else {}
meta['text'] = text
self.metadata.append(meta)
self.vectors = np.array(self.vectors).astype('float32')
# Normalize for cosine similarity
norms = np.linalg.norm(self.vectors, axis=1, keepdims=True)
self.vectors = self.vectors / norms
def search(self, query: str, top_k: int = 10) -> List[dict]:
"""Retrieve top-k similar documents."""
query_embedding = self.model.encode([query])[0]
query_embedding = query_embedding / np.linalg.norm(query_embedding)
# Cosine similarity = dot product for normalized vectors
similarities = np.dot(self.vectors, query_embedding)
# Get top-k indices
top_indices = np.argsort(similarities)[::-1][:top_k]
results = []
for idx in top_indices:
results.append({
'text': self.metadata[idx].get('text', ''),
'score': float(similarities[idx]),
'metadata': {k: v for k, v in self.metadata[idx].items()
if k != 'text'}
})
return results
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Compare retrieval quality (hit rate at k=10) between two embedding models on a benchmark dataset. Use trec_eval for standardized evaluation.