RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Advanced RAG — Chunking, Retrieval, Re-ranking
  6. /Ch. 5
Advanced RAG — Chunking, Retrieval, Re-ranking

05. Dense Retrieval Deep Dive

Chapter 5 of 24 · 15 min
KEY INSIGHT

Embedding model selection has larger impact on retrieval quality than index implementation details.

Dense retrieval maps queries and documents into a shared embedding space where semantic similarity translates to geometric proximity.

Embedding models determine retrieval quality. BERT-based models (e.g., sentence-transformers/all-MiniLM-L6-v2) capture semantic relationships but may miss domain-specific terminology. Models fine-tuned on retrieval tasks (e.g., colbertv2.6) perform better on passage-level matching.

HNSW indexing enables fast approximate nearest neighbor search. The trade-off: query speed vs. recall. Higher efConstruction during indexing improves recall at the cost of memory and build time. Higher ef at query time improves recall at the cost of latency.

import numpy as np
from sentence_transformers import SentenceTransformer

class DenseRetriever:
    def __init__(self, model_name: str, dimension: int = 384):
        self.model = SentenceTransformer(model_name)
        self.dimension = dimension
        self.vectors = []
        self.metadata = []
    
    def index(self, texts: List[str], metadata: List[dict] = None):
        """Index documents with their embeddings."""
        embeddings = self.model.encode(texts, show_progress_bar=True)
        
        for i, (text, embedding) in enumerate(zip(texts, embeddings)):
            self.vectors.append(embedding)
            meta = metadata[i] if metadata else {}
            meta['text'] = text
            self.metadata.append(meta)
        
        self.vectors = np.array(self.vectors).astype('float32')
        # Normalize for cosine similarity
        norms = np.linalg.norm(self.vectors, axis=1, keepdims=True)
        self.vectors = self.vectors / norms
    
    def search(self, query: str, top_k: int = 10) -> List[dict]:
        """Retrieve top-k similar documents."""
        query_embedding = self.model.encode([query])[0]
        query_embedding = query_embedding / np.linalg.norm(query_embedding)
        
        # Cosine similarity = dot product for normalized vectors
        similarities = np.dot(self.vectors, query_embedding)
        
        # Get top-k indices
        top_indices = np.argsort(similarities)[::-1][:top_k]
        
        results = []
        for idx in top_indices:
            results.append({
                'text': self.metadata[idx].get('text', ''),
                'score': float(similarities[idx]),
                'metadata': {k: v for k, v in self.metadata[idx].items() 
                           if k != 'text'}
            })
        
        return results

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Compare retrieval quality (hit rate at k=10) between two embedding models on a benchmark dataset. Use trec_eval for standardized evaluation.

← Chapter 4
Multi-Strategy Retrieval
Chapter 6 →
Sparse Retrieval BM25