RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /RAG Systems: Part 2
  6. /Ch. 8
RAG Systems: Part 2

08. Hybrid Search (Dense + Sparse)

Chapter 8 of 22 · 25 min
KEY INSIGHT

Hybrid search combines dense (semantic) and sparse (keyword) retrieval to capture complementary strengths, with optimal weighting determined empirically on your specific data and queries.

Dense retrieval captures semantic meaning but misses exact keyword matches. Sparse retrieval (BM25) excels at exact matching but ignores semantic relationships. Hybrid search combines both to leverage their complementary strengths.

The Complementary Strengths Problem

Dense embeddings are effective but imperfect. They struggle with:

  • Exact terminology matches ("ICD-12 code" vs "medical billing code")
  • Product names, proper nouns, and domain-specific jargon
  • Numerical precision ("within 3 business days" vs "within 5 business days")
  • Negation ("NOT covered" vs "covered")

Sparse retrieval (BM25) is based on term frequency statistics:

score(D, Q) = Σ IDF(term) × (term_frequency_in_D × (k1 + 1)) / 
                        (term_frequency_in_D + k1 × (1 - b + b × |D|/avgdl))

Where k1 controls term frequency saturation, b controls document length normalization.

Sparse retrieval on its own fails when queries use synonyms ("vehicle" vs "car" vs "automobile") or when semantic understanding is needed.

Implementing Hybrid Search

The standard hybrid search architecture:

Query
  ├── Embed Query → Dense Retrieval → Dense Scores
  └── BM25 Scoring → Sparse Scores
           ↓
    Combine Scores (RRF or weighted)
           ↓
       Final Ranking
from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer
import numpy as np

class HybridRetriever:
    def __init__(self, documents, dense_model='sentence-transformers/all-MiniLM-L6-v2'):
        self.documents = documents
        self.contents = [doc['content'] for doc in documents]
        
        # Setup sparse retrieval (BM25)
        tokenized_corpus = [doc.split() for doc in self.contents]
        self.bm25 = BM25Okapi(tokenized_corpus)
        
        # Setup dense retrieval
        self.dense_model = SentenceTransformer(dense_model)
        self.dense_embeddings = self.dense_model.encode(self.contents)
    
    def retrieve(self, query, k=20, alpha=0.5):
        """
        Hybrid retrieval combining dense and sparse.
        
        Args:
            query: Query string
            k: Number of results to return
            alpha: Weight for dense vs sparse (0=all sparse, 1=all dense)
        """
        # Dense retrieval
        query_embedding = self.dense_model.encode([query])
        dense_scores = self._cosine_similarity(query_embedding, self.dense_embeddings)
        
        # Sparse retrieval
        tokenized_query = query.split()
        sparse_scores = self.bm25.get_scores(tokenized_query)
        sparse_scores = self._normalize(sparse_scores)
        
        # Combine scores
        combined_scores = alpha * dense_scores + (1 - alpha) * sparse_scores
        
        # Get top-k results
        top_indices = np.argsort(combined_scores)[::-1][:k]
        
        return [
            {
                'document': self.documents[i]['content'],
                'score': combined_scores[i],
                'dense_score': dense_scores[i],
                'sparse_score': sparse_scores[i],
                'metadata': self.documents[i].get('metadata', {})
            }
            for i in top_indices
        ]
    
    def _cosine_similarity(self, query_vec, doc_vecs):
        """Compute cosine similarity between query and documents."""
        similarities = np.dot(doc_vecs, query_vec.T).flatten()
        norms = np.linalg.norm(doc_vecs, axis=1) * np.linalg.norm(query_vec)
        return similarities / norms
    
    def _normalize(self, scores):
        """Min-max normalize scores to [0, 1]."""
        if scores.max() == scores.min():
            return np.ones_like(scores) * 0.5
        return (scores - scores.min()) / (scores.max() - scores.min())

Weighting Strategies

Alpha controls dense vs. sparse contribution. The optimal value depends on your use case:

  • Alpha near 1.0 (0.7-0.9): Dense dominant. Best when queries use synonyms, when documents use precise technical language, when semantic understanding matters.

  • Alpha near 0.5 (0.4-0.6): Balanced. Good default starting point. Many applications perform well in this range.

  • Alpha near 0.0 (0.1-0.3): Sparse dominant. Best when queries contain exact terminology, product codes, proper nouns, or when LLM context would help with semantic gaps anyway.

The correct approach is to tune alpha on your evaluation set:

def tune_alpha(documents, queries, relevant_labels, alpha_values=[0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0]):
    """Find optimal alpha through grid search."""
    retriever = HybridRetriever(documents)
    results = {}
    
    for alpha in alpha_values:
        aggregate_recall = 0
        for query, relevant_docs in zip(queries, relevant_labels):
            retrieved = retriever.retrieve(query, k=50, alpha=alpha)
            retrieved_ids = [doc['metadata'].get('doc_id') for doc in retrieved]
            
            # Calculate recall for this query
            recall = len(set(retrieved_ids) & set(relevant_docs)) / len(relevant_docs)
            aggregate_recall += recall
        
        results[alpha] = aggregate_recall / len(queries)
    
    return results

Elasticsearch and Weaviate Implementations

Production vector databases often include native hybrid search:

# Elasticsearch with hybrid search
from elasticsearch import Elasticsearch

es = Elasticsearch(['http://localhost:9200'])

def es_hybrid_search(query, index_name, k=20, sparse_weight=0.5, dense_weight=0.5):
    """
    Elasticsearch native hybrid search usingbm25 and knn.
    """
    response = es.search(
        index=index_name,
        query={
            "bool": {
                "should": [
                    {"match": {"content": query}}  # Sparse component
                ]
            }
        },
        knn={
            "field": "embedding",
            "query_vector": embed_query(query),
            "k": 50,
            "num_candidates": 100
        },
        weight={
            "RRF": {
                "window_size": k,
                "rank_constant": 60
            }
        },
        size=k
    )
    
    return [
        {
            'document': hit['_source']['content'],
            'score': hit['_score'],
            'metadata': hit['_source'].get('metadata', {})
        }
        for hit in response['hits']['hits']
    ]
EXERCISE

Implement hybrid search using RRF on your document collection. Compare precision at k=10 for dense-only, sparse-only, and hybrid. Vary the alpha parameter from 0 to 1 in 0.2 increments and identify the optimal blend.

← Chapter 7
Query Expansion
Chapter 9 →
Reciprocal Rank Fusion