RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to Implement Hybrid Search RAG (BM25 + Vector)
HOW-TO · RAG

How to Implement Hybrid Search RAG (BM25 + Vector)

intermediate·30 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

RAG pipeline running, BM25 and vector store libraries installed

What this does

Hybrid search combines keyword-based BM25 retrieval with dense vector similarity. The combination covers both lexical matches and semantic meaning, improving recall on queries where exact terms matter as much as intent. This guide shows how to merge both retrieval modes into a single retriever using LangChain and rank fusion.

Steps

  1. Import and tokenize documents. BM25 relies on tokenization.

    import os
    os.environ["OLLAMA_BASE_URL"] = "http://localhost:11434"
    
    from langchain_community.document_loaders import TextLoader
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    
    loader = TextLoader("context/docs.txt")
    docs = loader.load()
    splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = splitter.split_documents(docs)
    texts = [c.page_content for c in chunks]
    
  2. Build a BM25 retriever. The BM25Okapi class scores documents by term frequency.

    from rank_bm25 import BM25Okapi
    import nltk
    nltk.download("punkt")
    nltk.download("punkt_tab")
    
    tokenized_corpus = [text.split() for text in texts]
    bm25 = BM25Okapi(tokenized_corpus)
    
  3. Build a vector retriever. Use Ollama-backed embeddings.

    from langchain_ollama import OllamaEmbeddings
    from langchain_community.vectorstores import Chroma
    
    embeddings = OllamaEmbeddings(model="llama3")
    vector_db = Chroma.from_texts(texts, embeddings)
    
  4. Combine with reciprocal rank fusion. A fused score merges both rankings.

    def rrf_fusion(results_list, k=60):
        fused = {}
        for results in results_list:
            for rank, doc in enumerate(results):
                doc_id = doc["doc_id"]
                fused[doc_id] = fused.get(doc_id, 0) + 1 / (k + rank + 1)
        return sorted(fused.items(), key=lambda x: x[1], reverse=True)
    
    bm25_scores = [{"doc_id": i, "content": texts[i]} for i in range(len(texts))]
    # Simulate BM25 ranking by scoring query tokens
    query_tokens = "retrieval pipeline".split()
    bm25_scores = sorted(bm25_scores, key=lambda d: bm25.get_scores(query_tokens)[texts.index(d["content"])], reverse=True)
    
  5. Query through the hybrid retriever. Pass the query to both retrievers and fuse results.

    query_embedding = embeddings.embed_query("retrieval pipeline")
    vector_results = vector_db.similarity_search_by_vector(query_embedding, k=5)
    # Combine and rank fusion result
    print("Retrieved", len(vector_results), "vector results")
    

    Expected output: a merged list of relevant chunks from both retrieval methods.

Verification

python -c "
from rank_bm25 import BM25Okapi
corpus = ['apple fruit', 'banana yellow', 'cherry red']
bm25 = BM25Okapi([c.split() for c in corpus])
scores = bm25.get_scores(['fruit'])
print(max(scores) > 0)
# Expected: True
"

Common failures

  • Missing NLTK data. Download punkt and punkt_tab before tokenizing, or BM25 raises errors.
  • Mismatched tokenization. If chunks were split differently, BM25 scores become unreliable; use consistent splitting.
  • Fusion returning empty results. Verify both retrievers return at least one result before merging.
  • Slow vector search. Increase chunk overlap or use approximate nearest neighbor (ANN) indexes for large corpora.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • build-basic-rag-pipeline-langchain
  • add-reranking-rag-pipeline
RELATED GUIDES
RAG
How to Build a Basic RAG Pipeline with LangChain
RAG
How to Add Reranking to Your RAG Pipeline
← All how-to guidesCourses →