What this does

Hybrid search combines keyword-based BM25 retrieval with dense vector similarity. The combination covers both lexical matches and semantic meaning, improving recall on queries where exact terms matter as much as intent. This guide shows how to merge both retrieval modes into a single retriever using LangChain and rank fusion.

Steps

Import and tokenize documents. BM25 relies on tokenization.

import os
os.environ["OLLAMA_BASE_URL"] = "http://localhost:11434"

from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader("context/docs.txt")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
texts = [c.page_content for c in chunks]

Build a BM25 retriever. The BM25Okapi class scores documents by term frequency.

from rank_bm25 import BM25Okapi
import nltk
nltk.download("punkt")
nltk.download("punkt_tab")

tokenized_corpus = [text.split() for text in texts]
bm25 = BM25Okapi(tokenized_corpus)

Build a vector retriever. Use Ollama-backed embeddings.

from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OllamaEmbeddings(model="llama3")
vector_db = Chroma.from_texts(texts, embeddings)

Combine with reciprocal rank fusion. A fused score merges both rankings.

def rrf_fusion(results_list, k=60):
    fused = {}
    for results in results_list:
        for rank, doc in enumerate(results):
            doc_id = doc["doc_id"]
            fused[doc_id] = fused.get(doc_id, 0) + 1 / (k + rank + 1)
    return sorted(fused.items(), key=lambda x: x[1], reverse=True)

bm25_scores = [{"doc_id": i, "content": texts[i]} for i in range(len(texts))]
# Simulate BM25 ranking by scoring query tokens
query_tokens = "retrieval pipeline".split()
bm25_scores = sorted(bm25_scores, key=lambda d: bm25.get_scores(query_tokens)[texts.index(d["content"])], reverse=True)

Query through the hybrid retriever. Pass the query to both retrievers and fuse results.

query_embedding = embeddings.embed_query("retrieval pipeline")
vector_results = vector_db.similarity_search_by_vector(query_embedding, k=5)
# Combine and rank fusion result
print("Retrieved", len(vector_results), "vector results")

Expected output: a merged list of relevant chunks from both retrieval methods.

Verification

python -c "
from rank_bm25 import BM25Okapi
corpus = ['apple fruit', 'banana yellow', 'cherry red']
bm25 = BM25Okapi([c.split() for c in corpus])
scores = bm25.get_scores(['fruit'])
print(max(scores) > 0)
# Expected: True
"

Common failures

Missing NLTK data. Download punkt and punkt_tab before tokenizing, or BM25 raises errors.
Mismatched tokenization. If chunks were split differently, BM25 scores become unreliable; use consistent splitting.
Fusion returning empty results. Verify both retrievers return at least one result before merging.
Slow vector search. Increase chunk overlap or use approximate nearest neighbor (ANN) indexes for large corpora.
Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

How to Implement Hybrid Search RAG (BM25 + Vector)

What this does

Steps

Verification

Common failures

Related guides