How to Implement Hybrid Search RAG (BM25 + Vector)
RAG pipeline running, BM25 and vector store libraries installed
What this does
Hybrid search combines keyword-based BM25 retrieval with dense vector similarity. The combination covers both lexical matches and semantic meaning, improving recall on queries where exact terms matter as much as intent. This guide shows how to merge both retrieval modes into a single retriever using LangChain and rank fusion.
Steps
Import and tokenize documents. BM25 relies on tokenization.
import os os.environ["OLLAMA_BASE_URL"] = "http://localhost:11434" from langchain_community.document_loaders import TextLoader from langchain.text_splitter import RecursiveCharacterTextSplitter loader = TextLoader("context/docs.txt") docs = loader.load() splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) chunks = splitter.split_documents(docs) texts = [c.page_content for c in chunks]Build a BM25 retriever. The
BM25Okapiclass scores documents by term frequency.from rank_bm25 import BM25Okapi import nltk nltk.download("punkt") nltk.download("punkt_tab") tokenized_corpus = [text.split() for text in texts] bm25 = BM25Okapi(tokenized_corpus)Build a vector retriever. Use Ollama-backed embeddings.
from langchain_ollama import OllamaEmbeddings from langchain_community.vectorstores import Chroma embeddings = OllamaEmbeddings(model="llama3") vector_db = Chroma.from_texts(texts, embeddings)Combine with reciprocal rank fusion. A fused score merges both rankings.
def rrf_fusion(results_list, k=60): fused = {} for results in results_list: for rank, doc in enumerate(results): doc_id = doc["doc_id"] fused[doc_id] = fused.get(doc_id, 0) + 1 / (k + rank + 1) return sorted(fused.items(), key=lambda x: x[1], reverse=True) bm25_scores = [{"doc_id": i, "content": texts[i]} for i in range(len(texts))] # Simulate BM25 ranking by scoring query tokens query_tokens = "retrieval pipeline".split() bm25_scores = sorted(bm25_scores, key=lambda d: bm25.get_scores(query_tokens)[texts.index(d["content"])], reverse=True)Query through the hybrid retriever. Pass the query to both retrievers and fuse results.
query_embedding = embeddings.embed_query("retrieval pipeline") vector_results = vector_db.similarity_search_by_vector(query_embedding, k=5) # Combine and rank fusion result print("Retrieved", len(vector_results), "vector results")Expected output: a merged list of relevant chunks from both retrieval methods.
Verification
python -c "
from rank_bm25 import BM25Okapi
corpus = ['apple fruit', 'banana yellow', 'cherry red']
bm25 = BM25Okapi([c.split() for c in corpus])
scores = bm25.get_scores(['fruit'])
print(max(scores) > 0)
# Expected: True
"
Common failures
- Missing NLTK data. Download
punktandpunkt_tabbefore tokenizing, or BM25 raises errors. - Mismatched tokenization. If chunks were split differently, BM25 scores become unreliable; use consistent splitting.
- Fusion returning empty results. Verify both retrievers return at least one result before merging.
- Slow vector search. Increase chunk overlap or use approximate nearest neighbor (ANN) indexes for large corpora.
- Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
- Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.