HOW-TO · RAG

How to Implement Hybrid Search (Keyword + Semantic)

intermediate25 minBy Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Vector store with hybrid search support, embeddings indexed

What this does

Hybrid search combines keyword-based retrieval (BM25) with semantic vector search in a single unified result set. Pure keyword search misses synonyms and paraphrases; pure semantic search misses exact terminology and acronyms. Hybrid search captures both signals and merges them with a configurable weighting scheme, significantly improving recall across diverse query types.

Steps

  1. Configure your schema to store both the raw text and the embedding vector in the same document record. Ensure the text field has BM25 indexing enabled.
{
  "fields": [
    {"name": "content", "type": "text"},
    {"name": "embedding", "type": "vector", "dimension": 384}
  ]
}
  1. Build the index for both keyword and vector fields. Most systems do this in parallel during ingestion.

  2. Query with both retrievers in a single request, specifying the relative weight of each signal:

curl -X POST "http://localhost:6333/collections/articles/search" \
  -H "Content-Type: application/json" \
  -d '{
    "vector": {"name": "embedding", "vector": [0.1, ...]},
    "top": 20,
    "with_payload": true,
    "query": "content:search algorithm",
    "score_threshold": 0.3,
    "params": {"hnsw_ef": 128}
  }'
  1. Tune the fusion weight. Start with a 50/50 split between semantic and keyword scores. Shift toward keyword (70/30) for technical corpora where exact terminology dominates; shift toward semantic (30/70) for conversational or conceptual queries.

  2. Normalize scores before combining them, since BM25 and vector similarity scores operate on different scales. Reciprocal Rank Fusion (RRF) is the most robust fusion method:

RRF_score = 1/(k + rank_in_list)

Verification

Test with a mixed query containing both technical terms and paraphrased concepts:

python test_hybrid.py --query "vector embedding similarity measurement technique" \
  --weight-semantic 0.6 --weight-keyword 0.4

Expected output: The top results should include documents matching on exact terms (measurement, technique) and documents matching on semantic similarity (vector, embedding, similarity). Both relevant document types should appear in the top 10. Compare hit rate against pure vector search to confirm hybrid outperforms both single-mode searches.

Common failures

  • Score scale mismatch: BM25 and cosine similarity scores are not comparable without normalization, causing one signal to dominate. Use RRF or min-max normalized scores before fusion.
  • Missing keyword index: Forgetting to enable BM25 on the text field silently falls back to pure vector search. Verify the index exists.
  • Incorrect weight tuning: Applying a single weight across all queries ignores query-type variance. Log query characteristics and adapt weights per query type when possible.
  • Embedding and keyword fields out of sync: If documents are updated without re-indexing both fields, results contain stale keyword or vector data.
  • Over-fusion of irrelevant results: High keyword scores can drown semantic relevance for rare but important terms. Apply a minimum semantic score threshold alongside the fusion.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • optimize-vector-search-query-performance
  • use-query-rewriting-better-recall