04. Multi-Strategy Retrieval
Production retrieval rarely relies on a single method. Different query types demand different strategies, and combining approaches improves recall.
Query classification determines which retrieval paths to activate. Transactional queries ("what is X") need precise definitions. Exploratory queries ("how does Y work") benefit from broad context. Factual queries ("when did Z happen") require exact matches.
Strategy routing assigns queries to retrieval methods:
- Dense retrieval handles semantic similarity and paraphrases
- Sparse retrieval handles exact terminology and technical terms
- Keyword matching handles product names, dates, identifiers
from enum import Enum
from typing import List, Callable
class QueryType(Enum):
SEMANTIC = "semantic" # Conceptual, how/why questions
FACTUAL = "factual" # Specific data, dates, names
DEFINITIONAL = "definitional" # What is X, definitions
PROCEDURAL = "procedural" # How to, steps, tutorials
class MultiStrategyRetriever:
def __init__(self, retrievers: dict, classifier: Callable):
self.retrievers = retrievers # {strategy_name: retriever}
self.classifier = classifier
def retrieve(self, query: str, top_k: int = 10) -> List[dict]:
query_type = self.classifier(query)
# Activate relevant strategies
strategies = self._get_strategies_for_type(query_type)
results = []
for strategy in strategies:
retrieved = self.retrievers[strategy].search(query, top_k * 2)
results.extend(retrieved)
# Deduplicate and return combined results
return self._deduplicate_and_rank(results, top_k)
def _get_strategies_for_type(self, query_type: QueryType) -> List[str]:
strategy_map = {
QueryType.SEMANTIC: ["dense", "sparse"],
QueryType.FACTUAL: ["sparse", "dense"],
QueryType.DEFINITIONAL: ["dense"],
QueryType.PROCEDURAL: ["dense", "sparse", "keyword"]
}
return strategy_map.get(query_type, ["dense"])
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Build a simple keyword-based classifier using regex patterns for 3 query types. Test on 50 queries and report precision per type.