HOW-TO · RAG
How to Build Query Expansion with Sub-Queries
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES
RAG pipeline, LLM for sub-query generation, Python 3.10+
What this does
Query expansion decomposes a single user question into multiple sub-queries, retrieves documents for each, then aggregates the results. This improves recall by covering different facets of a complex question.
Steps
- Define a sub-query generator. Prompt the LLM to break a question into standalone sub-questions.
from langchain_ollama import ChatOllama
from langchain.prompts import ChatPromptTemplate
llm = ChatOllama(model="llama3.2", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a query decomposition assistant. Break the user's question into 3-5 standalone sub-questions that cover different aspects. Return each on a new line."),
("human", "{question}")
])
def generate_sub_queries(question: str) -> list[str]:
response = llm.invoke(prompt.format(question=question))
return [q.strip("- ").strip() for q in response.content.split("\n") if q.strip()]
- Retrieve documents for each sub-query. Run similarity search per sub-query.
from langchain_community.vectorstores import Chroma
vectorstore = Chroma(...)
def retrieve_for_sub_queries(sub_queries: list[str], k=3) -> list:
all_docs = []
for sq in sub_queries:
docs = vectorstore.similarity_search(sq, k=k)
all_docs.extend(docs)
return all_docs
- Deduplicate and rerank. Remove duplicates by content hash and optionally rerank by relevance.
def deduplicate(docs):
seen = set()
unique = []
for d in docs:
h = hash(d.page_content[:100])
if h not in seen:
seen.add(h)
unique.append(d)
return unique
sub_queries = generate_sub_queries("What were the Q4 financial results and how did product launches affect revenue?")
docs = retrieve_for_sub_queries(sub_queries)
final_docs = deduplicate(docs)
- Feed aggregated context into final answer. Use all retrieved docs as context.
context = "\n\n".join(d.page_content for d in final_docs)
answer = llm.invoke(f"Context:\n{context}\n\nQuestion: {question}\nAnswer:")
print(answer.content)
Verification
python -c "
from your_module import generate_sub_queries
sq = generate_sub_queries('How does climate change affect agriculture and water supply?')
print(len(sq))
# Expected: 3-5
for s in sq:
print(s)
"
Common failures
- Sub-queries are too similar. The generator produces near-identical questions. Increase temperature to 0.3 or add diversity instructions.
- Context window overflow. Retrieving k=5 for 4 sub-queries yields 20 documents (potentially thousands of tokens). Reduce k or truncate each doc.
- Duplicate content dominates. Different sub-queries retrieve the same documents. Always deduplicate before building context.
- Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
- Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.
Related guides
- How to Apply Metadata Filters to Reduce Search Space
- How to Build RetrievalQA Chain with Sources