What this does

Query expansion decomposes a single user question into multiple sub-queries, retrieves documents for each, then aggregates the results. This improves recall by covering different facets of a complex question.

Steps

Define a sub-query generator. Prompt the LLM to break a question into standalone sub-questions.

from langchain_ollama import ChatOllama
from langchain.prompts import ChatPromptTemplate

llm = ChatOllama(model="llama3.2", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a query decomposition assistant. Break the user's question into 3-5 standalone sub-questions that cover different aspects. Return each on a new line."),
    ("human", "{question}")
])

def generate_sub_queries(question: str) -> list[str]:
    response = llm.invoke(prompt.format(question=question))
    return [q.strip("- ").strip() for q in response.content.split("\n") if q.strip()]

Retrieve documents for each sub-query. Run similarity search per sub-query.

from langchain_community.vectorstores import Chroma

vectorstore = Chroma(...)

def retrieve_for_sub_queries(sub_queries: list[str], k=3) -> list:
    all_docs = []
    for sq in sub_queries:
        docs = vectorstore.similarity_search(sq, k=k)
        all_docs.extend(docs)
    return all_docs

Deduplicate and rerank. Remove duplicates by content hash and optionally rerank by relevance.

def deduplicate(docs):
    seen = set()
    unique = []
    for d in docs:
        h = hash(d.page_content[:100])
        if h not in seen:
            seen.add(h)
            unique.append(d)
    return unique

sub_queries = generate_sub_queries("What were the Q4 financial results and how did product launches affect revenue?")
docs = retrieve_for_sub_queries(sub_queries)
final_docs = deduplicate(docs)

Feed aggregated context into final answer. Use all retrieved docs as context.

context = "\n\n".join(d.page_content for d in final_docs)
answer = llm.invoke(f"Context:\n{context}\n\nQuestion: {question}\nAnswer:")
print(answer.content)

Verification

python -c "
from your_module import generate_sub_queries
sq = generate_sub_queries('How does climate change affect agriculture and water supply?')
print(len(sq))
# Expected: 3-5
for s in sq:
    print(s)
"

Common failures

Sub-queries are too similar. The generator produces near-identical questions. Increase temperature to 0.3 or add diversity instructions.
Context window overflow. Retrieving k=5 for 4 sub-queries yields 20 documents (potentially thousands of tokens). Reduce k or truncate each doc.
Duplicate content dominates. Different sub-queries retrieve the same documents. Always deduplicate before building context.
Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

How to Apply Metadata Filters to Reduce Search Space
How to Build RetrievalQA Chain with Sources