RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to Build Query Expansion with Sub-Queries
HOW-TO · RAG

How to Build Query Expansion with Sub-Queries

advanced·25 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

RAG pipeline, LLM for sub-query generation, Python 3.10+

What this does

Query expansion decomposes a single user question into multiple sub-queries, retrieves documents for each, then aggregates the results. This improves recall by covering different facets of a complex question.

Steps

  • Define a sub-query generator. Prompt the LLM to break a question into standalone sub-questions.
from langchain_ollama import ChatOllama
from langchain.prompts import ChatPromptTemplate

llm = ChatOllama(model="llama3.2", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a query decomposition assistant. Break the user's question into 3-5 standalone sub-questions that cover different aspects. Return each on a new line."),
    ("human", "{question}")
])

def generate_sub_queries(question: str) -> list[str]:
    response = llm.invoke(prompt.format(question=question))
    return [q.strip("- ").strip() for q in response.content.split("\n") if q.strip()]
  • Retrieve documents for each sub-query. Run similarity search per sub-query.
from langchain_community.vectorstores import Chroma

vectorstore = Chroma(...)

def retrieve_for_sub_queries(sub_queries: list[str], k=3) -> list:
    all_docs = []
    for sq in sub_queries:
        docs = vectorstore.similarity_search(sq, k=k)
        all_docs.extend(docs)
    return all_docs
  • Deduplicate and rerank. Remove duplicates by content hash and optionally rerank by relevance.
def deduplicate(docs):
    seen = set()
    unique = []
    for d in docs:
        h = hash(d.page_content[:100])
        if h not in seen:
            seen.add(h)
            unique.append(d)
    return unique

sub_queries = generate_sub_queries("What were the Q4 financial results and how did product launches affect revenue?")
docs = retrieve_for_sub_queries(sub_queries)
final_docs = deduplicate(docs)
  • Feed aggregated context into final answer. Use all retrieved docs as context.
context = "\n\n".join(d.page_content for d in final_docs)
answer = llm.invoke(f"Context:\n{context}\n\nQuestion: {question}\nAnswer:")
print(answer.content)

Verification

python -c "
from your_module import generate_sub_queries
sq = generate_sub_queries('How does climate change affect agriculture and water supply?')
print(len(sq))
# Expected: 3-5
for s in sq:
    print(s)
"

Common failures

  • Sub-queries are too similar. The generator produces near-identical questions. Increase temperature to 0.3 or add diversity instructions.
  • Context window overflow. Retrieving k=5 for 4 sub-queries yields 20 documents (potentially thousands of tokens). Reduce k or truncate each doc.
  • Duplicate content dominates. Different sub-queries retrieve the same documents. Always deduplicate before building context.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • How to Apply Metadata Filters to Reduce Search Space
  • How to Build RetrievalQA Chain with Sources
← All how-to guidesCourses →