What this does

Short or vague queries often retrieve too few relevant documents because the exact terms do not appear in the corpus. Query expansion uses an LLM to generate related sub-queries, synonyms, or rephrasings, then merges the results. This broadens the retrieval surface and surfaces documents that would otherwise be missed by exact-match systems.

Steps

Import required modules. Set up the LLM and vector store.

import os
os.environ["OLLAMA_BASE_URL"] = "http://localhost:11434"

from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader

Build the vector store from documents.

docs = TextLoader("context/technical_docs.txt").load()
chunks = RecursiveCharacterTextSplitter(chunk_size=500).split_documents(docs)
embeddings = OllamaEmbeddings(model="llama3")
db = Chroma.from_documents(chunks, embeddings)

Define a query expansion prompt. Instruct the LLM to generate alternative phrasings.

from langchain.prompts import PromptTemplate

expansion_prompt = PromptTemplate.from_template(
    """Given the user query, generate 3 alternative phrasings that cover different aspects.
Original query: {query}
Alternative phrasings (one per line):"""
)

Generate expanded queries and retrieve. Run the LLM to produce variants, then retrieve for each.

llm = ChatOllama(model="llama3")
original = "How does indexing affect query speed?"

response = llm.invoke(expansion_prompt.format(query=original))
variants = [line.strip() for line in response.content.split("\n") if line.strip()]
all_results = {}
for variant in [original] + variants:
    docs = db.similarity_search(variant, k=5)
    for doc in docs:
        all_results[doc.page_content] = doc
merged = list(all_results.values())
print(f"Retrieved {len(merged)} unique chunks from {len(variants)} queries")

Expected output: a merged list of unique documents retrieved across all query variants.

Feed the expanded context to the LLM. The combined documents provide broader coverage.

context = "\n\n".join([d.page_content for d in merged[:5]])
answer = llm.invoke(f"Context:\n{context}\n\nQuestion: {original}")
print(answer.content)

Verification

python -c "
from langchain_ollama import ChatOllama
import os
os.environ['OLLAMA_BASE_URL'] = 'http://localhost:11434'
llm = ChatOllama(model='llama3')
result = llm.invoke('Generate one synonym for the word retrieval')
print(len(result.content) > 0)
# Expected: True
"

Common failures

Expansion generating irrelevant variants. Constrain the prompt to produce semantically related rephrasings rather than unrelated questions.
Too many variants causing latency. Limit to 3-5 expansions; excessive variants slow retrieval and increase context length.
Duplicate results inflating the merged set. Deduplicate by content hash before passing context to the LLM.
Duplicate content overwhelming the context window. Use a reranker after merging to select the most diverse set of documents.
Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

How to Add Query Expansion to Improve Recall

What this does

Steps

Verification

Common failures

Related guides