06. Query Rewriting
Query rewriting transforms user queries to better match document vocabulary and structure. Real users ask questions with informal language; documents use precise terminology. Query rewriting bridges this gap.
Why Query Rewriting Is Necessary
Users don't speak the language of your documents. A user might ask "Can I expense my client lunch?" while documents say "Business meal expenditures require pre-approval for amounts exceeding $50 per person." The vocabulary is mismatched even though the semantic content overlaps.
Embedding models handle some vocabulary mismatch through semantic generalization, but they have limits. A financial report embedding doesn't necessarily connect "profit" with "net income" in task-specific ways. Domain jargon often appears in documents but not in training data for general embedding models.
Query Rewriting with LLMs
The most effective approach uses an LLM to reformulate the query. The LLM understands both the user's intent and typical document phrasing:
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain_core.output_queries import RunnableGeneratingOutputParser
QUERY_REWRITE_TEMPLATE = """You are a query reformulation assistant for a RAG system.
The user asked: {query}
Your task is to reformulate this query to match how information is typically written in formal documents.
Consider:
1. Technical terminology the documents might use
2. Complete phrases instead of abbreviations or informal speech
3. Topic-complete queries (what concept is the user actually asking about?)
Generate 2-3 alternative reformulations that are semantically equivalent but phrased differently.
Output each reformulation on a new line.
"""
llm = ChatOpenAI(model="gpt-4o", temperature=0.3)
rewrite_prompt = PromptTemplate.from_template(QUERY_REWRITE_TEMPLATE)
rewrite_chain = rewrite_prompt | llm
def rewrite_query(query):
response = rewrite_chain.invoke({"query": query})
reformulations = [line.strip() for line in response.content.split('\n') if line.strip()]
return reformulations
# Example usage
original = "Can I expense my client dinner?"
reformulations = rewrite_query(original)
# Output might include:
# - "Business meal expenditure policy and pre-approval requirements"
# - "Client dinner expense reimbursement limits"
# - "Business entertainment expense guidelines"
Query Expansion vs. Query Rewriting
Query rewriting and query expansion are related but distinct:
Query rewriting produces a single alternative query that better matches documents. The goal is better retrieval, not necessarily exploring multiple angles.
Query expansion produces multiple queries exploring different aspects of the original question. The goal is to capture breadth when the user's question might map to multiple document sections.
Query rewriting is typically more precise; query expansion is typically higher recall. Use rewriting when the user's question is clear but vocabulary-mismatched. Use expansion when the user's question is broad or ambiguous.
def expand_query(query, llm):
"""Generate multiple queries exploring different aspects."""
expansion_prompt = """Given this user query: {query}
Generate 4-5 different search queries that explore different aspects or interpretations of this question.
Each query should be a complete, self-contained question that someone might ask.
Focus on different angles, clarifications, or related concepts.
Output each query on a new line."""
response = llm.invoke(expansion_prompt.format(query=query))
queries = [line.strip() for line in response.content.split('\n') if line.strip()]
return queries
# Example
original = "What are the benefits?"
expansion = expand_query(original)
# Output might include:
# - "Standard employee benefits package"
# - "Health insurance coverage details"
# - "Retirement and 401k matching policy"
# - "PTO and leave policies"
# - "Professional development benefits"
HyDE: Hypothetical Document Embeddings
HyDE (Hypothetical Document Embeddings) takes a different approach: it generates a hypothetical document that would answer the query, then embeds that (doesn't use it directly) for retrieval.
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
hyde_prompt = """Generate a hypothetical passage that directly answers this question.
The passage should be written as if it were extracted from a relevant document.
Include specific details, terminology, and structure typical of formal documents.
Question: {query}
Hypothetical passage:"""
hyde_chain = hyde_prompt | ChatOpenAI(model="gpt-4o", temperature=0.5) | StrOutputParser()
def hyde_retrieve(query, vectorstore, embedder):
# Generate hypothetical document
hypothetical_doc = hyde_chain.invoke({"query": query})
# Embed the hypothetical (not query) and retrieve
doc_embedding = embedder.embed_documents([hypothetical_doc])[0]
results = vectorstore.similarity_search_by_vector(doc_embedding, k=10)
return results, hypothetical_doc
HyDE works because the generated document contains vocabulary and phrasing similar to real documents. The embedding model then retrieves actual documents that are vector-similar to this generated content.
Collect 10 real user queries from your system. Write query rewrite rules that transform each into document-like language. Run both original and rewritten queries through your retriever and measure top-3 precision improvement.