Query Rewriting — RAG Systems: Part 2 (Chapter 6)

Query rewriting transforms user queries to better match document vocabulary and structure. Real users ask questions with informal language; documents use precise terminology. Query rewriting bridges this gap.

Why Query Rewriting Is Necessary

Users don't speak the language of your documents. A user might ask "Can I expense my client lunch?" while documents say "Business meal expenditures require pre-approval for amounts exceeding $50 per person." The vocabulary is mismatched even though the semantic content overlaps.

Embedding models handle some vocabulary mismatch through semantic generalization, but they have limits. A financial report embedding doesn't necessarily connect "profit" with "net income" in task-specific ways. Domain jargon often appears in documents but not in training data for general embedding models.

Query Rewriting with LLMs

The most effective approach uses an LLM to reformulate the query. The LLM understands both the user's intent and typical document phrasing:

from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain_core.output_queries import RunnableGeneratingOutputParser

QUERY_REWRITE_TEMPLATE = """You are a query reformulation assistant for a RAG system.
The user asked: {query}

Your task is to reformulate this query to match how information is typically written in formal documents.
Consider:
1. Technical terminology the documents might use
2. Complete phrases instead of abbreviations or informal speech
3. Topic-complete queries (what concept is the user actually asking about?)

Generate 2-3 alternative reformulations that are semantically equivalent but phrased differently.
Output each reformulation on a new line.
"""

llm = ChatOpenAI(model="gpt-4o", temperature=0.3)
rewrite_prompt = PromptTemplate.from_template(QUERY_REWRITE_TEMPLATE)
rewrite_chain = rewrite_prompt | llm

def rewrite_query(query):
    response = rewrite_chain.invoke({"query": query})
    reformulations = [line.strip() for line in response.content.split('\n') if line.strip()]
    return reformulations

# Example usage
original = "Can I expense my client dinner?"
reformulations = rewrite_query(original)
# Output might include:
# - "Business meal expenditure policy and pre-approval requirements"
# - "Client dinner expense reimbursement limits"
# - "Business entertainment expense guidelines"

Query Expansion vs. Query Rewriting

Query rewriting and query expansion are related but distinct:

Query rewriting produces a single alternative query that better matches documents. The goal is better retrieval, not necessarily exploring multiple angles.

Query expansion produces multiple queries exploring different aspects of the original question. The goal is to capture breadth when the user's question might map to multiple document sections.

Query rewriting is typically more precise; query expansion is typically higher recall. Use rewriting when the user's question is clear but vocabulary-mismatched. Use expansion when the user's question is broad or ambiguous.

def expand_query(query, llm):
    """Generate multiple queries exploring different aspects."""
    expansion_prompt = """Given this user query: {query}
    
    Generate 4-5 different search queries that explore different aspects or interpretations of this question.
    Each query should be a complete, self-contained question that someone might ask.
    Focus on different angles, clarifications, or related concepts.
    
    Output each query on a new line."""
    
    response = llm.invoke(expansion_prompt.format(query=query))
    queries = [line.strip() for line in response.content.split('\n') if line.strip()]
    return queries

# Example
original = "What are the benefits?"
expansion = expand_query(original)
# Output might include:
# - "Standard employee benefits package"
# - "Health insurance coverage details"
# - "Retirement and 401k matching policy"
# - "PTO and leave policies"
# - "Professional development benefits"

HyDE: Hypothetical Document Embeddings

HyDE (Hypothetical Document Embeddings) takes a different approach: it generates a hypothetical document that would answer the query, then embeds that (doesn't use it directly) for retrieval.

from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

hyde_prompt = """Generate a hypothetical passage that directly answers this question.
The passage should be written as if it were extracted from a relevant document.
Include specific details, terminology, and structure typical of formal documents.

Question: {query}

Hypothetical passage:"""

hyde_chain = hyde_prompt | ChatOpenAI(model="gpt-4o", temperature=0.5) | StrOutputParser()

def hyde_retrieve(query, vectorstore, embedder):
    # Generate hypothetical document
    hypothetical_doc = hyde_chain.invoke({"query": query})
    
    # Embed the hypothetical (not query) and retrieve
    doc_embedding = embedder.embed_documents([hypothetical_doc])[0]
    results = vectorstore.similarity_search_by_vector(doc_embedding, k=10)
    
    return results, hypothetical_doc

HyDE works because the generated document contains vocabulary and phrasing similar to real documents. The embedding model then retrieves actual documents that are vector-similar to this generated content.