RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /RAG Systems: Part 2
  6. /Ch. 6
RAG Systems: Part 2

06. Query Rewriting

Chapter 6 of 22 · 25 min
KEY INSIGHT

Query rewriting addresses vocabulary mismatch by transforming user queries into document-like language, improving retrieval precision without sacrificing recall.

Query rewriting transforms user queries to better match document vocabulary and structure. Real users ask questions with informal language; documents use precise terminology. Query rewriting bridges this gap.

Why Query Rewriting Is Necessary

Users don't speak the language of your documents. A user might ask "Can I expense my client lunch?" while documents say "Business meal expenditures require pre-approval for amounts exceeding $50 per person." The vocabulary is mismatched even though the semantic content overlaps.

Embedding models handle some vocabulary mismatch through semantic generalization, but they have limits. A financial report embedding doesn't necessarily connect "profit" with "net income" in task-specific ways. Domain jargon often appears in documents but not in training data for general embedding models.

Query Rewriting with LLMs

The most effective approach uses an LLM to reformulate the query. The LLM understands both the user's intent and typical document phrasing:

from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain_core.output_queries import RunnableGeneratingOutputParser

QUERY_REWRITE_TEMPLATE = """You are a query reformulation assistant for a RAG system.
The user asked: {query}

Your task is to reformulate this query to match how information is typically written in formal documents.
Consider:
1. Technical terminology the documents might use
2. Complete phrases instead of abbreviations or informal speech
3. Topic-complete queries (what concept is the user actually asking about?)

Generate 2-3 alternative reformulations that are semantically equivalent but phrased differently.
Output each reformulation on a new line.
"""

llm = ChatOpenAI(model="gpt-4o", temperature=0.3)
rewrite_prompt = PromptTemplate.from_template(QUERY_REWRITE_TEMPLATE)
rewrite_chain = rewrite_prompt | llm

def rewrite_query(query):
    response = rewrite_chain.invoke({"query": query})
    reformulations = [line.strip() for line in response.content.split('\n') if line.strip()]
    return reformulations

# Example usage
original = "Can I expense my client dinner?"
reformulations = rewrite_query(original)
# Output might include:
# - "Business meal expenditure policy and pre-approval requirements"
# - "Client dinner expense reimbursement limits"
# - "Business entertainment expense guidelines"

Query Expansion vs. Query Rewriting

Query rewriting and query expansion are related but distinct:

Query rewriting produces a single alternative query that better matches documents. The goal is better retrieval, not necessarily exploring multiple angles.

Query expansion produces multiple queries exploring different aspects of the original question. The goal is to capture breadth when the user's question might map to multiple document sections.

Query rewriting is typically more precise; query expansion is typically higher recall. Use rewriting when the user's question is clear but vocabulary-mismatched. Use expansion when the user's question is broad or ambiguous.

def expand_query(query, llm):
    """Generate multiple queries exploring different aspects."""
    expansion_prompt = """Given this user query: {query}
    
    Generate 4-5 different search queries that explore different aspects or interpretations of this question.
    Each query should be a complete, self-contained question that someone might ask.
    Focus on different angles, clarifications, or related concepts.
    
    Output each query on a new line."""
    
    response = llm.invoke(expansion_prompt.format(query=query))
    queries = [line.strip() for line in response.content.split('\n') if line.strip()]
    return queries

# Example
original = "What are the benefits?"
expansion = expand_query(original)
# Output might include:
# - "Standard employee benefits package"
# - "Health insurance coverage details"
# - "Retirement and 401k matching policy"
# - "PTO and leave policies"
# - "Professional development benefits"

HyDE: Hypothetical Document Embeddings

HyDE (Hypothetical Document Embeddings) takes a different approach: it generates a hypothetical document that would answer the query, then embeds that (doesn't use it directly) for retrieval.

from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

hyde_prompt = """Generate a hypothetical passage that directly answers this question.
The passage should be written as if it were extracted from a relevant document.
Include specific details, terminology, and structure typical of formal documents.

Question: {query}

Hypothetical passage:"""

hyde_chain = hyde_prompt | ChatOpenAI(model="gpt-4o", temperature=0.5) | StrOutputParser()

def hyde_retrieve(query, vectorstore, embedder):
    # Generate hypothetical document
    hypothetical_doc = hyde_chain.invoke({"query": query})
    
    # Embed the hypothetical (not query) and retrieve
    doc_embedding = embedder.embed_documents([hypothetical_doc])[0]
    results = vectorstore.similarity_search_by_vector(doc_embedding, k=10)
    
    return results, hypothetical_doc

HyDE works because the generated document contains vocabulary and phrasing similar to real documents. The embedding model then retrieves actual documents that are vector-similar to this generated content.

EXERCISE

Collect 10 real user queries from your system. Write query rewrite rules that transform each into document-like language. Run both original and rewritten queries through your retriever and measure top-3 precision improvement.

← Chapter 5
Reranking Pipeline
Chapter 7 →
Query Expansion