RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Advanced RAG — Chunking, Retrieval, Re-ranking
  6. /Ch. 13
Advanced RAG — Chunking, Retrieval, Re-ranking

13. Query Rewriting

Chapter 13 of 24 · 20 min
KEY INSIGHT

LLM-generated queries often fail to match how content was indexed; rewriting aligns the query language with the retrieval corpus. ### Why Queries Drift A user asking "how does consensus work in Raft" might produce hits for distributed systems documentation, but a retrieval query of "consensus algorithm Raft implementation details" bridges the vocabulary gap between the user's intent and the indexed content. Query rewriting uses an LLM to translate user queries into retrieval-friendly language before the search step. ### Implementation Rewrite the raw query by asking an LLM to reformulate it into 2-3 cleaner retrieval queries. The rewrite should strip conversational filler, expand abbreviations, and optionally include domain-specific terminology. ```python from openai import OpenAI client = OpenAI() def rewrite_query(raw_query: str, model: str = "gpt-4o-mini") -> list[str]: system_prompt = ( "You are a query rewriting assistant. Given a user query, " "produce 2-3 alternative retrieval queries that are optimized " "for semantic search against a technical document corpus. " "Return only the queries, one per line. Be specific and include " "domain terminology." ) response = client.chat.completions.create( model=model, messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": raw_query} ], temperature=0.0, # Deterministic output for retrieval consistency max_tokens=256 ) rewrites = response.choices[0].message.content.strip().split("\n") return [q.strip() for q in rewrites if q.strip()] # Example usage raw = "what happens when postgres runs out of connections" queries = rewrite_query(raw) # ['PostgreSQL connection pool exhaustion handling', # 'PostgreSQL max_connections configuration error', # 'database connection limit exceeded postgres solutions'] ``` ### HyDE: Hypothetical Document Embeddings A more advanced approach is HyDE (Hypothetical Document Embeddings), where an LLM generates a hypothetical answer document, and both the user query and the generated document are embedded for similarity search. This works because the generated document often uses closer vocabulary to actual indexed content. ```python from openai import OpenAI from embeddings import embed_texts # your embedding function client = OpenAI() def hyde_retrieval(query: str, top_k: int = 5) -> list[dict]: # Step 1: Generate a hypothetical answer (not final output) response = client.chat.completions.create( model="gpt-4o-mini", messages=[ {"role": "system", "content": "Generate a concise hypothetical answer to the question. " "Base it on general knowledge. Be factual and specific."}, {"role": "user", "content": query} ], temperature=0.7, max_tokens=512 ) hypothetical_doc = response.choices[0].message.content # Step 2: Embed both query and hypothetical doc query_emb = embed_texts([query])[0] doc_emb = embed_texts([hypothetical_doc])[0] # Step 3: Use the doc embedding for retrieval # (embarrassingly simple but effective trick: average with query) combined_emb = (query_emb + doc_emb) / 2 results = vector_store.similarity_search( embedding=combined_emb, top_k=top_k ) return results ``` ### Failure Modes Rewrite outputs can drift from user intent if the system prompt is ambiguous. Always validate rewrites stay semantically aligned—run cosine similarity between original and rewritten query embeddings as a sanity check. HyDE can hallucinate confident-sounding fictitious content, which distorts retrieval when the embedding model amplifies confident-but-wrong language.

EXERCISE

Implement rewrite_query and compare retrieval results with and without rewriting on a test set of 20 user queries against your corpus. Measure and report the mean average precision difference. (15 min)

← Chapter 12
Query Classification
Chapter 14 →
Query Decomposition