Advanced RAG — Chunking, Retrieval, Re-ranking

13. Query Rewriting

Chapter 13 of 24 · 20 min

KEY INSIGHT

LLM-generated queries often fail to match how content was indexed; rewriting aligns the query language with the retrieval corpus. ### Why Queries Drift A user asking "how does consensus work in Raft" might produce hits for distributed systems documentation, but a retrieval query of "consensus algorithm Raft implementation details" bridges the vocabulary gap between the user's intent and the indexed content. Query rewriting uses an LLM to translate user queries into retrieval-friendly language before the search step. ### Implementation Rewrite the raw query by asking an LLM to reformulate it into 2-3 cleaner retrieval queries. The rewrite should strip conversational filler, expand abbreviations, and optionally include domain-specific terminology. ```python from openai import OpenAI client = OpenAI() def rewrite_query(raw_query: str, model: str = "gpt-4o-mini") -> list[str]: system_prompt = ( "You are a query rewriting assistant. Given a user query, " "produce 2-3 alternative retrieval queries that are optimized " "for semantic search against a technical document corpus. " "Return only the queries, one per line. Be specific and include " "domain terminology." ) response = client.chat.completions.create( model=model, messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": raw_query} ], temperature=0.0, # Deterministic output for retrieval consistency max_tokens=256 ) rewrites = response.choices[0].message.content.strip().split("\n") return [q.strip() for q in rewrites if q.strip()] # Example usage raw = "what happens when postgres runs out of connections" queries = rewrite_query(raw) # ['PostgreSQL connection pool exhaustion handling', # 'PostgreSQL max_connections configuration error', # 'database connection limit exceeded postgres solutions'] ``` ### HyDE: Hypothetical Document Embeddings A more advanced approach is HyDE (Hypothetical Document Embeddings), where an LLM generates a hypothetical answer document, and both the user query and the generated document are embedded for similarity search. This works because the generated document often uses closer vocabulary to actual indexed content. ```python from openai import OpenAI from embeddings import embed_texts # your embedding function client = OpenAI() def hyde_retrieval(query: str, top_k: int = 5) -> list[dict]: # Step 1: Generate a hypothetical answer (not final output) response = client.chat.completions.create( model="gpt-4o-mini", messages=[ {"role": "system", "content": "Generate a concise hypothetical answer to the question. " "Base it on general knowledge. Be factual and specific."}, {"role": "user", "content": query} ], temperature=0.7, max_tokens=512 ) hypothetical_doc = response.choices[0].message.content # Step 2: Embed both query and hypothetical doc query_emb = embed_texts([query])[0] doc_emb = embed_texts([hypothetical_doc])[0] # Step 3: Use the doc embedding for retrieval # (embarrassingly simple but effective trick: average with query) combined_emb = (query_emb + doc_emb) / 2 results = vector_store.similarity_search( embedding=combined_emb, top_k=top_k ) return results ``` ### Failure Modes Rewrite outputs can drift from user intent if the system prompt is ambiguous. Always validate rewrites stay semantically aligned—run cosine similarity between original and rewritten query embeddings as a sanity check. HyDE can hallucinate confident-sounding fictitious content, which distorts retrieval when the embedding model amplifies confident-but-wrong language.

EXERCISE

Implement rewrite_query and compare retrieval results with and without rewriting on a test set of 20 user queries against your corpus. Measure and report the mean average precision difference. (15 min)