RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Advanced RAG — Chunking, Retrieval, Re-ranking
  6. /Ch. 14
Advanced RAG — Chunking, Retrieval, Re-ranking

14. Query Decomposition

Chapter 14 of 24 · 20 min
KEY INSIGHT

Complex multi-constraint queries retrieve poorly unless broken into sub-queries whose results are merged. ### The Multi-Constraint Problem A query like "compare the latency, throughput, and fault tolerance of Kafka versus RabbitMQ for stream processing" asks for a multi-dimensional comparison. A single embedding query against a document corpus rarely returns all relevant passages across these three concerns. Query decomposition breaks this into targeted sub-queries that separately retrieve each dimension, then merges results with weighted scoring. ### Sub-Query Generation Decomposition uses an LLM to identify semantic "facets" of a complex query and produce targeted sub-queries. Each sub-query retrieves independently. ```python from openai import OpenAI client = OpenAI() def decompose_query(query: str) -> list[str]: system_prompt = ( "Given a complex query, decompose it into 3-6 independent " "sub-queries that each retrieve a distinct piece of information. " "Each sub-query should be self-contained and retrieval-friendly. " "Return one sub-query per line." ) response = client.chat.completions.create( model="gpt-4o-mini", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": query} ], temperature=0.0, max_tokens=256 ) sub_queries = response.choices[0].message.content.strip().split("\n") return [q.strip() for q in sub_queries if q.strip()] def retrieveDecomposed(query: str, top_k_per: int = 5) -> list[dict]: sub_queries = decompose_query(query) all_chunks = {} for sq in sub_queries: results = vector_store.similarity_search(sq, top_k=top_k_per) for i, chunk in enumerate(results): # Score by position within each sub-query result set chunk_id = chunk["id"] base_score = (top_k_per - i) / top_k_per weight = 1.0 / len(sub_queries) # Equal weight across facets score = base_score * weight if chunk_id in all_chunks: all_chunks[chunk_id]["score"] += score all_chunks[chunk_id]["facets"].add(sq) else: all_chunks[chunk_id] = { "chunk": chunk, "score": score, "facets": {sq} } # Sort by aggregated score ranked = sorted(all_chunks.values(), key=lambda x: x["score"], reverse=True) return ranked ``` ### Facet Coverage Verification A key advantage of decomposition is accountability: you can verify which facets retrieved results and which did not. ```python def reportFacetCoverage(query: str, top_k_per: int = 5) -> dict: sub_queries = decompose_query(query) coverage = {} for sq in sub_queries: results = vector_store.similarity_search(sq, top_k=top_k_per) coverage[sq] = { "retrieved": len(results), "top_score": results[0]["score"] if results else 0.0 } missing_facets = [f for f, v in coverage.items() if v["retrieved"] == 0] if missing_facets: print(f"WARNING: No facets retrieved for: {missing_facets}") return coverage ``` ### Failure Modes Decomposition can over-segment when the LLM generates sub-queries that are too granular, retrieving nothing for empty facets. It can under-segment when different facets overlap heavily in a corpus, causing redundant retrieval. Tune the number of sub-queries based on query complexity analysis—measure average retrieved chunk overlap as a diagnostic.

EXERCISE

Implement decomposition on 10 multi-constraint queries. Measure the Jaccard overlap of retrieved chunks between the original query and the decomposed approach. Report precision and recall delta. (15 min)

← Chapter 13
Query Rewriting
Chapter 15 →
Context Optimization