11. Agentic Retrieval
Agentic retrieval uses an LLM as an agent that reasons about queries, decomposes complex questions, and executes multi-step retrieval chains. Unlike static pipelines, agentic retrieval enables dynamic, self-correcting search strategies.
When Static Pipelines Fail
Static pipelines (query → retrieval → answer) have fixed logic. They can't:
- Recognize when initial retrieval failed to find relevant information
- Decompose multi-hop questions that require chaining multiple searches
- Adjust strategy mid-retrieval based on partial results
- Ask clarifying questions when queries are ambiguous
Agentic retrieval addresses these by putting the LLM in control of the retrieval process.
The ReAct Pattern
ReAct (Reasoning + Acting) interleaves reasoning traces with action executions:
Thought: I need to find information about X
Action: retrieve(query=X)
Observation: Retrieved 5 documents
Thought: Document 3 mentions Y, I need more details about Y
Action: retrieve(query=Y specifically)
Observation: Retrieved additional documents
Thought: Now I have enough information to answer the original question
Final Answer: ...
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.agents import AgentExecutor, create_openai_functions_agent
from langchain_core.tools import tool
@tool
def retrieve_documents(query: str, k: int = 5) -> str:
"""Retrieve relevant documents from the knowledge base.
Args:
query: Search query
k: Number of documents to retrieve (default 5)
Returns:
Retrieved document contents as a string
"""
results = vectorstore.similarity_search(query, k=k)
return "\n---\n".join([f"[Document {i+1}]:\n{doc.page_content}"
for i, doc in enumerate(results)])
@tool
def rewrite_query(query: str) -> str:
"""Rewrite query to better match document vocabulary."""
# Implementation from Chapter 6
...
class ReActRetriever:
def __init__(self, tools, llm):
self.tools = tools
self.llm = llm
prompt = PromptTemplate.from_template("""
You are a research assistant. Your goal is to answer user questions by
retrieving information from the knowledge base.
You have access to these tools:
{tools}
Question: {input}
Follow this format:
Thought: [what you're thinking about next]
Action: [tool name]
Action Input: [input to the tool]
Observation: [result from the tool]
... (repeat Thought/Action/Observation as needed)
Final Answer: [your final answer]""")
agent = create_openai_functions_agent(llm, self.tools, prompt)
self.executor = AgentExecutor(agent=agent, tools=self.tools, verbose=True)
def retrieve(self, query, max_steps=10):
"""
Agentic retrieval with self-correction.
Args:
query: User question
max_steps: Maximum retrieval steps before forcing answer
"""
try:
result = self.executor.invoke(
{"input": query},
{"max_iterations": max_steps}
)
return {
'answer': result['output'],
'steps': result.get('steps', []),
'retrieval_count': count_retrieval_calls(result)
}
except Exception as e:
return {
'answer': f"Error during retrieval: {str(e)}",
'steps': [],
'retrieval_count': 0
}
Multi-Hop Retrieval
Multi-hop questions require chaining multiple retrievals where each step depends on previous results:
def multi_hop_agent(query, vectorstore, llm):
"""
Multi-hop retrieval that chains queries based on intermediate results.
"""
# Parse the question to identify hops
hop_plan = llm.invoke(f"""Analyze this question and identify the retrieval hops needed.
Each hop should be answerable by retrieving a single document or set of documents.
Question: {query}
Breakdown:""")
# Parse planned hops
hops = parse_hops(hop_plan.content)
context = ""
hop_results = []
for i, hop in enumerate(hops):
# Substitute context from previous hops into current query
current_query = substitute_context(hop['query'], context)
# Execute retrieval
docs = vectorstore.similarity_search(current_query, k=5)
hopped_context = format_documents(docs)
context += f"\n\n[Hop {i+1}: {hop['topic']}]\n{hopped_context}"
hop_results.append({
'hop': i+1,
'query': current_query,
'results': docs
})
return {
'context': context,
'hop_results': hop_results
}
# Example multi-hop question
# "Who approved the contract with the vendor and what was the total value?"
# Hop 1: Find the vendor contract
# Hop 2: Identify who approved it, extract the approval
# Hop 3: Extract contract value
Self-Correction Loop
Agentic retrieval can detect and correct failures:
def self_correcting_retrieval(query, vectorstore, llm, max_attempts=3):
"""
Retrieval with automatic self-correction.
"""
attempt = 0
all_retrieved = []
while attempt < max_attempts:
attempt += 1
# Current state
current_query = query if attempt == 1 else modified_query
# Retrieve
results = vectorstore.similarity_search(current_query, k=10)
new_docs = [doc for doc in results if doc not in all_retrieved]
all_retrieved.extend(new_docs)
# Check if retrieval is sufficient
sufficiency_check = llm.invoke(f"""
Given these retrieved documents:
{format_documents(results)}
And the original question: {query}
Is this sufficient to answer the question? If not, what information is missing?
What new query would retrieve the missing information?
Answer format:
Sufficient: YES/NO
Missing information: [description or N/A]
New query: [query or N/A]
""")
if "Sufficient: YES" in sufficiency_check.content:
return {
'documents': all_retrieved,
'attempts': attempt,
'sufficient': True
}
# Parse new query for retry
modified_query = extract_new_query(sufficiency_check.content)
if not modified_query:
break
return {
'documents': all_retrieved,
'attempts': attempt,
'sufficient': False
}
Tool Use Efficiency
Agentic retrieval can be expensive because the agent may make many retrieval calls. Optimize by:
Caching retrieval results: Store embeddings and BM25 scores for common sub-queries.
Parallel retrieval: When multiple independent hops are identified, execute them concurrently.
from concurrent.futures import ThreadPoolExecutor
def parallel_retrieval(queries, vectorstore):
"""Execute multiple retrieval queries in parallel."""
with ThreadPoolExecutor(max_workers=4) as executor:
futures = {
executor.submit(vectorstore.similarity_search, q, k=5): q
for q in queries
}
results = {}
for future in futures:
query = futures[future]
try:
results[query] = future.result()
except Exception as e:
results[query] = []
log_error(f"Retrieval failed for {query}: {e}")
return results
Early stopping: If confidence is high after a few retrieval steps, stop early rather than continuing to max_steps.
Implement a simple ReAct agent that can handle a multi-hop question like "What was the total budget for the project approved by Dr. Smith, and what percentage of the IT budget does it represent?" Track the number of retrieval steps and evaluate whether final answers are accurate compared to explicit ground truth. This concludes Part 1 (Chapters 1-11). Part 2 (Chapters 12-22) continues with advanced agentic patterns, evaluation metrics, optimization strategies, and production deployment considerations.