HOW-TO · RAG
How to Debug Agent Reasoning and Tool Selection
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES
Agent with tool use, verbose logging enabled, Python 3.10+
What this does
Debugging agent reasoning involves inspecting the LLM's chain-of-thought, why it selected specific tools, and how it interpreted tool results. This helps fix incorrect tool choices, loops, and hallucinated arguments.
Steps
- Enable chain-of-thought logging from the LLM. Set temperature to 0 and log raw responses.
import json
def debug_llm_response(messages: list, response) -> dict:
return {
"prompt_tokens": response.usage.prompt_tokens,
"completion_tokens": response.usage.completion_tokens,
"finish_reason": response.choices[0].finish_reason,
"tool_calls": [
{
"name": tc.function.name,
"args": tc.function.arguments,
"id": tc.id
}
for tc in (response.choices[0].message.tool_calls or [])
],
"content": response.choices[0].message.content
}
- Log the full message history at each turn. Capture what the agent sees.
def log_state(messages: list, turn: int):
log.info(f"=== Turn {turn} ===")
for i, msg in enumerate(messages):
role = msg["role"]
content_preview = str(msg.get("content", ""))[:200]
tool_calls = msg.get("tool_calls")
log.info(f" [{i}] {role}: {content_preview}")
if tool_calls:
for tc in tool_calls:
log.info(f" -> Tool: {tc.function.name}({tc.function.arguments})")
- Create a decision audit record. Track every tool choice with context.
class DecisionAudit:
def __init__(self):
self.entries = []
def record(self, turn: int, tool_name: str, args: dict, reason: str, result: dict):
self.entries.append({
"turn": turn,
"tool": tool_name,
"args": args,
"reason": reason,
"result_success": "error" not in result,
"result_preview": str(result)[:100]
})
def replay(self):
for e in self.entries:
print(f"Turn {e['turn']}: {e['tool']} → {'OK' if e['result_success'] else 'FAIL'}")
- Simulate with fixed inputs. Reproduce issues by replaying the same prompt.
def replay_agent(history: list, tool_calls_to_override: dict = None):
"""Replay an agent session with optional tool result overrides."""
messages = history.copy()
for turn in range(5):
response = llm.invoke(messages)
if not response.tool_calls:
return response.content
for tc in response.tool_calls:
if tool_calls_to_override and tc.function.name in tool_calls_to_override:
result = tool_calls_to_override[tc.function.name]
else:
result = execute_tool(tc.function.name, json.loads(tc.function.arguments))
messages.append({"role": "tool", "tool_call_id": tc.id, "content": json.dumps(result)})
return "Max turns"
- Compare tool selections between models. Run the same prompt on different models and diff the outputs.
def compare_models(prompt: str, models: list[str]):
results = {}
for model in models:
llm = ChatOllama(model=model, temperature=0)
response = llm.invoke(prompt)
results[model] = {
"tool_calls": [tc.function.name for tc in (response.tool_calls or [])],
"finish_reason": response.finish_reason
}
return results
Verification
python -c "
audit = {'entries': []}
for i in range(3):
audit['entries'].append({'turn': i, 'tool': 'search_web' if i % 2 == 0 else 'calculate'})
print(len(audit['entries']))
# Expected: 3
"
Common failures
- Reasoning hidden by the model. Some models don't expose chain-of-thought. Use a model with
thinkingorreasoningoutput when available. - Tool result truncation hides issues. The LLM may receive only the first 500 chars of a tool result, causing it to miss critical data. Log the full result separately.
- Non-deterministic behavior. Temperature > 0 causes different tool choices each run when debugging. Set temperature to 0 during debugging sessions.
- Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
- Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.
Related guides
- How to Implement Logging for Agent Debugging
- How to Test Agent Behavior with Unit Tests