HOW-TO · RAG
How to Test Agent Behavior with Unit Tests
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES
Agent code, pytest or similar framework, Python 3.10+
What this does
Unit tests for agents mock the LLM and tool responses to verify decision logic, tool selection, error handling, and termination conditions without making real API calls.
Steps
- Mock the LLM client. Replace real calls with controlled responses.
from unittest.mock import Mock, patch
import pytest
class MockLLM:
def __init__(self, responses: list[dict]):
self.responses = responses
self.call_count = 0
def invoke(self, messages):
response = self.responses[self.call_count]
self.call_count += 1
return response
@pytest.fixture
def mock_llm():
responses = [
Mock(tool_calls=[Mock(function=Mock(name="search_web", arguments='{"query":"test"}'), id="call_1")]),
Mock(content="Final answer.", tool_calls=None, finish_reason="stop")
]
return MockLLM(responses)
- Test tool selection logic. Verify the agent chooses the correct tool for a given input.
def test_agent_chooses_search_tool(mock_llm):
tools = [{"type": "function", "function": {"name": "search_web", "parameters": {...}}}]
agent = Agent(llm=mock_llm, tools=tools, tool_map={"search_web": lambda q: "result"})
result = agent.run("Search for Python tutorials")
assert "Python tutorials" in str(result)
assert mock_llm.call_count == 2
- Test error handling. Verify the agent recovers from tool errors.
def test_agent_handles_tool_error():
def failing_tool(**kwargs):
raise ValueError("API unavailable")
agent = Agent(llm=mock_llm, tools=tools, tool_map={"failing_tool": failing_tool})
agent.max_retries = 1
result = agent.run("Use failing tool")
assert "error" in result.lower() or "unavailable" in result.lower()
- Test termination conditions. Ensure the agent stops when expected.
def test_agent_stops_at_max_turns():
# LLM always returns tool_calls (never says stop)
always_tool = Mock(tool_calls=[Mock(function=Mock(name="noop", arguments="{}"), id="c1")])
mock = MockLLM([always_tool] * 10)
agent = Agent(llm=mock, tools=tools, tool_map={"noop": lambda: None}, max_turns=3)
result = agent.run("Loop test")
assert "max turns" in result.lower()
assert mock.call_count == 3
- Use pytest fixtures for test isolation.
@pytest.fixture
def mock_tool_registry():
registry = ToolRegistry()
registry.register("search", lambda q: ["result"], {})
registry.register("calculate", lambda e: "42", {})
return registry
@pytest.fixture
def agent(mock_llm, mock_tool_registry):
return Agent(llm=mock_llm, registry=mock_tool_registry)
- Test decision logic in isolation. Test the routing function independently.
def test_route_to_correct_tool():
router = IntentRouter()
assert router.route("search for data") == "search_web"
assert router.route("calculate 2+2") == "calculate"
assert router.route("unknown request") == "ask_clarification"
Verification
pytest test_agent.py -v 2>&1 | Select-String -Pattern "PASSED|FAILED"
# Expected: Several PASSED lines
Common failures
- Mocks diverge from real LLM responses. Mock responses may not match the actual message format. Serialize and save real responses as test fixtures.
- Test flakiness from state leakage. Tests that modify global state (e.g.,
set_debug) affect subsequent tests. Use@pytest.fixture(autouse=True)to reset state. - Tool side effects in tests. Tests that call real tools write to databases or send emails. Always mock external dependencies.
- Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
- Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.
Related guides
- How to Debug Agent Reasoning and Tool Selection
- How to Build Custom Tools for Agents