RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to Test Agent Behavior with Unit Tests
HOW-TO · RAG

How to Test Agent Behavior with Unit Tests

intermediate·20 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Agent code, pytest or similar framework, Python 3.10+

What this does

Unit tests for agents mock the LLM and tool responses to verify decision logic, tool selection, error handling, and termination conditions without making real API calls.

Steps

  • Mock the LLM client. Replace real calls with controlled responses.
from unittest.mock import Mock, patch
import pytest

class MockLLM:
    def __init__(self, responses: list[dict]):
        self.responses = responses
        self.call_count = 0

    def invoke(self, messages):
        response = self.responses[self.call_count]
        self.call_count += 1
        return response

@pytest.fixture
def mock_llm():
    responses = [
        Mock(tool_calls=[Mock(function=Mock(name="search_web", arguments='{"query":"test"}'), id="call_1")]),
        Mock(content="Final answer.", tool_calls=None, finish_reason="stop")
    ]
    return MockLLM(responses)
  • Test tool selection logic. Verify the agent chooses the correct tool for a given input.
def test_agent_chooses_search_tool(mock_llm):
    tools = [{"type": "function", "function": {"name": "search_web", "parameters": {...}}}]
    agent = Agent(llm=mock_llm, tools=tools, tool_map={"search_web": lambda q: "result"})

    result = agent.run("Search for Python tutorials")

    assert "Python tutorials" in str(result)
    assert mock_llm.call_count == 2
  • Test error handling. Verify the agent recovers from tool errors.
def test_agent_handles_tool_error():
    def failing_tool(**kwargs):
        raise ValueError("API unavailable")

    agent = Agent(llm=mock_llm, tools=tools, tool_map={"failing_tool": failing_tool})
    agent.max_retries = 1

    result = agent.run("Use failing tool")

    assert "error" in result.lower() or "unavailable" in result.lower()
  • Test termination conditions. Ensure the agent stops when expected.
def test_agent_stops_at_max_turns():
    # LLM always returns tool_calls (never says stop)
    always_tool = Mock(tool_calls=[Mock(function=Mock(name="noop", arguments="{}"), id="c1")])
    mock = MockLLM([always_tool] * 10)

    agent = Agent(llm=mock, tools=tools, tool_map={"noop": lambda: None}, max_turns=3)
    result = agent.run("Loop test")

    assert "max turns" in result.lower()
    assert mock.call_count == 3
  • Use pytest fixtures for test isolation.
@pytest.fixture
def mock_tool_registry():
    registry = ToolRegistry()
    registry.register("search", lambda q: ["result"], {})
    registry.register("calculate", lambda e: "42", {})
    return registry

@pytest.fixture
def agent(mock_llm, mock_tool_registry):
    return Agent(llm=mock_llm, registry=mock_tool_registry)
  • Test decision logic in isolation. Test the routing function independently.
def test_route_to_correct_tool():
    router = IntentRouter()
    assert router.route("search for data") == "search_web"
    assert router.route("calculate 2+2") == "calculate"
    assert router.route("unknown request") == "ask_clarification"

Verification

pytest test_agent.py -v 2>&1 | Select-String -Pattern "PASSED|FAILED"
# Expected: Several PASSED lines

Common failures

  • Mocks diverge from real LLM responses. Mock responses may not match the actual message format. Serialize and save real responses as test fixtures.
  • Test flakiness from state leakage. Tests that modify global state (e.g., set_debug) affect subsequent tests. Use @pytest.fixture(autouse=True) to reset state.
  • Tool side effects in tests. Tests that call real tools write to databases or send emails. Always mock external dependencies.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • How to Debug Agent Reasoning and Tool Selection
  • How to Build Custom Tools for Agents
← All how-to guidesCourses →