What this does

Performance monitoring tracks latency, token usage, error rates, and tool call frequency per agent — providing dashboards for capacity planning and bottleneck identification.

Steps

Install Prometheus client library.

pip install prometheus-client

Define metric counters and histograms. Each signal gets a Prometheus metric.

from prometheus_client import Counter, Histogram, Gauge, start_http_server

TOOL_CALL_COUNTER = Counter(
    "agent_tool_calls_total",
    "Total tool calls by tool name",
    ["agent", "tool_name", "status"]
)

LLM_LATENCY = Histogram(
    "agent_llm_latency_seconds",
    "LLM call latency in seconds",
    ["agent"],
    buckets=(0.1, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0)
)

TOKEN_USAGE = Counter(
    "agent_token_usage_total",
    "Token usage by agent",
    ["agent", "token_type"]  # prompt, completion
)

ACTIVE_AGENTS = Gauge(
    "agent_active_count",
    "Number of currently active agents",
    ["agent_type"]
)

Instrument the agent loop. Record metrics around each operation.

import time

def monitored_tool_call(agent_name: str, tool_name: str, args: dict):
    start = time.time()
    try:
        result = execute_tool(tool_name, args)
        TOOL_CALL_COUNTER.labels(agent=agent_name, tool_name=tool_name, status="success").inc()
        return result
    except Exception as e:
        TOOL_CALL_COUNTER.labels(agent=agent_name, tool_name=tool_name, status="error").inc()
        raise
    finally:
        LLM_LATENCY.labels(agent=agent_name).observe(time.time() - start)

Start the Prometheus HTTP server. Expose metrics on a dedicated port.

from prometheus_client import start_http_server

start_http_server(8001)  # Metrics available at http://localhost:8001/metrics

Add business-level metrics. Track task completion rate and quality.

TASK_COMPLETION = Counter(
    "agent_tasks_total",
    "Tasks by completion status",
    ["agent", "status"]  # success, failure, timeout
)
TASK_DURATION = Histogram(
    "agent_task_duration_seconds",
    "End-to-end task duration",
    ["agent"]
)

Query with PromQL. Example queries for monitoring.

# Top 5 most-called tools
topk(5, sum(rate(agent_tool_calls_total[5m])) by (tool_name))

# Error rate per agent
sum(rate(agent_tool_calls_total{status="error"}[5m])) by (agent)
/ sum(rate(agent_tool_calls_total[5m])) by (agent)

# P95 LLM latency
histogram_quantile(0.95, sum(rate(agent_llm_latency_seconds_bucket[5m])) by (le))

Verification

python -c "
from prometheus_client import Counter, start_http_server
import threading, requests, time
c = Counter('test_total', 'Test', ['label'])
c.labels(label='test').inc()
threading.Thread(target=lambda: start_http_server(8002), daemon=True).start()
time.sleep(0.5)
r = requests.get('http://localhost:8002/metrics')
print('test_total' in r.text)
# Expected: True
"

Common failures

Metric cardinality explosion. Using unique labels like session_id creates unlimited metric series. Pin to fixed labels like agent and tool_name.
Port conflicts. Multiple agent instances crash if they all try to bind the same Prometheus port. Use a unique port per instance or a push gateway.
Histogram bucket sizing wrong. Default buckets miss long tails. Configure buckets up to 60s for LLM calls with slow local models.
Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

How to Set Up Agent Observability and Tracing
How to Implement Logging for Agent Debugging