RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to Monitor Agent Performance Metrics
HOW-TO · RAG

How to Monitor Agent Performance Metrics

intermediate·20 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Agent system running, Prometheus or similar, Python 3.10+

What this does

Performance monitoring tracks latency, token usage, error rates, and tool call frequency per agent — providing dashboards for capacity planning and bottleneck identification.

Steps

  • Install Prometheus client library.
pip install prometheus-client
  • Define metric counters and histograms. Each signal gets a Prometheus metric.
from prometheus_client import Counter, Histogram, Gauge, start_http_server

TOOL_CALL_COUNTER = Counter(
    "agent_tool_calls_total",
    "Total tool calls by tool name",
    ["agent", "tool_name", "status"]
)

LLM_LATENCY = Histogram(
    "agent_llm_latency_seconds",
    "LLM call latency in seconds",
    ["agent"],
    buckets=(0.1, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0)
)

TOKEN_USAGE = Counter(
    "agent_token_usage_total",
    "Token usage by agent",
    ["agent", "token_type"]  # prompt, completion
)

ACTIVE_AGENTS = Gauge(
    "agent_active_count",
    "Number of currently active agents",
    ["agent_type"]
)
  • Instrument the agent loop. Record metrics around each operation.
import time

def monitored_tool_call(agent_name: str, tool_name: str, args: dict):
    start = time.time()
    try:
        result = execute_tool(tool_name, args)
        TOOL_CALL_COUNTER.labels(agent=agent_name, tool_name=tool_name, status="success").inc()
        return result
    except Exception as e:
        TOOL_CALL_COUNTER.labels(agent=agent_name, tool_name=tool_name, status="error").inc()
        raise
    finally:
        LLM_LATENCY.labels(agent=agent_name).observe(time.time() - start)
  • Start the Prometheus HTTP server. Expose metrics on a dedicated port.
from prometheus_client import start_http_server

start_http_server(8001)  # Metrics available at http://localhost:8001/metrics
  • Add business-level metrics. Track task completion rate and quality.
TASK_COMPLETION = Counter(
    "agent_tasks_total",
    "Tasks by completion status",
    ["agent", "status"]  # success, failure, timeout
)
TASK_DURATION = Histogram(
    "agent_task_duration_seconds",
    "End-to-end task duration",
    ["agent"]
)
  • Query with PromQL. Example queries for monitoring.
# Top 5 most-called tools
topk(5, sum(rate(agent_tool_calls_total[5m])) by (tool_name))

# Error rate per agent
sum(rate(agent_tool_calls_total{status="error"}[5m])) by (agent)
/ sum(rate(agent_tool_calls_total[5m])) by (agent)

# P95 LLM latency
histogram_quantile(0.95, sum(rate(agent_llm_latency_seconds_bucket[5m])) by (le))

Verification

python -c "
from prometheus_client import Counter, start_http_server
import threading, requests, time
c = Counter('test_total', 'Test', ['label'])
c.labels(label='test').inc()
threading.Thread(target=lambda: start_http_server(8002), daemon=True).start()
time.sleep(0.5)
r = requests.get('http://localhost:8002/metrics')
print('test_total' in r.text)
# Expected: True
"

Common failures

  • Metric cardinality explosion. Using unique labels like session_id creates unlimited metric series. Pin to fixed labels like agent and tool_name.
  • Port conflicts. Multiple agent instances crash if they all try to bind the same Prometheus port. Use a unique port per instance or a push gateway.
  • Histogram bucket sizing wrong. Default buckets miss long tails. Configure buckets up to 60s for LLM calls with slow local models.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • How to Set Up Agent Observability and Tracing
  • How to Implement Logging for Agent Debugging
← All how-to guidesCourses →