RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to set up agent observability with OpenTelemetry
HOW-TO · SUP

How to set up agent observability with OpenTelemetry

advanced·25 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

OpenTelemetry SDK, AI agent codebase

What this does

Setting up agent observability with OpenTelemetry provides distributed tracing, metrics collection, and structured logging for AI agent workflows. The instrumentation captures each tool call, reasoning step, and external API request as spans in a trace, enabling root-cause analysis of slow or failing agent runs. The collected data flows to backends like Jaeger, Grafana, or an OTLP-compatible service for visualization and alerting.

Steps

Initialize the OpenTelemetry SDK at application startup. Create a tracing.py module: import TracerProvider, BatchSpanProcessor, and the OTLP exporter. Set the service name: resource = Resource(attributes={"service.name": "ai-agent"}). Configure the provider with trace.set_tracer_provider(TracerProvider(resource=resource)) and add a span processor exporting to http://localhost:4317. Create a tracer instance: tracer = trace.get_tracer(__name__). Instrument the agent's main loop by wrapping each step in a span: with tracer.start_as_current_span("agent.step") as span: span.set_attribute("step", step_count); result = agent.execute(). Within each tool call, create a child span: with tracer.start_as_current_span(f"tool.{tool_name}") as tool_span: tool_span.set_attribute("args", str(args)); output = tool.run(). Add error recording: span.record_exception(e) in except blocks. Enable metrics by creating a MeterProvider and registering a counter for token usage and a histogram for step latency. For log correlation, inject trace context into log records using LoggingInstrumentor().instrument().

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

  • Confirm the local starting state. Print the active binary, package version, model name, or configuration path before changing the workflow.

  • Run the smallest complete path. Execute the minimum command or script that proves the guide works end to end on the local machine.

  • Compare against expected output. Check the final line, status code, generated artifact, or model response against the verification section before expanding the setup.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

Send a test task to the agent and check the observability backend UI—a trace should appear with a root span and nested child spans for each tool call. Verify span attributes are populated: inspect a tool call span and confirm args, duration, and status are present. Check that intentional errors appear as exception events on the relevant spans. Run the agent 10 times and confirm metrics show 10 counter increments and latency percentiles in the histogram. Verify log lines in the console include trace_id and span_id fields.

Common failures

OTLP exporter cannot reach backend: Verify the endpoint URL and port with curl http://localhost:4317/v1/traces and check firewall rules. Spans not appearing: Ensure BatchSpanProcessor flushes before the process exits by calling trace.get_tracer_provider().shutdown(). High memory usage from span buffering: Reduce max_export_batch_size from 512 to 128 in the processor configuration. Missing traces for async agent loops: Use async instrumentors and ensure spans are passed through context correctly in async/await patterns. Duplicate instrumentation: Check that only one TracerProvider is initialized—wrap in if not hasattr(sys.modules[__name__], '_otel_initialized') guard.

  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • monitor-agent-token-usage-cost
  • deploy-ai-kubernetes-gpu-nodes
  • build-multi-agent-supervisor-workflow
← All how-to guidesCourses →