HOW-TO · OPS

How to instrument an AI agent with OpenTelemetry SDK for automatic trace collection

advanced25 minBy Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Python AI agent, OpenTelemetry SDK installed

What this does

This guide sets up automatic trace collection for an AI agent application using the OpenTelemetry Python SDK. Each agent decision step, tool invocation, and LLM API call becomes a span within a distributed trace, enabling end-to-end visibility into agent workflows. The instrumentation uses the opentelemetry-instrument auto-instrumentation agent alongside manual span creation for custom agent logic.

Steps

  1. Install the OpenTelemetry distribution and instrument the runtime:

    pip install opentelemetry-distro opentelemetry-exporter-otlp
    opentelemetry-bootstrap -a install
    

    Expected output: list of installed instrumentation libraries for Flask, requests, and other detected packages.

  2. Configure the OTLP exporter via environment variables:

    export OTEL_SERVICE_NAME="ai-agent-orchestrator"
    export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
    export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=production"
    

    No output on success; verify with echo $OTEL_SERVICE_NAME.

  3. Add a custom tracer provider in the agent's entry point. Create tracing.py with:

    from opentelemetry import trace
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.resources import Resource
    from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
    from opentelemetry.sdk.trace.export import BatchSpanProcessor
    
    resource = Resource.create({"service.name": "ai-agent-orchestrator"})
    provider = TracerProvider(resource=resource)
    exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
    provider.add_span_processor(BatchSpanProcessor(exporter))
    trace.set_tracer_provider(provider)
    
  4. Wrap the agent decision loop with manual spans. Import the tracer and create spans for each reasoning step:

    tracer = trace.get_tracer(__name__)
    with tracer.start_as_current_span("agent-reasoning-step") as span:
        span.set_attribute("agent.step_id", step_id)
        result = agent.reason(context)
        span.set_attribute("agent.tool_selected", result.tool_name)
    
  5. Instrument LLM API calls by wrapping the HTTP client. The auto-instrumentation library intercepts requests calls automatically. Add opentelemetry-instrument python agent_main.py to the startup command.

  6. Run the agent with instrumentation enabled:

    opentelemetry-instrument python agent_main.py
    

    Expected output: trace export confirmation in collector logs showing spans with names like agent-reasoning-step and HTTP POST.

  7. Verify spans in the tracing backend. Open the Jaeger UI at http://localhost:16686 and search for ai-agent-orchestrator. Confirm traces show nested spans for the full decision loop.

Verification

curl -s http://localhost:16686/api/traces?service=ai-agent-orchestrator | jq '.data[0].spans | length'

Expected output: an integer greater than 0, confirming traces were received.

Common failures

  • OTLP endpoint unreachable — ensure the collector is running: docker ps | grep otel-collector. If absent, start it with docker run -p 4317:4317 otel/opentelemetry-collector-contrib.
  • No spans appearing — verify OTEL_SERVICE_NAME is set and the exporter endpoint uses the correct protocol (gRPC vs HTTP). Switch to HTTP by setting OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf.
  • Agent startup fails with import errors — ensure opentelemetry-distro matches the Python version. Run pip list | grep opentelemetry to confirm all packages are installed.
  • Duplicate spans or missing parent-child linkage — ensure the tracer context is propagated correctly; avoid creating multiple TracerProvider instances.
  • High latency overhead — switch from SimpleSpanProcessor to BatchSpanProcessor if spans are being exported synchronously.

Related guides