How to instrument an AI agent with OpenTelemetry SDK for automatic trace collection
Python AI agent, OpenTelemetry SDK installed
What this does
This guide sets up automatic trace collection for an AI agent application using the OpenTelemetry Python SDK. Each agent decision step, tool invocation, and LLM API call becomes a span within a distributed trace, enabling end-to-end visibility into agent workflows. The instrumentation uses the opentelemetry-instrument auto-instrumentation agent alongside manual span creation for custom agent logic.
Steps
Install the OpenTelemetry distribution and instrument the runtime:
pip install opentelemetry-distro opentelemetry-exporter-otlp opentelemetry-bootstrap -a installExpected output: list of installed instrumentation libraries for Flask, requests, and other detected packages.
Configure the OTLP exporter via environment variables:
export OTEL_SERVICE_NAME="ai-agent-orchestrator" export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317" export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=production"No output on success; verify with
echo $OTEL_SERVICE_NAME.Add a custom tracer provider in the agent's entry point. Create
tracing.pywith:from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.resources import Resource from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.trace.export import BatchSpanProcessor resource = Resource.create({"service.name": "ai-agent-orchestrator"}) provider = TracerProvider(resource=resource) exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True) provider.add_span_processor(BatchSpanProcessor(exporter)) trace.set_tracer_provider(provider)Wrap the agent decision loop with manual spans. Import the tracer and create spans for each reasoning step:
tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("agent-reasoning-step") as span: span.set_attribute("agent.step_id", step_id) result = agent.reason(context) span.set_attribute("agent.tool_selected", result.tool_name)Instrument LLM API calls by wrapping the HTTP client. The auto-instrumentation library intercepts
requestscalls automatically. Addopentelemetry-instrument python agent_main.pyto the startup command.Run the agent with instrumentation enabled:
opentelemetry-instrument python agent_main.pyExpected output: trace export confirmation in collector logs showing spans with names like
agent-reasoning-stepandHTTP POST.Verify spans in the tracing backend. Open the Jaeger UI at
http://localhost:16686and search forai-agent-orchestrator. Confirm traces show nested spans for the full decision loop.
Verification
curl -s http://localhost:16686/api/traces?service=ai-agent-orchestrator | jq '.data[0].spans | length'
Expected output: an integer greater than 0, confirming traces were received.
Common failures
- OTLP endpoint unreachable — ensure the collector is running:
docker ps | grep otel-collector. If absent, start it withdocker run -p 4317:4317 otel/opentelemetry-collector-contrib. - No spans appearing — verify
OTEL_SERVICE_NAMEis set and the exporter endpoint uses the correct protocol (gRPC vs HTTP). Switch to HTTP by settingOTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf. - Agent startup fails with import errors — ensure
opentelemetry-distromatches the Python version. Runpip list | grep opentelemetryto confirm all packages are installed. - Duplicate spans or missing parent-child linkage — ensure the tracer context is propagated correctly; avoid creating multiple TracerProvider instances.
- High latency overhead — switch from
SimpleSpanProcessortoBatchSpanProcessorif spans are being exported synchronously.