How to implement distributed tracing for multi-agent workflows with trace context propagation
OpenTelemetry SDK, multi-agent system
What this does
This guide implements distributed tracing across multiple AI agent services that collaborate on a single workflow. When one agent delegates a subtask to another, the trace context (trace ID, span ID, and trace flags) propagates via HTTP headers or message queue metadata. The result is a single end-to-end trace showing every agent's contribution, including inter-agent latency and error attribution.
Steps
Install the propagation library for the transport protocol. For HTTP:
pip install opentelemetry-propagator-b3For message queues, use the W3C propagator included in
opentelemetry-api.Configure the global propagator in every agent service's startup:
from opentelemetry.propagate import set_global_textmap from opentelemetry.propagators.composite import CompositeHTTPPropagator from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator set_global_textmap(CompositeHTTPPropagator([TraceContextTextMapPropagator()]))In the orchestrator agent, create a parent span for the workflow and inject context into outbound HTTP calls:
tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("multi-agent-workflow") as workflow_span: headers = {} propagate.inject(headers) response = requests.post("http://worker-agent:8000/execute", json={"task": subtask}, headers=headers) workflow_span.set_attribute("workflow.subtask_count", len(tasks))In the worker agent, extract the propagated context and create child spans:
@app.post("/execute") async def execute(request: Request): ctx = propagate.extract(dict(request.headers)) tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("worker-execute", context=ctx) as span: result = await process_task(request.json()["task"]) span.set_attribute("worker.result_size", len(str(result))) return {"result": result}For message-queue propagation, inject context into message metadata fields. With Redis Pub/Sub:
carrier = {} propagate.inject(carrier) redis.publish("agent-tasks", json.dumps({ "task": task_data, "trace_context": carrier }))On the consumer side, extract from the message carrier and restore the parent-child span relationship.
Deploy all services with identical OTLP exporter configuration:
exporter = OTLPSpanExporter(endpoint="http://jaeger-collector:4317", insecure=True)Expected: spans from both orchestrator and worker appear under one trace ID in the tracing UI.
Verification
curl -s "http://jaeger:16686/api/traces?service=agent-orchestrator&limit=1" | jq '.data[0].spans | length'
Expected output: an integer >= 2, confirming both orchestrator and worker spans exist in a single trace.
Common failures
- Spans appear as separate traces — the propagated context is not being extracted. Verify the traceparent header is present on the worker's incoming request:
print(dict(request.headers).get("traceparent")). - Incomplete traces (missing worker spans) — the worker's span exporter is not configured or the OTLP endpoint is unreachable from the worker container. Check worker logs for exporter errors.
- Mismatched propagation formats — orchestrator uses W3C TraceContext but consumer expects B3 format. Standardize on W3C across all services by setting
OTEL_PROPAGATORS=tracecontext.