HOW-TO · RAG

How to Set Up Agent Observability and Tracing

intermediate25 minBy Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Agent system deployed, OpenTelemetry SDK, Python 3.10+

What this does

Observability captures traces of agent execution — every LLM call, tool invocation, and decision point — enabling debugging, performance analysis, and behavior monitoring across the agent's lifecycle.

Steps

  • Install OpenTelemetry packages.
pip install opentelemetry-api opentelemetry-sdk opentelemetry-instrumentation-langchain
  • Initialize a tracer provider. Set up the global tracer with a service name.
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, BatchSpanProcessor

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)
  • Create spans around agent operations. Each step becomes a traceable span.
def agent_step(task: str) -> str:
    with tracer.start_as_current_span("agent_step") as span:
        span.set_attribute("task.description", task)

        with tracer.start_as_current_span("llm_call") as llm_span:
            response = llm.invoke(task)
            llm_span.set_attribute("llm.response_length", len(response.content))

        with tracer.start_as_current_span("tool_call") as tool_span:
            result = execute_tool(...)
            tool_span.set_attribute("tool.success", "error" not in result)

        span.set_attribute("step.complete", True)
        return response.content
  • Record tool call details as span events. Events provide granular timing.
def traced_tool_call(tool_name: str, args: dict):
    with tracer.start_as_current_span(f"tool:{tool_name}") as span:
        span.set_attribute("tool.args", str(args))
        start = time.time()
        try:
            result = TOOL_MAP[tool_name](**args)
            span.set_attribute("tool.duration_ms", (time.time() - start) * 1000)
            span.set_attribute("tool.success", True)
            return result
        except Exception as e:
            span.set_attribute("tool.success", False)
            span.set_attribute("tool.error", str(e))
            span.record_exception(e)
            raise
  • Export traces to a backend. Send spans to Jaeger, Zipkin, or an OTLP collector.
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces")
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
  • Add LangChain auto-instrumentation.
from opentelemetry.instrumentation.langchain import LangchainInstrumentor

LangchainInstrumentor().instrument()
# All LangChain chains are now automatically traced

Verification

python -c "
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
trace.set_tracer_provider(TracerProvider())
t = trace.get_tracer('test')
with t.start_as_current_span('test') as s:
    s.set_attribute('test', True)
print('Tracing works')
# Expected: Tracing works
"

Common failures

  • Missing context propagation. Spans appear disconnected because parent context is not passed between threads or async tasks. Use trace.get_tracer(__name__) consistently.
  • Exporter not configured. Without an exporter, spans are created but never sent. Always configure at least ConsoleSpanExporter for debugging.
  • Sampling rate too low. Default sampling may drop 90% of traces. Set AlwaysOnSampler() during development.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • How to Monitor Agent Performance Metrics
  • How to Implement Logging for Agent Debugging