HOW-TO · RAG
How to Set Up Agent Observability and Tracing
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES
Agent system deployed, OpenTelemetry SDK, Python 3.10+
What this does
Observability captures traces of agent execution — every LLM call, tool invocation, and decision point — enabling debugging, performance analysis, and behavior monitoring across the agent's lifecycle.
Steps
- Install OpenTelemetry packages.
pip install opentelemetry-api opentelemetry-sdk opentelemetry-instrumentation-langchain
- Initialize a tracer provider. Set up the global tracer with a service name.
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, BatchSpanProcessor
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)
- Create spans around agent operations. Each step becomes a traceable span.
def agent_step(task: str) -> str:
with tracer.start_as_current_span("agent_step") as span:
span.set_attribute("task.description", task)
with tracer.start_as_current_span("llm_call") as llm_span:
response = llm.invoke(task)
llm_span.set_attribute("llm.response_length", len(response.content))
with tracer.start_as_current_span("tool_call") as tool_span:
result = execute_tool(...)
tool_span.set_attribute("tool.success", "error" not in result)
span.set_attribute("step.complete", True)
return response.content
- Record tool call details as span events. Events provide granular timing.
def traced_tool_call(tool_name: str, args: dict):
with tracer.start_as_current_span(f"tool:{tool_name}") as span:
span.set_attribute("tool.args", str(args))
start = time.time()
try:
result = TOOL_MAP[tool_name](**args)
span.set_attribute("tool.duration_ms", (time.time() - start) * 1000)
span.set_attribute("tool.success", True)
return result
except Exception as e:
span.set_attribute("tool.success", False)
span.set_attribute("tool.error", str(e))
span.record_exception(e)
raise
- Export traces to a backend. Send spans to Jaeger, Zipkin, or an OTLP collector.
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces")
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
- Add LangChain auto-instrumentation.
from opentelemetry.instrumentation.langchain import LangchainInstrumentor
LangchainInstrumentor().instrument()
# All LangChain chains are now automatically traced
Verification
python -c "
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
trace.set_tracer_provider(TracerProvider())
t = trace.get_tracer('test')
with t.start_as_current_span('test') as s:
s.set_attribute('test', True)
print('Tracing works')
# Expected: Tracing works
"
Common failures
- Missing context propagation. Spans appear disconnected because parent context is not passed between threads or async tasks. Use
trace.get_tracer(__name__)consistently. - Exporter not configured. Without an exporter, spans are created but never sent. Always configure at least
ConsoleSpanExporterfor debugging. - Sampling rate too low. Default sampling may drop 90% of traces. Set
AlwaysOnSampler()during development. - Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
- Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.
Related guides
- How to Monitor Agent Performance Metrics
- How to Implement Logging for Agent Debugging