20. Logging and Monitoring

Chapter 20 of 22 · 20 min

KEY INSIGHT

Production MCP servers require structured logging and metrics collectionΓÇöobservability enables debugging, performance tuning, and reliability assurance. Structured logging captures queryable fields: ```python import structlog structlog.configure( processors=[ structlog.processors.TimeStamper(fmt="iso"), structlog.processors.JSONRenderer(), ] ) logger = structlog.get_logger() @mcp.tool() async def database_query(sql: str) -> dict: log = logger.bind(tool="database_query") log.info("tool_invocation", sql_type=sql.strip().upper()[:20]) start = time.perf_counter() try: result = await execute_query(sql) elapsed = time.perf_counter() - start log.info( "tool_completed", rows=len(result), elapsed_ms=round(elapsed * 1000, 2) ) return {"rows": result} except Exception as e: log.error("tool_failed", error=str(e)) raise ``` Prometheus metrics provide performance visibility: ```python from prometheus_client import Counter, Histogram, Gauge REQUEST_COUNT = Counter( "mcp_requests_total", "Total MCP requests", ["tool", "status"] ) REQUEST_DURATION = Histogram( "mcp_request_duration_seconds", "Request duration", ["tool"] ) ACTIVE_REQUESTS = Gauge( "mcp_active_requests", "Currently processing requests" ) @mcp.tool() async def monitored_tool(data: str) -> str: ACTIVE_REQUESTS.inc() REQUEST_COUNT.labels(tool="monitored_tool", status="started").inc() start = time.perf_counter() try: result = await heavy_operation(data) elapsed = time.perf_counter() - start REQUEST_DURATION.labels(tool="monitored_tool").observe(elapsed) REQUEST_COUNT.labels(tool="monitored_tool", status="success").inc() return result except Exception: REQUEST_COUNT.labels(tool="monitored_tool", status="error").inc() raise finally: ACTIVE_REQUESTS.dec() ``` Distributed tracing correlates requests: ```python from opentelemetry import trace tracer = trace.get_tracer(__name__) @mcp.tool() async def traced_operation(id: str) -> dict: with tracer.start_as_current_span( "mcp.tool", attributes={"tool.name": "traced_operation", "tool.param.id": id} ) as span: span.add_event("Processing started") result = await compute(id) span.set_attribute("result.rows", len(result)) span.add_event("Processing completed") return result ``` Health endpoints for monitoring systems: ```python from starlette.routing import Route async def health_check(request): checks = { "database": await check_database_health(), "disk_space": check_disk_space(), "memory": check_memory_usage(), } healthy = all(checks.values()) status_code = 200 if healthy else 503 return JSONResponse({ "status": "healthy" if healthy else "unhealthy", "checks": checks, "uptime_seconds": time.time() - START_TIME, }) ``` Log aggregation requires consistent formatting. Ship logs to a central system: ```python import logging.handlers # Configure structured logging with JSON logger = logging.getLogger("mcp") logger.setLevel(logging.INFO) # JSON file handler for local debugging json_handler = logging.handlers.RotatingFileHandler( "/var/log/mcp/server.json", maxBytes=10_000_000, backupCount=5 ) json_handler.setFormatter(JsonFormatter()) logger.addHandler(json_handler) # Syslog handler for centralized collection syslog_handler = logging.handlers.SysLogHandler(address="/dev/log") syslog_handler.setFormatter(SyslogFormatter()) logger.addHandler(syslog_handler) ```

Knowledge transfer checkpoint

Connect Logging and Monitoring back to the local-AI decision you are learning to make. The practical question is not only whether the code or concept works, but whether it still works when the model, runtime, hardware budget, privacy requirement, and latency target are real constraints.

Before moving on, write down four things: the local runtime or deployment surface involved, the memory or throughput constraint that could change the design, the verification signal that proves the lesson worked, and the failure mode you would check first if the result looked wrong. That turns this chapter from background knowledge into an operator habit.

A good answer should be specific enough that another reader could repeat the decision on their own machine. Name the model or component when there is one, record the relevant context or token budget, and prefer a measurable check over a vague statement such as "it seems faster" or "the setup is fine."

EXERCISE

Instrument an existing MCP server with structured logging, Prometheus metrics, and a health endpoint. Generate traffic and verify metrics appear in your monitoring system.