What this does

This guide configures Prometheus alerting rules that detect AI service degradation across four dimensions: increased error rate, elevated latency, token cost anomalies, and model availability drops. Each alert is tuned for AI workloads — for example, latency alerts use percentile thresholds rather than averages, and error alerts distinguish between client errors (4xx) and server errors (5xx). All alerts route through Alertmanager to the appropriate notification channel.

Steps

Create an alert rules file at /etc/prometheus/rules/ai-degradation.yml:
```
groups:
  - name: ai-service-degradation
```

Add a rule for elevated error rate. This fires when more than 5% of requests fail over 5 minutes:

    - alert: AIHighErrorRate
      expr: |
        sum(rate(ai_requests_total{status="error"}[5m]))
        / sum(rate(ai_requests_total[5m])) > 0.05
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "AI service error rate above 5%"
        description: "Error rate is {{ $value | humanizePercentage }} over the last 5 minutes."

Add latency degradation alerts using histogram quantiles:

    - alert: AIHighP99Latency
      expr: |
        histogram_quantile(0.99,
          rate(ai_request_duration_seconds_bucket[5m])
        ) > 30
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "AI p99 latency exceeds 30 seconds"

Add token cost anomaly detection:

    - alert: AITokenCostSpike
      expr: |
        rate(ai_inference_cost_cents_total[1h])
        / rate(ai_inference_cost_cents_total[1h] offset 24h) > 3
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "AI token cost 3x higher than same hour yesterday"

Add model unavailability detection:

    - alert: AIModelDown
      expr: up{job="ai-inference"} == 0
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "AI model endpoint is down"
        description: "Instance {{ $labels.instance }} has been down for 2 minutes."

Validate the rules file:
```
promtool check rules ai-degradation.yml
```
Expected output: SUCCESS: 4 rules found.
Load the rules into Prometheus by referencing them in prometheus.yml:
```
rule_files:
  - "/etc/prometheus/rules/ai-degradation.yml"
```
Reload Prometheus and verify alerts appear in the Alerts page at http://localhost:9090/alerts. Each rule should show as inactive (green).

Verification

curl -s http://localhost:9090/api/v1/rules | jq '.data.groups[] | select(.name=="ai-service-degradation") | .rules | length'

Expected output: 4.

Common failures

Alert fires continuously after deploy — the for duration is too short. Increase to 10-15 minutes to avoid flapping from transient errors during rollout.
Latency alert never fires despite slow responses — the histogram buckets may not extend high enough. Add buckets at 60, 120, and 300 seconds.
Token cost comparison returns NaN — the 24h offset requires at least 25 hours of metric history. Disable the alert or use a shorter offset during initial deployment.
Alertmanager does not receive alerts — check the Alertmanager tab in Prometheus UI (Status > Runtime & Build Information) for alertmanagers discovered. Verify the URL in prometheus.yml under alerting > alertmanagers.

How to set up Prometheus alerting rules for AI service degradation

What this does

Steps

Verification

Common failures

Related guides