How to set up Prometheus alerting rules for AI service degradation
Prometheus configured, AI service metrics to monitor
What this does
This guide configures Prometheus alerting rules that detect AI service degradation across four dimensions: increased error rate, elevated latency, token cost anomalies, and model availability drops. Each alert is tuned for AI workloads — for example, latency alerts use percentile thresholds rather than averages, and error alerts distinguish between client errors (4xx) and server errors (5xx). All alerts route through Alertmanager to the appropriate notification channel.
Steps
Create an alert rules file at
/etc/prometheus/rules/ai-degradation.yml:groups: - name: ai-service-degradationAdd a rule for elevated error rate. This fires when more than 5% of requests fail over 5 minutes:
- alert: AIHighErrorRate expr: | sum(rate(ai_requests_total{status="error"}[5m])) / sum(rate(ai_requests_total[5m])) > 0.05 for: 5m labels: severity: critical annotations: summary: "AI service error rate above 5%" description: "Error rate is {{ $value | humanizePercentage }} over the last 5 minutes."Add latency degradation alerts using histogram quantiles:
- alert: AIHighP99Latency expr: | histogram_quantile(0.99, rate(ai_request_duration_seconds_bucket[5m]) ) > 30 for: 10m labels: severity: warning annotations: summary: "AI p99 latency exceeds 30 seconds"Add token cost anomaly detection:
- alert: AITokenCostSpike expr: | rate(ai_inference_cost_cents_total[1h]) / rate(ai_inference_cost_cents_total[1h] offset 24h) > 3 for: 15m labels: severity: warning annotations: summary: "AI token cost 3x higher than same hour yesterday"Add model unavailability detection:
- alert: AIModelDown expr: up{job="ai-inference"} == 0 for: 2m labels: severity: critical annotations: summary: "AI model endpoint is down" description: "Instance {{ $labels.instance }} has been down for 2 minutes."Validate the rules file:
promtool check rules ai-degradation.ymlExpected output:
SUCCESS: 4 rules found.Load the rules into Prometheus by referencing them in
prometheus.yml:rule_files: - "/etc/prometheus/rules/ai-degradation.yml"Reload Prometheus and verify alerts appear in the Alerts page at
http://localhost:9090/alerts. Each rule should show as inactive (green).
Verification
curl -s http://localhost:9090/api/v1/rules | jq '.data.groups[] | select(.name=="ai-service-degradation") | .rules | length'
Expected output: 4.
Common failures
- Alert fires continuously after deploy — the
forduration is too short. Increase to 10-15 minutes to avoid flapping from transient errors during rollout. - Latency alert never fires despite slow responses — the histogram buckets may not extend high enough. Add buckets at 60, 120, and 300 seconds.
- Token cost comparison returns NaN — the 24h offset requires at least 25 hours of metric history. Disable the alert or use a shorter offset during initial deployment.
- Alertmanager does not receive alerts — check the Alertmanager tab in Prometheus UI (Status > Runtime & Build Information) for
alertmanagersdiscovered. Verify the URL inprometheus.ymlunderalerting > alertmanagers.