What this does

This guide implements predictive autoscaling for AI inference and training workloads by analyzing historical request patterns and pre-warming compute capacity before demand peaks. Unlike reactive HPA, which waits for metrics to cross a threshold, predictive scaling uses a Prometheus recording rule that computes the forecast using linear regression on the past 4 weeks of hourly traffic. A custom scaler or KEDA cron trigger then schedules capacity increases for known peak periods.

Steps

Verify historical request data exists in Prometheus:
```
curl -s "http://prometheus:9090/api/v1/query?query=ai_requests_total[4w]" | jq '.data.result[0].values | length'
```
Expected output: a number > 0 confirming 4 weeks of data is available.

Create a Prometheus recording rule that computes the forecast. In predictive-rules.yml:

groups:
  - name: predictive_scaling
    interval: 1h
    rules:
      - record: forecast:ai_request_rate_1h
        expr: |
          predict_linear(
            rate(ai_requests_total[1h])[4w:1h],
            3600
          )

The predict_linear function projects the request rate one hour into the future based on the 4-week trend.

Load the recording rule into Prometheus and verify:

promtool check rules predictive-rules.yml
curl -X POST http://prometheus:9090/-/reload
curl -s "http://prometheus:9090/api/v1/query?query=forecast:ai_request_rate_1h" | jq '.data.result[0].value[1]'

Expected output: a numeric forecast value.

Expose the forecast metric to Kubernetes custom metrics via the Prometheus adapter. Add to adapter-config.yml:

rules:
  custom:
    - seriesQuery: 'forecast:ai_request_rate_1h'
      metricsQuery: avg(forecast:ai_request_rate_1h)

Create a KEDA ScaledObject with both a cron trigger (for known peak hours) and a Prometheus trigger (for the forecast):

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: ai-predictive-scaler
spec:
  scaleTargetRef:
    name: ai-inference
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
        metricName: forecast_ai_request_rate_1h
        query: forecast:ai_request_rate_1h
        threshold: "50"
    - type: cron
      metadata:
        timezone: America/New_York
        start: 30 8 * * 1-5
        end: 30 17 * * 1-5
        desiredReplicas: "5"

Apply the KEDA ScaledObject:

kubectl apply -f predictive-scaler.yaml
kubectl get scaledobject ai-predictive-scaler

Expected output: READY True confirming the scaler is active.

Add a cooldown period to prevent rapid scale-down after peaks. Configure in the ScaledObject:

advanced:
  horizontalPodAutoscalerConfig:
    behavior:
      scaleDown:
        stabilizationWindowSeconds: 600
        policies:
          - type: Percent
            value: 50
            periodSeconds: 60

Monitor the predictive scaler during a known peak. At the scheduled cron time (8:30 AM on weekdays), observe:
```
kubectl get hpa keda-hpa-ai-predictive-scaler -w
```
Expected: the HPA target replicas increase to 5 before the traffic spike arrives.

Verification

kubectl get scaledobject ai-predictive-scaler -o json | jq '.status.externalMetricNames'

Expected output: the list of external metric names being used by KEDA (e.g., ["prometheus-forecast_ai_request_rate_1h", "cron-...-..."]).

Common failures

predict_linear returns NaN — the recording rule requires at least 2 data points in the range vector. If the AI service was deployed less than 2 hours ago, the 4-week window is empty. Check with count_over_time(ai_requests_total[4w]).
KEDA cron timezone mismatch — the cron expression uses the specified timezone. Verify the cluster's correct timezone or use UTC and adjust start/end times accordingly.
HPA and KEDA conflict — ensure no separate HPA targets the same Deployment. KEDA manages its own internal HPA. If an existing HPA exists, delete it: kubectl delete hpa <name> before applying the ScaledObject.

How to implement predictive autoscaling for AI workloads using historical patterns

What this does

Steps

Verification

Common failures

Related guides