13. Drift Detection

Chapter 13 of 24 · 15 min

KEY INSIGHT

Drift detection is the practice of identifying when your model's operating environment diverges from its training conditions. In local AI deployments, where models serve specific user populations over extended periods, drift is not theoretical—it's inevitable. ### Understanding Drift in Context Drift occurs when the statistical properties of your input data, output predictions, or the underlying problem itself change over time. Unlike cloud deployments where retraining pipelines can trigger automatically, local AI operators must build explicit observation into the serving stack. Drift compounds silently. A model that performs adequately in month one may degrade to dangerous territory by month six without any external indication. Users adapt their queries, your data distribution shifts, or the real-world phenomenon you're predicting fundamentally changes. ### Detection Approaches There are three primary drift detection approaches: **Statistical tests** compare feature distributions between a reference period and current observations. The Kolmogorov-Smirnov test measures maximum distance between cumulative distribution functions. The Chi-squared test evaluates categorical feature shifts. These are lightweight to compute and suitable for deployment on edge devices. **Distance-based methods** calculate divergence between probability distributions. KL divergence, Jensen-Shannon distance, and Wasserstein distance each offer different sensitivity profiles. Lower computational overhead than full statistical tests, but require choosing appropriate thresholds empirically. **Sequential methods** track performance metrics over time, treating drift detection as a change-point problem. Page-Hinkley test and CUSUM (cumulative sum) detect statistically significant shifts in monitored statistics. ### Implementation Considerations Local deployment constraints shape your drift detection architecture. You cannot stream infinite data to a central server for batch analysis. Instead, implement rolling window statistics computed on-device with lightweight reporting to a central dashboard. ```python # Python: Rolling window drift detection using Wasserstein distance import numpy as np from scipy.stats import wasserstein_distance class RollingDriftDetector: def __init__(self, window_size: int = 1000, threshold: float = 0.15): self.reference_window = [] self.current_window = [] self.window_size = window_size self.threshold = threshold self.drift_detected = False def add_sample(self, features: np.ndarray, prediction: float): """Add a sample from current serving traffic.""" # Compress features for storage efficiency sample = np.concatenate([features.flatten(), [prediction]]) self.current_window.append(sample) if len(self.current_window) > self.window_size: self.current_window.pop(0) def set_reference(self, reference_data: list): """Set reference distribution from training or last validation.""" self.reference_window = reference_data def check_drift(self) -> tuple[bool, float]: """Check if drift exceeds threshold. Returns (drifted, distance).""" if len(self.current_window) < 100: return False, 0.0 # Insufficient data current_mean = np.mean(self.current_window, axis=0) reference_mean = np.mean(self.reference_window, axis=0) distance = wasserstein_distance(current_mean, reference_mean) self.drift_detected = distance > self.threshold return self.drift_detected, distance ``` ### Practical Limitations Drift detection without ground truth labels is inherently limited. You detect distribution changes, not performance degradation. A drifted model might still perform acceptably, or a stable-looking distribution might mask catastrophic performance collapse. Pair statistical drift detection with user feedback mechanisms where possible.

EXERCISE

Implement a rolling drift detector using the KL divergence method. Store a reference distribution from your initial training data, then implement a scheduled check that logs drift measurements and warns when divergence exceeds your defined threshold. Validate by injecting synthetic drift (scaling features by a factor) and confirming detection.