RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /MLOps for Local AI
  6. /Ch. 14
MLOps for Local AI

14. Data Drift

Chapter 14 of 24 · 20 min
KEY INSIGHT

Data drift occurs when the statistical distribution of your input features changes over time. Detecting data drift is the foundation of proactive model retraining—catching distribution shifts before they cascade into prediction quality degradation. ### The Mechanics of Feature Drift Every model makes an implicit assumption: the future will resemble the past. This assumption lives in the training data distribution. When users generate data that diverges from this distribution, predictions suffer. Consider a local AI system screening support tickets. Over months, your product evolves. New features attract different user segments. Query language shifts as cultural references change. The ticket categories distribution you trained on no longer matches reality. Your model increasingly sees out-of-distribution inputs it cannot reliably process. ### Detection Implementation Memory-efficient feature drift detection for edge deployment requires careful resource management: ```python # Python: Efficient feature drift detection for edge deployment from collections import deque import numpy as np from scipy.stats import ks_2samp class FeatureDriftDetector: """ Monitors individual feature distributions for statistically significant drift. Designed for resource-constrained edge deployment. """ def __init__(self, n_features: int, window_size: int = 500, alpha: float = 0.05): self.n_features = n_features self.window_size = window_size self.alpha = alpha # Significance level # Rolling buffers per feature (memory-efficient deque) self.buffers = [deque(maxlen=window_size) for _ in range(n_features)] self.reference_means = None self.reference_stds = None self.drift_counts = 0 def capture_baseline(self, baseline_data: np.ndarray): """Capture statistical baseline from training or known-good data.""" self.reference_means = np.mean(baseline_data, axis=0) self.reference_stds = np.std(baseline_data, axis=0) # Pre-populate buffers with baseline for warm start for i in range(min(len(baseline_data), self.window_size)): for feat_idx in range(self.n_features): self.buffers[feat_idx].append(baseline_data[i, feat_idx]) def ingest(self, features: np.ndarray): """Ingest a single observation's features.""" if features.shape[0] != self.n_features: raise ValueError(f"Expected {self.n_features} features, got {features.shape[0]}") for feat_idx, value in enumerate(features): self.buffers[feat_idx].append(value) def assess(self) -> dict: """ Assess drift across all features using KS test. Returns dict with drift status per feature and overall status. """ if self.reference_means is None: return {"drifted": False, "error": "No baseline established"} results = {"drifted": False, "features": {}} for feat_idx in range(self.n_features): feature_data = np.array(self.buffers[feat_idx]) # Compute current statistics current_mean = np.mean(feature_data) current_std = np.std(feature_data) # Normalize for KS test to handle scale differences normalized = (feature_data - current_mean) / (current_std + 1e-8) reference = (self.reference_means[feat_idx], self.reference_stds[feat_idx]) ref_normalized = (0, 1) # Standard normal for comparison # KS test against normal distribution # In practice, compare against stored reference samples # Simplified: compare Z-score locations drift_score = abs(current_mean - reference[0]) / (reference[1] + 1e-8) results["features"][feat_idx] = { "drifted": drift_score > 3.0, # 3-sigma rule "score": float(drift_score) } if drift_score > 3.0: results["drifted"] = True return results ``` ### Categorical Feature Drift Numerical features yield to statistical tests. Categorical features require different treatment. Monitor category frequency distributions, alert on emerging categories with zero training frequency, and track category elimination events. ```python # Python: Categorical distribution drift detection from collections import Counter import numpy as np def categorical_drift_score( current: Counter, reference: Counter, total_current: int, total_reference: int ) -> dict: """ Compute drift metrics for categorical features. Uses Total Variation Distance as primary metric. """ # Get union of all categories all_categories = set(current.keys()) | set(reference.keys()) # Compute probability distributions current_probs = {cat: current.get(cat, 0) / total_current for cat in all_categories} ref_probs = {cat: reference.get(cat, 0) / total_reference for cat in all_categories} # Total Variation Distance tvd = 0.5 * sum(abs(current_probs[cat] - ref_probs[cat]) for cat in all_categories) # Flag novel categories (in current but not in reference) novel = set(current.keys()) - set(reference.keys()) # Flag atrophied categories (in reference but not current) atrophied = set(reference.keys()) - set(current.keys()) return { "tvd": tvd, "max_tvd": 1.0, # Normalized scale "novel_categories": list(novel), "atrophied_categories": list(atrophied), "drifted": tvd > 0.1 or len(novel) > 0 or len(atrophied) > 0 } ``` ### Operational Implications Data drift detection without automated response is observation without action. Build alerting thresholds that trigger retraining workflows in your MLOps pipeline. Distinguish minor fluctuations (expected noise) from systematic shifts (requiring intervention).


EXERCISE

Collect a baseline dataset from your model's initial serving period. Store per-feature statistics. Build a scheduled task that samples current traffic, computes drift scores, and emits logs when scores exceed thresholds. Include both numerical and categorical feature handling.

← Chapter 13
Drift Detection
Chapter 15 →
Model Drift