10. Model Validation Gates

Chapter 10 of 24 · 20 min

Validation gates are automated checks that determine whether a model progresses to the next pipeline stage. Without gates, you deploy whatever trained—regardless of quality. Gates enforce minimum thresholds and catch degradation before production.

A validation gate comprises: metric definitions, threshold values, and pass/fail logic. Gates typically sit between training and deployment, but can exist between any pipeline stages.

# validation_gate.py
from dataclasses import dataclass
from typing import Optional

@dataclass
class ValidationResult:
    passed: bool
    metrics: dict
    message: Optional[str] = None

def validate_model(model_name: str, version: int, thresholds: dict) -> ValidationResult:
    """Run validation checks against model registry."""
    from mlflow.tracking import MlflowClient
    
    client = MlflowClient()
    mv = client.get_model_version(model_name, version)
    run = client.get_run(mv.run_id)
    
    metrics = {m.key: m.value for m in run.data.metrics}
    failed_checks = []
    
    for metric, threshold in thresholds.items():
        if metric not in metrics:
            failed_checks.append(f"Missing metric: {metric}")
            continue
            
        actual = metrics[metric]
        if actual < threshold:
            failed_checks.append(f"{metric}={actual:.4f} < {threshold}")
    
    return ValidationResult(
        passed=len(failed_checks) == 0,
        metrics=metrics,
        message="; ".join(failed_checks) if failed_checks else "All checks passed"
    )

# Define thresholds
thresholds = {
    "accuracy": 0.92,
    "precision": 0.88,
    "recall": 0.85,
    "latency_p99_ms": 100,
}

result = validate_model("spam-classifier", 3, thresholds)
print(f"Validation: {'PASSED' if result.passed else 'FAILED'}")
print(result.message)

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Implement a validation gate for one of your models. Define at least three metrics with thresholds. Run the gate against your current production model to confirm it passes (or identify failures). Adjust thresholds based on current model performance.