KEY INSIGHT
GitOps extends version control practices to operational resources, including deployed models. Model artifacts become the source of truth, with deployment state automated from committed changes. Every model promotion or rollback traces to a git commit.
### GitOps Principles for ML
GitOps for models means three things: First, the model artifact is declared in code (which model version serves which environment). Second, the declared state is the actual deployed state—no manual overrides. Third, all changes flow through git (track who changed what and when).
In practice, this means your model registry is not a standalone system—it's an extension of your git workflow. Promotions are commits. Rollbacks are commits. The git history is your audit log.
### Implementation Pattern
```yaml
# YAML: GitOps model deployment manifest
# File: model-deployments.yaml (stored in git)
apiVersion: mlops/v1
kind: ModelDeployment
metadata:
name: customer-segmentation
namespace: production
spec:
model:
registry: s3://ml-models/company/
name: customer-segmentation
version: v2.4.1 # Reference to registered model version
environment: production
serving:
replicas: 3
resources:
cpu: "2"
memory: 4Gi
inference_endpoint: /v1/predict/segment
drift_config:
monitor_interval_hours: 6
alert_threshold: 0.12
auto_rollback_threshold: 0.18
reference_window_size: 1000
promotion_criteria:
shadow_test_passed: true
evaluation_metrics:
silhouette_score: ">0.45"
inference_latency_p99: "<100ms"
approval_required: true
approvers:
- ml-opsリード
- ビジネスユニットLG
# A/B routing configuration
apiVersion: mlops/v1
kind: ModelRouting
metadata:
name: customer-segmentation-ab
namespace: production
spec:
routes:
- weight: 80
model_version: v2.4.1
name: primary
- weight: 20
model_version: v2.5.0-rc
name: candidate
traffic_splitting:
method: header # Route by X-Model-Experiment header
header_name: X-Deployment-Context
```
```python
# Python: GitOps model promotion workflow
import subprocess
from dataclasses import dataclass
from pathlib import Path
@dataclass
class ModelPromotion:
"""Represents a model promotion through environments."""
model_name: str
from_version: str
to_version: str
from_environment: str
to_environment: str
commit_sha: str
approvers: list[str]
def execute(self) -> dict:
"""Execute promotion as git commit."""
# Update deployment manifest
manifest_path = Path(f"deployments/{self.model_name}.yaml")
# Read, modify, write
content = manifest_path.read_text()
content = content.replace(
f"version: {self.from_version}",
f"version: {self.to_version}"
)
manifest_path.write_text(content)
# Stage and commit
subprocess.run(["git", "add", str(manifest_path)], check=True)
commit_message = f"""
Promotion: {self.model_name} {self.from_version} -> {self.to_version}
From: {self.from_environment}
To: {self.to_environment}
Approved by: {', '.join(self.approvers)}
Promotion ID: {self.commit_sha}
""".strip()
result = subprocess.run(
["git", "commit", "-m", commit_message],
capture_output=True,
text=True
)
# In GitOps: this commit triggers deployment reconciliation
# ArgoCD, Flux, or equivalent watches git and reconciles cluster state
return {
"commit": result.stdout,
"manifest_updated": True,
"reconciliation_triggered": True
}
def rollback(self) -> dict:
"""Rollback to previous version as git revert."""
result = subprocess.run(
["git", "revert", "HEAD"],
capture_output=True,
text=True
)
return {"reverted": True, "commit": result.stdout}
# Promotion workflow with approval gates
def promote_model_with_approval(promotion: ModelPromotion) -> bool:
"""
Execute promotion workflow with required approval gates.
"""
# 1. Verify evaluation passed
evaluation_passed = verify_evaluation_criteria(
promotion.model_name,
promotion.to_version
)
if not evaluation_passed:
raise ValueError("Evaluation criteria not met")
# 2. Create promotion proposal
proposal = create_promotion_proposal(promotion)
# 3. Collect required approvals
required_approvers = promotion.approvers
collected = collect_approvals(proposal, required_approvers)
# 4. Execute promotion (commit to git)
if collected.all_received:
result = promotion.execute()
return result["commit"]
else:
raise PermissionError("Required approvals not collected")
```
### Repository Structure
Effective GitOps for ML requires deliberate repository structure:
```bash
# bash: GitOps model repository structure
mkdir -p model-repo/{models,deployments,configs,workflows,evaluate}
# models/: Versioned model artifacts (large files via git-lfs)
# ├── customer-segmentation/
# │ ├── v2.4.1/
# │ │ ├── model artifacts
# │ │ └── metadata.json
# └── text-classifier/
# └── v1.0.0/
# deployments/: Environment-specific declarations
# ├── production/
# │ ├── customer-segmentation.yaml
# │ └── text-classifier.yaml
# └── staging/
# └── customer-segmentation.yaml
# configs/: Hyperparameter and pipeline configurations
# ├── hyperparameters/
# │ ├── customer-segmentation/base.yaml
# │ └── customer-segmentification/production.yaml
# └── evaluation/
# └── metrics.yaml
# workflows/: Pipeline definitions
#workflows/
# ├── training.yaml
# └── evaluation.yaml
# evaluate/: Evaluation datasets and test cases
# evaluate/
# ├── datasets/
# └── test_cases/
```
### Automated Reconciliation
GitOps requires an reconciliation operator that continuously compares git-declared state with actual cluster state. When divergence is detected, the operator corrects production to match git.
For local AI deployments, this might mean reconciliation between your git repository and model serving endpoints on edge devices. Implement your own lightweight reconciliation loop or adapt existing GitOps tools for ML-specific resources.
---