Model Versioning — MLOps for Local AI (Chapter 6)

Model versioning goes beyond the registry's version numbers. True versioning captures the complete lineage: which data, code, and configuration produced each model. This enables root cause analysis when models misbehave and ensures reproducibility when regulatory requirements demand it.

Git captures code version. DVC (Data Version Control) extends this to data. Together, they track the complete experiment context.

# Initialize DVC
dvc init

# Track a dataset
dvc add data/training-data.csv
git add data/training-data.csv.dvc .gitignore
git commit -m "Add training dataset v2.1"

# Create a reproducible training pipeline
dvc run -n train \
    -d data/training-data.csv \
    -d src/train.py \
    -o models/v1/model.pkl \
    python src/train.py --data data/training-data.csv

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.