RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /MLOps for Local AI
  6. /Ch. 6
MLOps for Local AI

06. Model Versioning

Chapter 6 of 24 · 20 min
KEY INSIGHT

Every model artifact should be traceable to its inputs. Without this, you're deploying artifacts with unknown provenance—a compliance and operational risk. MLflow captures run metadata automatically, but you can enrich it with external version references: ```python import mlflow import git repo = git.Repo(search_parent_dirs=True) commit_hash = repo.head.commit.hexsha with mlflow.start_run(): mlflow.log_param("git_commit", commit_hash) mlflow.log_param("git_branch", repo.active_branch.name) # Also log data version if using DVC mlflow.log_param("data_version", "v2.1") # ... training code ... ``` Comparing versions requires consistent metrics: ```python def compare_versions(client, model_name, versions): """Compare metrics across model versions.""" results = [] for v in versions: run_id = client.get_model_version(model_name, v).run_id run = client.get_run(run_id) metrics = {k: v.value for k, v in run.data.metrics.items()} results.append({"version": v, "metrics": metrics}) return results comparison = compare_versions(client, "spam-classifier", [1, 2, 3]) ```

Model versioning goes beyond the registry's version numbers. True versioning captures the complete lineage: which data, code, and configuration produced each model. This enables root cause analysis when models misbehave and ensures reproducibility when regulatory requirements demand it.

Git captures code version. DVC (Data Version Control) extends this to data. Together, they track the complete experiment context.

# Initialize DVC
dvc init

# Track a dataset
dvc add data/training-data.csv
git add data/training-data.csv.dvc .gitignore
git commit -m "Add training dataset v2.1"

# Create a reproducible training pipeline
dvc run -n train \
    -d data/training-data.csv \
    -d src/train.py \
    -o models/v1/model.pkl \
    python src/train.py --data data/training-data.csv

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Initialize a git repository for an ML project. Commit your training script. Run training and record the git commit hash in MLflow. Modify the script, commit, and train again. Use MLflow UI to compare both runs and verify lineage is clear.

← Chapter 5
Model Registry
Chapter 7 →
Pipeline Orchestration