RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to set up model versioning and rollback
HOW-TO · SUP

How to set up model versioning and rollback

intermediate·20 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Model artifacts storage, deployment pipeline

What this does

Setting up model versioning and rollback creates a safety net for AI model deployments. Each model version (quantized weights, LoRA adapters, or full fine-tunes) is stored as an immutable artifact with a unique version identifier. A deployment registry points to the active version for each environment. When a new model version is deployed and proves problematic—degraded output quality, increased latency, or errors—the rollback mechanism reverts the active pointer to the previous known-good version within seconds, without rebuilding containers or restarting infrastructure.

Steps

Define the version naming convention. Use semantic versioning with a model identifier: llama3-instruct-v2.1.0-q4km. The format is {model_name}-v{major}.{minor}.{patch}-{quantization}. Store artifacts in a structured path: s3://models/{model_name}/v{major}.{minor}.{patch}/{filename}.gguf. For each new version, upload the model file and a metadata JSON: {"version": "2.1.0", "base_model": "llama3-instruct", "quantization": "q4km", "benchmark_score": 0.87, "created_at": "2026-05-29T10:00:00Z", "parent_version": "2.0.0"}. Create a deployment registry: a simple JSON file or database table that maps environments to active versions. { "production": {"model": "llama3-instruct", "version": "2.0.0"}, "staging": {"model": "llama3-instruct", "version": "2.1.0"} }. On the inference server, implement a model loader that reads the active version from the registry and downloads or loads the corresponding artifact. For a Kubernetes deployment, store the active version in a ConfigMap: kubectl create configmap model-registry --from-literal=active_version=v2.0.0. The inference pod reads this on startup and pulls the correct model. Implement the deployment script: def deploy(model_name, version, env): download_model(model_name, version); validate_model(model_name, version); run_smoke_tests(model_name, version); update_registry(env, model_name, version);. The validate_model step runs a set of benchmark prompts and compares metrics (perplexity, latency, accuracy on a test set) against the previous version. If metrics degrade by more than 5%, abort deployment. Implement rollback: def rollback(env): history = get_deployment_history(env); previous = history[-2]; update_registry(env, previous.model, previous.version); reload_model_server(). The rollback must complete in under 30 seconds. Maintain a deployment history table: CREATE TABLE deployments (id SERIAL, env TEXT, model_name TEXT, version TEXT, deployed_at TIMESTAMP, rolled_back BOOLEAN DEFAULT FALSE, metrics JSONB). Store performance metrics with each deployment for trend analysis.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

  • Confirm the local starting state. Print the active binary, package version, model name, or configuration path before changing the workflow.

  • Run the smallest complete path. Execute the minimum command or script that proves the guide works end to end on the local machine.

  • Compare against expected output. Check the final line, status code, generated artifact, or model response against the verification section before expanding the setup.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

Deploy v2.0.0 and verify the inference server loads it: check the model info endpoint returns the correct version. Deploy v2.1.0 and verify the server switches to the new version. Run rollback to v2.0.0 and verify the server reverts within 30 seconds by checking the model info endpoint again. Test the validation gate: create a deliberately bad model that fails benchmark thresholds and verify deployment is aborted. Check the deployment history table for correct records of all deploy and rollback events.

Common failures

Rollback references deleted artifact: Never delete old model artifacts from storage—only mark them as superseded. ConfigMap update not picked up by running pods: The inference pod must re-read the ConfigMap on each health check or use a file watcher; for Kubernetes, trigger a rolling restart with kubectl rollout restart deployment/inference. Concurrent deployments corrupt registry state: Use optimistic locking or a database transaction with SELECT ... FOR UPDATE when updating the active version. Model download fails mid-deployment leaving server in broken state: Download to a staging path first (/models/staging/), then atomically move or symlink to the active path. Performance metrics not comparable between quantizations: Normalize metrics by quantization level or compare only within the same quantization family.

  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • deploy-production-pd-hyperspace
  • implement-ab-testing-model-responses
  • setup-auto-scaling-llm-inference
← All how-to guidesCourses →