RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to set up CI/CD for AI model deployment
HOW-TO · SUP

How to set up CI/CD for AI model deployment

advanced·35 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

GitHub/GitLab repo, model artifacts, CI runner

What this does

A CI/CD pipeline for AI models automates quality gates, containerization, and deployment to reduce manual errors and accelerate release cycles. It tests model accuracy and latency, builds Docker images, integrates with a model registry, and provides automated rollback on failure.

Steps

Create a tests/ directory with test files that evaluate model performance before deployment. Write a test that loads the candidate model from the registry, runs it against a held-out validation dataset, and asserts that accuracy meets or exceeds 0.85 and p95 latency stays under 200 ms. Store test datasets as sample JSON files in tests/fixtures/ and use pytest as the test runner, exiting with code 1 if any assertion fails. Store baseline metrics in a model_baseline.json file in the repository root.

In the repository root, create .github/workflows/deploy.yml defining three jobs: test-quality, build-image, and deploy. The test-quality job runs pytest tests/ and publishes results as artifacts. The build-image job depends on test-quality completing successfully and executes docker build using a multi-stage Dockerfile that copies the model file and exposes port 8000. The deploy job triggers only on pushes to the main branch.

Add a download_model step in the pipeline before docker build that uses the registry's CLI to fetch the model artifact tagged with the current commit SHA or a version number. Store the downloaded file in the Docker build context so the container image includes the specific model version. Tag the Docker image as registry.example.com/model:{git_sha} and push it to the container registry. After deployment, write the current deployment commit SHA and image tag to a deploy_state.json file in shared storage. On the next pipeline run, if the quality gate fails, execute a rollback step that reads deploy_state.json, pulls the previous image, and redeploys it. Configure the pipeline to halt and emit a failure status when more than 2 rollbacks occur within 1 hour.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

  • Confirm the local starting state. Print the active binary, package version, model name, or configuration path before changing the workflow.

  • Run the smallest complete path. Execute the minimum command or script that proves the guide works end to end on the local machine.

  • Compare against expected output. Check the final line, status code, generated artifact, or model response against the verification section before expanding the setup.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

Push a commit to the main branch with git push origin main. Expected output: the test-quality job passes all pytest tests with artifacts including test-results.xml; the build-image job pushes a Docker image to the registry with a tag matching the commit SHA, and build logs confirm the model file was copied; the deploy job updates the deployment and deploy_state.json reflects the new commit SHA. For rollback simulation, artificially fail the test-quality stage and confirm the pipeline reads the previous state and redeploys the prior image within 3 minutes.

Common failures

  • Quality test flakiness: Non-deterministic model outputs cause intermittent test failures. Fix this by setting a random seed with torch.manual_seed(42) and using a fixed validation split across runs.
  • Docker build context missing model file: The Docker build fails because registry credentials are not available inside the build. Use a secret mounted in the CI runner environment and fetch credentials before the build step.
  • Registry image tag collision: The same tag is pushed repeatedly, overwriting production images. Always append the commit SHA or timestamp to the image tag.
  • Rollback reads stale state: The rollback step reads deploy_state.json from the wrong namespace or bucket, deploying the wrong image. Validate the path exists and the file is not older than 1 hour.
  • Runner timeout on large models: The CI job exceeds the runner timeout when downloading a multi-GB model artifact. Compress the model with gzip and use a cache step to avoid repeated downloads.
  • Branch condition misconfiguration: Deployments trigger on feature branches instead of main only. Ensure the deploy job includes a branch filter matching refs/heads/main.

Related guides

  • Build a cost tracking dashboard for AI usage — monitors infrastructure costs incurred by CI/CD runners building model images.
  • Set up model fallback chains (local to cloud) — deployment pipelines can trigger fallback model routing when new models roll out.
RELATED GUIDES
SUP
How to build a cost tracking dashboard for AI usage
SUP
How to Set Up Model Fallback Chains (Local to Cloud)
← All how-to guidesCourses →