What this does

A CI/CD pipeline for AI models automates quality gates, containerization, and deployment to reduce manual errors and accelerate release cycles. It tests model accuracy and latency, builds Docker images, integrates with a model registry, and provides automated rollback on failure.

Steps

Create a tests/ directory with test files that evaluate model performance before deployment. Write a test that loads the candidate model from the registry, runs it against a held-out validation dataset, and asserts that accuracy meets or exceeds 0.85 and p95 latency stays under 200 ms. Store test datasets as sample JSON files in tests/fixtures/ and use pytest as the test runner, exiting with code 1 if any assertion fails. Store baseline metrics in a model_baseline.json file in the repository root.

In the repository root, create .github/workflows/deploy.yml defining three jobs: test-quality, build-image, and deploy. The test-quality job runs pytest tests/ and publishes results as artifacts. The build-image job depends on test-quality completing successfully and executes docker build using a multi-stage Dockerfile that copies the model file and exposes port 8000. The deploy job triggers only on pushes to the main branch.

Add a download_model step in the pipeline before docker build that uses the registry's CLI to fetch the model artifact tagged with the current commit SHA or a version number. Store the downloaded file in the Docker build context so the container image includes the specific model version. Tag the Docker image as registry.example.com/model:{git_sha} and push it to the container registry. After deployment, write the current deployment commit SHA and image tag to a deploy_state.json file in shared storage. On the next pipeline run, if the quality gate fails, execute a rollback step that reads deploy_state.json, pulls the previous image, and redeploys it. Configure the pipeline to halt and emit a failure status when more than 2 rollbacks occur within 1 hour.

Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.
Confirm the local starting state. Print the active binary, package version, model name, or configuration path before changing the workflow.
Run the smallest complete path. Execute the minimum command or script that proves the guide works end to end on the local machine.
Compare against expected output. Check the final line, status code, generated artifact, or model response against the verification section before expanding the setup.
Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

Push a commit to the main branch with git push origin main. Expected output: the test-quality job passes all pytest tests with artifacts including test-results.xml; the build-image job pushes a Docker image to the registry with a tag matching the commit SHA, and build logs confirm the model file was copied; the deploy job updates the deployment and deploy_state.json reflects the new commit SHA. For rollback simulation, artificially fail the test-quality stage and confirm the pipeline reads the previous state and redeploys the prior image within 3 minutes.

Common failures

Quality test flakiness: Non-deterministic model outputs cause intermittent test failures. Fix this by setting a random seed with torch.manual_seed(42) and using a fixed validation split across runs.
Docker build context missing model file: The Docker build fails because registry credentials are not available inside the build. Use a secret mounted in the CI runner environment and fetch credentials before the build step.
Registry image tag collision: The same tag is pushed repeatedly, overwriting production images. Always append the commit SHA or timestamp to the image tag.
Rollback reads stale state: The rollback step reads deploy_state.json from the wrong namespace or bucket, deploying the wrong image. Validate the path exists and the file is not older than 1 hour.
Runner timeout on large models: The CI job exceeds the runner timeout when downloading a multi-GB model artifact. Compress the model with gzip and use a cache step to avoid repeated downloads.
Branch condition misconfiguration: Deployments trigger on feature branches instead of main only. Ensure the deploy job includes a branch filter matching refs/heads/main.

Related guides

Build a cost tracking dashboard for AI usage — monitors infrastructure costs incurred by CI/CD runners building model images.
Set up model fallback chains (local to cloud) — deployment pipelines can trigger fallback model routing when new models roll out.