What this does

This guide creates a Kubernetes operator using kubebuilder that automates the lifecycle of AI model deployments. The operator watches a custom resource ModelDeployment and reconciles the cluster state: when a new model is requested, the operator downloads weights, creates a PVC, deploys an inference server, and configures a Service. When the model is deprecated, the operator drains traffic and decommissions resources. This pattern eliminates manual toil for MLOps teams managing dozens of models.

Steps

Scaffold the operator project:

mkdir model-operator && cd model-operator
kubebuilder init --domain ai.example.com --repo github.com/example/model-operator

Create the ModelDeployment API:

kubebuilder create api --group ai --version v1alpha1 --kind ModelDeployment --resource --controller

Define the ModelDeployment spec in api/v1alpha1/modeldeployment_types.go:

type ModelDeploymentSpec struct {
    ModelName    string `json:"modelName"`
    ModelSource  string `json:"modelSource"`  // HuggingFace repo or S3 URI
    GPUs         int    `json:"gpus,omitempty"`
    Replicas     int    `json:"replicas,omitempty"`
    AutoScaling  bool   `json:"autoScaling,omitempty"`
}
type ModelDeploymentStatus struct {
    Phase      string `json:"phase"` // Pending, Downloading, Deploying, Running, Failed
    Endpoint   string `json:"endpoint,omitempty"`
    ReadyPods  int    `json:"readyPods"`
}

Run code generation:
```
make generate && make manifests
```

Implement the controller reconciliation logic in internal/controller/modeldeployment_controller.go. The reconcile loop handles four phases:

func (r *ModelDeploymentReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    var md aiv1alpha1.ModelDeployment
    if err := r.Get(ctx, req.NamespacedName, &md); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }
    switch md.Status.Phase {
    case "":
        md.Status.Phase = "Downloading"
        r.Status().Update(ctx, &md)
    case "Downloading":
        if err := r.downloadWeights(ctx, &md); err != nil {
            return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
        }
        md.Status.Phase = "Deploying"
        r.Status().Update(ctx, &md)
    case "Deploying":
        if err := r.createDeployment(ctx, &md); err != nil {
            return ctrl.Result{RequeueAfter: 10 * time.Second}, nil
        }
        md.Status.Phase = "Running"
        r.Status().Update(ctx, &md)
    case "Running":
        r.updateStatus(ctx, &md)
    }
    return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}

Implement the createDeployment method to generate a Kubernetes Deployment with GPU affinity and the appropriate vLLM args. Use controller-runtime's CreateOrUpdate to handle idempotent resource management.

Build and push the operator image:

make docker-build docker-push IMG=registry.example.com/model-operator:v0.1.0

Deploy the operator to the cluster:

make deploy IMG=registry.example.com/model-operator:v0.1.0
kubectl get pods -n model-operator-system

Create a ModelDeployment custom resource to test:

apiVersion: ai.example.com/v1alpha1
kind: ModelDeployment
metadata:
  name: llama-3-8b
spec:
  modelName: "Llama-3-8B"
  modelSource: "huggingface://meta-llama/Meta-Llama-3-8B-Instruct"
  gpus: 1
  replicas: 1

Apply: kubectl apply -f model.yaml.

Watch the operator reconcile:
```
kubectl get modeldeployment llama-3-8b -w
```
Expected output: Phase transitions from Pending to Downloading to Deploying to Running.

Verification

kubectl get modeldeployment llama-3-8b -o json | jq '.status.phase'

Expected output: "Running".

Common failures

Controller crash-loops on API version mismatch — ensure the CRD YAML is regenerated after schema changes: make manifests && make install.
Model download fails silently — the weights downloader container needs network access to HuggingFace or S3. Check operator logs: kubectl logs -n model-operator-system deployment/controller-manager.
Phase stuck at "Deploying" — the generated Deployment may fail scheduling due to GPU unavailability. Check the Deployment status: kubectl describe deployment -l modeldeployment=llama-3-8b.

How to create a Kubernetes operator for managing AI model deployments

What this does

Steps

Verification

Common failures

Related guides