What this does

This guide sets resource quotas and limit ranges for Kubernetes namespaces running AI workloads to enforce fair sharing and prevent resource exhaustion. Resource Quotas cap the total CPU, memory, GPU, and persistent volume claims within a namespace. Limit Ranges set default resource requests/limits for containers that omit them, preventing unbounded consumption. Together they ensure AI inference, training, and agent workloads coexist without starving each other.

Steps

Create dedicated namespaces if not already present:

kubectl create namespace ai-inference
kubectl create namespace ai-training
kubectl label namespace ai-inference team=ml-platform
kubectl label namespace ai-training team=ml-platform

Define a Resource Quota for the inference namespace. Cap total resources including GPUs:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: inference-quota
  namespace: ai-inference
spec:
  hard:
    requests.cpu: "32"
    requests.memory: "128Gi"
    limits.cpu: "64"
    limits.memory: "256Gi"
    requests.nvidia.com/gpu: "4"
    persistentvolumeclaims: "10"
    requests.storage: "500Gi"
    count/deployments.apps: "20"
    count/services: "10"

Define a Resource Quota for the training namespace with higher GPU allocation:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: training-quota
  namespace: ai-training
spec:
  hard:
    requests.nvidia.com/gpu: "8"
    limits.cpu: "128"
    limits.memory: "512Gi"

Apply Limit Ranges to set default values for containers that omit resource specifications:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: ai-inference
spec:
  limits:
    - type: Container
      default:
        cpu: "2"
        memory: "8Gi"
      defaultRequest:
        cpu: "500m"
        memory: "2Gi"
      min:
        cpu: "100m"
        memory: "256Mi"
      max:
        cpu: "16"
        memory: "64Gi"

This ensures every container in the namespace gets at least 500m CPU and at most 16 CPU.

Configure Object Count quotas to prevent runaway resource creation:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: object-count-quota
  namespace: ai-inference
spec:
  hard:
    count/configmaps: "50"
    count/secrets: "30"
    count/jobs.batch: "20"

Apply all configurations and verify:

kubectl apply -f inference-quota.yaml -f inference-limits.yaml -f object-count-quota.yaml
kubectl get resourcequota -n ai-inference
kubectl get limitrange -n ai-inference

Check current consumption against quotas:
```
kubectl describe resourcequota inference-quota -n ai-inference
```
Expected output: a table showing Used vs Hard for each resource, e.g., requests.nvidia.com/gpu: 2/4.

Test enforcement by attempting to deploy a pod that exceeds the quota:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: test-quota
  namespace: ai-inference
spec:
  containers:
    - name: test
      image: nginx
      resources:
        requests:
          nvidia.com/gpu: "10"
EOF

Expected output: error exceeded quota: inference-quota, requested: requests.nvidia.com/gpu=10, used: requests.nvidia.com/gpu=X, limited: requests.nvidia.com/gpu=4.

Verification

kubectl get resourcequota -n ai-inference -o json | jq '.items[0].status'

Expected output: JSON showing hard and used values for all tracked resources.

Common failures

GPU quota has no effect — the NVIDIA device plugin must be installed and the nvidia.com/gpu resource must be recognised. Verify: kubectl describe node <gpu-node> | grep nvidia.com/gpu. If empty, install the device plugin.
LimitRange min/max too restrictive — inference pods requiring more than the max CPU or memory cannot be scheduled. Review the current max pod requests with kubectl top pods -n ai-inference and set max values accordingly.
Quota prevents cluster autoscaling — resource quotas are enforced before autoscaling. If the quota is fully consumed, HPA cannot scale up. Set quotas with headroom: used / hard <= 0.7 for all metrics.

How to configure resource quotas and limit ranges for AI namespaces

What this does

Steps

Verification

Common failures

Related guides