How to configure resource quotas and limit ranges for AI namespaces
Kubernetes cluster with namespace admin access
What this does
This guide sets resource quotas and limit ranges for Kubernetes namespaces running AI workloads to enforce fair sharing and prevent resource exhaustion. Resource Quotas cap the total CPU, memory, GPU, and persistent volume claims within a namespace. Limit Ranges set default resource requests/limits for containers that omit them, preventing unbounded consumption. Together they ensure AI inference, training, and agent workloads coexist without starving each other.
Steps
Create dedicated namespaces if not already present:
kubectl create namespace ai-inference kubectl create namespace ai-training kubectl label namespace ai-inference team=ml-platform kubectl label namespace ai-training team=ml-platformDefine a Resource Quota for the inference namespace. Cap total resources including GPUs:
apiVersion: v1 kind: ResourceQuota metadata: name: inference-quota namespace: ai-inference spec: hard: requests.cpu: "32" requests.memory: "128Gi" limits.cpu: "64" limits.memory: "256Gi" requests.nvidia.com/gpu: "4" persistentvolumeclaims: "10" requests.storage: "500Gi" count/deployments.apps: "20" count/services: "10"Define a Resource Quota for the training namespace with higher GPU allocation:
apiVersion: v1 kind: ResourceQuota metadata: name: training-quota namespace: ai-training spec: hard: requests.nvidia.com/gpu: "8" limits.cpu: "128" limits.memory: "512Gi"Apply Limit Ranges to set default values for containers that omit resource specifications:
apiVersion: v1 kind: LimitRange metadata: name: default-limits namespace: ai-inference spec: limits: - type: Container default: cpu: "2" memory: "8Gi" defaultRequest: cpu: "500m" memory: "2Gi" min: cpu: "100m" memory: "256Mi" max: cpu: "16" memory: "64Gi"This ensures every container in the namespace gets at least 500m CPU and at most 16 CPU.
Configure Object Count quotas to prevent runaway resource creation:
apiVersion: v1 kind: ResourceQuota metadata: name: object-count-quota namespace: ai-inference spec: hard: count/configmaps: "50" count/secrets: "30" count/jobs.batch: "20"Apply all configurations and verify:
kubectl apply -f inference-quota.yaml -f inference-limits.yaml -f object-count-quota.yaml kubectl get resourcequota -n ai-inference kubectl get limitrange -n ai-inferenceCheck current consumption against quotas:
kubectl describe resourcequota inference-quota -n ai-inferenceExpected output: a table showing
UsedvsHardfor each resource, e.g.,requests.nvidia.com/gpu: 2/4.Test enforcement by attempting to deploy a pod that exceeds the quota:
kubectl apply -f - <<EOF apiVersion: v1 kind: Pod metadata: name: test-quota namespace: ai-inference spec: containers: - name: test image: nginx resources: requests: nvidia.com/gpu: "10" EOFExpected output: error
exceeded quota: inference-quota, requested: requests.nvidia.com/gpu=10, used: requests.nvidia.com/gpu=X, limited: requests.nvidia.com/gpu=4.
Verification
kubectl get resourcequota -n ai-inference -o json | jq '.items[0].status'
Expected output: JSON showing hard and used values for all tracked resources.
Common failures
- GPU quota has no effect — the NVIDIA device plugin must be installed and the
nvidia.com/gpuresource must be recognised. Verify:kubectl describe node <gpu-node> | grep nvidia.com/gpu. If empty, install the device plugin. - LimitRange min/max too restrictive — inference pods requiring more than the max CPU or memory cannot be scheduled. Review the current max pod requests with
kubectl top pods -n ai-inferenceand set max values accordingly. - Quota prevents cluster autoscaling — resource quotas are enforced before autoscaling. If the quota is fully consumed, HPA cannot scale up. Set quotas with headroom:
used / hard <= 0.7for all metrics.