RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to configure resource quotas and limit ranges for AI namespaces
HOW-TO · OPS

How to configure resource quotas and limit ranges for AI namespaces

intermediate·20 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Kubernetes cluster with namespace admin access

What this does

This guide sets resource quotas and limit ranges for Kubernetes namespaces running AI workloads to enforce fair sharing and prevent resource exhaustion. Resource Quotas cap the total CPU, memory, GPU, and persistent volume claims within a namespace. Limit Ranges set default resource requests/limits for containers that omit them, preventing unbounded consumption. Together they ensure AI inference, training, and agent workloads coexist without starving each other.

Steps

  1. Create dedicated namespaces if not already present:

    kubectl create namespace ai-inference
    kubectl create namespace ai-training
    kubectl label namespace ai-inference team=ml-platform
    kubectl label namespace ai-training team=ml-platform
    
  2. Define a Resource Quota for the inference namespace. Cap total resources including GPUs:

    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: inference-quota
      namespace: ai-inference
    spec:
      hard:
        requests.cpu: "32"
        requests.memory: "128Gi"
        limits.cpu: "64"
        limits.memory: "256Gi"
        requests.nvidia.com/gpu: "4"
        persistentvolumeclaims: "10"
        requests.storage: "500Gi"
        count/deployments.apps: "20"
        count/services: "10"
    
  3. Define a Resource Quota for the training namespace with higher GPU allocation:

    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: training-quota
      namespace: ai-training
    spec:
      hard:
        requests.nvidia.com/gpu: "8"
        limits.cpu: "128"
        limits.memory: "512Gi"
    
  4. Apply Limit Ranges to set default values for containers that omit resource specifications:

    apiVersion: v1
    kind: LimitRange
    metadata:
      name: default-limits
      namespace: ai-inference
    spec:
      limits:
        - type: Container
          default:
            cpu: "2"
            memory: "8Gi"
          defaultRequest:
            cpu: "500m"
            memory: "2Gi"
          min:
            cpu: "100m"
            memory: "256Mi"
          max:
            cpu: "16"
            memory: "64Gi"
    

    This ensures every container in the namespace gets at least 500m CPU and at most 16 CPU.

  5. Configure Object Count quotas to prevent runaway resource creation:

    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: object-count-quota
      namespace: ai-inference
    spec:
      hard:
        count/configmaps: "50"
        count/secrets: "30"
        count/jobs.batch: "20"
    
  6. Apply all configurations and verify:

    kubectl apply -f inference-quota.yaml -f inference-limits.yaml -f object-count-quota.yaml
    kubectl get resourcequota -n ai-inference
    kubectl get limitrange -n ai-inference
    
  7. Check current consumption against quotas:

    kubectl describe resourcequota inference-quota -n ai-inference
    

    Expected output: a table showing Used vs Hard for each resource, e.g., requests.nvidia.com/gpu: 2/4.

  8. Test enforcement by attempting to deploy a pod that exceeds the quota:

    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-quota
      namespace: ai-inference
    spec:
      containers:
        - name: test
          image: nginx
          resources:
            requests:
              nvidia.com/gpu: "10"
    EOF
    

    Expected output: error exceeded quota: inference-quota, requested: requests.nvidia.com/gpu=10, used: requests.nvidia.com/gpu=X, limited: requests.nvidia.com/gpu=4.

Verification

kubectl get resourcequota -n ai-inference -o json | jq '.items[0].status'

Expected output: JSON showing hard and used values for all tracked resources.

Common failures

  • GPU quota has no effect — the NVIDIA device plugin must be installed and the nvidia.com/gpu resource must be recognised. Verify: kubectl describe node <gpu-node> | grep nvidia.com/gpu. If empty, install the device plugin.
  • LimitRange min/max too restrictive — inference pods requiring more than the max CPU or memory cannot be scheduled. Review the current max pod requests with kubectl top pods -n ai-inference and set max values accordingly.
  • Quota prevents cluster autoscaling — resource quotas are enforced before autoscaling. If the quota is fully consumed, HPA cannot scale up. Set quotas with headroom: used / hard <= 0.7 for all metrics.

Related guides

  • Network policies for AI service mesh isolation
  • Horizontal pod autoscaling for AI inference services
  • Deploy vLLM on Kubernetes with GPU node selection
← All how-to guidesCourses →