RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Production Local AI Deployment
  6. /Ch. 10
Production Local AI Deployment

10. Services and Ingress

Chapter 10 of 24 · 20 min
KEY INSIGHT

Service and Ingress resources create stable network abstractions over transient pods, enabling service discovery without coupling to pod lifecycle details.

Kubernetes Services provide stable network endpoints for transient pods. The service abstraction decouples consumers from pod IP addresses that change during rescheduling, scaling, and rolling updates. Service types determine network exposure scope.

ClusterIP services receive internal-only IP addresses within the cluster. Internal communication patterns use ClusterIP services as the stable target for deployments that never require external access.

apiVersion: v1
kind: Service
metadata:
  name: inference-service
  namespace: ai-inference
spec:
  type: ClusterIP
  selector:
    app: inference-server
  ports:
    - name: http
      port: 80
      targetPort: 8080
      protocol: TCP
    - name: metrics
      port: 9090
      targetPort: 9090
      protocol: TCP

NodePort services expose ports on every node's IP address. External traffic reaches the service through nodeIP:nodePort. NodePort ranges default to 30000-32767. The pattern suits development and edge deployments without load balancer infrastructure.

LoadBalancer services integrate with cloud provider control planes to provision external load balancers. On-premises deployments require MetalLB or similar bare-metal load balancer implementations to provide LoadBalancer functionality.

Ingress resources manage HTTP/HTTPS routing at the application layer. Ingress controllers implement the routing rules and terminate TLS connections. The ingress pattern enables host-based and path-based routing to multiple backend services.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: inference-ingress
  namespace: ai-inference
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - inference.example.com
      secretName: inference-tls-cert
  rules:
    - host: inference.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: inference-service
                port:
                  number: 80

Headless services with clusterIP: None return pod IPs directly through DNS. Consumer applications performing their own load balancing use headless services for direct pod discovery. The pattern supports inference clients implementing custom connection pooling.

EndpointSlice objects track pod IPs behind services at scale. Controller-managed EndpointSlices group endpoints for efficient watch operations. Services referencing 100+ pods automatically create multiple EndpointSlices.

EXERCISE

Configure service networking for an AI inference stack with multiple components. Create a ClusterIP service for the inference deployment, a NodePort service for external development access, and an Ingress resource with TLS termination routing to the internal service. Verify connectivity through each layer.

# Create ClusterIP service
kubectl expose deployment inference-server \
  --name=inference-svc \
  --type=ClusterIP \
  --port=80 \
  --target-port=8080

# Create NodePort for development access
kubectl expose deployment inference-server \
  --name=inference-svc-dev \
  --type=NodePort \
  --port=80 \
  --target-port=8080 \
  --node-port=30080

# Create Ingress with TLS
kubectl create tls inference-tls \
  --cert=inference.crt \
  --key=inference.key \
  --namespace=ai-inference

kubectl apply -f ingress.yaml

# Verify all networking layers
kubectl get svc,ingress
kubectl describe ingress inference-ingress
curl -k https://localhost/health
← Chapter 9
Kubernetes Deployments
Chapter 11 →
ConfigMaps and Secrets