20. Production Deployment
Deploying multi-agent systems to production requires infrastructure that supports stateful agents, manages deployment versioning, and provides operational guarantees that development environments lack.
Deployment Architecture
Production deployments separate concerns: orchestration layer handles workflow management, agent pool layer handles execution, infrastructure layer provides compute and networking. This separation enables independent scaling and failure isolation.
# deployment/orchestrator.py
from dataclasses import dataclass
from typing import Optional
import hashlib
@dataclass
class DeploymentConfig:
version: str
agent_definitions: dict[str, dict]
resource_limits: dict[str, dict]
scaling_policy: dict[str, int]
environment: str
class DeploymentManager:
def __init__(self, container_registry: str, config_store: any):
self.container_registry = container_registry
self.config_store = config_store
async def deploy(self, config: DeploymentConfig) -> str:
deployment_id = self._generate_deployment_id(config.version)
for agent_name, agent_def in config.agent_definitions.items():
await self._deploy_agent_image(
agent_name,
agent_def,
deployment_id
)
await self.config_store.save_deployment(
deployment_id,
{
"version": config.version,
"agents": config.agent_definitions,
"limits": config.resource_limits,
"scaling": config.scaling_policy
}
)
await self._update_load_balancer(deployment_id, config)
return deployment_id
def _generate_deployment_id(self, version: str) -> str:
return hashlib.sha256(
f"{version}_{__import__('time').time()}".encode()
).hexdigest()[:16]
async def rollback(self, target_version: str):
config = await self.config_store.get_deployment(target_version)
await self.deploy(DeploymentConfig(
version=target_version,
agent_definitions=config["agents"],
resource_limits=config["limits"],
scaling_policy=config["scaling"]
))
Rollout Strategies
Multi-agent deployments use canary releases and blue-green deployments to minimize risk. Canary releases route small traffic percentages to new versions, monitoring error rates before full promotion.
Health Checks and Readiness
Agents expose health endpoints that orchestration systems poll. Readiness probes indicate whether an agent can accept traffic; liveness probes indicate whether an agent requires restart.
Secrets and Configuration Injection
Production secrets inject at runtime via secure channels. Configuration changes propagate through the system without agent restarts where possible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Implement a deployment manifest generator that produces Kubernetes-style deployment configurations from a simplified agent specification format, including resource limits, health checks, and scaling policies.