Shadow Deployment

Shadow Deployment (also called dark launch or shadow mode) runs a candidate model in production alongside the current model, feeding it identical input traffic but discarding its predictions — they are logged and compared against the production model's outputs without affecting users. This enables realistic load testing and quality validation before risking user-facing traffic: a shadowed model processes 100% of production volume, revealing real-world latency characteristics (cold starts, GPU memory pressure) and prediction quality on live data distribution — where holdout test sets often underestimate error by 30-50% due to covariate shift.

Shadow deployment runs a new model in production WITHOUT exposing its outputs to users — it receives the same inputs as production, generates outputs, but results are logged for analysis, not returned. This lets you evaluate a new model against real traffic risk-free.

Shadow deployment workflow: (1) deploy new model alongside production, (2) mirror all production traffic to the shadow model, (3) new model generates outputs that are logged (not returned to users), (4) analyze: compare shadow outputs vs production outputs — is quality better? worse? different?, (5) evaluate: compute metrics (accuracy, latency, toxicity) on real-traffic distribution, (6) decision: after sufficient analysis (1 week of traffic), decide to promote, iterate, or discard, (7) cost: shadow deployment doubles inference cost during evaluation — budget accordingly.

Reviewed by Fredoline Eruo. See our editorial policy.

When it doesn't work

Practical example

Workflow example