09. Distillation at Scale
Chapter 9 of 18 · 15 min
EXERCISE
Compare the training time and final student accuracy between online distillation (teacher runs every step), cached distillation (pre-computed teacher outputs), and mixed distillation (periodic teacher refresh). Identify the conditions where each approach works best.