07. Teacher-Student Setup
Chapter 7 of 18 · 15 min
EXERCISE
Design three student architectures representing different compression levels of a BERT model. Implement parameter counting and measure the capacity of each student relative to the teacher. Identify which architecture might achieve the best accuracy-efficiency tradeoff.