11. Combined Compression

Chapter 11 of 18 · 20 min
EXERCISE

Apply pruning (50% unstructured) followed by int8 quantization on a small transformer model. Compare accuracy against pruning-only and quantization-only baselines.