14. Hardware-Aware Compression

Chapter 14 of 18 · 25 min
EXERCISE

Profile inference latency for a compressed model on CPU, GPU, and NPU if available. Report which compression technique yields the best speedup on each hardware type and explain why.