I delivered a talk in the Algebraic Geometry and Machine Learning Session, geared towards understanding distillation and exploring the possibility of structured distillation algorithms based on our NeurIPS paper “Should Under-parameterized Student Networks Copy or Average Teacher Weights?”