I delivered a contributed talk on distillation based on our NeurIPS paper “Should Under-parameterized Student Networks Copy or Average Teacher Weights?”.