文献收藏与分享平台

尹志 (2022-07-30 22:41):

#paper https://doi.org/10.48550/arXiv.2205.01529 Masked Generative Distillation ECCV 2022. 这是一篇知识蒸馏的文章，通过类似对比学习的方式去生成特征，从而实现蒸馏。我们知道，知识蒸馏作为一个通用的技巧，已经被用于各类机器学习任务，在视觉上比如分类、分割、检测等。一般来说蒸馏算法通过使得学生模仿老师特征去提高学生特征的表征能力。但这篇文章提出，学生不用去模仿老师的特征了，干脆自己生成特征好了，即通过对学生特征进行随机遮盖，然后用学生的部分特征去生成老师特征。这样学生特征就具有了较强的表征能力。这个想法很有意思，我打个比方（可能不太合适），就像本来是要学习老师的一举一动，但是现在这个老师不太出现，你不方便直接模仿，那就学生自己通过监督，去盲猜老师的特征什么样的，这样多猜几次，每次都能猜准的时候，说明对这位老师已经很熟悉了，然后说明学生的表征能力就比较强了。通过这个方式，作者在图像分类、目标检测、语义分割、实例分割等多种任务上，在不同数据集不同model的基础上，做了大量实验，发现性能都得到了提升（基本上都有2-3个点的提升，具体数值见文献）。

arXiv, 2022. DOI: 10.48550/arXiv.2205.01529

Masked Generative Distillation

翻译

Zhendong Yang, Zhe Li, Mingqi Shao, Dachuan Shi, Zehuan Yuan, Chun Yuan

Abstract:

Knowledge distillation has been applied to various tasks successfully. The current distillation algorithm usually improves students' performance by imitating the output of the teacher. This paper shows that teachers can also improve students' representation power by guiding students' feature recovery. From this point of view, we propose Masked Generative Distillation (MGD), which is simple: we mask random pixels of the student's feature and force it to generate the teacher's full feature through a simple block. MGD is a truly general feature-based distillation method, which can be utilized on various tasks, including image classification, object detection, semantic segmentation and instance segmentation. We experiment on different models with extensive datasets and the results show that all the students achieve excellent improvements. Notably, we boost ResNet-18 from 69.90% to 71.69% ImageNet top-1 accuracy, RetinaNet with ResNet-50 backbone from 37.4 to 41.0 Boundingbox mAP, SOLO based on ResNet-50 from 33.1 to 36.2 Mask mAP and DeepLabV3 based on ResNet-18 from 73.20 to 76.02 mIoU. Our codes are available at this https URL.

翻译