初始化
同种类的
计算机科学
分类器(UML)
功能(生物学)
蒸馏
相似性(几何)
机器学习
人工智能
数学教育
数学
图像(数学)
色谱法
生物
进化生物学
组合数学
化学
程序设计语言
作者
Quanzheng Xu,Liyu Liu,Bing Ji
标识
DOI:10.1016/j.ins.2022.05.117
摘要
Knowledge distillation (KD) transfers knowledge from a heavy teacher network to a lightweight student network while maintaining the student’s performance closely to that of the teacher. However, the large gap between the teacher and the student in terms of capacity is not conducive to KD. Consequently, a large teacher network is not necessarily the most suitable teacher to guide the student. Therefore, this study proposes a multiple homogeneous teacher-guided KD method. First, multiple networks with the same structure as that of the student are pretrained to act as a teacher group, which is different from utilizing a large teacher network in traditional KD, to alleviate the capacity gap between the teacher and student. Second, a confidence-adaptive initialization strategy is developed to initialize the student network, which learns knowledge from the pretrained teacher group. Experiments are performed on CIFAR10, CIFAR100, and Tiny-ImageNet using three networks. The experimental results demonstrate that the proposed KD method outperforms existing advanced KD methods. Furthermore, a similarity loss function is introduced to optimize the parameters of the classifier in the student network. The experimental results indicate that this loss function improves the performance for basic classification tasks without KD and efficiently works in the proposed KD method.
科研通智能强力驱动
Strongly Powered by AbleSci AI