蒸馏
计算机科学
修剪
过程(计算)
变量(数学)
断层(地质)
人工智能
机器学习
工艺工程
化学
数学
色谱法
工程类
数学分析
地震学
农学
生物
地质学
操作系统
作者
Ze Cui,Q I Yang,Zixiang Xiong,Rongyang Gu
标识
DOI:10.1088/1361-6501/ada6f3
摘要
Abstract In recent years, deep learning models have been extensively researched and applied in fault diagnosis. However, they often require substantial storage resources, posing challenges for deployment on embedded devices. A prevalent solution to this is leveraging knowledge distillation (KD) between teacher-student models. Through the distillation process, the student model can acquire knowledge from the teacher model without introducing additional parameters, thereby enhancing its performance. Nevertheless, when utilizing a powerful teacher model, the distillation performance is not always optimal. This is attributed to the teacher model's significantly higher complexity compared to the student model, potentially leading to a diminished simulation effect by the student model. To address this issue, the Variable-Temperature Gradient TOP-K Knowledge Distillation (VTGTK-KD) method is proposed, which employs multiple pruned, medium-sized teacher models to facilitate a gradual distillation learning process. Furthermore, these models share the same architecture, fostering better knowledge transfer conditions at the logical layer. To further elevate distillation performance, Variable-Temperature distillation (VT) is introduced to ensure a balance between distillation speed and accuracy. Additionally, the Gradient TOP-K algorithm is utilized to eliminate erroneous knowledge from the teacher network. Ultimately, classification experiments were conducted on two bearing datasets. The experimental results demonstrate that the proposed VTGTK-KD method enhances distillation performance, surpassing other advanced KD approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI