计算机科学
水准点(测量)
机器学习
人工智能
人工神经网络
计算
子网
蒸馏
排名(信息检索)
数据挖掘
特征提取
算法
有机化学
化学
计算机安全
地理
大地测量学
作者
Zhihua Chen,Guhao Qiu,Ping Li,Lei Zhu,Xiaokang Yang,Bin Sheng
标识
DOI:10.1109/tpami.2023.3293885
摘要
Recently neural architecture (NAS) search has attracted great interest in academia and industry. It remains a challenging problem due to the huge search space and computational costs. Recent studies in NAS mainly focused on the usage of weight sharing to train a SuperNet once. However, the corresponding branch of each subnetwork is not guaranteed to be fully trained. It may not only incur huge computation costs but also affect the architecture ranking in the retraining procedure. We propose a multi-teacher-guided NAS, which proposes to use the adaptive ensemble and perturbation-aware knowledge distillation algorithm in the one-shot-based NAS algorithm. The optimization method aiming to find the optimal descent directions is used to obtain adaptive coefficients for the feature maps of the combined teacher model. Besides, we propose a specific knowledge distillation process for optimal architectures and perturbed ones in each searching process to learn better feature maps for later distillation procedures. Comprehensive experiments verify our approach is flexible and effective. We show improvement in precision and search efficiency in the standard recognition dataset. We also show improvement in correlation between the accuracy of the search algorithm and true accuracy by NAS benchmark datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI