后门
可解释性
计算机科学
深层神经网络
人工神经网络
修剪
人工智能
机器学习
计算机安全
生物
农学
作者
Wei Jiang,Xiangyu Wen,Jinyu Zhan,Xupeng Wang,Zi-Wei Song
标识
DOI:10.1109/tcad.2021.3111123
摘要
As an emerging threat to deep neural networks (DNNs), backdoor attacks have received increasing attentions due to the challenges posed by the lack of transparency inherent in DNNs. In this article, we develop an efficient algorithm from the interpretability of DNNs to defend against backdoor attacks to DNN models. To extract critical neurons, we deploy sets of control gates following neurons in layers, and the function of a DNN model can be interpreted as semantic sensitivities of neurons to input samples. A backdoor identification approach, derived from the activation frequency distribution on critical neurons, is proposed to reveal anomalies of particular neurons produced by backdoor attacks. Subsequently, a feasible and fine-grained pruning strategy is introduced to eliminate backdoors hidden in DNN models, without the need of retraining. Extensive experiments demonstrate that the proposed algorithm can identify and eliminate malicious backdoors efficiently in both single-target and multitarget scenarios with the performance of a DNN model retained to a large extent.
科研通智能强力驱动
Strongly Powered by AbleSci AI