后门
计算机科学
词(群论)
对抗制
人工智能
黑匣子
样品(材料)
深层神经网络
可转让性
人工神经网络
自然语言处理
语音识别
机器学习
计算机安全
数学
化学
几何学
罗伊特
色谱法
作者
Kun Shao,Yu Zhang,Junan Yang,Xiaoshuai Li,Hui Liu
标识
DOI:10.1016/j.cose.2022.102730
摘要
Deep neural networks (DNNS) have been proven to be vulnerable to adversarial attacks. But the adversarial perturbations are generated for specific input samples, and the perturbations of one sample cannot be applied to other samples. In this paper, we propose a method to search for the backdoor of the natural language processing (NLP) model under the black-box condition, and we find that the universal attack triggers exist in the adversarial samples. The method includes two steps. The first step is to extract aggressive words in the adversarial sample to form the adversarial knowledge base under the black-box condition. The second step is to generate universal attack triggers by minimizing the target prediction results of a batch of samples. When we add the generated trigger to any benign input, the prediction accuracy of the DNNS model can be reduced to close to zero. The experimental results show that our method can achieve a high attack success rate with a short trigger (e.g., more than 90% using only a trigger of length 3 when attacking BiLSTM on SST-2). In addition, experiments show that our method has higher transferability. Finally, for the backdoor vulnerabilities in the NLP models, we did two defense experiments: abnormal word detection and word frequency analysis, which improve the NLP model’s ability of resisting backdoor attacks.
科研通智能强力驱动
Strongly Powered by AbleSci AI