后门
计算机科学
编码器
图像(数学)
生成语法
人工智能
自然语言处理
语音识别
计算机视觉
计算机安全
操作系统
作者
Siman Wu,S.K. Hui,Tianqing Zhu,Wanlei Zhou
标识
DOI:10.1109/tdsc.2025.3595864
摘要
Text-to-image generative models have gained popularity among both researchers and the general public due to their ability to generate high-quality images from text prompts. However, many of these models rely on pre-trained text encoders obtained from external sources, which introduces significant security risks. Adversaries can implant backdoors into these encoders, causing the models to generate images with predefined attributes, such as nudity, or to produce malicious content when triggered prompts are used. Existing backdoor defenses are ineffective in text-to-image models due to the large number of model parameters, the difficulty of collecting high-quality text-image pairs, and the challenges in defining adversarial behaviors. In this paper, we propose two defense methods: few-shot fine-tuning and history-based fine-tuning. Specifically, few-shot fine-tuning eliminates backdoors by adding words that present backdoor effects to the input prompts to make the embeddings of trigger prompts consistent with those of clean prompts. History-based fine-tuning distinguishes triggered prompts from clean prompts by removing keywords from the input prompts and utilizes only historical data for self-healing. Experimental results demonstrate that both defense methods can effectively eliminate backdoors while maintaining high model performance. Our code is available at https://github.com/Wu-sm/Defense-against-backdoor-attacks-in-text-to-image.
科研通智能强力驱动
Strongly Powered by AbleSci AI