Fast and Controllable Bias-Guided Jailbreak Attack on Large Language Models
计算机科学
计算机网络
作者
Zi Kang,Hui Xia,Rui Zhang,Xiaoxue Song,Le Li,Chunqiang Hu
出处
期刊:IEEE Internet of Things Journal [Institute of Electrical and Electronics Engineers] 日期:2025-08-18卷期号:12 (24): 51892-51901
标识
DOI:10.1109/jiot.2025.3599199
摘要
Large language models (LLMs), with their powerful natural language processing capabilities, can provide more advanced intelligent services for edge devices. However, deploying LLMs at the edge is vulnerable to jailbreak attacks, which can cause the model to generate unsafe content. Meanwhile, current jailbreak attack schemes are inefficient in generating highly stealthy jailbreak prompts. To address this, we propose a Fast and Controllable Bias-Guided Jailbreak Attack (FCB) scheme. First, to improve attack efficiency, we optimize the bias of the model’s output layer to guide the model in generating low-energy jailbreak prompts by directly adjusting the output layer’s logits, thereby accelerating the decoding process. Second, to enhance the stealthiness of the generated jailbreak prompts, we design token stop selection and bias normalization methods to constrain the perturbations during the iterative process, preventing the generation of jailbreak prompts without meaningful semantics. Finally, extensive experimental results demonstrate that FCB can generate highly stealthy jailbreak prompts within a short time. Specifically, compared to the current state-of-the-art controllable attack generation scheme, COLD Attack, FCB achieves up to a 8% improvement in attack success rate, reduces perplexity by up to 181.171, and shortens generation time by as much as 28 seconds.