计算机科学
程序设计语言
编码(集合论)
代码生成
软件工程
计算机安全
钥匙(锁)
集合(抽象数据类型)
标识
DOI:10.54254/2755-2721/2025.20425
摘要
In recent years, the proliferation of code generation models based on large language models such as GitHub Copilot and ChatGPT allows automated source code generation to meet the needs of developers and helps increase coding efficiency. However, a recent study revealed security concerns in generated code, leading the code to be vulnerable to attacks. My research introduces a framework aimed at mitigating the risk of code generation models generating vulnerable code specific to data leakage issues. The ranker is developed to use VUDENC, a deep learning model for vulnerability detection, along with CodeQL and Bandit, two Python code analyzers, to evaluate and rank generated code based on security metrics. By generating multiple code candidates and utilizing the ranker to select the most secure option, it ensures the generation of more secure code. The framework is evaluated on an aggregated SecurityEval and LLMSecEval dataset on relevant scenarios, which shows the framework has newfound advantages compared to the gpt3.5-turbo model. With its proven effectiveness, the framework could be expanding its applicability beyond data leakage issues, adapting to mitigate a comprehensive range of vulnerabilities.
科研通智能强力驱动
Strongly Powered by AbleSci AI