Deep Learning Based Vulnerability Detection: Are We There Yet?

计算机科学 假阳性悖论 机器学习 惊喜 人工智能 脆弱性(计算) 软件 深度学习 数据挖掘 计算机安全 心理学 社会心理学 程序设计语言
作者
Saikat Chakraborty,Rahul Krishna,Yangruibo Ding,Baishakhi Ray
出处
期刊:IEEE Transactions on Software Engineering [Institute of Electrical and Electronics Engineers]
卷期号:48 (9): 3280-3296 被引量:111
标识
DOI:10.1109/tse.2021.3087402
摘要

Automated detection of software vulnerabilities is a fundamental problem in software security. Existing program analysis techniques either suffer from high false positives or false negatives. Recent progress in Deep Learning (DL) has resulted in a surge of interest in applying DL for automated vulnerability detection. Several recent studies have demonstrated promising results achieving an accuracy of up to 95 percent at detecting vulnerabilities. In this paper, we ask, “how well do the state-of-the-art DL-based techniques perform in a real-world vulnerability prediction scenario?” To our surprise, we find that their performance drops by more than 50 percent. A systematic investigation of what causes such precipitous performance drop reveals that existing DL-based vulnerability prediction approaches suffer from challenges with the training data (e.g., data duplication, unrealistic distribution of vulnerable classes, etc.) and with the model choices (e.g., simple token-based models). As a result, these approaches often do not learn features related to the actual cause of the vulnerabilities. Instead, they learn unrelated artifacts from the dataset (e.g., specific variable/function names, etc.). Leveraging these empirical findings, we demonstrate how a more principled approach to data collection and model design, based on realistic settings of vulnerability prediction, can lead to better solutions. The resulting tools perform significantly better than the studied baseline—up to 33.57 percent boost in precision and 128.38 percent boost in recall compared to the best performing model in the literature. Overall, this paper elucidates existing DL-based vulnerability prediction systems’ potential issues and draws a roadmap for future DL-based vulnerability prediction research.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
reddy发布了新的文献求助10
2秒前
chen123发布了新的文献求助10
2秒前
柯一一应助augenstern采纳,获得10
2秒前
田様应助寻水的鱼采纳,获得10
4秒前
rsq发布了新的文献求助10
5秒前
coisini发布了新的文献求助10
6秒前
6秒前
LZYC发布了新的文献求助10
6秒前
大力洋葱发布了新的文献求助10
6秒前
传奇3应助西米采纳,获得10
7秒前
7秒前
mo发布了新的文献求助10
7秒前
9秒前
9秒前
9秒前
10秒前
dd发布了新的文献求助10
11秒前
酷波er应助hhh采纳,获得10
11秒前
11秒前
hugeng发布了新的文献求助10
12秒前
chen123完成签到,获得积分10
12秒前
12秒前
Xyx发布了新的文献求助10
13秒前
13秒前
学术野猪发布了新的文献求助10
13秒前
舒服的踏歌完成签到,获得积分10
13秒前
鳗鱼厉发布了新的文献求助10
14秒前
Yogurt发布了新的文献求助10
14秒前
天晓发布了新的文献求助10
15秒前
15秒前
15秒前
小皮蛋发布了新的文献求助20
16秒前
研友_VZG7GZ应助大力洋葱采纳,获得10
17秒前
17秒前
Joey完成签到 ,获得积分10
19秒前
西米发布了新的文献求助10
20秒前
时光倒流ltt完成签到 ,获得积分10
21秒前
restudy68发布了新的文献求助20
21秒前
单纯烟完成签到 ,获得积分10
22秒前
22秒前
高分求助中
Teaching Social and Emotional Learning in Physical Education 900
Plesiosaur extinction cycles; events that mark the beginning, middle and end of the Cretaceous 500
Chinese-English Translation Lexicon Version 3.0 500
[Lambert-Eaton syndrome without calcium channel autoantibodies] 440
Two-sample Mendelian randomization analysis reveals causal relationships between blood lipids and venous thromboembolism 400
薩提亞模式團體方案對青年情侶輔導效果之研究 400
3X3 Basketball: Everything You Need to Know 310
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2387042
求助须知:如何正确求助?哪些是违规求助? 2093463
关于积分的说明 5268255
捐赠科研通 1820154
什么是DOI,文献DOI怎么找? 908016
版权声明 559237
科研通“疑难数据库(出版商)”最低求助积分说明 485015