计算机科学
克隆(Java方法)
安全性令牌
编码(集合论)
源代码
可扩展性
人工智能
机器学习
程序设计语言
操作系统
DNA
遗传学
集合(抽象数据类型)
生物
作者
Siyue Feng,Wenqi Suo,Yueming Wu,Deqing Zou,Yang Liu,Hai Jin
标识
DOI:10.1145/3597503.3639114
摘要
As software engineering advances and the code demand rises, the prevalence of code clones has increased. This phenomenon poses risks like vulnerability propagation, underscoring the growing importance of code clone detection techniques. While numerous code clone detection methods have been proposed, they often fall short in real-world code environments. They either struggle to identify code clones effectively or demand substantial time and computational resources to handle complex clones. This paper introduces a code clone detection method namely Toma using tokens and machine learning. Specifically, we extract token type sequences and employ six similarity calculation methods to generate feature vectors. These vectors are then input into a trained machine learning model for classification. To evaluate the effectiveness and scalability of Toma, we conduct experiments on the widely used BigCloneBench dataset. Results show that our tool outperforms token-based code clone detectors and most tree-based clone detectors, demonstrating high effectiveness and significant time savings.
科研通智能强力驱动
Strongly Powered by AbleSci AI