计算机科学
程序设计语言
源代码
抽象语法树
Java
程序理解
抽象语法
冗余代码
代码生成
图形
编码(集合论)
理论计算机科学
人工智能
软件
语义学(计算机科学)
软件系统
解析
操作系统
钥匙(锁)
集合(抽象数据类型)
作者
Dawei Yuan,Sen Fang,Tao Zhang,Zhou Xu,Xiapu Luo
标识
DOI:10.1109/tr.2022.3176922
摘要
Code clone detection plays a critical role in the field of software engineering. To achieve this goal, developers are required to have rich development experience for finding the "functional" clone code. However, this is unfriendly to novice developers. Although many approaches were proposed to automatically detect code clones, the results are not satisfactory. A major reason is that it is difficult to extract syntax and semantic information from the source code. To resolve this problem, in this article, we develop a novel graph representation approach based on intermediate code to detect the functional code clones. This graph representation is built based on intermediate code compiled from the source code. By using it, we can easily utilize graph embedding techniques to extract syntactic and semantic features from abstract syntax tree, control flow graph, and DFG generated from intermediate code. After that, we use the Softmax classifier to detect functional code clone pairs. We evaluate the performance of the proposed graph representation approach based on intermediate code for the code clone detection task on the BigCloneBench dataset. In order to improve performance, the embedded representation of intermediate code is initialized based on pretrained vectors learned from the collected LLVM IR dataset in advance. The experimental results show that our proposed intermediate code-based graph approach performs better than existing functional code clone detection approaches. Especially for the type-4 code clone detection, our approach outperforms the baseline approaches by an average of 33.49% in the term of F 1 score.
科研通智能强力驱动
Strongly Powered by AbleSci AI