计算机科学
编码(集合论)
重构代码
人工智能
源代码
相似性(几何)
二进制代码
理论计算机科学
软件
二进制数
程序设计语言
数学
算术
图像(数学)
集合(抽象数据类型)
出处
期刊:Foundations of Software Engineering
日期:2018-10-26
卷期号:: 141-151
被引量:240
标识
DOI:10.1145/3236024.3236068
摘要
Measuring code similarity is fundamental for many software engineering tasks, e.g., code search, refactoring and reuse. However, most existing techniques focus on code syntactical similarity only, while measuring code functional similarity remains a challenging problem. In this paper, we propose a novel approach that encodes code control flow and data flow into a semantic matrix in which each element is a high dimensional sparse binary feature vector, and we design a new deep learning model that measures code functional similarity based on this representation. By concatenating hidden representations learned from a code pair, this new model transforms the problem of detecting functionally similar code to binary classification, which can effectively learn patterns between functionally similar code with very different syntactics.
科研通智能强力驱动
Strongly Powered by AbleSci AI