计算机科学
编码(集合论)
代码段
匹配(统计)
聚类分析
算法
数据挖掘
人工智能
情报检索
数学
程序设计语言
集合(抽象数据类型)
统计
作者
K. Liu,Jianxun Liu,Haize Hu
摘要
Abstract The deep‐learning based code search model mainly takes accuracy as the only target for judging the performance of the model, ignoring the efficiency of code search. This article proposes a clustering‐based code search model (C‐DCS). C‐DCS uses the K‐Means to divide the code vector base into K clusters and obtains the center vectors of K clusters. While searching, C‐DCS first matches the query vector with the K center vectors to get the best matching center vector. After matching the center vector, C‐DCS matches the query vector with code vectors in the cluster corresponding to the best matching center vector one by one and then gets the best matching code snippet vector. To verify the efficiency of C‐DCS in the code search task, experimental analysis was built on a large dataset. The experimental results showed that C‐DCS saves 92.2% of the search time compared to the baseline model while remaining the accuracy. In the experimental evaluation section, we optimized the K‐Means algorithm to improve the code search efficiency of C‐DCS further, reducing the search time to 93.8% of the baseline model. Hence, C‐DCS reduces the code search time greatly with not affecting the accuracy, improving the efficiency of software development.
科研通智能强力驱动
Strongly Powered by AbleSci AI