冗余(工程)
计算机科学
相似性(几何)
数据挖掘
排队
算法
模式识别(心理学)
相似
精确性和召回率
人工智能
图像(数学)
程序设计语言
操作系统
作者
Y. F. Long,H.L. Li,Wan Zhang,Peng Tian
标识
DOI:10.1109/frse58934.2023.00032
摘要
In order to address the problem of misjudgment and missing judgment in duplicate data detection by traditional similarity method. The multidimensional similarity redundancy detection algorithm MSRD is proposed in this paper, which combines numerical similarity, literal similarity, and semantic similarity to form a more complete similarity calculation method, and the detection process is constructed based on the idea of priority queue. In this paper, four real and synthetic data sets and three traditional redundancy detection tools are compared. The results show that this method can better distinguish the similarity between records, and in the process of redundancy detection, it can ensure that the detection efficiency is similar to the traditional algorithm, with better precision and recall, and can more accurately detect similar duplicate data.
科研通智能强力驱动
Strongly Powered by AbleSci AI