搜索引擎索引
计算机科学
最近邻搜索
数据挖掘
聚类分析
公制(单位)
度量空间
数据库索引
相似性(几何)
索引(排版)
范围查询(数据库)
情报检索
人工智能
搜索引擎
Web搜索查询
数学
Web查询分类
数学分析
运营管理
万维网
经济
图像(数学)
作者
Yao Tian,Tingyun Yan,Xi Zhao,Kai Huang,Xiaofang Zhou
标识
DOI:10.1109/tkde.2022.3206441
摘要
Indexing is an effective way to support efficient query processing in large databases. Recently the concept of learned index , which replaces or complements traditional index structures with machine learning models, has been actively explored to reduce storage and search costs. However, accurate and efficient similarity query processing in high-dimensional metric spaces remains to be an open challenge. In this paper, we propose a novel indexing approach called LIMS that uses data clustering, pivot-based data transformation techniques and learned indexes to support efficient similarity query processing in metric spaces. In LIMS, the underlying data is partitioned into clusters such that each cluster follows a relatively uniform data distribution. Data redistribution is achieved by utilizing a small number of pivots for each cluster. Similar data are mapped into compact regions and the mapped values are totally ordinal. Machine learning models are developed to approximate the position of each data record on disk. Efficient algorithms are designed for processing range queries and nearest neighbor queries based on LIMS, and for index maintenance with dynamic updates. Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes and state-of-the-art learned indexes.
科研通智能强力驱动
Strongly Powered by AbleSci AI