局部敏感散列
计算机科学
散列函数
查询优化
哈希表
最近邻搜索
数据挖掘
计算机安全
作者
Yao Tian,Xi Zhao,Xiaofang Zhou
出处
期刊:IEEE Transactions on Knowledge and Data Engineering
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:: 1-14
标识
DOI:10.1109/tkde.2023.3295831
摘要
Locality-sensitive hashing (LSH) is a promising family of methods for the high-dimensional approximate nearest neighbor (ANN) search problem due to its sub-linear query time and strong theoretical guarantee. Existing LSH methods either suffer from large index sizes and hash boundary problems, or incur a linear cost for high-quality candidate identification. This dilemma is addressed in a novel method called DB-LSH proposed in this paper. It organizes the projected spaces with multi-dimensional indexes instead of fixed-width hash buckets, which significantly reduces space costs. High-quality candidates can be generated efficiently by dynamically constructing query-based hypercubic buckets with the required widths through index-based window queries. A novel incremental search strategy called DBI-LSH is also developed to further boost the query performance, which incrementally accesses the next best point for higher accuracy and efficiency. Considering the intermediate query information of each query, DBA-LSH is designed to adaptively tune termination conditions without scarifying the success probability. Our theoretical analysis proves that DB-LSH has a smaller query cost than the existing work while DBA-LSH and DBI-LSH have lower expected query costs than DB-LSH. An extensive range of experiments on real-world data show the superiority of our approaches over the state-of-the-art methods in both efficiency and accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI