局部敏感散列
最近邻搜索
计算机科学
次线性函数
散列函数
维数之咒
理论计算机科学
聚类分析
时间复杂性
相似性(几何)
哈希表
数据挖掘
算法
数学
人工智能
离散数学
计算机安全
图像(数学)
作者
Rajendra Shinde,Ashish Goel,Pankaj Gupta,Debojyoti Dutta
标识
DOI:10.1145/1807167.1807209
摘要
Similarity search methods are widely used as kernels in various data mining and machine learning applications including those in computational biology, web search/clustering. Nearest neighbor search (NNS) algorithms are often used to retrieve similar entries, given a query. While there exist efficient techniques for exact query lookup using hashing, similarity search using exact nearest neighbors suffers from a "curse of dimensionality", i.e. for high dimensional spaces, best known solutions offer little improvement over brute force search and thus are unsuitable for large scale streaming applications. Fast solutions to the approximate NNS problem include Locality Sensitive Hashing (LSH) based techniques, which need storage polynomial in n with exponent greater than 1, and query time sublinear, but still polynomial in n, where n is the size of the database. In this work we present a new technique of solving the approximate NNS problem in Euclidean space using a Ternary Content Addressable Memory (TCAM), which needs near linear space and has O(1) query time. In fact, this method also works around the best known lower bounds in the cell probe model for the query time using a data structure near linear in the size of the data base.
科研通智能强力驱动
Strongly Powered by AbleSci AI