三角形不等式
相似性(几何)
财产(哲学)
欧几里德距离
计算机科学
欧几里得空间
模式识别(心理学)
欧几里德几何
向量空间
空格(标点符号)
数学
人工智能
算法
组合数学
纯数学
图像(数学)
几何学
哲学
操作系统
认识论
标识
DOI:10.1007/978-3-030-87334-9_2
摘要
The Tanimoto similarity is widely used in chemo-informatics, biology, bio-informatics, text mining and information retrieval to determine neighborhoods of sufficiently similar objects or k most similar objects represented by real-valued vectors. For metrics such as the Euclidean distance, the triangle inequality property is often used to efficiently identify vectors that may belong to the sought neighborhood of a given vector. Nevertheless, the Tanimoto similarity as well as the Tanimoto dissimilarity do not fulfill the triangle inequality property for real-valued vectors. In spite of this, in this paper, we show that the problem of looking for a neighborhood with respect to the Tanimoto similarity among real-valued vectors is equivalent to the problem of looking for a neighborhood among normalized forms of these vectors in the Euclidean space. Based on this result, we propose a method that uses the triangle inequality to losslessly identify promising candidates for members of Tanimoto similarity neighborhoods among real-valued vectors. The method requires pre-calculation and storage of the distances from normalized forms of real-valued vectors to so called a reference vector. The normalized forms of vectors themselves do not need to be stored after the pre-calculation of these distances. We also propose two variants of a new combined method which, apart from the triangle inequality, also uses bounds on vector lengths to determine Tanimoto similarity neighborhoods. The usefulness of the new and related methods is illustrated with examples.
科研通智能强力驱动
Strongly Powered by AbleSci AI