聚类分析
相似性(几何)
计算机科学
星团(航天器)
数据挖掘
质心
兰德指数
人工智能
模式识别(心理学)
日光
物理
图像(数学)
光学
程序设计语言
出处
期刊:Journal of Chemical Information and Computer Sciences
[American Chemical Society]
日期:1999-06-29
卷期号:39 (4): 747-750
被引量:535
摘要
One of the most commonly used clustering algorithms within the worldwide pharmaceutical industry is Jarvis−Patrick's (J−P) (Jarvis, R. A. IEEE Trans. Comput. 1973, C-22, 1025−1034). The implementation of J−P under Daylight software, using Daylight's fingerprints and the Tanimoto similarity index, can deal with sets of 100 k molecules in a matter of a few hours. However, the J−P clustering algorithm has several associated problems which make it difficult to cluster large data sets in a consistent and timely manner. The clusters produced are greatly dependent on the choice of the two parameters needed to run J−P clustering, such that this method tends to produce clusters which are either very large and heterogeneous or homogeneous but too small. In any case, J−P always requires time-consuming manual tuning. This paper describes an algorithm which will identify dense clusters where similarity within each cluster reflects the Tanimoto value used for the clustering, and, more importantly, where the cluster centroid will be at least similar, at the given Tanimoto value, to every other molecule within the cluster in a consistent and automated manner. The similarity term used throughout this paper reflects the overall similarity between two given molecules, as defined by Daylight's fingerprints and the Tanimoto similarity index.
科研通智能强力驱动
Strongly Powered by AbleSci AI