相似性(几何)
结构相似性
计算机科学
人工智能
情报检索
图像(数学)
作者
Rebekah Duke,Chih-Hsuan Yang,Baskar Ganapathysubramanian,Chad Risko
标识
DOI:10.1021/acs.jcim.5c00175
摘要
The rapid adoption of big data, machine learning (ML), and generative artificial intelligence (AI) in chemical discovery has heightened the importance of quantifying molecular similarity. Molecular similarity, commonly assessed as the distance between molecular fingerprints, is integral to applications such as database curation, diversity analysis, and property prediction. AI tools frequently rely on these similarity measures to cluster molecules under the assumption that structurally similar molecules exhibit similar properties. However, this assumption is not universally valid, particularly for continuous properties like electronic structure properties. Despite the prevalence of fingerprint-based similarity measures, their evaluation has largely depended on biological activity data sets and qualitative metrics, limiting their relevance for nonbiological domains. To address this gap, we propose a framework to evaluate the correlation between molecular similarity measures and molecular properties. Our approach builds on the concept of neighborhood behavior and incorporates kernel density estimation (KDE) analysis to quantify how well similarity measures capture property relationships. Using a data set of over 350 million molecule pairs with electronic structure, redox, and optical properties, we systematically evaluate the correlation between several molecular fingerprint generators, distance functions, and these properties. Both the curated data set and the evaluation framework are publicly available.
科研通智能强力驱动
Strongly Powered by AbleSci AI