水准点(测量)
化学空间
集合(抽象数据类型)
多样性(政治)
分子
组合化学
计算机科学
化学
药物发现
有机化学
地理
社会学
生物化学
地图学
人类学
程序设计语言
作者
Alexander Neumann,Raphael Klein
标识
DOI:10.26434/chemrxiv-2025-vzjw3
摘要
Sources for commercially available compounds have been experiencing continuous growth for several years, reaching their peak in billion- to trillion-sized combinatorial Chemical Spaces. In order to assess the quality of a compound collection to provide relevant chemistry, a benchmark set of pharmaceutically relevant structures is required that enables an unbiased comparison. For this purpose, the CHEMBL database was mined for molecules displaying biological activity, and three benchmark sets of successive orders of magnitude were created by systematic filtering and processing: Set L (‘large-sized’, 379k), Set M (‘medium-sized’, 25k), and Set S (‘small-sized’, 3k). Tailored for broad coverage of the physicochemical and topological landscape, the benchmark Set S was then employed to analyze the chemical diversity capacities of commercial combinatorial Chemical Spaces and enumerated compound libraries. Among the three utilized search methods—FTrees (pharmacophore features), SpaceLight (molecular fingerprints), and SpaceMACS (maximum common substructure)—the eXplore and REAL Space consistently performed best. In general, each Chemical Space was able to provide a larger number of compounds more similar to the respective query molecule than the enumerated libraries, while also individually offering unique scaffolds for each method.
科研通智能强力驱动
Strongly Powered by AbleSci AI