空格(标点符号)
质量(理念)
集合(抽象数据类型)
数据集
计算机科学
基质(水族馆)
人工智能
模式识别(心理学)
物理
生物
生态学
量子力学
操作系统
程序设计语言
作者
Stephanie Felten,Cyndi Qixin He,Marion H. Emmert
标识
DOI:10.1021/acs.joc.4c02574
摘要
We report a general C-H aminoalkylation of 5-membered heterocycles through a combined machine learning/experimental workflow. Our work describes previously unknown C-H functionalization reactivity and creates a predictive machine learning (ML) model through iterative refinement over 6 rounds of active learning. The initial model established with 1,3-azoles predicts the reactivities of N-aryl indazoles, 1,2,4-triazolopyrazines, 1,2,3-thiadiazoles, and 1,3,4-oxadiazoles, while other substrate classes (e.g., pyrazoles and 1,2,4-triazoles) are not predicted well. The final model includes the reactivities of additional heterocyclic scaffolds in the training data, which results in high predictive accuracy across all of the tested cores. The high prediction performance is shown both within the training set via cross-validation (CV R2 = 0.81) and when predicting unseen substrates of diverse molecular weight and structure (Test R2 = 0.95). The concept of feature engineering is discussed, and we benchmark mechanistically related DFT-based features that are more time-intensive and laborious in comparison with molecular descriptors and fingerprints. Importantly, this work establishes novel reactivity for heterocycles for which C-H functionalization methods are underdeveloped. Since such heterocycles are key motifs in drug discovery and development, we expect this work to be of significant use to the synthetic and synthesis-oriented ML communities.
科研通智能强力驱动
Strongly Powered by AbleSci AI