噪音(视频)
数据集
预测能力
反应性(心理学)
集合(抽象数据类型)
稀缺
合成数据
机器学习
计算机科学
人工智能
化学
数据挖掘
图像(数学)
医学
替代医学
病理
哲学
认识论
微观经济学
经济
程序设计语言
作者
Julian A. Hueffel,Quentin P. Bindschaedler,Francesco Sala,Franziska Schoenebeck
摘要
Data scarcity is a key obstacle in the pursuit to capitalize on the predictive powers of the A.I. in problems related to bond-making and -breaking at the molecular level. While the generation of artificial data from real data points (known as "data augmentation") is a widely pursued strategy employed in, for example, image or speech recognition or health data evaluations, among others, to artificially expand existing data sets, it is currently unknown whether this strategy is applicable to reactivity problems at the molecular level, where predictive models are exquisitely sensitive to steric, electronic, and structural nuances. Here, we systematically evaluated the power of data augmentation for a diverse set of reactivity questions ranging from the prediction of activation barriers to stereoselectivities of catalytic transformations. We demonstrate that introducing Gaussian noise to existing data points, which is completed in under a second for a full data set, can dramatically enhance the predictive performance. It can enable model training in low-data regimes where otherwise no meaningful model could be built and achieves accuracy comparable to models built on full data sets, while needing only a fraction of the data. The approach substantially lowers the number of necessary experiments (by 20-50%), conserving time, energy, and resources, while advancing the integration of machine learning in molecular reactivity challenges.
科研通智能强力驱动
Strongly Powered by AbleSci AI