亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Randomized SMILES strings improve the quality of molecular generative models

计算机科学 质量(理念) 生成语法 随机对照试验 情报检索 数据挖掘 数据科学 人工智能 医学 外科 哲学 认识论
作者
Josep Arús‐Pous,Simon Johansson,Oleksii Prykhodko,Esben Jannik Bjerrum,Christian Tyrchan,Jean‐Louis Reymond,Hongming Chen,Ola Engkvist
出处
期刊:Journal of Cheminformatics [BioMed Central]
卷期号:11 (1) 被引量:297
标识
DOI:10.1186/s13321-019-0393-0
摘要

Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces of valid and meaningful structures. Herein we perform an extensive benchmark on models trained with subsets of GDB-13 of different sizes (1 million, 10,000 and 1000), with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations. To guide the benchmarks new metrics were developed that define how well a model has generalized the training set. The generated chemical space is evaluated with respect to its uniformity, closedness and completeness. Results show that models that use LSTM cells trained with 1 million randomized SMILES, a non-unique molecular string representation, are able to generalize to larger chemical spaces than the other approaches and they represent more accurately the target chemical space. Specifically, a model was trained with randomized SMILES that was able to generate almost all molecules from GDB-13 with a quasi-uniform probability. Models trained with smaller samples show an even bigger improvement when trained with randomized SMILES models. Additionally, models were trained on molecules obtained from ChEMBL and illustrate again that training with randomized SMILES lead to models having a better representation of the drug-like chemical space. Namely, the model trained with randomized SMILES was able to generate at least double the amount of unique molecules with the same distribution of properties comparing to one trained with canonical SMILES.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
chen完成签到 ,获得积分10
39秒前
53秒前
Nichols完成签到,获得积分10
58秒前
59秒前
1分钟前
辞稚发布了新的文献求助10
1分钟前
1分钟前
1分钟前
hahasun完成签到,获得积分10
1分钟前
小凯完成签到 ,获得积分10
1分钟前
LiuHD完成签到,获得积分10
2分钟前
专注的月亮完成签到,获得积分10
2分钟前
科研通AI2S应助科研通管家采纳,获得10
2分钟前
OsamaKareem应助科研通管家采纳,获得30
2分钟前
2分钟前
2分钟前
PG发布了新的文献求助10
2分钟前
2分钟前
Lucas应助PG采纳,获得10
3分钟前
MosesConey发布了新的文献求助10
3分钟前
3分钟前
Owen应助三倍美式采纳,获得50
3分钟前
zs发布了新的文献求助10
3分钟前
zs完成签到,获得积分20
3分钟前
希望天下0贩的0应助matrixu采纳,获得10
4分钟前
MadysonKotrba发布了新的文献求助10
4分钟前
尼古丁的味道完成签到 ,获得积分10
4分钟前
MadysonKotrba发布了新的文献求助10
4分钟前
MadysonKotrba发布了新的文献求助10
4分钟前
matrixu完成签到,获得积分10
5分钟前
5分钟前
matrixu发布了新的文献求助10
5分钟前
5分钟前
PG发布了新的文献求助10
5分钟前
vvcat完成签到,获得积分10
5分钟前
5分钟前
辞稚完成签到,获得积分10
5分钟前
Yini应助兼听则明采纳,获得50
5分钟前
夜休2024完成签到 ,获得积分10
6分钟前
6分钟前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
The Cambridge History of China: Volume 4, Sui and T'ang China, 589–906 AD, Part Two 1500
Cowries - A Guide to the Gastropod Family Cypraeidae 1200
Quality by Design - An Indispensable Approach to Accelerate Biopharmaceutical Product Development 800
Pulse width control of a 3-phase inverter with non sinusoidal phase voltages 777
Signals, Systems, and Signal Processing 610
Research Methods for Applied Linguistics: A Practical Guide 600
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6399278
求助须知:如何正确求助?哪些是违规求助? 8215084
关于积分的说明 17407606
捐赠科研通 5452618
什么是DOI,文献DOI怎么找? 2881845
邀请新用户注册赠送积分活动 1858293
关于科研通互助平台的介绍 1700300