NRC-VABS: Normalized Reparameterized Conditional Variational Autoencoder with applied beam search in latent space for drug molecule design

自编码计算机科学人工智能代表（政治）水准点（测量）波束搜索后验概率潜变量机器学习算法模式识别（心理学）数学优化数学深度学习搜索算法贝叶斯概率大地测量学政治政治学法学地理

作者

Arun Singh Bhadwal,Kamal Kumar,Neeraj Kumar

出处

期刊：Expert Systems With Applications [Elsevier]
日期：2024-04-01 卷期号：240: 122396-122396

标识

DOI：10.1016/j.eswa.2023.122396

摘要

Designing an optimal and desired drug molecule structure is a challenging problem. Most of the existing solutions/representations reported in the literature for this problem are complex and time consuming. This is due to larger datasets with longer training periods and long learning dependencies. Deep learning’s generative model can be used to enable chemical modelling to generate molecules without explicit complex molecular rules. However, Deep Learning models (LSTM based VAE) suffer from posterior collapse, larger vocabulary of datasets and sub-optimal latent space searching mechanisms. Motivated by this, we propose a recently researched idea of Normalised Reparameterized conditional Variational Autoencoder with applied beam search in latent space (NRC-VABS). The resulting model with normalized vocabulary, conditionally augmented dataset and revised/reparameterized loss function addresses posterior collapse and constructs continuous and consistent latent space for exploitation by beam search during generation stages. The conditions/properties of desirable molecules are specified through a condition vector and is used while training as well as during generation of drug molecules. Beam search is coined on improved normalized SMILES representation. The idea entails by creating samples with beam search and filtering them depending on their condition and identifying the optimal molecules with desired properties. Normalization also improves the information and reduces complexity in latent space. To address the diversity of the generated molecules, a tunable parameter (D) is also used. Various performance evaluation metrics, such as validity, uniqueness, novelty, accuracy, and Frechet ChemNet Distance are used to evaluate the NRC-VABS on benchmark data sets such as GDB13, MOSES and subset of 250k ZINC molecules. The performance of the NRC-VABS is compared with state-of-the-art peer techniques. NRC-VABS generates molecules at validity range from 92% to 84%, Accuracy 89% to 97% at varied level of diversities(D=1, D=2 and D=3). An application of the proposal in terms interpolation and controlling other (2 of 3) properties by varying one (1 of 3) property at a time. Generating only target molecules with desired properties and maintaining diversity improves novel molecules while greatly reducing time complexity as only novel and desired molecules can be generated.

求助该文献

最长约 10秒，即可获得该文献文件

NRC-VABS: Normalized Reparameterized Conditional Variational Autoencoder with applied beam search in latent space for drug molecule design

今日热心研友