自编码
计算机科学
人工智能
代表(政治)
水准点(测量)
波束搜索
后验概率
潜变量
机器学习
算法
模式识别(心理学)
数学优化
数学
深度学习
搜索算法
贝叶斯概率
大地测量学
政治
政治学
法学
地理
作者
Arun Singh Bhadwal,Kamal Kumar,Neeraj Kumar
标识
DOI:10.1016/j.eswa.2023.122396
摘要
Designing an optimal and desired drug molecule structure is a challenging problem. Most of the existing solutions/representations reported in the literature for this problem are complex and time consuming. This is due to larger datasets with longer training periods and long learning dependencies. Deep learning’s generative model can be used to enable chemical modelling to generate molecules without explicit complex molecular rules. However, Deep Learning models (LSTM based VAE) suffer from posterior collapse, larger vocabulary of datasets and sub-optimal latent space searching mechanisms. Motivated by this, we propose a recently researched idea of Normalised Reparameterized conditional Variational Autoencoder with applied beam search in latent space (NRC-VABS). The resulting model with normalized vocabulary, conditionally augmented dataset and revised/reparameterized loss function addresses posterior collapse and constructs continuous and consistent latent space for exploitation by beam search during generation stages. The conditions/properties of desirable molecules are specified through a condition vector and is used while training as well as during generation of drug molecules. Beam search is coined on improved normalized SMILES representation. The idea entails by creating samples with beam search and filtering them depending on their condition and identifying the optimal molecules with desired properties. Normalization also improves the information and reduces complexity in latent space. To address the diversity of the generated molecules, a tunable parameter (D) is also used. Various performance evaluation metrics, such as validity, uniqueness, novelty, accuracy, and Frechet ChemNet Distance are used to evaluate the NRC-VABS on benchmark data sets such as GDB13, MOSES and subset of 250k ZINC molecules. The performance of the NRC-VABS is compared with state-of-the-art peer techniques. NRC-VABS generates molecules at validity range from 92% to 84%, Accuracy 89% to 97% at varied level of diversities(D=1, D=2 and D=3). An application of the proposal in terms interpolation and controlling other (2 of 3) properties by varying one (1 of 3) property at a time. Generating only target molecules with desired properties and maintaining diversity improves novel molecules while greatly reducing time complexity as only novel and desired molecules can be generated.
科研通智能强力驱动
Strongly Powered by AbleSci AI