计算机科学
财产(哲学)
符号
语法
代表(政治)
集合(抽象数据类型)
机制(生物学)
人工智能
表(数据库)
自然语言处理
理论计算机科学
过程(计算)
数据挖掘
程序设计语言
数学
算术
哲学
认识论
政治
政治学
法学
作者
Shuangjia Zheng,Xin Yan,Yuedong Yang,Jun Xu
标识
DOI:10.26434/chemrxiv.7295903.v2
摘要
Recognizing substructures and their relations embedded in a molecular structure representation is a key process for structure-activity or structure-property relationship (SAR/SPR) studies. A molecular structure can be either explicitly represented as a connection table (CT) or linear notation, such as SMILES, which is a language describing the connectivity of atoms in the molecular structure. Conventional SAR/SPR approaches rely on partitioning the CT into a set of predefined substructures as structural descriptors. In this work, we propose a new method to identifying SAR/SPR through linear notation (for example, SMILES) syntax analysis with self-attention mechanism, an interpretable deep learning architecture. The method has been evaluated by predicting chemical property, toxicology, and bioactivity from experimental data sets. Our results demonstrate that the method yields superior performance comparing with state-of-the-art methods. Moreover, the method can produce chemically interpretable results, which can be used for a chemist to design, and synthesize the activity/property improved compounds.
科研通智能强力驱动
Strongly Powered by AbleSci AI