变压器
片段(逻辑)
生成语法
计算机科学
词汇分析
树(集合论)
自然语言处理
人工智能
程序设计语言
数学
工程类
电气工程
数学分析
电压
作者
Tensei Inukai,Aoi Yamato,Manato Akiyama,Yasubumi Sakakibara
标识
DOI:10.26434/chemrxiv-2024-77vhr-v3
摘要
Molecular generation models, especially chemical language model (CLM) utilizing SMILES, a string representation of compounds, face limitations in handling large and complex compounds while maintaining structural accuracy. To address these challenges, we propose FRATTVAE, a Transformer-based variational autoencoder that treats molecules as tree structures with fragments as nodes. FRATTVAE employs several innovative deep learning techniques, including ECFP (Extended Connectivity Fingerprints) based token embeddings and the Transformer’s self-attention mechanism, FRATTVAE efficiently handles large-scale compounds, improving both computational speed and generation accuracy. Evaluations across benchmark datasets, ranging from small molecules to natural compounds, demonstrate that FRATTVAE consistently outperforms existing models, achieving superior reconstruction accuracy and generation quality. Additionally, in molecular optimization tasks, FRATTVAE generated stable, high-quality molecules with desired properties, avoiding structural alerts. These results highlight FRATTVAE as a robust and versatile solution for molecular generation and optimization, making it well-suited for a variety of applications in cheminformatics and drug discovery.
科研通智能强力驱动
Strongly Powered by AbleSci AI