Token-Mol 1.0: Tokenized drug design with large language model

药品 计算机科学 医学 药理学
作者
Jike Wang,Rui Qin,Mingyang Wang,Meijing Fang,Yangyang Zhang,Yuchen Zhu,Qun Su,Qiaolin Gou,Chao Shen,Odin Zhang,Zhenxing Wu,Dejun Jiang,Xujun Zhang,Huifeng Zhao,Xiaozhe Wan,Zhourui Wu,Liwei Liu,Yu Kang,Chang‐Yu Hsieh,Tingjun Hou
出处
期刊:Cornell University - arXiv 被引量:2
标识
DOI:10.48550/arxiv.2407.07930
摘要

Significant interests have recently risen in leveraging sequence-based large language models (LLMs) for drug design. However, most current applications of LLMs in drug discovery lack the ability to comprehend three-dimensional (3D) structures, thereby limiting their effectiveness in tasks that explicitly involve molecular conformations. In this study, we introduced Token-Mol, a token-only 3D drug design model. This model encodes all molecular information, including 2D and 3D structures, as well as molecular property data, into tokens, which transforms classification and regression tasks in drug discovery into probabilistic prediction problems, thereby enabling learning through a unified paradigm. Token-Mol is built on the transformer decoder architecture and trained using random causal masking techniques. Additionally, we proposed the Gaussian cross-entropy (GCE) loss function to overcome the challenges in regression tasks, significantly enhancing the capacity of LLMs to learn continuous numerical values. Through a combination of fine-tuning and reinforcement learning (RL), Token-Mol achieves performance comparable to or surpassing existing task-specific methods across various downstream tasks, including pocket-based molecular generation, conformation generation, and molecular property prediction. Compared to existing molecular pre-trained models, Token-Mol exhibits superior proficiency in handling a wider range of downstream tasks essential for drug design. Notably, our approach improves regression task accuracy by approximately 30% compared to similar token-only methods. Token-Mol overcomes the precision limitations of token-only models and has the potential to integrate seamlessly with general models such as ChatGPT, paving the way for the development of a universal artificial intelligence drug design model that facilitates rapid and high-quality drug design by experts.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
路人发布了新的文献求助10
1秒前
AddictedBoy完成签到,获得积分20
1秒前
英俊的铭应助生动饼干采纳,获得10
1秒前
小二郎应助可乐不加冰采纳,获得10
1秒前
1秒前
顾矜应助跟村口狗约过架采纳,获得10
1秒前
3秒前
天天快乐应助江辰汐月采纳,获得10
4秒前
winfan完成签到,获得积分10
4秒前
5秒前
Walalilongla发布了新的文献求助10
5秒前
无极微光应助hwezhu采纳,获得20
6秒前
南山完成签到,获得积分10
6秒前
Lucas应助AddictedBoy采纳,获得10
7秒前
长风完成签到 ,获得积分10
7秒前
tly完成签到,获得积分10
8秒前
sagitar应助科研通管家采纳,获得20
9秒前
大大杰发布了新的文献求助10
9秒前
我是老大应助科研通管家采纳,获得10
9秒前
我是老大应助科研通管家采纳,获得10
9秒前
上官若男应助科研通管家采纳,获得10
9秒前
思源应助科研通管家采纳,获得10
9秒前
充电宝应助科研通管家采纳,获得10
9秒前
李健应助科研通管家采纳,获得10
9秒前
9秒前
9秒前
sagitar应助科研通管家采纳,获得20
10秒前
10秒前
10秒前
10秒前
Akim应助科研通管家采纳,获得10
10秒前
10秒前
10秒前
change应助科研通管家采纳,获得10
10秒前
100w完成签到,获得积分10
10秒前
10秒前
10秒前
10秒前
FashionBoy应助科研通管家采纳,获得10
10秒前
10秒前
高分求助中
The Graphene Handbook (2019 Edition) 800
Signals, Systems, and Signal Processing 610
IEST-RP-CC018: Cleanroom Cleaning and Sanitization: Operating and Monitoring Procedures 600
Fundamentals of Pharmaceutical and Biologics Regulations: A Global Perspective, Second Edition 600
久松真一著作集〈第5巻〉禅と芸術 500
Fundamentals of Modern Mathematics: A Practical Review (Dover Books on Mathematics) 500
Cold War Transcended: Australia's China Policy, 1949-1990 470
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6599505
求助须知:如何正确求助?哪些是违规求助? 8368723
关于积分的说明 17912389
捐赠科研通 5754226
什么是DOI,文献DOI怎么找? 2954122
邀请新用户注册赠送积分活动 1929341
关于科研通互助平台的介绍 1824531