清晨好,您是今天最早来到科研通的研友!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您科研之路漫漫前行!

Token-Mol 1.0: Tokenized drug design with large language model

药品 计算机科学 医学 药理学
作者
Jike Wang,Rui Qin,Mingyang Wang,Meijing Fang,Yangyang Zhang,Yuchen Zhu,Qun Su,Qiaolin Gou,Chao Shen,Odin Zhang,Zhenxing Wu,Dejun Jiang,Xujun Zhang,Huifeng Zhao,Xiaozhe Wan,Zhourui Wu,Liwei Liu,Yu Kang,Chang‐Yu Hsieh,Tingjun Hou
出处
期刊:Cornell University - arXiv 被引量:2
标识
DOI:10.48550/arxiv.2407.07930
摘要

Significant interests have recently risen in leveraging sequence-based large language models (LLMs) for drug design. However, most current applications of LLMs in drug discovery lack the ability to comprehend three-dimensional (3D) structures, thereby limiting their effectiveness in tasks that explicitly involve molecular conformations. In this study, we introduced Token-Mol, a token-only 3D drug design model. This model encodes all molecular information, including 2D and 3D structures, as well as molecular property data, into tokens, which transforms classification and regression tasks in drug discovery into probabilistic prediction problems, thereby enabling learning through a unified paradigm. Token-Mol is built on the transformer decoder architecture and trained using random causal masking techniques. Additionally, we proposed the Gaussian cross-entropy (GCE) loss function to overcome the challenges in regression tasks, significantly enhancing the capacity of LLMs to learn continuous numerical values. Through a combination of fine-tuning and reinforcement learning (RL), Token-Mol achieves performance comparable to or surpassing existing task-specific methods across various downstream tasks, including pocket-based molecular generation, conformation generation, and molecular property prediction. Compared to existing molecular pre-trained models, Token-Mol exhibits superior proficiency in handling a wider range of downstream tasks essential for drug design. Notably, our approach improves regression task accuracy by approximately 30% compared to similar token-only methods. Token-Mol overcomes the precision limitations of token-only models and has the potential to integrate seamlessly with general models such as ChatGPT, paving the way for the development of a universal artificial intelligence drug design model that facilitates rapid and high-quality drug design by experts.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
科科通通完成签到,获得积分10
3秒前
英喆完成签到 ,获得积分10
7秒前
凤迎雪飘完成签到,获得积分10
14秒前
23秒前
2022H发布了新的文献求助20
28秒前
whuhustwit完成签到,获得积分10
31秒前
33秒前
科研通AI5应助2022H采纳,获得10
40秒前
1分钟前
fuyuhaoy完成签到,获得积分10
1分钟前
Sunny完成签到,获得积分10
1分钟前
自然的含蕾完成签到 ,获得积分10
2分钟前
共享精神应助俊逸吐司采纳,获得10
2分钟前
SCI的芷蝶完成签到 ,获得积分10
2分钟前
2分钟前
钉钉完成签到 ,获得积分10
3分钟前
3211应助科研通管家采纳,获得10
4分钟前
金钰贝儿完成签到,获得积分10
4分钟前
meijuan1210完成签到 ,获得积分10
5分钟前
vbnn完成签到 ,获得积分10
5分钟前
顾矜应助大雄先生采纳,获得10
5分钟前
Adam完成签到 ,获得积分10
5分钟前
5分钟前
大雄先生发布了新的文献求助10
5分钟前
大雄先生完成签到,获得积分20
5分钟前
星辰大海应助lulululululu采纳,获得30
6分钟前
刘刘完成签到 ,获得积分10
6分钟前
今后应助Tia采纳,获得10
6分钟前
深林盛世完成签到,获得积分10
7分钟前
xiaoyi完成签到 ,获得积分10
7分钟前
吃的饭广泛完成签到,获得积分10
7分钟前
Barid完成签到,获得积分10
7分钟前
8分钟前
俊逸吐司发布了新的文献求助10
8分钟前
8分钟前
8分钟前
俊逸吐司完成签到 ,获得积分10
8分钟前
沙海沉戈完成签到,获得积分0
8分钟前
dery完成签到,获得积分10
8分钟前
asdwind完成签到,获得积分10
8分钟前
高分求助中
The world according to Garb 600
Разработка метода ускоренного контроля качества электрохромных устройств 500
Mass producing individuality 500
Chinesen in Europa – Europäer in China: Journalisten, Spione, Studenten 500
Arthur Ewert: A Life for the Comintern 500
China's Relations With Japan 1945-83: The Role of Liao Chengzhi // Kurt Werner Radtke 500
Two Years in Peking 1965-1966: Book 1: Living and Teaching in Mao's China // Reginald Hunt 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3819960
求助须知:如何正确求助?哪些是违规求助? 3362858
关于积分的说明 10418873
捐赠科研通 3081189
什么是DOI,文献DOI怎么找? 1695009
邀请新用户注册赠送积分活动 814791
科研通“疑难数据库(出版商)”最低求助积分说明 768522