MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations

计算机科学 自然语言处理 语言模型 人工智能 统一医学语言系统 程序设计语言
作者
Xiaoqiang Tang,Andrew Tran,Jeffrey Too Chuan Tan,Mark Gerstein
出处
期刊:Bioinformatics [Oxford University Press]
卷期号:40 (Supplement_1): i357-i368
标识
DOI:10.1093/bioinformatics/btae260
摘要

Abstract Motivation The current paradigm of deep learning models for the joint representation of molecules and text primarily relies on 1D or 2D molecular formats, neglecting significant 3D structural information that offers valuable physical insight. This narrow focus inhibits the models’ versatility and adaptability across a wide range of modalities. Conversely, the limited research focusing on explicit 3D representation tends to overlook textual data within the biomedical domain. Results We present a unified pre-trained language model, MolLM, that concurrently captures 2D and 3D molecular information alongside biomedical text. MolLM consists of a text Transformer encoder and a molecular Transformer encoder, designed to encode both 2D and 3D molecular structures. To support MolLM’s self-supervised pre-training, we constructed 160K molecule-text pairings. Employing contrastive learning as a supervisory signal for learning, MolLM demonstrates robust molecular representation capabilities across four downstream tasks, including cross-modal molecule and text matching, property prediction, captioning, and text-prompted molecular editing. Through ablation, we demonstrate that the inclusion of explicit 3D representations improves performance in these downstream tasks. Availability and implementation Our code, data, pre-trained model weights, and examples of using our model are all available at https://github.com/gersteinlab/MolLM. In particular, we provide Jupyter Notebooks offering step-by-step guidance on how to use MolLM to extract embeddings for both molecules and text.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
zz发布了新的文献求助10
刚刚
aaaa完成签到,获得积分10
刚刚
刚刚
斯文败类应助CrispyPotato采纳,获得10
刚刚
幸福的手套完成签到 ,获得积分10
刚刚
lww发布了新的文献求助20
刚刚
dyd完成签到,获得积分10
1秒前
笨男孩完成签到,获得积分10
1秒前
秋山落叶完成签到,获得积分10
1秒前
霸气剑通完成签到 ,获得积分10
1秒前
俭朴的忘幽完成签到,获得积分20
1秒前
Chiuchiu完成签到,获得积分10
1秒前
1秒前
喜东东发布了新的文献求助10
2秒前
NexusExplorer应助韶卿采纳,获得10
2秒前
天真的雨发布了新的文献求助10
2秒前
clownnn发布了新的文献求助10
2秒前
AC赵先生完成签到,获得积分10
3秒前
汉堡包应助周小鱼采纳,获得10
3秒前
3秒前
Qixin发布了新的文献求助10
3秒前
4秒前
4秒前
king完成签到,获得积分10
5秒前
研友_VZG7GZ应助Bruce采纳,获得10
5秒前
健忘鞋垫发布了新的文献求助10
5秒前
吴苏菲发布了新的文献求助10
5秒前
李健应助俭朴的忘幽采纳,获得10
6秒前
贪玩的问夏完成签到,获得积分10
6秒前
6秒前
7秒前
张悦完成签到,获得积分10
7秒前
7秒前
7秒前
丽丽完成签到,获得积分10
8秒前
花灯发布了新的文献求助10
9秒前
依妍发布了新的文献求助10
9秒前
等待毛豆完成签到,获得积分10
9秒前
9秒前
丘比特应助洪山老狗采纳,获得10
9秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Kinesiophobia : a new view of chronic pain behavior 2000
Burger's Medicinal Chemistry, Drug Discovery and Development, Volumes 1 - 8, 8 Volume Set, 8th Edition 1800
Cronologia da história de Macau 1600
文献PREDICTION EQUATIONS FOR SHIPS' TURNING CIRCLES或期刊Transactions of the North East Coast Institution of Engineers and Shipbuilders第95卷 1000
BRITTLE FRACTURE IN WELDED SHIPS 1000
Lloyd's Register of Shipping's Approach to the Control of Incidents of Brittle Fracture in Ship Structures 1000
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 纳米技术 计算机科学 化学工程 生物化学 物理 复合材料 内科学 催化作用 物理化学 光电子学 细胞生物学 基因 电极 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6144263
求助须知:如何正确求助?哪些是违规求助? 7971322
关于积分的说明 16555141
捐赠科研通 5256316
什么是DOI,文献DOI怎么找? 2806466
邀请新用户注册赠送积分活动 1787018
关于科研通互助平台的介绍 1656411