计算机科学
自然语言处理
人工智能
构造(python库)
组合性原则
一般化
词汇
情报检索
语言学
程序设计语言
数学
数学分析
哲学
作者
Shengchao Liu,Weili Nie,Chengpeng Wang,Jiarui Lu,Zhuoran Qiao,Ling Liu,Jian Tang,Chaowei Xiao,Anima Anandkumar
出处
期刊:Cornell University - arXiv
日期:2022-01-01
被引量:4
标识
DOI:10.48550/arxiv.2212.10789
摘要
There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
科研通智能强力驱动
Strongly Powered by AbleSci AI