计算机科学
代表(政治)
蛋白质设计
构造(python库)
自然语言处理
人工智能
情报检索
蛋白质结构
程序设计语言
政治学
核磁共振
政治
物理
法学
作者
Shengchao Liu,Yutao Zhu,Jiarui Lu,Xu Zhao,Weili Nie,Anthony Gitter,Chaowei Xiao,Jian Tang,Hongyu Guo,Anima Anandkumar,Tang, Jian,Guo, Hongyu,Anandkumar, Anima
出处
期刊:Cornell University - arXiv
日期:2023-02-09
被引量:23
标识
DOI:10.48550/arxiv.2302.04611
摘要
Current AI-assisted protein design mainly utilizes protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in the text format describing proteins' high-level functionalities. Yet, whether the incorporation of such text data can help protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multi-modal framework that leverages textual descriptions for protein design. ProteinDT consists of three subsequent steps: ProteinCLAP which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality, and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441K text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.
科研通智能强力驱动
Strongly Powered by AbleSci AI