序列(生物学)
蛋白质测序
序列空间
肽序列
计算生物学
蛋白质设计
蛋白质结构预测
蛋白质结构
蛋白质家族
计算机科学
算法
生物
数学
遗传学
生物化学
基因
巴拿赫空间
纯数学
作者
Sidney Lisanza,Jake Merle Gershon,S. Tipps,Lucas Arnoldt,Samuel J. Hendel,Jeremiah Nelson Sims,Xinting Li,David Baker
标识
DOI:10.1101/2023.05.08.539766
摘要
Abstract Protein denoising diffusion probabilistic models (DDPMs) show great promise in the de novo generation of protein backbones but are limited in their inability to guide generation of proteins with sequence specific attributes and functional properties. To overcome this limitation, we develop ProteinGenerator, a sequence space diffusion model based on RoseTTAfold that simultaneously generates protein sequences and structures. Beginning from random amino acid sequences, our model generates sequence and structure pairs by iterative denoising, guided by any desired sequence and structural protein attributes. To explore the versatility of this approach, we designed proteins enriched for specific amino acids, with internal sequence repeats, with masked bioactive peptides, with state dependent structures, and with key sequence features of specific protein families. ProteinGenerator readily generates sequence-structure pairs satisfying the input conditioning (sequence and/or structural) criteria, and experimental validation showed that the designs were monomeric by size exclusion chromatography (SEC), had the desired secondary structure content by circular dichroism (CD), and were thermostable up to 95°C. By enabling the simultaneous optimization of both sequence and structure, ProteinGenerator allows for the design of functional proteins with specific sequence and structural attributes, and paves the way for protein function optimization by active learning on sequence-activity datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI