持久同源性
计算机科学
序列空间
蛋白质设计
不变(物理)
拓扑数据分析
蛋白质工程
序列(生物学)
健身景观
蛋白质测序
水准点(测量)
人工智能
拓扑(电路)
数学
蛋白质结构
算法
理论计算机科学
离散数学
生物
肽序列
组合数学
遗传学
地理
社会学
巴拿赫空间
人口学
基因
酶
生物化学
数学物理
大地测量学
人口
标识
DOI:10.1101/2022.12.18.520933
摘要
Abstract While protein engineering, which iteratively optimizes protein fitness by screening the gigantic mutational space, is constrained by experimental capacity, various machine learning models have substantially expedited protein engineering. Three-dimensional protein structures promise further advantages, but their intricate geometric complexity hinders their applications in deep mutational screening. Persistent homology, an established algebraic topology tool for protein structural complexity reduction, fails to capture the homotopic shape evolution during the filtration of a given data. This work introduces a T opology- o ffered p rotein Fit ness (TopFit) framework to complement protein sequence and structure embeddings. Equipped with an ensemble regression strategy, TopFit integrates the persistent spectral theory, a new topological Laplacian, and two auxiliary sequence embeddings to capture mutation-induced topological invariant, shape evolution, and sequence disparity in the protein fitness landscape. The performance of TopFit is assessed by 34 benchmark datasets with 128,634 variants, involving a vast variety of protein structure acquisition modalities and training set size variations.
科研通智能强力驱动
Strongly Powered by AbleSci AI