非规范的
匹配(统计)
氨基酸
流量(数学)
计算生物学
计算机科学
化学
数学
生物化学
生物
细胞生物学
统计
几何学
作者
Jin Sub Lee,Philip M. Kim
标识
DOI:10.1101/2025.07.31.667780
摘要
Abstract The canonical vocabulary of twenty amino acids limits the chemical space available to proteins and peptides. Expanding this vocabulary to hundreds of non-canonical amino acids allows the engineering of proteins with novel function and activity, and is of great interest for the discovery of novel drugs such as macrocyclic peptides. Here we present NCFlow, a flow-based generative model capable of incorporating any arbitrary non-canonical amino acid into a given protein. To supplement sparse training data in the Protein Data Bank, NCFlow is pretrained on millions of small molecule structures and a large set of protein-ligand complexes before finetuning on native non-canonicals found within proteins in the Protein Data Bank. We show that NCFlow outperforms AlphaFold3-based methods in the structure prediction of unseen non-canonical amino acids. We present a peptide design pipeline akin to in silico deep mutational scanning, and propose a novel scoring strategy using a combination of deep learning-based and molecular dynamics-based alchemical binding free energy calculations to identify improved peptide variants. We apply the method on four protein-peptide complex test cases, and observe that incorporating non-canonicals can significantly improve binding affinity by up to -7.0 kcal/mol. Thus, NCFlow can be easily integrated into existing protein design platforms to further improve its properties outside of what is capable with standard amino acids.
科研通智能强力驱动
Strongly Powered by AbleSci AI