UniProt公司
注释
计算生物学
剧目
互补性(分子生物学)
生物
功能(生物学)
蛋白质数据库
计算机科学
生物信息学
遗传学
基因
生物化学
物理
声学
作者
Hannelore Longin,George Bouras,Susanna R. Grigson,Robert A. Edwards,Hanne Hendrix,Rob Lavigne,Vera van Noort
标识
DOI:10.1101/2025.07.17.665397
摘要
Phages, the viruses of bacteria, harbor an incredibly diverse repertoire of proteins capable of manipulating their bacterial hosts, inspiring many medical and biotechnological applications. However, to date, only a limited subset of that repertoire can be exploited, due to the difficulties in functionally elucidating these proteins. In this study, we investigated several structure-informed approaches to annotate hypothetical proteins from Pseudomonas infecting phages. We curated a representative dataset of over 10,000 proteins derived from NCBI, for which we predicted protein structures with ColabFold and assessed structural similarity via FoldSeek against the PDB, AlphaFold, and Phold databases. We evaluated multiple annotation strategies, including sequence-based (Pharokka), and structure-based (FoldSeek, Phold) methods. Our results show that up to 43 % of truly unannotated proteins can be functionally annotated when combining structure-informed approaches with UniProt-derived annotations. We highlight the complementarity of different databases and the importance of annotation quality filtering. This work provides a valuable resource of predicted structures and annotations, and offers insights into optimizing structure-based annotation pipelines for viral proteins, paving the way for deeper exploration of phage biology and its applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI