蛋白质基因组学
计算生物学
开放式参考框架
生物
基因组
蛋白质组
注释
核糖体分析
人类蛋白质组计划
计算机科学
翻译(生物学)
蛋白质组学
基因组学
遗传学
打开阅读框
基因
肽序列
信使核糖核酸
作者
Eduardo Vieira de Souza,Angie L. Bookout,Christopher A. Barnes,Brendan Miller,Pablo Machado,Luiz Augusto Basso,Cristiano Valim Bizarro,Alan Saghatelian
标识
DOI:10.1101/2023.09.27.559809
摘要
Abstract There has been a dramatic increase in the identification of non-conical translation and a significant expansion of the protein-coding genome and proteome. Among the strategies used to identify novel small ORFs (smORFs), Ribosome profiling (Ribo-Seq) is the gold standard for the annotation of novel coding sequences by reporting on smORF translation. In Ribo-Seq, ribosome-protected footprints (RPFs) that map to multiple sites in the genome are computationally removed since they cannot unambiguously be assigned to a specific genomic location, or to a specific transcript in the case of multiple isoforms. Furthermore, RPFs necessarily result in short (25-34 nucleotides) reads, increasing the chance of ambiguous and multi-mapping alignments, such that smORFs that reside in these regions cannot be identified by Ribo-Seq. Here, we show that the inclusion of proteogenomics to create a Ribosome Profiling and Proteogenomics Pipeline (RP3) bypasses this limitation to identify a group of microprotein-encoding smORFs that are missed by current Ribo-Seq pipelines. Moreover, we show that the microproteins identified by RP3 have different sequence compositions from the ones identified by Ribo-Seq-only pipelines, which can affect proteomics identification. In aggregate, the development of RP3 maximizes the detection and confidence of protein-encoding smORFs and microproteins.
科研通智能强力驱动
Strongly Powered by AbleSci AI