蛋白质异构体
Ensembl公司
选择性拼接
数据库
基因亚型
RNA序列
计算生物学
核糖核酸
RNA剪接
蛋白质组学
蛋白质基因组学
基因
生物
遗传学
转录组
基因组学
基因组
基因表达
计算机科学
作者
Aidan P. Tay,Joshua J. Hamey,Gabriella E. Martyn,Laurence O.W. Wilson,Marc R. Wilkins
标识
DOI:10.1021/acs.jproteome.1c00968
摘要
Alternative splicing can lead to distinct protein isoforms. These can have different functions in specific cells and tissues or in different developmental stages. In this study, we explored whether transcripts assembled from long read, nanopore-based, direct RNA-sequencing (RNA-seq) could improve the identification of protein isoforms in human K562 cells. By comparing with Illumina-based short read RNA-seq, we showed that a large proportion of Ensembl transcripts (5949/14,326) and genes expressing alternatively spliced transcripts (486/2981) identified with long direct reads were missed by short paired-end reads. By co-analyzing proteomic and transcriptomic data, we also showed that some peptides (826/35,976), proteins (262/3215), and protein isoforms arising from distinct transcript variants (574/1212) identified with isoform-specific peptides via custom long-read-based databases were missed in Illumina-derived databases. Finally, we generated unequivocal peptide evidence for a set of protein isoforms and showed that long read, direct RNA-seq allows the discovery of novel protein isoforms not already in reference databases or custom databases built from short read RNA-seq data. Our analysis highlights the benefits of long read RNA-seq data in the generation of reference databases to increase tandem mass spectrometry (MS/MS) identification of protein isoforms.
科研通智能强力驱动
Strongly Powered by AbleSci AI