转录组
长非编码RNA
计算生物学
生物
单细胞分析
计算机科学
小RNA
细胞
核糖核酸
遗传学
基因
基因表达
作者
Raza‐Ur Rahman,Iftikhar Ahmad,Zixiu Li,Robert P. Sparks,Amel Ben Saad,Alan C. Mullen
标识
DOI:10.1101/2022.10.31.514182
摘要
Single cell RNA sequencing (scRNA-seq) has revolutionized the study of gene expression in individual cell types, but scRNA-seq studies have focused primarily on expression of protein-coding genes. Long noncoding RNAs (lncRNAs) are more diverse than protein-coding genes, yet remain underexplored in part because they are under-represented in reference annotations applied to scRNA-seq. Merging annotations containing protein-coding and lncRNA genes is not sufficient, because the addition of lncRNA genes that overlap in sense and antisense with protein-coding genes will affect how reads are counted for both protein-coding and lncRNA genes. Here, we introduce Singletrome, a Singularity image that integrates protein-coding and lncRNA gene transfer format (GTF) annotations to generate enhanced annotations that take into account the sense and antisense overlap of annotated genes, maps scRNA-seq data, and produces files for downstream analysis and visualization. With Singletrome, we observed an increase in the number of reads mapped to exons, detected thousands of lncRNAs not included in GENCODE, and observed a decrease in uniquely mapped reads, indicating improved mapping specificity. Moreover, we were able to cluster cell types based solely on lncRNAs expression, and lncRNAs alone were able to predict cell types and human disease pathology through machine learning. This comprehensive annotation will allow mapping of lncRNA expression across cell types of the human body, facilitating the development of an atlas of human lncRNAs in health and disease with the ability to integrate new lncRNA annotations as they become available.
科研通智能强力驱动
Strongly Powered by AbleSci AI