生物
转录组
RNA序列
计算生物学
核糖核酸
深度测序
基因
遗传学
人口
DNA测序
基因表达谱
基因表达
生物信息学
基因组
社会学
人口学
作者
Shanrong Zhao,Ye Zhan,Robert V. Stanton
出处
期刊:RNA
[Cold Spring Harbor Laboratory Press]
日期:2020-04-13
卷期号:26 (8): 903-909
被引量:274
标识
DOI:10.1261/rna.074922.120
摘要
In recent years, RNA-sequencing (RNA-seq) has emerged as a powerful technology for transcriptome profiling. For a given gene, the number of mapped reads is not only dependent on its expression level and gene length, but also the sequencing depth. To normalize these dependencies, RPKM (reads per kilobase of transcript per million reads mapped) and TPM (transcripts per million) are used to measure gene or transcript expression levels. A common misconception is that RPKM and TPM values are already normalized, and thus should be comparable across samples or RNA-seq projects. However, RPKM and TPM represent the relative abundance of a transcript among a population of sequenced transcripts, and therefore depend on the composition of the RNA population in a sample. Quite often, it is reasonable to assume that total RNA concentration and distributions are very close across compared samples. Nevertheless, the sequenced RNA repertoires may differ significantly under different experimental conditions and/or across sequencing protocols; thus, the proportion of gene expression is not directly comparable in such cases. In this review, we illustrate typical scenarios in which RPKM and TPM are misused, unintentionally, and hope to raise scientists' awareness of this issue when comparing them across samples or different sequencing protocols.
科研通智能强力驱动
Strongly Powered by AbleSci AI