Beyond Read-Counts: Ribo-seq Data Analysis to Understand the Functions of the Transcriptome

生物 计算生物学 RNA序列 转录组 基因 遗传学 基因表达
作者
Lorenzo Calviello,Uwe Ohler
出处
期刊:Trends in Genetics [Elsevier BV]
卷期号:33 (10): 728-744 被引量:113
标识
DOI:10.1016/j.tig.2017.08.003
摘要

By mapping the positions of millions of translating ribosomes in the cell, ribosome profiling (Ribo-seq) has established its role as a powerful tool to study gene expression. Several laboratories have introduced modifications to the experimental protocol and expanded the repertoire of biochemical methods to study translation transcriptome-wide. However, the diversity of protocols highlights a need for standardization. At the same time, different computational analysis strategies have used Ribo-seq data to identify the set of translated sequences with high confidence. In this review we present an overview of such methodologies, outlining their assumptions, data requirements, and availability. At the interface between RNA and proteins, Ribo-seq can complement data from multiple omics approaches, zooming in on the central role of translation in the molecular cell. By mapping the positions of millions of translating ribosomes in the cell, ribosome profiling (Ribo-seq) has established its role as a powerful tool to study gene expression. Several laboratories have introduced modifications to the experimental protocol and expanded the repertoire of biochemical methods to study translation transcriptome-wide. However, the diversity of protocols highlights a need for standardization. At the same time, different computational analysis strategies have used Ribo-seq data to identify the set of translated sequences with high confidence. In this review we present an overview of such methodologies, outlining their assumptions, data requirements, and availability. At the interface between RNA and proteins, Ribo-seq can complement data from multiple omics approaches, zooming in on the central role of translation in the molecular cell. Ribo-seq has become an established protocol to identify translated transcript regions via deep sequencing, closing the gap between RNA sequencing and proteomics. Recently developed Ribo-seq data analysis strategies use different features as hallmarks of translation. Specifically, the ability to monitor the positions of translating ribosomes with single-nucleotide precision has driven the development of computational tools that rely on ‘subcodon resolution’. Knowing the concrete assumptions and precise goals of different approaches is crucial. In addition to addressing translation-focused questions, from defining open reading frames to identifying alternative translation initiation sites and estimating differential translation rates, Ribo-seq data show great promise for integrative efforts combining additional omics approaches. Ribo-seq has become an established protocol to identify translated transcript regions via deep sequencing, closing the gap between RNA sequencing and proteomics. Recently developed Ribo-seq data analysis strategies use different features as hallmarks of translation. Specifically, the ability to monitor the positions of translating ribosomes with single-nucleotide precision has driven the development of computational tools that rely on ‘subcodon resolution’. Knowing the concrete assumptions and precise goals of different approaches is crucial. In addition to addressing translation-focused questions, from defining open reading frames to identifying alternative translation initiation sites and estimating differential translation rates, Ribo-seq data show great promise for integrative efforts combining additional omics approaches. a machine-learning approach whose objective is to assign datapoints to different classes (two in the case of binary classifiers). In supervised learning, the classifier is trained on known examples, while unsupervised classification methods are used in absence of known (or labeled) data. a sequence that is translated using one (or more) of the three possible reading frames. a probabilistic method in which a signal (e.g., a coverage track or a nucleotide sequence) is emitted from a finite succession of unknown (hidden) states. The hidden states can represent different biological concepts (e.g., 5′-UTRs, ORFs, etc. in genomic sequence classification); transitions between them specify possible sequences of the states, and can be defined and trained on available data (e.g., read coverage or nucleotide sequences in annotated genomic regions). Once the model is trained, it can be used to parse a new signal and label it with the optimal sequence of states. long transcripts (>200 nt) which do not exhibit clear coding potential. a signal processing method that aims to provide reliable estimates of the spectrum of frequencies present in a signal. In the multitaper method, multiple filters are applied as windows over the same signal, and coefficients for all frequency components are retrieved from each filtered sample (using the Fourier transform). Different types of filters have been proposed; specifically, the use of the so-called Slepian sequences enables the application of a statistical test to each frequency component. a modified version of the ordinary least squares, in which the regression coefficients cannot be negative values. an mRNA surveillance pathway that degrades aberrant transcripts, thus preventing the production of non-functional proteins. One of the proposed mechanisms for NMD involves the recognition of a premature termination codon (PTC), aided by the action of proteins that are part of the exon junction complex (EJC). a section of a transcript which contains a start and a stop codon in frame. In eukaryotes, most mRNA transcripts contain one main ORF that is translated into a polypeptide. a technique that isolates nascent protein chains. Ribosome–nascent chain complexes are first isolated, and biotinylated puromycin is incorporated into the complexes. Streptavidin pulldown allows the nascent protein chains to be extracted, and these can by analyzed by LC-MS/MS. proteomics techniques aimed at quantifying protein expression. Label-free quantification methods can be used, but techniques such as SILAC that label amino acids can represent superior alternatives for protein quantification. a classification algorithm that combines the classification output of multiple classifiers, called decision trees. Each tree splits the data into different groups (‘leaves’) and assigns a label to each datapoint in each leaf. Each tree is applied to a subset of the data and features to avoid overfitting. Usually used as a supervised learning method, random forests can also be used for unsupervised learning and for regression tasks. this aims to quantify the relationship between a target variable and one (or more) features. To this end, approaches fit a function that minimizes the distance between the predictor and the target variable (e.g., by using the least squares method). The regression coefficient quantifies the relationship between the target variable and the predictor. a set of techniques that enable the identification and quantification of protein expression from a mixture of digested peptides, using peptide isolation (usually with liquid chromatography, LC) and tandem mass spectrometry (MS/MS). When they are eluted in the LC step, peptides are ionized, and ions are selected in the first MS step according to their mass-to-charge (m/z) ratio. Ions are then fragmented, and in the second MS step fragment ions are again isolated according their m/z ratio and quantified. Using a reference protein database, m/z values can be mapped to expected values matching peptides from known proteins. a measure of correlation between two frequency spectra. Signals exhibiting a similar set of frequency components will have high coherence. pSILAC is a variant of SILAC in which labeled amino acids are added to the cell culture for short periods of time, thus allowing the kinetics of de novo protein synthesis to be monitored. a binary classification algorithm. SVMs are supervised learning methods and therefore need to be trained on known examples. In the training stage, SVMs aim to define a separating line maximizing the distance between the two sets of data. When a linear separation of the two sets is not effective, SVMs can compute the distance between datapoints in a higher-dimensional space by means of different kernel functions in which a linear separation between the samples is possible. This strategy (the ‘kernel trick’) enables non-linear classification, and has contributed to the popularity of SVMs in the machine-learning community. the section of a coding mature mRNA that does not code for protein. The 5′-UTR is located upstream of the start codon, while the 3′-UTR is downstream of the stop codon. a small (usually <100 aa) ORF whose start codon is located in the 5′-UTR upstream of the main ORF of a transcript. Many uORFs have been shown to regulate the translation of the main ORF. It is generally assumed that uORFs do not encode stable polypeptides.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
1秒前
情怀应助黄3采纳,获得10
3秒前
baozibaozi完成签到,获得积分10
3秒前
4秒前
nm123发布了新的文献求助10
5秒前
严昌发布了新的文献求助10
5秒前
善学以致用应助zhangxueqing采纳,获得10
7秒前
赘婿应助咩咩采纳,获得10
8秒前
pkc-hzch完成签到,获得积分10
8秒前
xujia完成签到,获得积分10
8秒前
科目三应助老武采纳,获得10
9秒前
10秒前
qt完成签到,获得积分10
10秒前
邓佳鑫Alan应助洋桔梗采纳,获得30
12秒前
小章鱼完成签到,获得积分10
12秒前
13秒前
Serendipity完成签到,获得积分10
13秒前
14秒前
道友且慢发布了新的文献求助20
14秒前
精明的新蕾完成签到,获得积分10
14秒前
科研通AI5应助starry采纳,获得30
14秒前
严昌完成签到,获得积分20
14秒前
nm123完成签到,获得积分10
15秒前
16秒前
16秒前
怕黑念薇发布了新的文献求助10
16秒前
bc关闭了bc文献求助
16秒前
16秒前
wxd完成签到,获得积分10
17秒前
dild完成签到,获得积分10
17秒前
充电宝应助岩岩岩岩岩采纳,获得50
18秒前
19秒前
jinjun发布了新的文献求助10
19秒前
wxd发布了新的文献求助10
20秒前
科研通AI5应助GWZZ采纳,获得10
20秒前
ycy发布了新的文献求助10
20秒前
dild发布了新的文献求助10
20秒前
21秒前
haha发布了新的文献求助10
21秒前
高分求助中
Разработка метода ускоренного контроля качества электрохромных устройств 500
Chinesen in Europa – Europäer in China: Journalisten, Spione, Studenten 500
Arthur Ewert: A Life for the Comintern 500
China's Relations With Japan 1945-83: The Role of Liao Chengzhi // Kurt Werner Radtke 500
Two Years in Peking 1965-1966: Book 1: Living and Teaching in Mao's China // Reginald Hunt 500
Epigenetic Drug Discovery 500
Politiek-Politioneele Overzichten van Nederlandsch-Indië. Bronnenpublicatie, Deel II 1929-1930 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3819296
求助须知:如何正确求助?哪些是违规求助? 3362356
关于积分的说明 10416633
捐赠科研通 3080508
什么是DOI,文献DOI怎么找? 1694605
邀请新用户注册赠送积分活动 814703
科研通“疑难数据库(出版商)”最低求助积分说明 768388