Decision letter: Many lncRNAs, 5’UTRs, and pseudogenes are translated and some are likely to express functional proteins

假基因 计算生物学 生物 遗传学 计算机科学 基因 基因组
标识
DOI:10.7554/elife.08890.028
摘要

Article Figures and data Abstract eLife digest Introduction Results Discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract Using a new bioinformatic method to analyze ribosome profiling data, we show that 40% of lncRNAs and pseudogene RNAs expressed in human cells are translated. In addition, ~35% of mRNA coding genes are translated upstream of the primary protein-coding region (uORFs) and 4% are translated downstream (dORFs). Translated lncRNAs preferentially localize in the cytoplasm, whereas untranslated lncRNAs preferentially localize in the nucleus. The translation efficiency of cytoplasmic lncRNAs is nearly comparable to that of mRNAs, suggesting that cytoplasmic lncRNAs are engaged by the ribosome and translated. While most peptides generated from lncRNAs may be highly unstable byproducts without function, ~9% of the peptides are conserved in ORFs in mouse transcripts, as are 74% of pseudogene peptides, 24% of uORF peptides and 32% of dORF peptides. Analyses of synonymous and nonsynonymous substitution rates of these conserved peptides show that some are under stabilizing selection, suggesting potential functional importance. https://doi.org/10.7554/eLife.08890.001 eLife digest Our genes encode the instructions needed to make proteins. When a gene is switched on, it's DNA is used as a template to make molecules of messenger ribonucleic acid (RNA). These RNAs are then "translated" into proteins by large cell machines called ribosomes. Within the messenger RNA, a long region called an "open reading frame" is the section that encodes the protein. The human genome also contains a vast amount of DNA that is not part of any gene. Cells can produce molecules of RNA from this DNA (so-called "non-coding RNAs"), but these RNAs are not thought to code for proteins because they lack long open reading frames. Non-coding RNAs can also be made from sections of DNA called "pseudogenes", which have lost their ability to code for proteins over the course of evolution. Furthermore, messenger RNAs also contain short open reading frames in the "untranslated" regions that flank the protein-coding region. The extent to which cells translate non-coding RNAs to produce small proteins (or peptides) is not known. "Ribosome profiling" is a powerful method to determine which RNAs are translated, but it is not always possible to distinguish between the RNAs that are genuinely translated and those that just happen to be bound to ribosomes. Ji et al. overcome these limitations by developing a new computational method to analyse data from ribosome profiling. The experiments show that thousands of non-coding RNAs in the human genome are, in fact, translated. This is many more than anticipated and represents approximately 40% of the lncRNAs and pseudogene RNAs, and 35% of untranslated regions in messenger RNAs. Ji et al. also found that a small group of all the lncRNA peptides in the human genome appear to have changed little over the course of evolution, which strongly suggests that they have specific roles in cells. The next challenge is to find out what roles the peptides encoded by these lncRNAs play in cells. https://doi.org/10.7554/eLife.08890.002 Introduction In the central dogma, mRNAs are translated into proteins that carry out biological functions. On a genomic scale, translated regions are identified as open reading frames (ORFs) that are longer (typically >100 amino acids) than expected by chance, given sequence composition. In addition to mRNAs, mammalian cells contain other RNA transcripts generated by RNA polymerase II that are polyadenylated, spliced, and capped, but may not code for protein. One category consists of thousands of long RNAs that lack long open reading frames and have been considered to be non-coding (Guttman et al., 2009; 2010; Trapnell et al., 2010; Cabili et al., 2011). A few lncRNAs play key regulatory roles in various biological processes via functional RNA domains that regulate chromatin modifications, DNA transcription, mRNA stability, and translation (Rinn and Chang, 2012; Batista and Chang, 2013; Ulitsky and Bartel, 2013). However, the biological functions of most lncRNAs remain unknown. The human genome also encodes thousands of pseudogenes, which are homologous to protein-coding genes but have lost their coding ability and/or are not expressed (Vanin, 1985). Pseudogenes can function as competing endogenous RNAs (ceRNAs) regulating other RNA transcripts by competing for microRNAs (Salmena et al., 2011). Some pseudogenes are differentially expressed in human cancers (Kalyana-Sundaram et al., 2012; Han et al., 2014), but it is unknown if the RNAs expressed from pseudogenes are translated or have biological functions. By definition, noncoding RNAs should not be translated into protein, but this can be difficult to ascertain using informatics alone because they contain short open reading frames that could be potentially translated. Even if a peptide is expressed from a putative non-coding RNA, it is difficult to determine whether the peptide has a biological function or is a mere by-product of an RNA that performs the biological function. However, there are a few examples of lncRNAs that are in fact translated into short peptides with biological roles (Galindo et al., 2007; Kondo et al., 2010; Magny et al., 2013; Pauli et al., 2014). In addition, a number of mammalian mRNAs contain so-called 5' untranslated regions (5'UTRs) with one or more ORFs upstream of their canonical protein-coding regions (uORFs). Due to the scanning mechanism for translational initiation in which ribosomes scan in a 5' to 3' direction from the mRNA cap to find an initiation codon (Sonenberg and Hinnebusch, 2009), uORFs have the potential to regulate translation of the primary protein-coding ORF (Calvo et al., 2009; Barbosa et al., 2013). For example, translation of the uORFs in the yeast GCN4 gene strongly inhibits translation of Gcn4 under normal conditions (Hinnebusch, 2005). However, during amino acid starvation, ribosomes reinitiate translation at the canonical AUG codon, thereby permitting increased synthesis of Gcn4 (Hinnebusch, 2005). In human cells, bioinformatic analyses and limited functional testing indicate that uORFs can inhibit protein production, but genome-wide functional analysis has yet to be performed (Calvo et al., 2009; Barbosa et al., 2013). Ribosome profiling, the sequencing of ribosome-associated RNAs, represents a powerful assay for assessing translation in vivo in an unbiased manner on a genome-wide scale (Ingolia et al., 2009; Ingolia, 2014). In particular, ribosome profiling in mammalian cells reveals many reads derived from lncRNAs and 5' UTRs, and lncRNAs and 5'UTRs can be co-purified with 80S ribosome, indicating that these transcripts are translated (Ingolia et al., 2011; 2014). However, unlike canonical protein coding-genes translated from mRNAs, many lncRNAs do not have a predominant ORF based on the ribosome release or disengagement scores (Chew et al., 2013; Guttman et al., 2013). However, due to a variety of limitations, previous analyses typically did not explicitly identify in-frame translated ORFs, and they identified only several hundred translated regions that do not correspond to canonical protein-coding regions. Importantly, ribosome profiling reads do not necessarily represent its active translation, due to potential artifacts from non-ribosomal entities and scanning ribosomes (Guttman et al., 2013; Ingolia et al., 2014). Systematic examination of translation requires a computational method to identify bona fide translated ORFs in an unbiased fashion. Here we develop a method, RibORF, to analyze ribosomal profiling data and identify translated ORFs that combines alignment of ribosomal A-sites, 3-nt periodicity, and uniformity across codons. RibORF can effectively distinguish in-frame ORFs from overlapping off-frame ORFs, and it can distinguish reads arising from RNAs that are not associated with ribosomes. Using RibORF, we identify thousands of translated ORFs in lncRNAs, pseudogenes, and mRNA regions upstream (5'UTRs) and downstream (3'UTRs) of protein-coding sequences. Our results suggest that cytoplasmic noncoding RNAs are translated, and that some of these translated products are likely to be biologically meaningful based on their evolutionary conservation. Results Ribosome profiling experiment reveals in vivo translation in single nucleotide resolution We performed ribosome profiling (Figure 1A) in two isogenic human cancer cell models: a Src-inducible mammary epithelial model and a Ras-dependent fibroblast model (Hirsch et al., 2010). Cells were treated either with cycloheximide, which inhibits translational elongation of ribosomes throughout the mRNA coding region, or harringtonine, which traps the ribosome at the site of translational initiation. After removing reads aligned to rRNAs and multiple genomic locations, we generated 44.0 and 21.2 million unique mappable reads upon cycloheximide treatment for breast epithelial and fibroblast cell transformation models, respectively. For harringtonine treatment, we obtained 5.9 and 9.0 million unique mappable reads for breast epithelial and fibroblast cells, respectively. Figure 1 with 1 supplement see all Download asset Open asset Ribosome profiling reveals in vivo translation with single nucleotide resolution. (A) Ribosome profiling experiment. (B) Read distribution (reads/million mappable reads; RPM) around start and stop codons of canonical protein coding genes. (C) Fractions of reads in 1st, 2nd and 3rd nucleotides of codons in the indicated types of ORFs. (D) Read distribution in the protein-coding gene CPSF2. The RPM value was calculated for every 20-nt region along the transcript. (E) Distribution of reads across human genome. (F) Read distribution of the snoRNA gene SNORA49 in cells treated with cycloheximide (Chx) or harringtonine (Harr). (G) Distribution of PME values in the indicated types of ORFs. https://doi.org/10.7554/eLife.08890.003 The length of ribosome-protected fragments (RPFs) ranges primarily between 24–31 nts (Figure 1—figure supplement 1A). Notably, RPFs with different length have variable distances between the 5' end and the ribosome A-site, as defined by canonical ORFs in protein-coding genes (Figure 1—figure supplement 1B). We used these offset distances in known protein-coding genes to account for the read length distribution and thereby align RPFs to specific A-site nucleotides throughout the entire dataset. Most expressed protein-coding ORFs show a clear 3-nt periodicity corresponding to codon triplets (Figure 1B,C). The 1st nucleotides of codons in an ORF contain about 65% of reads, while the 2nd and 3rd have 24% and 11%, respectively (Figure 1B,C). In addition, reads in protein-coding genes are uniformly distributed across codons in an ORF (Figure 1D). 73% of ribosome profiling reads map to canonical ORFs of mRNAs. 2% and 4% map to 5'UTRs and 3'UTRs of mRNAs, respectively, and 9% map to lncRNAs and pseudogenes, suggesting pervasive non-canonical translation (Figure 1E). Removing sequence reads that are not derived from translated RNA Consistent with previous reports (Ingolia et al., 2011; Guttman et al., 2013), some ribosome profiling reads map to short noncoding RNAs, including small nucleolar RNA (snoRNA). As snoRNAs are located in nucleus, they should not be accessible to translation machinery located in cytoplasm. Indeed, the sequence reads in the snoRNAs map to a very narrow region and are comparable in the cycloheximide- and harringtonine-treated samples (Figure 1F), indicating they do not represent translated regions of these RNAs. To exclude reads that do not represent active translation, we developed a Percentage of Maximum Entropy (PME) approach to measure the uniformity of read distribution across codons in a candidate ORF (See Experimental Procedures). A PME value of 1 represents uniform read distribution, indicative of real translation, while smaller values indicate skewed distribution with a minimum value of 0 indicating reads at a single location, expected for reads not derived from translated RNA. As expected, candidate ORFs from short noncoding RNAs show drastically lower PME values, as compared to canonical protein coding ORFs (Figure 1G). Low PME values indicate RNAs that are not translated, but rather are protected in non-ribosomal protein complexes (Ji et al., 2015). RibORF identifies a large number of translated ORFs in lncRNAs, pseudogenes, and UTRs of mRNAs Based on the 3-nt periodicity (Figure 1C) and uniformity of read distribution across codons (Figure 1G) of translated regions, we developed a Support Vector Machine classifier, RibORF, to identify translated ORFs from ribosome profiling data. The model was trained by using canonical protein-coding ORFs as positive examples and off-frame ORFs from protein-coding regions and candidate ORFs from short noncoding RNAs as negative examples. The classifier using both features performed almost perfectly to separate positive and negative examples in a testing set (Area Under the ROC Curve [AUC] = 0.996), with 3-nt periodicity making a greater contribution (Figure 2A). The algorithm performed well for genes expressed at various levels, with AUC values greater than 0.993 for ORFs with RPKM > 1 (Figure 2—figure supplement 1A). In addition, the predicted translation probabilities are well correlated in the two cancer models (R = 0.97), indicating the algorithm can be robustly applied to various cell types (Figure 2—figure supplement 1B). Figure 2 with 2 supplements see all Download asset Open asset RibORF identifies translating ORFs. (A) Receiver-operating characteristic (ROC) curves to measure algorithm performance using different training parameters. (B) Types of translated ORFs identified in this study, with ORF number:gene number shown in parenthesis. (C) Distribution of reads upon cycloheximide treatment around start codon of predicted positive and negative lncRNA ORFs. Examples of (D) a translated lncRNA (E) an mRNA with a uORF (F) an mRNA with a dORFs; the 3' most exon is shown. Enlarged figures show 3-nt periodicity can be observed for each codon in Figure 2D–F. https://doi.org/10.7554/eLife.08890.005 We applied the classifier to predict translated ORFs within lncRNAs, pseudogenes, and mRNAs. Candidate ORFs showed a mixed population of 3-nt periodicity and PME values (Figure 1C,G). Using a stringent cutoff for the probability of prediction (0.7 with a false positive rate 0.67% and a false negative rate 2.5%; Figure 2—figure supplement 1C), we identified canonical ORFs in 10,946 protein-coding genes, and truncated or extended variants in 544 genes (Figure 2B). The canonical ORFs in almost all expressed transcripts were identified. In addition, we identified so-called uORFs in the 5'UTRs of 3842 protein-coding genes, and uORFs overlapping with coding regions (overlapping uORFs) in 1054 genes Figure 2B). We also identified ORFs located in 3'UTRs of 550 genes, which we term downstream ORFs (dORFs; Figure 2B). In general, translated uORFs and dORFs are expressed from the same transcript as the relevant canonical ORF, although in some cases these may arise from truncated transcripts. Lastly, we identified 1204 ORFs in 510 lncRNAs and 278 ORFs in 161 pseudogenes (Figure 2B). As expected, the predicted translated ORFs show clear 3-nt periodicity and high PME values, while the negative ones do not (Figure 2C and Figure 2—figure supplement 1C-D). Examples of lncRNA ORFs, uORFs and dORFs are shown in Figure 2D-–F, and a full list is presented in Supplementary file 1. For the well expressed ORFs, we observe 3-nt periodicity for individual codons (Figure 2D–F). Uniform 3-nt periodicity over an extended distance is diagnostic of bona fide translation. In this regard, all 7 tested RNAs encoding non-canonical translated ORFs are associated with 80S monosomes and/or polysomes (Figure 2—figure supplement 2). Thus, we will refer to the products of translated ORFs as 'peptides', even though direct biochemical evidence is lacking. In this regard, the peptides represent initial translation products whose stability in vivo is unknown. We suspect that many non-functional peptides will be degraded rapidly and hence difficult to detect biochemically. Nuclear/cytoplasmic localization is a major determinant of translation efficiency We did not detect translation for 679 lncRNAs in breast epithelial cells even though RNA-seq analysis indicates that they are expressed at comparable levels to the 510 translated lncRNAs (p>0.05; Figure 3A). We hypothesized that the distinction between these two classes is that the untranslated lncRNAs would be preferentially localized in nucleus and not accessible to the translation machinery, whereas the translated lncRNAs would be preferentially localized in the cytoplasm. To test this hypothesis, we examined the cytosolic and nuclear distribution (C:N ratio) of lncRNAs, using RNA-seq data from multiple cell lines (Djebali et al., 2012; ENCODE, 2012). Indeed, untranslated lncRNAs are less likely to localize to the cytoplasm (lower C:N ratio), than translated ones (p<10-70; Figure 3B). Similar results are observed for lncRNAs in a variety of cell lines (Figure 3—figure supplement 1A–D). Compared to canonical protein coding mRNAs, translated lncRNAs show slightly lower C:N ratios (p<10-46; Figure 3B). Translated pseudogene RNAs are also more likely to be localized in the cytoplasm as compared with untranslated pseudogene RNAs (Figure 3—figure supplement 1E–G). Figure 3 with 1 supplement see all Download asset Open asset RNA subcellular localization is a major determinate of translation efficiency. (A) RNA expression levels of lncRNAs with or without translated ORFs and canonical mRNAs in MCF10A-ER-Src cells. (B) Relative subcellular location of translated and untranslated lncRNAs and canonical mRNAs. (C) Translation efficiency of translated lncRNAs and canonical mRNAs. (D) Distribution of translation efficiency of canonical mRNAs, calculated as averaged translation efficiency values in breast epithelial and fibroblast cells. (E) Relative subcellular locations of mRNAs grouped based on translation efficiency. https://doi.org/10.7554/eLife.08890.008 Translation efficiency of a given RNA is defined as the ratio of translated RNA (from ribosomal profiling): overall RNA (from RNA-seq). In accord with the reduced C:N ratio of translated lncRNAs as compared to mRNAs, lncRNAs also show lower translation efficiency (p<10-12; Figure 3C). However, when corrected for the reduced levels of lncRNAs in the cytoplasm, it appears that the translation efficiency of cytoplasmic lncRNAs and mRNAs are nearly comparable, albeit slightly reduced. Interestingly, the translation efficiencies of mRNAs vary hundreds of fold (Ingolia et al., 2009) (Figure 3D), and these differences are strongly correlated with localization in the cytosol (Figure 3E and Figure 3—figure supplement 1H–I). The strong relationship between nucleo-cytoplasmic location and translatability of lncRNAs provides strong independent evidence that our classifier effectively identifies translated RNAs. In addition, translation efficiency is strongly correlated with degree of cytoplasmic location, indicating that accessibility of an RNA to the translation machinery is a major determinant of how well it is translated. Features of lncRNA peptides Over 40% (491 out of 1189) of expressed lncRNAs encode peptides longer than 10 aa, and 8% (98 lncRNAs) encode peptides longer than 100 aa (Figure 4A). The median length of all peptides translated from lncRNAs (43 aa; Figure 4B) is considerably longer than that of peptides generated from uORFs (17 aa). Translation of many lncRNAs yields multiple peptides from non-overlapping ORFs, and the median length of the longest peptide translated by a given lncRNA is 62 aa (Figure 4C). Translated lncRNAs use AUG start codons more often than uORFs (Figure 4—figure supplement 1A,B). Figure 4 with 6 supplements see all Download asset Open asset Features and conservation of lncRNA peptides. (A) Fraction of expressed lncRNAs that encode peptides longer than a certain length. (B) Peptide length encoded by lncRNAs. (C) Length of the longest peptide in a given lncRNAs. (D) Length of conserved lncRNA peptides. (E) LncRNA LOC284023 encodes two peptides, the upstream one being conserved in the mouse lncRNA Chd3os. (F) Ka and Ks values of types of conserved lncRNA peptides with Z-Test p-values shown. (G) Ka/Ks ratios of types of conserved lncRNA peptides. https://doi.org/10.7554/eLife.08890.010 For mRNAs, the longest candidate ORFs are virtually always translated into functional proteins, but this is not the case for lncRNAs. The median length of the longest candidate ORF in a given lncRNA is 79 aa, but the longest candidate ORFs is translated only for 56% of the lncRNAs (Figure 4—figure supplement 1C,D). For the remaining 38% of the lncRNAs, the translated ORF was located upstream of the longest ORF. This preferential translation of ORFs located closer to the 5' ends of the lncRNAs likely reflects the strong preference of translation to initiated at the first AUG codon. The fact that the longest candidate ORF and/or its 5' proximal location is not necessarily the portion of the lncRNA that is translated indicates the value of the RibORF algorithm. Conservation of human lncRNA peptides in mouse To address the functional significance of peptides translated from lncRNAs, we used four approaches to study their evolutionary conservation. First, we used PhastCon scores based on 44-vertebrate Multiz alignment (Siepel et al., 2005) to measure conservation of ORF nucleotide sequence among species (Figure 4—figure supplement 2). Second, we used the PhyloCSF score to study the protein-coding potential of ORF sequences based on 29-mammal genome alignment (Lin et al., 2011)(Figure 4—figure supplement 3). Third, we checked the conservation of human peptides in mouse transcripts at the amino acid level and defined them to be conserved if two homologous ORFs encode peptides with a BLASTP alignment E-value <10-4 (False Discovery Rate < 0.0005 for all types and lengths of ORFs; Figure 4—figure supplements 4 and 5, and Supplementary file 2). Fourth, for lncRNA peptides conserved between human and mouse, we computed the ratio of nonsynonymous (Ka) to synonymous (Ks) substitution rates of the homologous nucleotide sequences. The Ka/Ks ratio is a commonly used parameter to infer the direction and magnitude of natural selection on peptide sequences (Hurst, 2002). A ratio smaller than 1 indicates a significant number of nucleotide sequence changes that do not result in protein sequence changes, indicating that the protein is under stabilizing (negative) selection and likely to be functional. For these analyses, we excluded the 30 lncRNAs that encode peptides conserved in mouse protein-coding genes and likely to be pseudogenes mis-annotated by GENCODE (Supplementary file 2). For each translated ORF, we compared its conservation level (Phastcon and PhyloCSF score) to untranslated segments that are matched for length and transcript location. Interestingly, at the nucleotide level, translated ORF sequences tend to be more conserved and have higher coding potential than the untranslated sequences (p<10-4; Figure 4—figure supplements 2A and 3A). The pattern is consistent for translated ORFs with different lengths, suggesting that some peptides might be functional. Most lncRNA peptides (92%) do not contain protein domains annotated by Pfam (Punta et al., 2012) (Figure 4—figure supplement 2C). ORF nucleotide sequences encoding short peptides (<100 aa) containing protein domains are more conserved (p<10-3; Figure 4—figure supplement 2D). 93 translated lncRNAs (19% of the total) have homologous lncRNA genes in mouse. From those conserved lncRNA genes, 41 (44%) express conserved peptides, with a median length 69 aa (Figure 4D, Figure 4—figure supplement 4A, and Supplementary file 2). As expected, these conserved peptides have higher coding potential than non-conserved ones (Figure 4—figure supplement 3A). For example, the human lncRNA LOC284023 expresses a 97 aa peptide encoded by the 5' end, and a 37 aa peptide encoded downstream (Figure 4E). The 97 aa peptide is conserved in mouse homologous transcript Chd3os, while the 37 aa peptide is not. Interestingly, human lncRNA peptides conserved with mouse peptides encoded by lncRNAs have Ka/Ks ratios significantly lower than 1 (Figure 4F,G). The low Ka/Ks ratios were not due to our BLASTP E-value cutoff (Figure 4—figure supplement 6). 20 such lncRNAs express peptides with Ka/Ks values smaller than 0.5, and 12 have values < 0.3. Consistently, peptides with lower Ka/Ks values have higher coding potential based on PhyloCSF scores (Figure 4—figure supplement 3A), suggesting that they are evolutionary stabilized and are probably functionally important. Features and conservation of pseudogene peptides The human genome contains 13,708 annotated pseudogenes that are derived from ancestral protein-coding genes but generally not expressed as RNAs and believed to have lost their protein-coding capability. However, out of 426 expressed pseudogenes (~3% of those annotated), 155 (36%) are translated into peptides longer than 10 aa. In addition, 81 expressed pseudogenes (19%) generate peptides longer than 100 aa (Figure 5A), and most (~80%) of these contain at least one protein domain (Figure 4—figure supplement 2C). The median length of pseudogene peptides is 70 aa (Figure 5B), and the median length of the longest peptide translated by a pseudogene is 102 aa (Figure 5C), which is 30 aa longer than lncRNA peptides. Figure 5 Download asset Open asset Features and conservation of pseudogene peptides. (A) Fraction of expressed pseudogenes that encode peptides longer than a certain length. (B) Peptide length encoded by pseudogenes. (C) Length of the longest peptides in a given pseudogenes. (D) Length of conserved pseudogene peptides. (E) Peptide in a human pseudogene FAM86C2P is conserved in the mouse protein coding gene Fam86. FAM86C2P also has a homologous human protein coding gene FAM86A. (F) Conserved human pseudogene peptides, grouped based on their homologous ORF types in mouse genome. (G) Ka and Ks values of types of conserved pseudogene peptides with Z-Test p-values shown. (H) Ka/Ks ratios of types of conserved pseudogene peptides. https://doi.org/10.7554/eLife.08890.017 Nucleotide sequences of translated ORFs in pseudogenes are significantly more conserved and have higher coding potential than untranslated sequences of the matching sizes and relative positions, and the pattern is consistent for translated ORFs of various sizes (p<10-22; Figure 4—figure supplements 2B and 3B). 114 pseudogene peptides (74% out of those translated) are conserved in mouse, with a median length 92 aa (Figure 5D, Figure 4—figure supplements 3B and 4B, and Supplementary file 2) that is ~25% the length of the corresponding canonical proteins. For example, the mouse protein-coding gene Fam86 has a homologous protein-coding gene FAM86A in human, and also has a homologous pseudogene FAM86C2P, which is annotated as a long noncoding RNA. We found FAM86C2P is translated into a peptide with 131 aa, while mouse Fam86 protein is 336 aa (Figure 5E). Several internal coding exons in Fam86 are lost in FAM86C2P during evolution. 69% of conserved human pseudogene peptides are homologous to canonical ORFs in mouse mRNAs (Figure 5F). As a class, these conserved peptides show a Ka/Ks ratio significantly lower than 1 (Figure 5G,H), with 50 pseudogenes expressing peptides with Ka/Ks values lower than 0.3. This suggests that, although some human pseudogenes are translated into shorter peptides than their mouse homologs, the peptide sequences are evolutionarily constrained, and hence may play functional roles. In addition, 15% of conserved pseudogene peptides are homologous to mouse pseudogenes, and these peptides also have Ka/Ks ratios even lower than those homologous to mouse canonical ORFs, including 19 with Ka/Ks ratios < 0.3 (Figure 5F–H). Thus, pseudogenes with longer evolutionary histories are more likely to encode functional peptides. In contrast, the remaining 16% of conserved pseudogene peptides are homologous to non-canonical ORFs in mouse mRNAs, and these peptides have Ka/Ks ratios close to 1 suggesting they are nonfunctional (Figure 5F–H). Translation of uORFs and dORFs and the relationship to protein-coding sequences The median lengths of uORFs (17 aa) and overlapping uORFs (37 aa) are shorter than those of lncRNAs and pseudogene peptides (Figure 6A). In general, the translation efficiency of uORFs is similar to that of canonical protein-coding sequences (Figure 6B), and this effect is typical for individual genes. However, in accord with previous results linking uORFs to decreased protein levels (Calvo et al., 2009; Barbosa et al., 2013), the translational efficiency of mRNA coding regions is slightly lower for genes containing uORFs (p<10-34; Figure 6C), even though RNA levels of uORF-containing genes somewhat higher than genes lacking uORFs (p<10-200; Figure 6D). However, the relatively high translational efficiency of protein-coding regions in genes containing uORFs suggests that scanning ribosomes often skip the uORF to allow efficient initiation at th

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
lllooo完成签到,获得积分10
刚刚
小呆呆完成签到 ,获得积分10
1秒前
酒尚温完成签到 ,获得积分10
1秒前
艾斯完成签到 ,获得积分10
2秒前
2秒前
赵铁皮发布了新的文献求助10
2秒前
2秒前
2秒前
3秒前
3秒前
3秒前
Yong-AI-BUPT完成签到,获得积分10
4秒前
4秒前
君莫笑完成签到,获得积分10
5秒前
222完成签到 ,获得积分10
6秒前
李爱国应助cxl采纳,获得10
6秒前
无语完成签到,获得积分10
7秒前
小马甲应助速速接采纳,获得10
7秒前
Maxw发布了新的文献求助10
7秒前
张欣童666发布了新的文献求助10
8秒前
9秒前
10秒前
11秒前
12秒前
zpz完成签到 ,获得积分10
12秒前
Maxw完成签到,获得积分10
13秒前
英姑应助飞飞飞采纳,获得10
13秒前
等待半烟给等待半烟的求助进行了留言
14秒前
量子星尘发布了新的文献求助10
14秒前
Lert完成签到,获得积分10
15秒前
SciGPT应助weigaozhao采纳,获得10
16秒前
ww发布了新的文献求助10
17秒前
史萌完成签到,获得积分20
19秒前
ang完成签到,获得积分10
19秒前
20秒前
张婷完成签到,获得积分10
20秒前
21秒前
小盆呐完成签到,获得积分10
21秒前
四斤瓜完成签到 ,获得积分10
21秒前
嘻嘻嘻完成签到,获得积分20
21秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
List of 1,091 Public Pension Profiles by Region 1581
以液相層析串聯質譜法分析糖漿產品中活性雙羰基化合物 / 吳瑋元[撰] = Analysis of reactive dicarbonyl species in syrup products by LC-MS/MS / Wei-Yuan Wu 1000
Biology of the Reptilia. Volume 21. Morphology I. The Skull and Appendicular Locomotor Apparatus of Lepidosauria 600
The Scope of Slavic Aspect 600
Foregrounding Marking Shift in Sundanese Written Narrative Segments 600
Rousseau, le chemin de ronde 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 遗传学 催化作用 冶金 量子力学 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 5539445
求助须知:如何正确求助?哪些是违规求助? 4626188
关于积分的说明 14598305
捐赠科研通 4567104
什么是DOI,文献DOI怎么找? 2503781
邀请新用户注册赠送积分活动 1481606
关于科研通互助平台的介绍 1453214