摘要
What are spliceosomes? Spliceosomes are huge, multimegadalton ribonucleoprotein (RNP) complexes found in eukaryotic nuclei. They assemble on RNA polymerase II transcripts from which they excise RNA sequences called introns and splice together the flanking sequences called exons. This so-called pre-messenger RNA (pre-mRNA) splicing is an essential step in eukaryotic mRNA synthesis. Every human cell contains ∼100,000 spliceosomes, which are responsible for removing over 200,000 different intron sequences. Human cells contain two types of spliceosome: the major spliceosome responsible for removing 99.5% of introns and the minor spliceosome, which removes the remaining 0.5%. What are spliceosomes made of? Spliceosomes contain both proteins and RNAs. Yeasts have ∼100 spliceosomal proteins, whereas over 300 different proteins associate with human spliceosomes (Figure 1A). Many of these proteins have specific RNA recognition activities, while others are NTPases that function to drive the overall process forward and ensure its fidelity. Numerous other proteins bind stably to small nuclear RNAs (snRNAs) to form small nuclear RNPs (snRNPs, pronounced ‘snurps’). Major spliceosomes are assembled from U1, U2, U4, U6, and U5 snRNPs (which are named according to the U snRNA(s) they contain); minor spliceosomes are assembled from U11, U12, U4atac, U5, and U6atac snRNPs (Figure 1B). How did the various spliceosomal parts get their names? The U snRNAs were originally discovered as abundant small uridine-rich RNA molecules present in mammalian nuclei, and they were initially numbered in order of their apparent abundance. U1, U2, U4, U5, U6, U11, and U12 were later found to be spliceosome components. U7 snRNA is required for histone mRNA 3′-end processing; the other abundant U snRNAs (U3, U8, U9 and U10) are all involved in ribosome biogenesis. U4atac and U6atac are much less abundant than other spliceosomal snRNAs, so were only discovered and named when it was realized that there must be other snRNAs that recognize the minor intron class. The first and last two DNA nucleotides of minor introns are most often AT and AC, respectively (Figure 1B), hence the names U4atac and U6atac. Many spliceosomal proteins have PRP names, e.g. Prp2, Prp5, Prp8, etc. (Figure 1A). In yeast, mutations in these genes lead to ‘pre-mRNA processing’ defects. Confusingly, orthologous genes can have different PRP names in Saccharomyces cerevisiae and Schizosaccharomyces pombe because the original mutational screens were performed around the same time and a unified naming system has yet to be devised. Other core spliceosomal proteins include CWC (complexed with CDC5), CWF (complexed with CDC five), SPF (sensitivity to Pichia farinosa killer toxin), SYF (synthetic lethal with cdcforty). The nineteen complex (NTC) is a large protein-only subcomplex named after its most abundant component, Prp19, while another small protein-only complex known as NTR (nineteen complex related) contains factors involved in spliceosome disassembly. Some major spliceosomal proteins were first discovered in vertebrates. The seven Sm proteins, which form a ring encircling a specific binding site in almost all spliceosomal snRNAs, were named after the patient (Smith) with whose autoimmune antibodies they react. A similar set of proteins (Lsm, for ‘like Sm’) were later found to encircle U6 and U6atac snRNAs, the only two spliceosomal snRNAs lacking a consensus Sm-binding site. Two additional large classes of metazoan splicing factors are the hnRNP proteins, so-called because they are found associated with heterogeneous nuclear RNA (hnRNA), and the SR proteins, named for a carboxy-terminal domain rich in arginine-serine (RS) dipeptides. How does the spliceosome do its job? Spliceosomes must excise non-coding introns from precursor transcripts and stitch the flanking exons back together to create mature spliced mRNAs. To do so, the splicing machinery assembles in a stepwise manner on the ends of introns, with U1 snRNP recognizing the beginning of an intron (5′ splice site, the donor site) and U2 snRNP recognizing a feature (the branch site) at the other end in the vicinity of the 3′ splice site (acceptor site). After numerous structural rearrangements that involve both the addition of new components and the ejection of many others, splicing occurs in two chemical steps: firstly, cleavage at the 5′ splice site coupled to formation of a lariat structure in which the first nucleotide of the intron is linked via a 2′–5′ phosphodiester bond to the branch site adenosine; and secondly, ligation of the two exons, coupled to cleavage at the 3′ splice site (Figure 1C). The spliceosome then disassembles from the excised intron, which is subsequently debranched and degraded. How do spliceosomes affect gene expression? Because the vast majority of protein-coding genes in humans contain introns (typically 9 or 10, but some have more than 100!), splicing is an essential step in gene expression. High-throughput sequencing has now revealed that ∼95% of human genes are also subject to alternative splicing, which allows for the synthesis of many different mRNAs from a single DNA gene. By encoding alternative protein isoforms or harboring different regulatory sequences in their untranslated regions, alternatively spliced mRNAs greatly enhance biological complexity. The act of splicing itself also has important consequences for gene expression beyond intron removal. By stably depositing on exons proteins that accompany mRNPs to the cytoplasm (e.g. the exon junction complex, EJC), splicing can affect the subcellular localization, translation efficiency and decay kinetics of the mRNA. In particular, mRNA decay driven by EJC location relative to the stop codon is a crucial mediator of cellular protein abundance. Are spliceosomes associated with any diseases? Many human diseases are caused by either mis-splicing of a single gene or mis-regulation of the entire spliceosome. Around 35% of human genetic disorders are caused by a mutation that alters the splicing of a single gene. Such mutations can add/remove a single splice site (e.g., α- or β-thalassemia) or shift the balance of alternative splicing by affecting the inclusion/exclusion of a cassette exon (e.g., frontotemporal dementia driven by tau mis-splicing). Some mis-splicing events generate an mRNA isoform that is subject to rapid degradation. Single point mutations that affect splicing can thereby result in large changes to both protein structure and protein abundance. Other diseases are caused by mutations in the splicesomal proteins themselves, thereby affecting splicing of many transcripts. For example, mutations in several core spliceosomal proteins (e.g., Prp8, Prp3, Prp31, and Brr2) have been shown to cause autosomal dominant retinitis pigmentosa. Mutations in splicing factor 3B subunit 1 (SF3B1) and U2 auxiliary factor 35 (U2AF35) are frequently associated with chronic lymphocytic leukaemia and myelodysplasia. Other cancers are associated with mis-regulation of splicing factor levels. Therefore, the spliceosome has recently emerged as a new target for the development of novel anti-cancer therapies. What remains to be explored? Because of its highly dynamic and complex nature, an atomic level structure of the spliceosome remains an elusive goal. Nonetheless, much progress has recently been made by crystallizing subsets of spliceosomal components, including U1 and U4 snRNPs and the central core protein Prp8. Other major questions regard the exact molecular mechanisms by which spliceosomes achieve high splicing accuracy, while simultaneously allowing for flexibility in splice site choice to permit alternative splicing. To answer these questions, new tools such as single-molecule microscopy, bioinformatics, and high-throughput methods for determining protein–protein, protein–RNA and RNA–RNA interaction dynamics are increasingly being developed and applied.