Quantitative Proteomics of the Cancer Cell Line Encyclopedia

生物 蛋白质组学 百科全书 计算生物学 癌症 定量蛋白质组学 生物信息学 遗传学 图书馆学 基因 计算机科学
作者
David P. Nusinow,John Szpyt,Mahmoud Ghandi,Christopher M. Rose,E. Robert McDonald,Marian Kalocsay,Judit Jané‐Valbuena,Ellen Gelfand,Devin K. Schweppe,Mark P. Jedrychowski,Javad Golji,Dale Porter,Tomáš Rejtar,Yan Wang,Gregory V. Kryukov,Frank Stegmeier,Brian K. Erickson,Levi A. Garraway,William R. Sellers,Steven P. Gygi
出处
期刊:Cell [Cell Press]
卷期号:180 (2): 387-402.e16 被引量:817
标识
DOI:10.1016/j.cell.2019.12.023
摘要

•Quantified the proteomes of 375 cell lines from diverse lineages in the CCLE•Correlated expression of proteins across many pathways•Downregulation of multiple protein complexes in microsatellite instability•Protein complexes associated with sensitivity to gene knockdown and mutation Proteins are essential agents of biological processes. To date, large-scale profiling of cell line collections including the Cancer Cell Line Encyclopedia (CCLE) has focused primarily on genetic information whereas deep interrogation of the proteome has remained out of reach. Here, we expand the CCLE through quantitative profiling of thousands of proteins by mass spectrometry across 375 cell lines from diverse lineages to reveal information undiscovered by DNA and RNA methods. We observe unexpected correlations within and between pathways that are largely absent from RNA. An analysis of microsatellite instable (MSI) cell lines reveals the dysregulation of specific protein complexes associated with surveillance of mutation and translation. These and other protein complexes were associated with sensitivity to knockdown of several different genes. These data in conjunction with the wider CCLE are a broad resource to explore cellular behavior and facilitate cancer research. Proteins are essential agents of biological processes. To date, large-scale profiling of cell line collections including the Cancer Cell Line Encyclopedia (CCLE) has focused primarily on genetic information whereas deep interrogation of the proteome has remained out of reach. Here, we expand the CCLE through quantitative profiling of thousands of proteins by mass spectrometry across 375 cell lines from diverse lineages to reveal information undiscovered by DNA and RNA methods. We observe unexpected correlations within and between pathways that are largely absent from RNA. An analysis of microsatellite instable (MSI) cell lines reveals the dysregulation of specific protein complexes associated with surveillance of mutation and translation. These and other protein complexes were associated with sensitivity to knockdown of several different genes. These data in conjunction with the wider CCLE are a broad resource to explore cellular behavior and facilitate cancer research. Proteins are the executors of the function encoded by a cell’s genome. Although commonly used as a proxy for protein expression, on average RNA expression data predict protein expression poorly (Gygi et al., 1999Gygi S.P. Rochon Y. Franza B.R. Aebersold R. Correlation between protein and mRNA abundance in yeast.Mol. Cell. Biol. 1999; 19: 1720-1730Crossref PubMed Scopus (3185) Google Scholar, Liu et al., 2016Liu Y. Beyer A. Aebersold R. On the dependency of cellular protein levels on mRNA abundance.Cell. 2016; 165: 535-550Abstract Full Text Full Text PDF PubMed Scopus (1386) Google Scholar). Unfortunately, generation of high-quality proteomics data have lagged behind RNA expression profiling. Recently, proteomic studies of several cancers have rediscovered many of the same subtypes found by gene expression, as well as new disease categorizations, highlighting the gains from studying the proteome (Mertins et al., 2016Mertins P. Mani D.R. Ruggles K.V. Gillette M.A. Clauser K.R. Wang P. Wang X. Qiao J.W. Cao S. Petralia F. et al.NCI CPTACProteogenomics connects somatic mutations to signalling in breast cancer.Nature. 2016; 534: 55-62Crossref PubMed Scopus (977) Google Scholar, Pozniak et al., 2016Pozniak Y. Balint-Lahat N. Rudolph J.D. Lindskog C. Katzir R. Avivi C. Pontén F. Ruppin E. Barshack I. Geiger T. System-wide Clinical Proteomics of Breast Cancer Reveals Global Remodeling of Tissue Homeostasis.Cell Syst. 2016; 2: 172-184Abstract Full Text Full Text PDF PubMed Scopus (59) Google Scholar, Zhang et al., 2014Zhang B. Wang J. Wang X. Zhu J. Liu Q. Shi Z. Chambers M.C. Zimmerman L.J. Shaddox K.F. Kim S. et al.NCI CPTACProteogenomic characterization of human colon and rectal cancer.Nature. 2014; 513: 382-387Crossref PubMed Scopus (937) Google Scholar, Zhang et al., 2016Zhang H. Liu T. Zhang Z. Payne S.H. Zhang B. McDermott J.E. Zhou J.-Y. Petyuk V.A. Chen L. Ray D. et al.Integrated proteogenomic characterization of human high-grade serous ovarian cancer.Cell. 2016; 166: 755-765Abstract Full Text Full Text PDF PubMed Scopus (557) Google Scholar, Vasaikar et al., 2019Vasaikar S. Huang C. Wang X. Petyuk V.A. Savage S.R. Wen B. Dou Y. Zhang Y. Shi Z. Arshad O.A. et al.Clinical Proteomic Tumor Analysis ConsortiumProteogenomic analysis of human colon cancer reveals new therapeutic opportunities.Cell. 2019; 177: 1035-1049Abstract Full Text Full Text PDF PubMed Scopus (308) Google Scholar). The post-transcriptional mechanisms underlying the differences between protein and RNA expression are well enumerated. However, despite significant mechanistic understanding, there is less clarity about the global organization of gene and protein expression and where they differ. Correlated expression patterns in gene expression data are organized in large part around chromosomal location, driven by mechanisms such as transcription factor activity and chromosomal topology as set up by cellular and tissue identity (Caron et al., 2001Caron H. van Schaik B. van der Mee M. Baas F. Riggins G. van Sluis P. Hermus M.-C. van Asperen R. Boon K. Voûte P.A. et al.The human transcriptome map: clustering of highly expressed genes in chromosomal domains.Science. 2001; 291: 1289-1292Crossref PubMed Scopus (605) Google Scholar, Dixon et al., 2016Dixon J.R. Gorkin D.U. Ren B. Chromatin Domains: the unit of chromosome organization.Mol. Cell. 2016; 62: 668-680Abstract Full Text Full Text PDF PubMed Scopus (420) Google Scholar, Furlong and Levine, 2018Furlong E.E.M. Levine M. Developmental enhancers and chromosome topology.Science. 2018; 361: 1341-1345Crossref PubMed Scopus (245) Google Scholar, Hnisz et al., 2017Hnisz D. Shrinivas K. Young R.A. Chakraborty A.K. Sharp P.A. A phase separation model for transcriptional control.Cell. 2017; 169: 13-23Abstract Full Text Full Text PDF PubMed Scopus (866) Google Scholar). These patterns are reduced or absent in protein expression data (Grabowski et al., 2018Grabowski P. Kustatscher G. Rappsilber J. Epigenetic variability confounds transcriptome but not proteome profiling for coexpression-based gene function prediction.Mol. Cell. Proteomics. 2018; 17: 2082-2090Crossref PubMed Scopus (6) Google Scholar, Kustatscher et al., 2017Kustatscher G. Grabowski P. Rappsilber J. Pervasive coexpression of spatially proximal genes is buffered at the protein level.Mol. Syst. Biol. 2017; 13: 937Crossref PubMed Scopus (48) Google Scholar), leading to a model where post-transcriptional events buffer gene expression changes to create a new pattern of protein abundance. The degree to which this occurs is unclear and likely dependent on individual genes and the biological phenomena at play (Jovanovic et al., 2015Jovanovic M. Rooney M.S. Mertins P. Przybylski D. Chevrier N. Satija R. Rodriguez E.H. Fields A.P. Schwartz S. Raychowdhury R. et al.Immunogenetics. Dynamic profiling of the protein life cycle in response to pathogens.Science. 2015; 347: 1259038Crossref PubMed Scopus (295) Google Scholar, Liu et al., 2016Liu Y. Beyer A. Aebersold R. On the dependency of cellular protein levels on mRNA abundance.Cell. 2016; 165: 535-550Abstract Full Text Full Text PDF PubMed Scopus (1386) Google Scholar). In contrast to RNA expression, protein expression is organized by protein interactions and subcellular localization (Dephoure et al., 2014Dephoure N. Hwang S. O’Sullivan C. Dodgson S.E. Gygi S.P. Amon A. Torres E.M. Quantitative proteomic analysis reveals posttranslational responses to aneuploidy in yeast.eLife. 2014; 3: e03023Crossref PubMed Scopus (164) Google Scholar, Gonçalves et al., 2017Gonçalves E. Fragoulis A. Garcia-Alonso L. Cramer T. Saez-Rodriguez J. Beltrao P. Widespread post-transcriptional attenuation of genomic copy-number variation in cancer.Cell Syst. 2017; 5: 386-398Abstract Full Text Full Text PDF PubMed Scopus (57) Google Scholar, Kustatscher et al., 2017Kustatscher G. Grabowski P. Rappsilber J. Pervasive coexpression of spatially proximal genes is buffered at the protein level.Mol. Syst. Biol. 2017; 13: 937Crossref PubMed Scopus (48) Google Scholar, Lapek et al., 2017Lapek Jr., J.D. Greninger P. Morris R. Amzallag A. Pruteanu-Malinici I. Benes C.H. Haas W. Detection of dysregulated protein-association networks by high-throughput proteomics predicts cancer vulnerabilities.Nat. Biotechnol. 2017; 35: 983-989Crossref PubMed Scopus (95) Google Scholar, Pozniak et al., 2016Pozniak Y. Balint-Lahat N. Rudolph J.D. Lindskog C. Katzir R. Avivi C. Pontén F. Ruppin E. Barshack I. Geiger T. System-wide Clinical Proteomics of Breast Cancer Reveals Global Remodeling of Tissue Homeostasis.Cell Syst. 2016; 2: 172-184Abstract Full Text Full Text PDF PubMed Scopus (59) Google Scholar, Roumeliotis et al., 2017Roumeliotis T.I. Williams S.P. Gonçalves E. Alsinet C. Del Castillo Velasco-Herrera M. Aben N. Ghavidel F.Z. Michaut M. Schubert M. Price S. et al.Genomic determinants of protein abundance variation in colorectal cancer cells.Cell Rep. 2017; 20: 2201-2214Abstract Full Text Full Text PDF PubMed Scopus (65) Google Scholar). Although these findings have appeared consistently, the extent to which they contribute to the organization of the proteome and if other organizing principles are at work are unknown. Cancer cell lines are important model systems to study normal and aberrant cellular processes. The Cancer Cell Line Encyclopedia (CCLE) is an effort to generate large-scale profiling datasets across nearly 1,000 cell lines from diverse tissue lineages. Its original release included gene expression, DNA copy numbers, and hybrid capture sequencing (Barretina et al., 2012Barretina J. Caponigro G. Stransky N. Venkatesan K. Margolin A.A. Kim S. Wilson C.J. Lehár J. Kryukov G.V. Sonkin D. et al.The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.Nature. 2012; 483: 603-607Crossref PubMed Scopus (4820) Google Scholar). Recently, histone profiling, RNA-seq, DNA methylation, microRNA (miRNA) profiling, and whole-genome sequencing, and metabolite profiling were added (Ghandi et al., 2019Ghandi M. Huang F.W. Jané-Valbuena J. Kryukov G.V. Lo C.C. McDonald 3rd, E.R. Barretina J. Gelfand E.T. Bielski C.M. Li H. et al.Next-generation characterization of the Cancer Cell Line Encyclopedia.Nature. 2019; 569: 503-508Crossref PubMed Scopus (1175) Google Scholar, Li et al., 2019Li H. Ning S. Ghandi M. Kryukov G.V. Gopal S. Deik A. Souza A. Pierce K. Keskula P. Hernandez D. et al.The landscape of cancer cell line metabolism.Nat. Med. 2019; 25: 850-860Crossref PubMed Scopus (207) Google Scholar). Associated drug and short hairpin RNA (shRNA) sensitivity screens increased the richness of data attached to the CCLE (Basu et al., 2013Basu A. Bodycombe N.E. Cheah J.H. Price E.V. Liu K. Schaefer G.I. Ebright R.Y. Stewart M.L. Ito D. Wang S. et al.An interactive resource to identify cancer genetic and lineage dependencies targeted by small molecules.Cell. 2013; 154: 1151-1161Abstract Full Text Full Text PDF PubMed Scopus (426) Google Scholar, Meyers et al., 2017Meyers R.M. Bryan J.G. McFarland J.M. Weir B.A. Sizemore A.E. Xu H. Dharia N.V. Montgomery P.G. Cowley G.S. Pantel S. et al.Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells.Nat. Genet. 2017; 49: 1779-1784Crossref PubMed Scopus (800) Google Scholar, Tsherniak et al., 2017Tsherniak A. Vazquez F. Montgomery P.G. Weir B.A. Kryukov G. Cowley G.S. Gill S. Harrington W.F. Pantel S. Krill-Burger J.M. et al.Defining a cancer dependency map.Cell. 2017; 170: 564-576Abstract Full Text Full Text PDF PubMed Scopus (948) Google Scholar). With its latest release, the CCLE includes targeted protein quantification by reverse-phase protein arrays, but deep proteome profiling is absent (Ghandi et al., 2019Ghandi M. Huang F.W. Jané-Valbuena J. Kryukov G.V. Lo C.C. McDonald 3rd, E.R. Barretina J. Gelfand E.T. Bielski C.M. Li H. et al.Next-generation characterization of the Cancer Cell Line Encyclopedia.Nature. 2019; 569: 503-508Crossref PubMed Scopus (1175) Google Scholar). Although cell lines are popular models (Frejno et al., 2017Frejno M. Zenezini Chiozzi R. Wilhelm M. Koch H. Zheng R. Klaeger S. Ruprecht B. Meng C. Kramer K. Jarzab A. et al.Pharmacoproteomic characterisation of human colon and rectal cancer.Mol. Syst. Biol. 2017; 13: 951Crossref PubMed Scopus (23) Google Scholar, Gholami et al., 2013Gholami A.M. Hahne H. Wu Z. Auer F.J. Meng C. Wilhelm M. Kuster B. Global proteome analysis of the NCI-60 cell line panel.Cell Rep. 2013; 4: 609-620Abstract Full Text Full Text PDF PubMed Scopus (222) Google Scholar), no large-scale proteomics study of human samples across a diverse population as in the CCLE has been performed. Cancer arises from mutation, but the character of that mutation differs between cancers (Lawrence et al., 2013Lawrence M.S. Stojanov P. Polak P. Kryukov G.V. Cibulskis K. Sivachenko A. Carter S.L. Stewart C. Mermel C.H. Roberts S.A. et al.Mutational heterogeneity in cancer and the search for new cancer-associated genes.Nature. 2013; 499 (advance online publication): 214-218Crossref PubMed Scopus (3687) Google Scholar). A subset of cancers, such as microsatellite instable (MSI) colorectal cancers, possess orders of magnitude more mutations than other tumors (Campbell et al., 2017Campbell B.B. Light N. Fabrizio D. Zatzman M. Fuligni F. de Borja R. Davidson S. Edwards M. Elvin J.A. Hodel K.P. et al.Comprehensive analysis of hypermutation in human cancer.Cell. 2017; 171: 1042-1056Abstract Full Text Full Text PDF PubMed Scopus (451) Google Scholar, Lawrence et al., 2013Lawrence M.S. Stojanov P. Polak P. Kryukov G.V. Cibulskis K. Sivachenko A. Carter S.L. Stewart C. Mermel C.H. Roberts S.A. et al.Mutational heterogeneity in cancer and the search for new cancer-associated genes.Nature. 2013; 499 (advance online publication): 214-218Crossref PubMed Scopus (3687) Google Scholar, Cancer Genome Atlas Network, 2012Cancer Genome Atlas NetworkComprehensive molecular characterization of human colon and rectal cancer.Nature. 2012; 487: 330-337Crossref PubMed Scopus (5900) Google Scholar). How a cancer proteome adapts to the negative selective effects of an extremely high mutation burden is unknown. Additionally, these tumors have increased levels of neoantigens making them attractive for immune-oncology therapies (Baretti and Le, 2018Baretti M. Le D.T. DNA mismatch repair in cancer.Pharmacol. Ther. 2018; 189: 45-62Crossref PubMed Scopus (228) Google Scholar). MSI is the dominant form of hypermutation present in the CCLE, and although the MSI proteome has been studied in colorectal cell lines and tumors (Halvey et al., 2013Halvey P.J. Wang X. Wang J. Bhat A.A. Dhawan P. Li M. Zhang B. Liebler D.C. Slebos R.J.C. Proteogenomic analysis reveals unanticipated adaptations of colorectal tumor cells to deficiencies in DNA mismatch repair.Cancer Res. 2013; : 387-397PubMed Google Scholar, Liu and Zhang, 2016Liu Q. Zhang B. Integrative omics analysis reveals post-transcriptionally enhanced protective host response in colorectal cancers with microsatellite instability.J. Proteome Res. 2016; 15: 766-776Crossref PubMed Scopus (8) Google Scholar), it has not been explored across tissue lineages. Here, we have profiled 375 cell lines in the CCLE by mass spectrometry. All of the data are available at https://gygi.med.harvard.edu/publications/ccle and https://depmap.org. We found that the primary variation in protein expression appears to be organized around biological pathways, and there were unexpected correlations between members of entirely different pathways. We leverage the data to better understand the effects of MSI on the proteome, finding substantial buffering of transcriptional effects. Exploring the relationship between genetics and protein complex amounts uncovered associations between protein complexes and sensitivity to gene knockdown and mutation. The addition of quantitative proteomics to the CCLE presents opportunities to understand the proteome in conjunction with the many other datasets present in the CCLE to improve our understanding of cancer and basic cellular biology. We selected 375 cell lines from the CCLE for quantitative protein expression profiling (Tables S1 and S2). The cell lines were distributed among 22 lineages, dominated by solid organs (Figure 1A). The experiment used sample multiplexing by TMT10-plex reagents and the best available instrumentation. These technologies enabled good depth of coverage with a high degree of overlap between samples and uncompromised quantitation. At a 1% protein-level FDR over 12,000 proteins were quantified among all samples and over 9,000 in a majority of samples (Figures 1B and S1C). Representation of categories was as expected, including good coverage of abundant proteins like the ribosome and incomplete coverage of lower abundance ones like transcription factors (Figure S1D). The first two batches of nine samples were each prepared in biological triplicate, where the latter two replicates were grown one year after the first (Table S3). In all cases, triplicates clustered together, with the latter two replicates clustering more tightly (Figure 1C). The average correlation between replicate samples was 0.8 and between different cell lines was −0.05 (p < 2e−16). There was a median CV of 60% between biological replicate protein measurements within a cell line. There was visible, though incomplete, clustering by tissue lineage (Figures 1C and 2A ). Protein expression among samples was highly variable but generally consistent with previous data. For example, ERBB2 (also known as HER2) is upregulated in a breast-derived line in the replicate dataset (Figure 1D). In the complete dataset, the pattern is complex, but ERBB2 is upregulated in several breast lines and is largely predicted by ERBB2 copy number (Figure 1E). Among the non-breast lines with the highest levels were many with already-reported high expression amounts (Ise et al., 2011Ise N. Omi K. Nambara D. Higashiyama S. Goishi K. Overexpressed HER2 in NSCLC is a possible therapeutic target of EGFR inhibitors.Anticancer Res. 2011; 31: 4155-4161PubMed Google Scholar, Kim et al., 2008Kim J.W. Kim H.-P. Im S.-A. Kang S. Hur H.S. Yoon Y.-K. Oh D.-Y. Kim J.H. Lee D.S. Kim T.-Y. Bang Y.J. The growth inhibitory effect of lapatinib, a dual inhibitor of EGFR and HER2 tyrosine kinase, in gastric cancer cell lines.Cancer Lett. 2008; 272: 296-306Crossref PubMed Scopus (109) Google Scholar, Mimura et al., 2005Mimura K. Kono K. Hanawa M. Kanzaki M. Nakao A. Ooi A. Fujii H. Trastuzumab-mediated antibody-dependent cellular cytotoxicity against esophageal squamous cell carcinoma.Clin. Cancer Res. 2005; 11: 4898-4904Crossref PubMed Scopus (78) Google Scholar, Scott et al., 1993Scott G.K. Robles R. Park J.W. Montgomery P.A. Daniel J. Holmes W.E. Lee J. Keller G.A. Li W.L. Fendly B.M. et al.A truncated intracellular HER2/neu receptor produced by alternative RNA processing affects growth of human carcinoma cells.Mol. Cell. Biol. 1993; 13: 2247-2257Crossref PubMed Scopus (139) Google Scholar).Figure S1Numbers of Proteins Quantified Per Ten-Plex and Their Overlap, Related to Figure 1Show full caption(A) Summary statistics for the dataset.(B) Numbers of proteins quantified in each ten-plex. The median level is denoted by the dashed line.(C) Numbers of proteins quantified across each number of ten-plexes. Over 5,000 proteins were quantified in all 42 ten-plexes, and only 151 were quantified in a single ten-plex. The results shown in this plot are summarized in Figure 1B.(D) Coverage of different sets of proteins. Bars are colored by fraction of proteins quantified in different numbers of samples. For example, nearly all mitochondrial proteins were quantified in all samples while about half of all transcription factors were quantified in at least one sample. No olfactory receptors were quantified.View Large Image Figure ViewerDownload Hi-res image Download (PPT)Figure 2Correlation between Protein and RNA ExpressionShow full caption(A) Hierarchical clustering using proteins quantified in all samples (left) and their corresponding RNA-seq expression (middle).(B) Correlation between samples for protein expression (y axis) and RNA-seq (x axis). In all cases the most highly correlated RNA-seq sample to any given protein sample was the same cell line. Clusters of similarity for lymphoid lines and skin lines are highlighted in (A) and (B) with orange and purple asterisks, respectively.(C) Per-gene Pearson correlation between protein and RNA expression for all proteins quantified. Mean correlation is 0.48 (dashed line). The locations of several cancer-related genes are shown.(D) Examples of the RNA and protein expression for both low (left) and high (right) correlating genes. See also Table S4.View Large Image Figure ViewerDownload Hi-res image Download (PPT) (A) Summary statistics for the dataset. (B) Numbers of proteins quantified in each ten-plex. The median level is denoted by the dashed line. (C) Numbers of proteins quantified across each number of ten-plexes. Over 5,000 proteins were quantified in all 42 ten-plexes, and only 151 were quantified in a single ten-plex. The results shown in this plot are summarized in Figure 1B. (D) Coverage of different sets of proteins. Bars are colored by fraction of proteins quantified in different numbers of samples. For example, nearly all mitochondrial proteins were quantified in all samples while about half of all transcription factors were quantified in at least one sample. No olfactory receptors were quantified. (A) Hierarchical clustering using proteins quantified in all samples (left) and their corresponding RNA-seq expression (middle). (B) Correlation between samples for protein expression (y axis) and RNA-seq (x axis). In all cases the most highly correlated RNA-seq sample to any given protein sample was the same cell line. Clusters of similarity for lymphoid lines and skin lines are highlighted in (A) and (B) with orange and purple asterisks, respectively. (C) Per-gene Pearson correlation between protein and RNA expression for all proteins quantified. Mean correlation is 0.48 (dashed line). The locations of several cancer-related genes are shown. (D) Examples of the RNA and protein expression for both low (left) and high (right) correlating genes. See also Table S4. Hierarchical clustering had some coherency on the basis of tissue lineage (Figure 2A, left). We quantified this by using Gini purity, a measure of clustering specificity. Our clustering had a mean Gini purity of 0.46 where 1.0 would be perfect clustering by lineage. Clustering of the RNA data had similarly complex clustering (Figure 2A, center). In both cases, skin and hematopoietic-lymphoid lineages clustered more tightly with themselves than other lineages (Figures 2A and 2B, purple and orange asterisks, respectively), which differed substantially from the clusters recently reported from reverse phase protein array (RPPA) data (Li et al., 2017Li J. Zhao W. Akbani R. Liu W. Ju Z. Ling S. Vellano C.P. Roebuck P. Yu Q. Eterovic A.K. et al.Characterization of human cancer cell lines by reverse-phase protein arrays.Cancer Cell. 2017; 31: 225-239Abstract Full Text Full Text PDF PubMed Scopus (140) Google Scholar). Although the protein data had slightly less lineage coherency than RNA (mean Gini purity of 0.6 in the RNA), both showed incomplete clustering. Examining the correlation of protein and RNA expression by sample, in all cases the protein data were most highly correlated with the corresponding RNA data from the same cell line, providing additional confidence in our results (Figure 2B). The diversity and depth of these data allowed us to calculate the RNA/protein correlation of individual genes. Here, the correlations between RNA and protein expression varied widely, averaging at just under 0.5, in line with previous studies (Figures 1C and 1D; Table S4) (Edfors et al., 2016Edfors F. Danielsson F. Hallström B.M. Käll L. Lundberg E. Pontén F. Forsström B. Uhlén M. Gene-specific correlation of RNA and protein levels in human cells and tissues.Mol. Syst. Biol. 2016; 12: 883Crossref PubMed Scopus (240) Google Scholar, Roumeliotis et al., 2017Roumeliotis T.I. Williams S.P. Gonçalves E. Alsinet C. Del Castillo Velasco-Herrera M. Aben N. Ghavidel F.Z. Michaut M. Schubert M. Price S. et al.Genomic determinants of protein abundance variation in colorectal cancer cells.Cell Rep. 2017; 20: 2201-2214Abstract Full Text Full Text PDF PubMed Scopus (65) Google Scholar, Zhang et al., 2014Zhang B. Wang J. Wang X. Zhu J. Liu Q. Shi Z. Chambers M.C. Zimmerman L.J. Shaddox K.F. Kim S. et al.NCI CPTACProteogenomic characterization of human colon and rectal cancer.Nature. 2014; 513: 382-387Crossref PubMed Scopus (937) Google Scholar). For some genes, RNA expression was a good proxy for protein amounts (e.g., EGFR), whereas for others it provided little information (e.g., BRAF) (Figure 2D). We examined the consistency of high or low correlations between RNA and protein amounts by using Gene Set Enrichment Analysis (GSEA) (Subramanian et al., 2005Subramanian A. Tamayo P. Mootha V.K. Mukherjee S. Ebert B.L. Gillette M.A. Paulovich A. Pomeroy S.L. Golub T.R. Lander E.S. Mesirov J.P. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.Proc. Natl. Acad. Sci. USA. 2005; 102: 15545-15550Crossref PubMed Scopus (26549) Google Scholar) (Table S4). Hundreds of pathways and gene ontology (GO) categories had higher or lower than expected RNA/protein correlations. Those with the consistently highest correlations were epithelial mesenchymal transition and various cell-surface-protein-related pathways associated with the epithelia. Among those with the consistently lowest correlations were gene sets with notable protein complexes. Several dozen transcription factors showed sets of targets with high RNA/protein correlation and none with consistently low correlation. The results of hierarchical clustering in Figure 2A were reflected in the principal component analysis (PCA) projection of the same data. There, the hematopoietic and lymphoid lineages segregated from the solid organ lineages, making up a large part of the PC1 projection (Figure 3A). By themselves, the hematopoietic and lymphoid lines separated by PCA (Figure S2A). Thus, these cell lines are significantly different from both the solid-organ-derived lineages and from each other. Following previous work (Barretina et al., 2012Barretina J. Caponigro G. Stransky N. Venkatesan K. Margolin A.A. Kim S. Wilson C.J. Lehár J. Kryukov G.V. Sonkin D. et al.The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.Nature. 2012; 483: 603-607Crossref PubMed Scopus (4820) Google Scholar), we therefore removed the hematopoietic and lymphoid lines from further analyses.Figure S2Features Associated with Pathway Co-expression of the Proteome, Related to Figures 3 and 4Show full caption(A) PCA of the hematopoietic and lymphoid lineages alone shows them mostly segregating from each other based on the first two principal components.(B) Heatmap of large protein complexes found to be enriched in the protein PC1. Axes and annotations are the same as in Figure 3C.(C) Scatterplot of solid organ lineage cell lines comparing PC1 projection (x axis, same as in Figure 3B) with total mutational burden of uncommon alleles. The linear regression along with 95% confidence interval is plotted.(D) The top mutations predicting the PC1 projection of a cell line were selected by elastic net. The top 12 excluding TP53 (which is has mutations in most cell lines) are plotted. Each subpanel is a histogram where cell lines are binned along their PC1 projections. If a cell line has an uncommon allele in the gene of interest it is plotted as a stacked bar in that bin. The bar is colored by tissue of origin. For example, VHL mutations are predominantly in kidney lines (orange) and cluster in the negative PC1 projections. Dashed black lines denote the 0 point along PC1, and the dashed blue lines are the gene-specific median.(E) Expression levels of RNA-seq (left) and protein (right) for various transcription factor targets. Transcription factors targeting these proteins are annotated on the bottom according to the legend. Samples are arranged along the PCA PC1 projection, similarly to Figures 3 and 4 but performing the PCA on the RNASeq data instead of the protein data, resulting in a different ordering of samples. Correlation between RNA and protein is annotated on the bottom of the protein panel.View Large Image Figure ViewerDownload Hi-res image Download (PPT) (A) PCA of the hematopoietic and lymphoid lineages alone shows them mostly segregating from each other based on the first two principal components. (B) Heatmap of large protein complexes found to be enriched in the protein PC1. Axes and annotations are the same as in Figure 3C. (C) Scatterplot of solid organ lineage cell lines comparing PC1 projection (x axis, same as in Figure 3B) with total mutational burden of uncommon alleles. The linear regression along with 95% confidence interval is plotted. (D) T
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
wanhe完成签到,获得积分10
2秒前
可爱的函函应助青栞采纳,获得10
4秒前
斯文败类应助yao采纳,获得10
4秒前
可靠的冰烟完成签到,获得积分10
4秒前
ufofly730完成签到 ,获得积分10
10秒前
上官若男应助科研通管家采纳,获得30
16秒前
科研通AI2S应助科研通管家采纳,获得10
16秒前
所所应助科研通管家采纳,获得10
16秒前
斯文败类应助科研通管家采纳,获得10
16秒前
华仔应助科研通管家采纳,获得10
16秒前
科研通AI5应助科研通管家采纳,获得10
16秒前
16秒前
科研通AI2S应助科研通管家采纳,获得10
16秒前
852应助科研通管家采纳,获得10
16秒前
19秒前
21秒前
21秒前
打打应助断章采纳,获得10
22秒前
Wei完成签到 ,获得积分10
24秒前
Fancy发布了新的文献求助30
24秒前
25秒前
27秒前
完美世界应助Ss采纳,获得10
30秒前
杆杆发布了新的文献求助10
31秒前
王运静发布了新的文献求助10
32秒前
刘刘刘monkey完成签到,获得积分20
32秒前
chiyu完成签到,获得积分10
33秒前
缥缈书本完成签到 ,获得积分10
34秒前
35秒前
科研通AI5应助杆杆采纳,获得10
40秒前
断章发布了新的文献求助10
40秒前
duoduo完成签到,获得积分10
42秒前
研友_ZGDVz8完成签到,获得积分10
49秒前
CipherSage应助maxin采纳,获得10
53秒前
58秒前
黑白完成签到,获得积分10
1分钟前
皮卡丘发布了新的文献求助10
1分钟前
星辰大海应助cai采纳,获得10
1分钟前
科研通AI5应助刘小明采纳,获得10
1分钟前
研友_RLNzvL发布了新的文献求助10
1分钟前
高分求助中
【此为提示信息,请勿应助】请按要求发布求助,避免被关 20000
Continuum Thermodynamics and Material Modelling 2000
Encyclopedia of Geology (2nd Edition) 2000
105th Edition CRC Handbook of Chemistry and Physics 1600
Maneuvering of a Damaged Navy Combatant 650
Mixing the elements of mass customisation 300
the MD Anderson Surgical Oncology Manual, Seventh Edition 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3777977
求助须知:如何正确求助?哪些是违规求助? 3323580
关于积分的说明 10215083
捐赠科研通 3038764
什么是DOI,文献DOI怎么找? 1667645
邀请新用户注册赠送积分活动 798329
科研通“疑难数据库(出版商)”最低求助积分说明 758315