AraENCODE: A comprehensive epigenomic database of Arabidopsis thaliana

表观遗传学 拟南芥 表观基因组 染色质 生物 拟南芥 表观遗传学 计算生物学 基因组 组蛋白 基因组学 遗传学 DNA甲基化 基因 基因表达 突变体
作者
Zhenji Wang,Minghao Liu,Fuming Lai,Qiangqiang Fu,Liang Xie,Yaping Fang,Qiangwei Zhou,Guoliang Li
出处
期刊:Molecular Plant [Elsevier BV]
卷期号:16 (7): 1113-1116 被引量:2
标识
DOI:10.1016/j.molp.2023.06.005
摘要

Arabidopsis thaliana is an important model organism in plant biology and genetics. The genome of Arabidopsis ecotype Columbia-0 has been sequenced and completely annotated (Cheng et al., 2017Cheng C.-Y. Krishnakumar V. Chan A.P. Thibaud-Nissen F. Schobel S. Town C.D. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome.Plant J. 2017; 89: 789-804https://doi.org/10.1111/tpj.13415Crossref PubMed Scopus (544) Google Scholar), which facilitates the genomics research of plants. Over the past two decades, advances in the gene regulation studies have elucidated a spectrum of epigenetic molecular phenomena, including DNA methylation, histone modification, chromatin accessibility, and chromatin interaction, which collectively form an additional layer of information based on DNA sequence (Law and Jacobsen, 2010Law J.A. Jacobsen S.E. Establishing, maintaining and modifying DNA methylation patterns in plants and animals.Nat. Rev. Genet. 2010; 11: 204-220https://doi.org/10.1038/nrg2719Crossref PubMed Scopus (2666) Google Scholar). The epigenome landscape has been characterized in Arabidopsis as more high-throughput analyses were developed (Zhao et al., 2022Zhao L. Zhou Q. He L. Deng L. Lozano-Duran R. Li G. Zhu J.-K. DNA methylation underpins the epigenomic landscape regulating genome transcription in Arabidopsis.Genome Biol. 2022; 23: 197https://doi.org/10.1186/s13059-022-02768-xCrossref PubMed Scopus (6) Google Scholar). Without a doubt, in-depth studies of gene expression regulation heavily rely on such epigenomic information. However, the utilization of this knowledge poses a challenge for certain groups lacking bioinformatics analysts or adequate computing resources. Here, we developed a comprehensive epigenomic database for Arabidopsis (AraENCODE, http://glab.hzau.edu.cn/AraENCODE/), which comprises a total of 4511 sample accessions from Sequence Read Archive, Gene Expression Omnibus, Genome Sequence Archive, and other open-access databases, including datasets with histone modification, chromatin accessibility, DNA methylation, transcriptome, and chromatin interactions from different tissues in wild type or mutants (Figure 1A). The resource and distribution of the datasets are displayed in detail on the "data statistics" module of the website (http://glab.hzau.edu.cn/AraENCODE/pages/datasets.html) (Supplemental Figure 1). We downloaded raw data from various libraries encompassing chromatin immunoprecipitation sequencing (ChIP-seq), assay for transposase-accessible chromatin sequencing (ATAC-seq), DNase I hypersensitive sites sequencing (DNase-seq), micrococcal nuclease digestion with deep sequencing (MNase-seq), Formaldehyde-assisted isolation of regulatory elements sequencing (FAIRE-Seq), chromosome conformation capture methods (Hi-C, HiChIP), Bisulfite sequencing (BS-seq), RNA sequencing (RNA-seq), and non-coding RNA-seq (Supplemental Table 1) and subsequently reprocessed them using a standardized pipeline tailored to each data type (Figure 1B). Quality control metrics for various types of data have been provided to identify the data quality in the "quality control" module. For the sake of facilitating comparisons between various datasets, all datasets were aligned to the TAIR10 genome and visualized using the WashU epigenome browser (Li et al., 2022Li D. Purushotham D. Harrison J.K. Hsu S. Zhuo X. Fan C. Liu S. Xu V. Chen S. Xu J. et al.WashU Epigenome Browser update 2022.Nucleic Acids Res. 2022; 50: W774-W781https://doi.org/10.1093/nar/gkac238Crossref PubMed Scopus (25) Google Scholar), which can help the users compare different tracks flexibly. Such a database provides a comprehensive view of the relationships between gene expression and the epigenetic landscape across various tissues and genotypes in Arabidopsis. Furthermore, it has the potential to facilitate our understanding of the epigenetic mechanism in higher plants. According to various functions, AraENCODE is divided into the following parts: "quick search," "WashU browser," "histone modification," "3D genome," "open chromatin," "chromatin state," "DNA methylation," "transcriptome," "WT/mutant," "datasets," "download," "analysis pipeline," and "tutorial." In AraENCODE (Figure 1C), users can retrieve epigenomic information by querying a gene ID or a genome region. On the homepage, users could query a gene ID and fetch all levels of epigenome information related to this gene simultaneously (Supplemental Figure 2A). The quick search results are intended to provide researchers with a concise overview of the epigenomics information pertaining to the gene of interest. The quick search page contains a WashU browser window that includes a few tracks: the seven types of histone modifications and DNA methylation of the Arabidopsis genome as well as chromatin states and single-nucleotide polymorphism (SNP) information (Supplemental Figure 2B); specific protein-mediated chromatin loops that link regulatory elements (e.g., enhancers) or SNPs physically close to their target genes and can be characterized using 3C-based methods (e.g., Hi-C and HiChIP), histone modification, and chromatin accessibility information (Supplemental Figure 2C); and profiles of cross-tissue differential expression and differential methylation levels (Supplemental Figures 2D and 2E). For more detailed information, users have the option to navigate to the corresponding page for in-depth exploration. Histones are often covalently modified to affect various chromatin-dependent processes, including gene transcription. Within the histone modification page, users can search for genes or chromosome regions and retrieve information regarding the various histone modifications and their distribution over the whole genome in different samples (Figure 1D). AraENCODE also provides information on chromatin accessibility, which is important for establishing and maintaining cell identity and can be characterized by ATAC-seq, FAIRE-seq, DNase-seq, and MNase-seq. This information can be viewed in a table or visualized through the WashU epigenome browser. As a validation, we examined the abundance of histone modifications at the well-documented genes in our database, such as the flowering time regulatory gene AT5G10140 (also known as FLC or RSB6). Previous studies have indicated a correlation between the overexpression of COLDAIR (long non-coding RNA) in 35S:COLDAIR and down-regulation of the repressive histone mark H3K27me3, as well as the up-regulation of the active histone mark H3K4me3 at the FLC gene, which upregulates FLC expression and ultimately impacts the flowering process (Liu et al., 2020Liu Z.-W. Zhao N. Su Y.-N. Chen S.-S. He X.-J. Exogenously overexpressed intronic long noncoding RNAs activate host gene expression by affecting histone modification in Arabidopsis.Sci. Rep. 2020; 10: 3094https://doi.org/10.1038/s41598-020-59697-7Crossref PubMed Scopus (14) Google Scholar). The abundance of these histone modifications and gene expressions in our database is highly consistent with previous reports (Supplemental Figure 3). DNA methylation (5-methylcytosine), a stable epigenetic mark, underpins the landscape of histone modifications in Arabidopsis (Zhao et al., 2022Zhao L. Zhou Q. He L. Deng L. Lozano-Duran R. Li G. Zhu J.-K. DNA methylation underpins the epigenomic landscape regulating genome transcription in Arabidopsis.Genome Biol. 2022; 23: 197https://doi.org/10.1186/s13059-022-02768-xCrossref PubMed Scopus (6) Google Scholar). Three sequence contexts of DNA methylation (CG, CHG, and CHH) are catalyzed by different methyltransferases and interdependently occur in different contexts (Law and Jacobsen, 2010Law J.A. Jacobsen S.E. Establishing, maintaining and modifying DNA methylation patterns in plants and animals.Nat. Rev. Genet. 2010; 11: 204-220https://doi.org/10.1038/nrg2719Crossref PubMed Scopus (2666) Google Scholar). On the "DNA methylation" page, users can quickly browse the methylation levels across tissues (Figure 1F). The gene search module enables users to retrieve DNA methylation levels of three sequence contexts (CG, CHG, and CHH) in the gene body or promotor regions (Supplemental Figures 4A–4D) and browse it at a single-base resolution across all samples (Supplemental Figure 4E) (Zhou et al., 2021Zhou Q. Guan P. Zhu Z. Cheng S. Zhou C. Wang H. Xu Q. Sung W.-k. Li G. ASMdb: a comprehensive database for allele-specific DNA methylation in diverse organisms.Nucleic Acids Res. 2022; 50: D60-D71https://doi.org/10.1093/nar/gkab937Crossref PubMed Scopus (9) Google Scholar). As an example, we investigated the expression levels and methylation levels of two genes, AT3G50770 (CML4) and AT5G43260, in three mutants, including met1, ddcc (drm1 drm2 cmt2 cmt3), and mddcc (met1 drm1 drm2 cmt2 cmt3), with different degrees of demethylation, as well as in the wild type (Columbia-0). We found that the expression of AT3G50770 is up-regulated in the mutants (Supplemental Figure 5A), especially in the mddcc mutant, whose DNA methylation in all contexts is eliminated. Meanwhile, the expression of AT5G43260 does not change significantly (Supplemental Figure 5B). Such results are consistent with previous reports (Naydenov et al., 2015Naydenov M. Baev V. Apostolova E. Gospodinova N. Sablok G. Gozmanova M. Yahubyan G. High-temperature effect on genes engaged in DNA methylation and affected by DNA methylation in Arabidopsis.Plant Physiol. Biochem. 2015; 87: 102-108https://doi.org/10.1016/j.plaphy.2014.12.022Crossref PubMed Scopus (83) Google Scholar). Chromatin states, which reflect genome activity and transcriptional regulation in eukaryotes (Roudier et al., 2011Roudier F. Ahmed I. Bérard C. Sarazin A. Mary-Huard T. Cortijo S. Bouyer D. Caillieux E. Duvernois-Berthet E. Al-Shikhley L. et al.Integrative epigenomic mapping defines four main chromatin states in Arabidopsis.EMBO J. 2011; 30: 1928-1938https://doi.org/10.1038/emboj.2011.103Crossref PubMed Scopus (492) Google Scholar), are mainly determined by histone modification and DNA methylation. To train the model and achieve the precise classification of chromatin states (Zhao et al., 2022Zhao L. Zhou Q. He L. Deng L. Lozano-Duran R. Li G. Zhu J.-K. DNA methylation underpins the epigenomic landscape regulating genome transcription in Arabidopsis.Genome Biol. 2022; 23: 197https://doi.org/10.1186/s13059-022-02768-xCrossref PubMed Scopus (6) Google Scholar), we utilized ChromHMM along with six types of modifications, RNA polymerase II occupancy, and methylation across diverse varieties. The Arabidopsis genome can be segmented into 12 chromatin states with distinct percentages of genome coverage. On the chromatin states page, users can select different varieties and search a gene or a region to obtain the chromatin state information at a resolution of 200 base pairs (Figure 1H). Previous studies have shown that gene expression can be regulated by the 3D structure of chromatin (Zhang et al., 2019Zhang H. Zheng R. Wang Y. Zhang Y. Hong P. Fang Y. Li G. Fang Y. The effects of Arabidopsis genome duplication on the chromatin organization and transcriptional regulation.Nucleic Acids Res. 2019; 47: 7857-7869https://doi.org/10.1093/nar/gkz511Crossref PubMed Scopus (40) Google Scholar). In AraENCODE, we collected the Hi-C, capture Hi-C, and HiChIP datasets to build the "3D genome" to show chromatin interactions in different samples. Interactions matrixes from Hi-C and capture Hi-C are visualized using the HiGlass browser, enabling the examination of global variations among different samples. Hi-C and capture Hi-C are also used to reconstruct 3D structures, detect chromatin loops, and make compartment analyses. These results can be checked in the WashU browser. HiChIP has been developed to detect and quantify chromatin contacts anchored at genomic regions associated with specific DNA-binding proteins or histone modifications, similar to ChIA-PET (Fullwood et al., 2009Fullwood M.J. Liu M.H. Pan Y.F. Liu J. Xu H. Mohamed Y.B. Orlov Y.L. Velkov S. Ho A. Mei P.H. et al.An oestrogen-receptor-α-bound human chromatin interactome.Nature. 2009; 462: 58-64https://doi.org/10.1038/nature08497Crossref PubMed Scopus (1281) Google Scholar; Mumbach et al., 2016Mumbach M.R. Rubin A.J. Flynn R.A. Dai C. Khavari P.A. Greenleaf W.J. Chang H.Y. HiChIP: efficient and sensitive analysis of protein-directed genome architecture.Nat. Methods. 2016; 13: 919-922https://doi.org/10.1038/nmeth.3999Crossref PubMed Scopus (605) Google Scholar). The HiChIP workflow includes the following steps: cell lysis and permeabilization, in situ restriction enzyme digestion, biotin labeling and in situ proximity ligation, ChIP (chromatin shearing, IP, and wash), and library preparation. By integrating in situ Hi-C and ChIP, HiChIP is able to detect long-range chromatin contacts mediated or associated with specific protein factors at kilobase-scale resolution with significantly reduced sequencing costs compared with in situ Hi-C (Fullwood et al., 2009Fullwood M.J. Liu M.H. Pan Y.F. Liu J. Xu H. Mohamed Y.B. Orlov Y.L. Velkov S. Ho A. Mei P.H. et al.An oestrogen-receptor-α-bound human chromatin interactome.Nature. 2009; 462: 58-64https://doi.org/10.1038/nature08497Crossref PubMed Scopus (1281) Google Scholar; Rao et al., 2014Rao S.S.P. Huntley M.H. Durand N.C. Stamenova E.K. Bochkov I.D. Robinson J.T. Sanborn A.L. Machol I. Omer A.D. Lander E.S. Aiden E.L. A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping.Cell. 2014; 159: 1665-1680https://doi.org/10.1016/j.cell.2014.11.021Abstract Full Text Full Text PDF PubMed Scopus (4061) Google Scholar). With HiChIP in Arabidopsis, Huang and colleagues show that H3K27me3 is a key regulator of global and local facultative heterochromatin topology and is tightly linked to 3D organization (Huang et al., 2021Huang Y. Sicar S. Ramirez-Prado J.S. Manza-Mianza D. Antunez-Sanchez J. Brik-Chaouche R. Rodriguez-Granados N.Y. An J. Bergounioux C. Mahfouz M.M. et al.Polycomb-dependent differential chromatin compartmentalization determines gene coregulation in Arabidopsis.Genome Res. 2021; 31: 1230-1244https://doi.org/10.1101/gr.273771.120Crossref PubMed Scopus (21) Google Scholar). To detect significant chromatin interactions, HiChIP datasets in AraENCODE were reprocessed using ChIA-PET tool (v.3), a computational package designed for processing sequence data derived from ChIA-PET or HiChIP experiments (Sun et al., 2019Li G. Sun T. Chang H. Cai L. Hong P. Zhou Q. Chromatin Interaction Analysis with Updated ChIA-PET Tool (V3).Genes. 2019; 10: 554https://doi.org/10.3390/genes10070554Crossref PubMed Scopus (13) Google Scholar). Interaction networks, constructed through chromatin loops, reveal gene–gene or gene–enhancer interactions (an example that involved gene AT5G10140, also known as FLC or RSB6, is shown). In the "3D genome" module, users can query a gene or region, especially phenotype-associated genome-wide association study (GWAS) SNPs, to find its target loci and visualize these interactions using the built-in browser (Figure 1E). And an interaction network is provided for users to examine the loops connecting the genes and regulatory elements (Figure 1E). We investigated the SNPs associated with FLC expression in AraGWAS (1241 SNPs in total, 623 within the FLC gene body and 618 outside of FLC gene body) (Supplemental Table 2). 91% (563/618) of the SNPs outside of FLC gene body are located within the anchors (PET count ≥ 10) interacting with FLC (Togninalli et al., 2019Togninalli M. Seren Ü. Freudenthal J.A. Monroe J.G. Meng D. Nordborg M. Weigel D. Borgwardt K. Korte A. Grimm D.G. AraPheno and the AraGWAS Catalog 2020: a major database update including RNA-Seq and knockout mutation data for Arabidopsis thaliana.Nucleic Acids Res. 2020; 48: D1063-D1068https://doi.org/10.1093/nar/gkz925Crossref PubMed Scopus (34) Google Scholar), which includes a lot of SNPs with high scores in AraGWAS (e.g., chr5:3 181 599 G>C and chr5:3 181 811 C>A). These SNPs that overlapped with the interaction regions were concentrated in two regions (chr5:3 168 751–3 174 117 and chr5:3 180 175–3 189 848), involving the genes AT5G10120, AT5G10130, and AT5G10150, and the detailed interaction networks are also provided for a more comprehensive view in Supplemental Figure 6. These findings suggest a potential mechanism in which these extragenic SNPs contribute to the regulation of FLC expression. To our knowledge, this is the most comprehensive resource for Arabidopsis 3D genome data to date. AraENCODE also collected high-throughput transcriptomic data from various tissues. AraENCODE includes expression levels of specific genes as well as miRNA (Figure 1H). Although there are existing RNA-seq databases dedicated to Arabidopsis, such as ARS (Zhang et al., 2020Zhang H. Zhang F. Yu Y. Feng L. Jia J. Liu B. Li B. Guo H. Zhai J. A Comprehensive Online Database for Exploring ∼20,000 Public Arabidopsis RNA-Seq Libraries.Mol. Plant. 2020; 13: 1231-1233https://doi.org/10.1016/j.molp.2020.08.001Abstract Full Text Full Text PDF PubMed Scopus (64) Google Scholar), we firmly believe that incorporating RNA-seq data into our database will provide users with a more convenient means of integrating transcriptomic and epigenomic information for further analysis. As Arabidopsis high-throughput data accumulate, many excellent web-based resources have been developed. For example, the AraGWAS Catalog (Togninalli et al., 2019Togninalli M. Seren Ü. Freudenthal J.A. Monroe J.G. Meng D. Nordborg M. Weigel D. Borgwardt K. Korte A. Grimm D.G. AraPheno and the AraGWAS Catalog 2020: a major database update including RNA-Seq and knockout mutation data for Arabidopsis thaliana.Nucleic Acids Res. 2020; 48: D1063-D1068https://doi.org/10.1093/nar/gkz925Crossref PubMed Scopus (34) Google Scholar) focuses on genotype–phenotype associations, ARS (Zhang et al., 2020Zhang H. Zhang F. Yu Y. Feng L. Jia J. Liu B. Li B. Guo H. Zhai J. A Comprehensive Online Database for Exploring ∼20,000 Public Arabidopsis RNA-Seq Libraries.Mol. Plant. 2020; 13: 1231-1233https://doi.org/10.1016/j.molp.2020.08.001Abstract Full Text Full Text PDF PubMed Scopus (64) Google Scholar) collects high-throughput transcriptome data, and the ChIP-Hub focuses on regulome data of plants (Fu et al., 2022Fu L.-Y. Zhu T. Zhou X. Yu R. He Z. Zhang P. Wu Z. Chen M. Kaufmann K. Chen D. ChIP-Hub provides an integrative platform for exploring plant regulome.Nat. Commun. 2022; 13: 3413https://doi.org/10.1038/s41467-022-30770-1Crossref PubMed Scopus (10) Google Scholar). However, to our knowledge, AraENCODE is the first comprehensive, user-friendly, and sustainably maintained database for epigenome and 3D genome in Arabidopsis. AraENCODE provides a comprehensive map of the Arabidopsis epigenetic landscape, facilitating rapid access to the epigenomic information, chromatin states, and 3D interactions associated with GWAS SNPs and their target genes, which could further improve our knowledge of gene regulation mechanisms. In the future, with the rapid generation of high-throughput data, we will regularly integrate additional resources and valuable analytical tools for Arabidopsis epigenomics and 3D genomics. The source code of the data analysis pipeline is available from the GitHub repository (https://github.com/versarchey/AraENCODE-pipeline). This work was supported by the Fundamental Research Funds for the Central Universities (2662021PY005 and 2662023PY002 to G.L.).
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Orange应助遇安采纳,获得10
刚刚
潇洒发夹发布了新的文献求助10
1秒前
LHL发布了新的文献求助10
3秒前
Mike完成签到,获得积分10
4秒前
小菜发布了新的文献求助10
4秒前
wangq246完成签到,获得积分10
4秒前
田様应助肖耶啵采纳,获得10
5秒前
科研通AI5应助杨咩咩采纳,获得30
5秒前
HEAUBOOK举报明理的小蜜蜂求助涉嫌违规
5秒前
5秒前
善学以致用应助xiaokang123采纳,获得200
6秒前
世纪飞虎完成签到,获得积分10
8秒前
9秒前
10秒前
11秒前
高有财完成签到 ,获得积分10
11秒前
sasa发布了新的文献求助10
11秒前
bidibi完成签到,获得积分10
11秒前
活泼的如容完成签到,获得积分20
12秒前
tom完成签到,获得积分10
13秒前
Monster发布了新的文献求助10
14秒前
bidibi发布了新的文献求助10
15秒前
YifanWang应助hurunxuan采纳,获得20
15秒前
15秒前
醍醐不醒完成签到,获得积分10
15秒前
16秒前
我超超超无奈完成签到,获得积分10
16秒前
17秒前
白日幻想家完成签到 ,获得积分10
18秒前
18秒前
金枪鱼子发布了新的文献求助10
19秒前
xiaokang123完成签到,获得积分10
19秒前
充电宝应助潇洒发夹采纳,获得10
19秒前
20秒前
轻松小张应助FIN采纳,获得60
20秒前
20秒前
乐乐应助Bink采纳,获得10
20秒前
笨蛋琪露诺完成签到,获得积分10
21秒前
雪白鸿涛发布了新的文献求助10
21秒前
22秒前
高分求助中
Technologies supporting mass customization of apparel: A pilot project 600
武汉作战 石川达三 500
Arthur Ewert: A Life for the Comintern 500
China's Relations With Japan 1945-83: The Role of Liao Chengzhi // Kurt Werner Radtke 500
Two Years in Peking 1965-1966: Book 1: Living and Teaching in Mao's China // Reginald Hunt 500
Understanding Interaction in the Second Language Classroom Context 300
Fractional flow reserve- and intravascular ultrasound-guided strategies for intermediate coronary stenosis and low lesion complexity in patients with or without diabetes: a post hoc analysis of the randomised FLAVOUR trial 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3810335
求助须知:如何正确求助?哪些是违规求助? 3354856
关于积分的说明 10372789
捐赠科研通 3071306
什么是DOI,文献DOI怎么找? 1686850
邀请新用户注册赠送积分活动 811269
科研通“疑难数据库(出版商)”最低求助积分说明 766510