T2T genome, pan‐genome analysis, and heat stress response genes in Rhododendron species

基因组 基因 生物 热应力 遗传学 基因组大小 动物科学
作者
Xiaojing Wang,Ping Zhou,Xiaoyu Hu,Yun Bai,Chenhao Zhang,Yanhong Fu,Ruirui Huang,Suzhen Niu,Xiaoming Song
出处
期刊:iMeta [Wiley]
卷期号:4 (2)
标识
DOI:10.1002/imt2.70010
摘要

This study reports the first high-quality telomere-to-telomere (T2T) Rhododendron liliiflorum genome with 11 chromosomes that are gap free. The 24 telomeres and all 13 centromeres detected in this genome, which reached the highest quality gold level. In addition, other three Rhododendron species were sequenced and assembled to the chromosomal level. Based on 15 Rhododendron genomes, we conducted a pan-genome analysis of genus Rhododendron. Combining the genome and whole transcriptome sequencing, we identified several key genes and miRNAs related to the heat stress, which were further verified by transgenic experiments. Our findings provide rich resources for comparative and functional genomics studies of Rhododendron species. Rhododendron belongs to Ericaceae, which is one of the largest genus of woody plants. There are approximately 1000 Rhododendron species worldwide, and China is an important distribution center [1]. They underwent evolutionary radiations in Himalaya-Hengduan Mountains, which are the world's biodiversity hotspots [2]. Rhododendron species are highly prized in horticulture due to their ornamental value. Global climate change causes a rise in temperatures, while heat stress can influence the growth and development of plants [3, 4]. However, Rhododendron plants are typically adapted to cooler climates. Multi-omics analysis and molecular techniques can be used to explore the heat stress response mechanism, which is of great significance for breeding heat-tolerant varieties and expanding the Rhododendron cultivation range. Although several genomic studies have been conducted separately on Rhododendrons, high-quality telomere-to-telomere (T2T) genomes and large-scale pan-genome analysis of Rhododendron are still lacking, which limits our understanding of genetic diversity and gene mining [5-14]. The T2T genome can provide more complete and comprehensive genomic information for a species [15]. Therefore, this study aims to resolve the first high-quality T2T Rhododendron genome. Then, 15 Rhododendron genomes were used for pan-genome analysis, identifying lots of structural variations (SVs), which provided rich resources for the mining of important functional genes and molecular breeding of Rhododendron. Here, we perform the de novo genome sequencing of four Rhododendron plants (Rhododendron liliiflorum, Rhododendron decorum, Rhododendron platypodum, and Rhododendron concinnum) by PacBio HiFi, Oxford Nanopore Technology (ONT), Illumina, and Hi-C technology (Figure 1A, Tables S1–6). The estimated genome size by K-mer was 759.08, 581.05, 593.47, and 1356.22 Mb for R. liliiflorum, R. decorum, R. platypodum, and R. concinnum, respectively, which was further verified by flow-cytometry (Table S1, Figure 1B). We found that R. concinnum genome is almost twice as large as the other three species. Therefore, we further analyzed the chromosome karyotype using flow cytometry, for the first time, discovered that R. concinnum is a tetraploid with a karyotype of 2n = 4x = 52, which is distinctly different from the other three diploid species (2n = 2x = 26) (Figure 1C, Table S1). The assembled genome size was 793.25, 649.87, 652.27, and 1321.11 Mb for four species (Table S1). The chromosomal anchored ratio was over 97.90% among four species by Hi-C (Figure 1D, Table S5). We obtained four high-quality assembled genomes with scaffold N50 over 48.68 Mb (Table S6). Core Eukaryotic Genes Mapping Approach value from 95.63% to 99.56%, Benchmarking Universal Single-Copy Orthologs (BUSCO) value from 96.65% to 97.34%, and reads mapping rate exceeded 99.40% (Table S7). Most importantly, we have obtained a high-quality T2T R. liliiflorum genome, which consists of 13 chromosomes, with 24 telomeres and 13 centromeres detected (Figure S1A, Tables S8–S11). Eleven of the chromosomes are gap-free from telomere to telomere, and the other two chromosomes only have one gap. The contig N50 of R. liliiflorum genome was over 58.56 Mb, which was larger than that of most previous Rhododendron genomes [1, 6, 7, 12]. The genome completeness is assessed by BUSCO (96.65%), and the genome consistency quality value (QV) is 43.71 (Table S1). Genome LTR assembly index (LAI) value is 21.15 (Figure S1B), indicating it has reached the highest quality gold level (LAI ≥ 20) [16]. Repetitive sequences accounted for over 49.10% of the four genomes, and most repetitive sequences were long-terminal repeats (LTRs) (Figure 1E, Table S11). A total of 41,406, 41,084, 40,556, and 83,203 genes was predicted in the four genomes (Table S12). Over 97.15% of BUSCO genes were detected, indicating high completeness of prediction (Table S13). Over 92.16% of genes were annotated by NR, eggNOG, GO, KEGG, TrEMBL, KOG, Swissprot, and Pfam databases (Table S14). The 2355, 4862, 2852, and 9511 noncoding RNAs were detected in the four species (Table S15). The genus Rhododendron, renowned for its diverse floral displays, has drawn significant scientific attention, with several genomes being decoded in recent years since the first R. delavayi genome was released [13]. Several Rhododendron genomes have been reported, such as R. griersonianum [11], R. Henanense [9], R. Irroratum [6], R. kiyosumense [10], R. Ripense [10], R. Vialii [8], R. nivale [5], and R. williamsianum [12]. These genomes are laying the groundwork for pan-genome study [17-20]. Based on these four high-quality genomes, along with 11 previously published genomes, a pan-genome analysis of Rhododendron genus was conducted (Figure 1F, Table S16). The T2T-level R. liliiflorum genome was selected as the reference. This super-pangenome has expanded the T2T-level R. liliiflorum genome, by adding 394.57 Mb and 14,424 genes. The number of gene families across 15 species is 45,731, including 5734 core gene families, 37,027 dispensable gene families, and 2970 private gene families (Figures 1G and S2A, Table S17). An UpSet plot was used to show the relationships of gene family sharing and uniqueness among 15 species. Finally, we constructed a distribution map of the presence and absence of gene families based on clustering analysis (Figure 1H). Among 2970 private gene families, there were the most species-specific genes in R. irroratum (1705) (Table S17, Figure S2B). The functional enrichment analysis indicated that "Sesquiterpenoid and triterpenoid biosynthesis" and "Linoleic acid metabolism" pathways were significantly enriched (Figure S3). A total of 121,185 core genes were identified, and R. ovatum had the highest number (9847) (Figure S2B). Functional enrichment analysis indicated that gene pathways related to flower color and fragrance were significantly enriched, such as limonene and pinene degradation (Figure S4). We perform a comprehensive identification of variations such as single nucleotide polymorphisms (SNPs), insertions and deletions (InDels), and SVs in Rhododendron based on pan-genome analysis using T2T genome as reference (Figures 1I–L and S5). The tetraploid R. concinnum had the highest number of SNPs (1,876,446) and InDels (447,281) (Figure 1I, Tables S18–19). Functional enrichment analysis showed that genes contained SNPs and InDels were significantly enriched in "Carbon metabolism" and "Biosynthesis of amino acids" pathways. R. concinnum had the highest number of SVs, reaching 7694 (Table S20). Meanwhile, we further subdivided SVs into duplication (DUP), translocation (TRANS), and inversion (INV), and found the former's quantity exceeded the latter two in most Rhododendron species (Figures S6–S7). Genes with SVs showed a distinct pattern compared to those with SNPs or InDels, focusing on RNA polymerase and mRNA surveillance pathway. We identified 70,759 LTRs in the whole genome of 15 Rhododendron species, and R. griersonianum had the highest number of LTRs (7323) (Table S21). We found that most Rhododendron species only experienced one outbreak of insertion event during the last 1 million years (Mya), while R. delavayi, R. molle, and R. williamsianum experienced two outbreaks. Two events in R. williamsianum occurred at 1.53 and 2.94 Mya, earlier than other Rhododendron species. We performed clustering on LTRs of 15 species to obtain the shared LTRs within each cluster. The results showed that 2622 LTRs could be clustered, with R. platypodum having the highest number (531). R. liliiflorum had the highest number of specific LTRs (109), while no species-specific LTRs were found in R. williamsianum. The clustering diagram showed that R. williamsianum had the highest proportion of shared LTRs with other species (Figure 1M). Furthermore, the distribution density of LTRs in the middle of chromosomes was greater than two ends (Figure 1N). Through collinearity analysis, we found that 15 Rhododendron genomes generally exhibit good collinearity (Figure 1O). The number of collinearity blocks from 336 (R. henanense vs. R. delavayi) to 692 (R. irroratum vs. R. prattii). Additionally, some genomic transpositions were detected, such as the terminal regions of chromosome 7 in R. ovatum compared to R. simsii and R. henanense. To explore heat-resistant genes and regulatory mechanisms of Rhododendrons, we conducted whole transcriptome sequencing under heat treatment of CK, heat treatment of 3 days (H3) and 6 days (H6) (Figure 2A, Table S22). A total of 50,648 mRNAs, 17,476 lncRNAs, 448 miRNAs, and 6299 circRNAs were identified (Figure 2B). Furthermore, 632 mRNAs, 21 lncRNAs, and 6 miRNAs were differently expressed and shared among CK, H3, and H6 treatments (Figure 2C). We selected two representative pairs of miRNAs and related target genes for functional verification due to their response to heat treatment. Expression of the target genes was significantly upregulated at 3 and 6 h after heat treatment, while expression of the small RNAs was significantly downregulated. We further investigate the effects of miR177 on RdbHLH153 (Rhdel02G0118700) expression and miR49 on RdMYB1R1 (Rhdel08G0208700) expression, respectively. Firefly luciferase was fused to the C-terminal of RdbHLH153 and RdMYB1R1, respectively, and miR49 and miR177 were separately inserted into SK vectors (Figure 2D). Results showed that the target sites of miR177 in RdbHLH153 and miR49 in RdMYB1R1 were slightly altered (Figure 2E). The infiltrated areas within the single Nicotiana benthamiana leaves were infiltrated with mixtures of RdbHLH153/RdMYB1R1 and empty SK vector (mixed and infiltrated together), or the mixtures of RdbHLH153 and miR177 (RdMYB1R1 and miR49). All showed induction of luciferase signals, whereas the overexpressed R. delavayi miR177 and miR49 could abolish signals produced by RdbHLH153/RdMYB1R1 (Figure 2E–G). To further investigate the roles of RdbHLH153 and RdMYB1R1 in heat stress, transgenic Arabidopsis lines overexpressing RdbHLH153 and RdMYB1R1 were generated using floral-dip method (Table S23). After 36 h heat treatment, the growth of transgenic plants was significantly better than WT (Figure 2H). After heat treatment, the seedlings were returned to normal conditions for 5 days, and it was found that transgenic plants turn alive, while all leaves of WT became yellowish. This result indicated that RdbHLH153 and RdMYB1R1 play important roles in enhancing heat tolerance. DAB and NBT staining revealed that the contents of H2O2 and O−2 in RdbHLH153/RdMYB1R1-OE lines were significantly decreased comparing with WT plants (Figure 2I). The genus Rhododendron, renowned for its diverse floral displays, has drawn significant scientific attention, with several genomes being decoded in recent years since the first R. delavayi genome was released [13]. Through genome sequencing of nine Rhododendron species, researchers have revealed the molecular mechanisms underlying the formation of flower color diversity [1]. In addition, several Rhododendron genomes have been reported, such as R. griersonianum [11], R. Henanense [9], R. Irroratum [6], R. Kiyosumense [10], R. Ripense [10], R. Vialii [8], R. nivale [5], and R. williamsianum [12]. These genomes and related database are laying the groundwork for a more comprehensive understanding of the comparative and functional genomics study [17-20]. Although several genomes have been sequenced for Rhododendron, none of them reached the T2T level, especially for the large-scale pan-genome analysis of Rhododendron using a T2T genome as reference. Here, R. liliiflorum genome was deciphered at the T2T level. Compared to the genomes previously released, we obtained a higher quality and more complete genome for Rhododendron. The contig N50 of R. liliiflorum genome was over 58.56 Mb, which was larger than that of most previous Rhododendron genomes [1, 6, 7, 12]. Furthermore, large-scale pan-genome analysis of 15 Rhododendron genomes identified lots of SVs, which is used for understanding the genetic diversity behind the various morphotypes. In conclusion, we report the first high-quality T2T R. liliiflorum genome. Genome sequencing of three other Rhododendron species was completed, and R. concinnum was discovered to be tetraploid. Pan-genome analysis of 15 Rhododendron genomes detected structure variations and identified several key heat-stress-related genes. Genome sequencing, genome size estimation, and chromosome karyotype analysis Data quality control and de novo genome assembly Hi-C data processing and assisted genome assembly Telomere-to-Telomere (T2T) genome analysis Genomic repetitive sequence annotation Gene prediction, evaluation, and functional annotation Noncoding RNA prediction RNA extraction Graph-based genome construction Core and noncore gene family analysis Variant analysis Long-terminal repeats (LTR) insertion time analysis Genome collinearity and visualization Plant materials and treatment Library construction and whole transcriptome sequencing Analysis of mRNAs and noncoding RNAs Dual-luciferase transient expression system Generation of transgenic plant materials and heat stress treatment Detection of reactive oxygen species (ROS). Xiaojing Wang: Conceptualization; data curation; investigation; funding acquisition; supervision; writing—original draft; project administration; writing—review and editing; resources; validation; visualization; formal analysis. Ping Zhou: Validation; data curation; writing—original draft; formal analysis; visualization; writing—review and editing. Xiaoyu Hu: Data curation; validation; formal analysis; visualization; writing—original draft; writing—review and editing. Yun Bai: Data curation; formal analysis; visualization; validation; writing—original draft. Chenhao Zhang: Data curation; formal analysis; visualization; validation; writing—original draft; methodology. Yanhong Fu: Formal analysis; visualization; validation; writing—original draft. Ruirui Huang: Writing—review and editing; supervision; data curation. Niu Suzhen: Investigation; supervision; writing—original draft; writing—review and editing; resources; validation; data curation. Xiaoming Song: Conceptualization; data curation; formal analysis; visualization; writing—original draft; writing—review and editing; project administration; supervision; investigation; validation; funding acquisition; resources; methodology; software. This work was supported by the National Natural Science Foundation of China Project (32260097, 32172583), the National Key Research and Development Program of China (2023YFF1002000), Tangshan Science and Technology Plan Project (24130219C), the Natural Science Foundation for Distinguished Young Scholar of Hebei Province (C2022209010). The genome sequencing was conducted in the BioMarker Corporation. We apologize for not being able to cite additional work owing to space limitations. The authors declare no conflict of interest. No animals or humans were involved in this study. All the sequencing data have been deposited in NCBI under submission number SUB15033098, BioProject accession number PRJNA1215314 (https://www.ncbi.nlm.nih.gov/sra/PRJNA1215314). The data and scripts used are saved in GitHub (https://github.com/songxm-ncst/Rhododendron). All the genomic annotation datasets have also been curated in the download interface of TEGR database (http://www.tegr.com.cn) with species latin name. Supplementary materials (methods, figures, tables, graphical abstract, slides, videos, Chinese translated version and update materials) may be found in the online DOI or iMeta Science http://www.imeta.science/. Figure S1: The assessment of Telomere-to-Telomere (T2T) genome of R. liliiflorum. Figure S2: The gene family analysis of 15 species. Figure S3: The KEGG functional enrichment analysis on all species-specific genes in R. liliiflorum. Figure S4: The KEGG functional enrichment analysis on all core cluster genes of 15 Rhododendron genomes. Figure S5: The comparative genome visualization map shows the homology and rearrangement between each Rhododendron species and the reference T2T genome of R. liliiflorum. Figure S6: The length distribution of three structural variations (SVs) types, including duplication (DUP), inversion (INV), and translocation (TRANS) in the Rhododendron genome. Figure S7: The length of duplication (DUP), translocation (TRANS), and inversion (INV) type of structural variations (SVs) in each Rhododendron species. Table S1: Statistics of Rhododendron genome sequencing, assembly and annotation. Table S2: Statistics of sequencing data obtained by illumina Hiseq platform for genome survey of four Rhododendron species. Table S3: Statistics of sequencing data of four Rhododendron species by Pacbio HiFi platform. Table S4: Statistics of sequencing data of Rhododendron liliiflorum by ONT platform. Table S5: The assembled length and cluster number of each chromosome of Rhododendron liliiflorum genome by HIC. Table S6: Statistics of sequencing data of four Rhododendron species by HIC platform. Table S7: CEGMA and BUSCO assessment of assembled genome by Pacbio HiFi platform. Table S8: The assembled chromosome length and gap informatioin of Rhododendron liliiflorum T2T genome. Table S9: The telomere position of assembled Rhododendron liliiflorum T2T genome. Table S10: The centromere position of assembled Rhododendron liliiflorum T2T genome. Table S11: The statistics of the repeat sequence classification in four Rhododendron genomes. Table S12: The statistics of predicted gene number in the four assembled Rhododendron genomes. Table S13: BUSCO assessment of genes in the four Rhododendron genomes. Table S14: Statistics of gene functional annotations in the four Rhododendron genomes. Table S15: The ncRNA number of the four Rhododendron genomes. Table S16: Statistics of gene family in the 15 Rhododendron genomes by pan-genome analysis. Table S17: Statistics of core, dispensable, and private number in the 15 Rhododendron genomes by pan-genome analysis. Table S18: Statistics of SNP number in the 15 Rhododendron genomes. Table S19: Statistics of INDEL number in the 15 Rhododendron genomes. Table S20: Statistics of SV number in the 15 Rhododendron genomes. Table S21: Statistics of LTR cluster in the 15 Rhododendron genomes. Table S22: Whole transcriptome sequencing data assessment of Rhododendron species under heat treatment. Table S23: Transgenic and dual luciferase assay primers used in this study. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
莲枳榴莲完成签到,获得积分10
刚刚
香蕉若风发布了新的文献求助10
刚刚
刚刚
1秒前
Mr.Young发布了新的文献求助10
1秒前
2秒前
易大人发布了新的文献求助10
2秒前
无奈山晴完成签到 ,获得积分10
2秒前
亮晶晶完成签到 ,获得积分10
3秒前
Dr完成签到,获得积分10
3秒前
青栀发布了新的文献求助10
3秒前
WZH发布了新的文献求助10
5秒前
5秒前
李爱国应助kellywang采纳,获得10
5秒前
5秒前
鹿仪发布了新的文献求助10
6秒前
peach发布了新的文献求助10
7秒前
无奈山晴关注了科研通微信公众号
8秒前
9秒前
Owen应助BOSLobster采纳,获得10
10秒前
11秒前
良辰应助勤劳怜寒采纳,获得10
11秒前
良辰应助勤劳怜寒采纳,获得10
11秒前
良辰应助勤劳怜寒采纳,获得10
11秒前
11秒前
研友_wZrxbL发布了新的文献求助10
13秒前
研友_VZG7GZ应助springovo采纳,获得10
13秒前
kellywang发布了新的文献求助10
13秒前
peach完成签到,获得积分10
13秒前
13秒前
彭于彦祖应助努力读文献采纳,获得20
15秒前
hayden发布了新的文献求助10
16秒前
研友_ndDGVn完成签到,获得积分10
17秒前
17秒前
独享发布了新的文献求助10
18秒前
18秒前
18秒前
NexusExplorer应助yxy采纳,获得30
20秒前
一一发布了新的文献求助20
21秒前
加菲丰丰应助felix采纳,获得30
21秒前
高分求助中
Africanfuturism: African Imaginings of Other Times, Spaces, and Worlds 3000
Electron microscopy study of magnesium hydride (MgH2) for Hydrogen Storage 1000
Exhibiting Chinese Art in Asia: Histories, Politics and Practices 700
1:500万中国海陆及邻区磁力异常图 600
相变热-动力学 520
生物降解型栓塞微球市场(按产品类型、应用和最终用户)- 2030 年全球预测 500
Nucleophilic substitution in azasydnone-modified dinitroanisoles 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3896977
求助须知:如何正确求助?哪些是违规求助? 3440810
关于积分的说明 10818835
捐赠科研通 3165748
什么是DOI,文献DOI怎么找? 1748945
邀请新用户注册赠送积分活动 845077
科研通“疑难数据库(出版商)”最低求助积分说明 788423