The gap-free potato genome assembly reveals large tandem gene clusters of agronomical importance in highly repeated genomic regions

生物 基因组 顺序装配 串联 基因 计算生物学 遗传学 基因组编辑 生物技术 进化生物学 基因表达 复合材料 转录组 材料科学
作者
Xiaohui Yang,Lingkui Zhang,Xiao Guo,Jianfei Xu,Kang Zhang,Yinqing Yang,Yang Yu,Yinqiao Jian,Daofeng Dong,Sanwen Huang,Cheng Feng,Guangcun Li
出处
期刊:Molecular Plant [Elsevier BV]
卷期号:16 (2): 314-317 被引量:70
标识
DOI:10.1016/j.molp.2022.12.010
摘要

Potato is a vital food security crop and is ranked as the world's third most important food crop after rice and wheat. In 2011, the first genome assembly of a doubled monoploid potato DM1-3 516 R44 (DM) was released (Potato Genome Sequencing Consortium, 2011Potato Genome Sequencing ConsortiumGenome sequence and analysis of the tuber crop potato.Nature. 2011; 475: 189-195Crossref PubMed Scopus (1511) Google Scholar), which has been widely used as one of the most popular reference genomes in the last decade and served as a valuable resource in plant genomics and potato genetics community (Leisner et al., 2018Leisner C.P. Hamilton J.P. Crisovan E. Manrique-Carpintero N.C. Marand A.P. Newton L. Pham G.M. Jiang J. Douches D.S. Jansky S.H. et al.Genome sequence of M6, a diploid inbred clone of the high-glycoalkaloid-producing tuber-bearing potato species Solanum chacoense, reveals residual heterozygosity.Plant J. 2018; 94: 562-570Crossref PubMed Scopus (91) Google Scholar; Yang et al., 2020Yang X. Yang Y. Ling J. Guan J. Guo X. Dong D. Jin L. Huang S. Liu J. Li G. A high-throughput BAC end analysis protocol (BAC-anchor) for profiling genome assembly and physical mapping.Plant Biotechnol. J. 2020; 18: 364-372Crossref PubMed Scopus (3) Google Scholar; Zheng et al., 2020Zheng J. Yang Y. Guo X. Jin L. Xiong X. Yang X. Li G. Exogenous SA initiated defense response and multi-signaling pathway in tetraploid potato SD20.Horticultural Plant Journal. 2020; 6: 99-110Crossref Scopus (16) Google Scholar). The latest version of DM genome assembly (v6.1) (Pham et al., 2020Pham G.M. Hamilton J.P. Wood J.C. Burke J.T. Zhao H. Vaillancourt B. Ou S. Jiang J. Buell C.R. Construction of a chromosome-scale long-read reference genome assembly for potato.GigaScience. 2020; 9: giaa100-giaa111Crossref PubMed Scopus (96) Google Scholar) served as a good reference and quality control in studies of diploid and tetraploid potatoes (Zhou et al., 2020Zhou Q. Tang D. Huang W. Yang Z. Zhang Y. Hamilton J.P. Visser R.G.F. Bachem C.W.B. Robin Buell C. Zhang Z. et al.Haplotype-resolved genome analyses of a heterozygous diploid potato.Nat. Genet. 2020; 52: 1018-1023Crossref PubMed Scopus (96) Google Scholar; Bao et al., 2022Bao Z. Li C. Li G. Wang P. Peng Z. Cheng L. Li H. Zhang Z. Li Y. Huang W. et al.Genome architecture and tetrasomic inheritance of autotetraploid potato.Mol. Plant. 2022; 15: 1211-1226Abstract Full Text Full Text PDF PubMed Scopus (11) Google Scholar; Hoopes et al., 2022Hoopes G. Meng X. Hamilton J.P. Achakkagari S.R. de Alves Freitas Guesdes F. Bolger M.E. Coombs J.J. Esselink D. Kaiser N.R. Kodde L. et al.Phased, chromosome-scale genome assemblies of tetraploid potato reveal a complex genome, transcriptome, and predicted proteome landscape underpinning genetic diversity.Mol. Plant. 2022; 15: 520-536Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar; Sun et al., 2022Sun H. Jiao W.B. Krause K. Campoy J.A. Goel M. Folz-Donahue K. Kukat C. Huettel B. Schneeberger K. Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar.Nat. Genet. 2022; 54: 342-348Crossref PubMed Scopus (45) Google Scholar; Tang et al., 2022Tang D. Jia Y. Zhang J. Li H. Cheng L. Wang P. Bao Z. Liu Z. Feng S. Zhu X. et al.Genome evolution and diversity of wild and cultivated potatoes.Nature. 2022; 606: 535-541Crossref PubMed Scopus (62) Google Scholar). However, 161 gaps remain in DM6.1 (v6.1), and the centromere and telomere structures are incomplete. Considering the importance of the DM genome in potato genomics, genetics, and breeding studies, generating a complete genome assembly of DM is of great importance. In this study, a telomere-to-telomere gap-free genome of DM (DM8.1) (Figure 1A) was assembled through combining Oxford Nanopore Technologies (ONT) ultra-long reads sequencing (119.81× coverage) and Hi-C sequencing (130.57×) (Supplemental Table 1), as well as being assisted by multiple gap-closing strategies coupled with high fidelity (HIFI) reads from circular consensus sequencing. A total of 179 contigs with a summed size of 773.36 Mb and a contig N50 of 59.72 Mb were obtained after initial genome assembly, polishing, and decontamination. Hi-C reads further anchored 37 of the 179 contigs into 12 chromosomes (Supplemental Figure 1; Supplemental Table 2), accounting for 95.53% (738.82 Mb) of the total assembly, and we named it preDM8. For the 142 (34.53 Mb) unanchored contigs, over 98% are short sequences (<1 Mb), and all could be aligned to chromosomes with high similarity, indicating that these were repetitive or redundant sequences. The preDM8 has better contiguous sequences than DM6.1 and the potato pan-genome assemblies (Tang et al., 2022Tang D. Jia Y. Zhang J. Li H. Cheng L. Wang P. Bao Z. Liu Z. Feng S. Zhu X. et al.Genome evolution and diversity of wild and cultivated potatoes.Nature. 2022; 606: 535-541Crossref PubMed Scopus (62) Google Scholar) (Supplemental Figure 2). However, there were 25 gaps in preDM8. Three methods were further adopted to close these gaps (Supplemental Figure 3A; Supplemental Table 3). First, we aligned the ONT reads to preDM8, and reads mapped on the flanking regions of gaps were collected and assembled, which successfully closed 14 gaps. Second, based on the syntenic homologous fragments between preDM8 and DM6.1, three gaps were closed with the DM6.1 consecutive sequences that covered these gaps in preDM8. Third, target sequences amplification experiments (Supplemental Figure 3B) and HIFI sequencing were performed, which successfully closed the remaining eight gaps (Supplemental Figures 3C and 4). Finally, we generated the gap-free genome assembly of DM and named it DM8.1 (Figure 1A; Supplemental Table 4). To verify the quality of the gap-free genome, we investigated the reliability of these sequences in DM8.1 that corresponded to the 161 gaps in DM6.1. We randomly selected 50 of the 161 gaps and designed 100 pairs of primers (Supplemental Table 5) based on sequences on both sides of these closed gaps for PCR amplification (Supplemental Figure 5) and Sanger sequencing. Both the 5′ and 3′ boundary sequences of these gaps were successfully obtained, which indicated the high accuracy of DM8.1. Meanwhile, DM8.1 genome achieved a BUSCO value of 98.70%, an extremely high mapping rate (>99.90%) of both Illumina short reads and ONT long reads; a high consensus quality value (35.85) obtained by Merqury analysis; and improvement in long terminal repeat (LTR)-retrotransposon completeness (DM8.1: LAI = 12.92, LTR length = 388.58 Mb; DM6.1: LAI = 12.75, LTR length = 375.91 Mb), further supporting the high quality of DM8.1 (Supplemental Tables 6 and 7). A total of 40 155 protein-coding genes were predicted in DM8.1 (Supplemental Figure 6), among which 33 972 (84.60%) were functionally annotated and 24 362 genes were expressed, estimated by the 10 mRNA sequencing datasets. Further analysis found that there were 1117 genes in DM8.1 that were mis-annotated in DM6.1 in that one gene was incorrectly annotated as two. These errors were revealed by individual read pairs (mRNA sequencing) covering and linking two mis-annotated neighbor genes, suggesting that they were from a transcript of one gene (Supplemental Figure 7). Meanwhile, a total of 956 349 transposable elements (TEs) were predicted, accounting for 60.31% (465.81 Mb) of the DM8.1 genome (Supplemental Figure 8; Supplemental Table 8). Additionally, there were 4676 small RNAs predicted in DM8.1 (Supplemental Figure 9). All telomere regions were detected in DM8.1 using the seven-base telomeric repeat and sub-telomeric repeats of CL14 and CL34, and all centromere regions were identified using CENH3 (Figure 1A). Sequence composition analysis showed that the centromere regions contained more Gypsy-type LTRs (49.25%), while the telomere regions harbored more unknown TEs (Supplemental Figure 8). Additionally, the filled sequences in these 25 gaps showed similar TE contents to the centromere regions (Supplemental Figure 8). The complete genome assembly of DM8.1 facilitated the identification of large tandem gene clusters of functional importance. A total of 181 genes were identified in these newly assembled sequences, corresponding to the 161 gap regions in DM6.1. Of these 181 genes, three large clusters (>15 copies) of tandem duplicated genes were found, including 21 patatin genes (Figures 1B), 31 terpene synthase genes, and 18 cupin genes (Supplemental Figure 10). Among them, the 21 patatin genes showed much higher expression levels in tubers than in other organs of potato (Figure 1C). Intriguingly, patatin was found to be under absolute dosage selection, because it has been continuously expanded during the evolution, domestication, and breeding improvement of potato (Figures 1D–1E). In family Solanaceae, we found that patatin was only largely expanded in potato and a bit expanded in wolfberry (seven copies) while keeping three or fewer copies in others or was even completely lost in Physalis and tobacco (Figure 1E). Additionally, Etuberosum, which is a sister group of potato, has four and five copies of patatin in the two assembled Etuberosum genomes (Figure 1D). This indicates that expansion of patatin gene copies is associated with the speciation of potato, which may play an important role in the formation of enlarged tubers in potato. Furthermore, in the reported pan-genomes of tomato and potato (Tang et al., 2022Tang D. Jia Y. Zhang J. Li H. Cheng L. Wang P. Bao Z. Liu Z. Feng S. Zhu X. et al.Genome evolution and diversity of wild and cultivated potatoes.Nature. 2022; 606: 535-541Crossref PubMed Scopus (62) Google Scholar; Zhou et al., 2022Zhou Y. Zhang Z. Bao Z. Li H. Lyu Y. Zan Y. Wu Y. Cheng L. Fang Y. Wu K. et al.Graph pangenome captures missing heritability and empowers tomato breeding.Nature. 2022; 606: 527-534Crossref PubMed Scopus (60) Google Scholar), we found that the locus of patatin maintained only one or two gene copies in the tomato population but was expanded continuously and significantly in the potato population from the diploid wild potato, diploid S. candolleanum, to the diploid landraces of potato, with the average copy number growing from 5.9 and 7 to 14.6, respectively (Figure 1D), clearly indicating the expansion of patatin during the domestication of potato. Moreover, these expanded patatin genes were under strong positive selection (Ka/Ks > 1), especially in these domesticated potato genomes (Supplemental Figure 11), indicating the functional differentiation of patatin after gene copy expansion, which may associate with the development, production, and quality improvement of potato tubers. These findings together suggest that it is possible to breed potato cultivars of higher yields and quality through manipulating the absolute dosage, i.e., the gene copy number or the expression level, of patatin. There have been continuous efforts to improve the reference genome of DM, which is important for both scientific research and breeding programs of potato. In this study, we have generated the gap-free telomere-to-telomere genome assembly of DM8.1, which could serve as an important resource for future genomics and gene function studies in potato. This work was supported by the National Natural Science Foundation of China (32072119 and 31801421); the Breeding Program of Shandong Province, China (2020LZGC003); the National Agriculture Science and Technology Major Program, China (NK20220904); the China Agricultural Research System (CARS-9); the Central Public-interest Scientific Institution Basal Research Fund (Y2022PT23); and the Innovation Program of Chinese Academy of Agricultural Sciences (CAAS-ASTIP-IVFCAAS).
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
lyw完成签到 ,获得积分10
刚刚
领导范儿应助GREENP采纳,获得10
1秒前
科研通AI6.1应助唐帅采纳,获得10
2秒前
Kannan发布了新的文献求助10
5秒前
852应助zz采纳,获得10
6秒前
9秒前
momo完成签到,获得积分10
9秒前
小张发布了新的文献求助10
10秒前
10秒前
258552发布了新的文献求助10
11秒前
momo发布了新的文献求助10
13秒前
标致无心发布了新的文献求助10
14秒前
15秒前
zz完成签到,获得积分10
15秒前
17秒前
我来何忧发布了新的文献求助10
19秒前
标致无心完成签到,获得积分10
21秒前
22秒前
fjn完成签到,获得积分10
23秒前
蜡笔小鑫完成签到,获得积分10
24秒前
molihuakai应助优美的安梦采纳,获得10
25秒前
Akim应助souvenir采纳,获得10
28秒前
Lucas应助乐观青寒采纳,获得10
28秒前
28秒前
星辰大海应助pattrick采纳,获得10
31秒前
31秒前
32秒前
kmyang发布了新的文献求助10
32秒前
CodeCraft应助淡然冰之采纳,获得10
33秒前
rui发布了新的文献求助10
33秒前
35秒前
shu完成签到,获得积分10
36秒前
37秒前
汉堡包应助小阿博采纳,获得10
38秒前
猫儿发布了新的文献求助10
38秒前
38秒前
LUCHI应助柨瑶采纳,获得10
41秒前
优美的安梦完成签到,获得积分10
41秒前
42秒前
deamon21012发布了新的文献求助10
42秒前
高分求助中
Clinical Epidemiology: The Essentials, 6e 10000
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
The Graphene Handbook (2019 Edition) 800
Adhesion Science: Principles & Practice 800
Signals, Systems, and Signal Processing 610
Fundamentals of Pharmaceutical and Biologics Regulations: A Global Perspective, Second Edition 600
The Immune System (Fifth Edition) 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6568180
求助须知:如何正确求助?哪些是违规求助? 8347779
关于积分的说明 17885285
捐赠科研通 5695137
什么是DOI,文献DOI怎么找? 2944040
邀请新用户注册赠送积分活动 1919936
关于科研通互助平台的介绍 1795942