生物
霰弹枪测序
甘蓝
拟南芥
基因组
遗传学
基因
注释
计算生物学
猎枪
基因组计划
全基因组测序
植物
突变体
作者
Mulu Ayele,Brian J. Haas,Nikhil Kumar,Hank Wu,Yongli Xiao,Susan Van Aken,Teresa R. Utterback,Jennifer R. Wortman,Owen White,Christopher D. Town
出处
期刊:Genome Research
[Cold Spring Harbor Laboratory Press]
日期:2005-04-01
卷期号:15 (4): 487-495
被引量:74
摘要
Through comparative studies of the model organism Arabidopsis thaliana and its close relative Brassica oleracea , we have identified conserved regions that represent potentially functional sequences overlooked by previous Arabidopsis genome annotation methods. A total of 454,274 whole genome shotgun sequences covering 283 Mb (0.44×) of the estimated 650 Mb Brassica genome were searched against the Arabidopsis genome, and conserved Arabidopsis genome sequences (CAGSs) were identified. Of these 229,735 conserved regions, 167,357 fell within or intersected existing gene models, while 60,378 were located in previously unannotated regions. After removal of sequences matching known proteins, CAGSs that were close to one another were chained together as potentially comprising portions of the same functional unit. This resulted in 27,347 chains of which 15,686 were sufficiently distant from existing gene annotations to be considered a novel conserved unit. Of 192 conserved regions examined, 58 were found to be expressed in our cDNA populations. Rapid amplification of cDNA ends (RACE) was used to obtain potentially full-length transcripts from these 58 regions. The resulting sequences led to the creation of 21 gene models at 17 new Arabidopsis loci and the addition of splice variants or updates to another 19 gene structures. In addition, CAGSs overlapping already annotated genes in Arabidopsis can provide guidance for manual improvement of existing gene models. Published genome-wide expression data based on whole genome tiling arrays and massively parallel signature sequencing were overlaid on the Brassica – Arabidopsis conserved sequences, and 1399 regions of intersection were identified. Collectively our results and these data sets suggest that several thousand new Arabidopsis genes remain to be identified and annotated.
科研通智能强力驱动
Strongly Powered by AbleSci AI