A Vision of How Low-Coverage Sequence Data Should Contribute to Genetic Evaluation in the Future.

序列(生物学) 计算生物学 计算机科学 生物 遗传学
作者
R. M. Thallman,J. E. Borgert,Bailey N. Engle,J. W. Keele,W. M. Snelling,Cedric Gondro,L. A. Kuehn
出处
期刊:PubMed
标识
DOI:10.1093/jas/skaf294
摘要

Low-coverage sequencing refers to sequencing DNA of individuals to a low depth of coverage (e.g., 0.5X) and imputing that sequence to genomic sequence based on reference haplotypes from individuals sequenced to high depth of coverage (e.g., ≥ 10X). It has been proposed as an alternative to genotyping by SNP arrays. At least one commercial product based on it is available for agricultural species. Concerns limiting adoption in its current form are: 1) the cost of storing the huge volume of data it generates and 2) whether that additional data will result in improved accuracy of genetic evaluation. This work envisions future implementation of low-coverage sequencing to reduce storage costs and enhance genetic evaluations by leveraging the additional information in the full sequence of the pangenome to account for more genetic variation. We propose addressing the storage issue by representing genomic sequence of an individual in a pair of haplotype arrays with each element pointing to an enumerated haplotype of the sequence within one of approximately 50,000 defined genome segments. Assuming 60 million genomic variants, the infrastructure required to translate the identifier of any enumerated haplotype into its genomic sequence would require less than 10 gigabytes of binary storage. Each haplotype array element would require 2 bytes, so the marginal binary storage required to represent the genomic sequence of an individual would be about 200 kilobytes (KB), similar to the genotypes from a SNP array with 200,000 markers. This assumes no pedigree and no ambiguity of the imputation, though the latter is unrealistic. Strategies to minimize, and when necessary, to manage and efficiently represent ambiguity are proposed. The genomic sequence of an individual could be stored in about 1 KB (binary) if both parents have unambiguous sequence stored as described above. The proposed system for representing the pangenome includes algorithms for read mapping and imputation intended to leverage all known genetic variation in the target population. It is also designed to use sequencing reads generated for imputing genomic sequence of new individuals to identify unrecognized mutations, crossovers, and structural variants, thus continuously improving the genome representation, especially if widespread use of low-coverage sequencing in livestock industries is realized. This could make improved genetic merit and management of livestock feasible without computational burden.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
胡燕完成签到 ,获得积分10
刚刚
量子星尘发布了新的文献求助10
2秒前
阳光访波发布了新的文献求助10
2秒前
輝23发布了新的文献求助10
2秒前
优美的莹芝完成签到,获得积分10
4秒前
Hello应助花花123采纳,获得10
5秒前
6秒前
李李李发布了新的文献求助10
6秒前
NexusExplorer应助张大快乐采纳,获得10
7秒前
9秒前
10秒前
酷波er应助SN采纳,获得10
11秒前
11秒前
王晴发布了新的文献求助10
11秒前
12秒前
酷炫完成签到,获得积分10
15秒前
刘机智发布了新的文献求助10
15秒前
17秒前
17秒前
小莹完成签到,获得积分10
19秒前
19秒前
小长夜完成签到,获得积分10
19秒前
科研通AI6应助輝23采纳,获得10
20秒前
科研通AI5应助彩色一兰采纳,获得10
20秒前
CipherSage应助11采纳,获得10
20秒前
爆螺钉发布了新的文献求助10
20秒前
量子星尘发布了新的文献求助10
23秒前
SN发布了新的文献求助10
24秒前
25秒前
25秒前
七个小矮人完成签到,获得积分10
26秒前
香蕉觅云应助李李李采纳,获得10
27秒前
28秒前
effort完成签到,获得积分10
29秒前
爆螺钉完成签到,获得积分10
31秒前
32秒前
ZBY完成签到,获得积分20
33秒前
彩色一兰发布了新的文献求助10
33秒前
c123完成签到 ,获得积分10
34秒前
獭獭完成签到,获得积分10
35秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Разработка технологических основ обеспечения качества сборки высокоточных узлов газотурбинных двигателей,2000 1000
Vertebrate Palaeontology, 5th Edition 500
ISO/IEC 24760-1:2025 Information security, cybersecurity and privacy protection — A framework for identity management 500
碳捕捉技术能效评价方法 500
Optimization and Learning via Stochastic Gradient Search 500
Nuclear Fuel Behaviour under RIA Conditions 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 内科学 生物化学 物理 计算机科学 纳米技术 遗传学 基因 复合材料 化学工程 物理化学 病理 催化作用 免疫学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 4698752
求助须知:如何正确求助?哪些是违规求助? 4067820
关于积分的说明 12576514
捐赠科研通 3767364
什么是DOI,文献DOI怎么找? 2080626
邀请新用户注册赠送积分活动 1108593
科研通“疑难数据库(出版商)”最低求助积分说明 986889