SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing

生物 DNA测序 深度测序 外显子组测序 外显子组 计算生物学 大规模并行测序 一致性 基因组 遗传学 计算机科学 突变 基因
作者
Jean-François Spinella,Pamela Mehanna,Ramón Vidal,Virginie Saillour,Pauline Cassart,Chantal Richer,Manon Ouimet,Jasmine Healy,Daniel Sinnett
出处
期刊:BMC Genomics [BioMed Central]
卷期号:17 (1) 被引量:65
标识
DOI:10.1186/s12864-016-3281-2
摘要

Next-generation sequencing (NGS) allows unbiased, in-depth interrogation of cancer genomes. Many somatic variant callers have been developed yet accurate ascertainment of somatic variants remains a considerable challenge as evidenced by the varying mutation call rates and low concordance among callers. Statistical model-based algorithms that are currently available perform well under ideal scenarios, such as high sequencing depth, homogeneous tumor samples, high somatic variant allele frequency (VAF), but show limited performance with sub-optimal data such as low-pass whole-exome/genome sequencing data. While the goal of any cancer sequencing project is to identify a relevant, and limited, set of somatic variants for further sequence/functional validation, the inherently complex nature of cancer genomes combined with technical issues directly related to sequencing and alignment can affect either the specificity and/or sensitivity of most callers. For these reasons, we developed SNooPer, a versatile machine learning approach that uses Random Forest classification models to accurately call somatic variants in low-depth sequencing data. SNooPer uses a subset of variant positions from the sequencing output for which the class, true variation or sequencing error, is known to train the data-specific model. Here, using a real dataset of 40 childhood acute lymphoblastic leukemia patients, we show how the SNooPer algorithm is not affected by low coverage or low VAFs, and can be used to reduce overall sequencing costs while maintaining high specificity and sensitivity to somatic variant calling. When compared to three benchmarked somatic callers, SNooPer demonstrated the best overall performance. While the goal of any cancer sequencing project is to identify a relevant, and limited, set of somatic variants for further sequence/functional validation, the inherently complex nature of cancer genomes combined with technical issues directly related to sequencing and alignment can affect either the specificity and/or sensitivity of most callers. The flexibility of SNooPer's random forest protects against technical bias and systematic errors, and is appealing in that it does not rely on user-defined parameters. The code and user guide can be downloaded at https://sourceforge.net/projects/snooper/ .

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
百里健柏发布了新的文献求助10
1秒前
小透明发布了新的文献求助20
1秒前
小二郎应助海洋球采纳,获得10
2秒前
开朗的紫萱完成签到,获得积分20
2秒前
2秒前
3秒前
小彩虹发布了新的文献求助10
3秒前
完美世界应助嗯啊采纳,获得10
4秒前
方圆几里发布了新的文献求助10
5秒前
打打应助认真的从凝采纳,获得10
6秒前
英俊的铭应助积极卡罗采纳,获得10
6秒前
CipherSage应助朴素的迎波采纳,获得10
6秒前
科研通AI6.2应助胡俊豪采纳,获得10
7秒前
3152发布了新的文献求助10
7秒前
映城发布了新的文献求助10
7秒前
Always完成签到,获得积分10
8秒前
抹茶夏天完成签到,获得积分10
8秒前
土豪的严青完成签到,获得积分10
8秒前
cdercder应助曾经如是采纳,获得10
8秒前
9秒前
爱听歌绿海完成签到,获得积分10
9秒前
温柔的尔芙完成签到,获得积分10
9秒前
9秒前
9秒前
斯文败类应助杜晓倩采纳,获得10
10秒前
10秒前
年过半摆应助小阳春采纳,获得40
10秒前
英俊的铭应助彩色一手采纳,获得10
10秒前
seaqiong完成签到,获得积分10
10秒前
11秒前
12秒前
852应助123采纳,获得10
12秒前
wxs完成签到,获得积分10
13秒前
西门访天应助雪山飞龙采纳,获得10
13秒前
14秒前
14秒前
14秒前
14秒前
Hello应助小菜采纳,获得10
14秒前
方圆几里完成签到,获得积分10
14秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Adhesion Science: Principles & Practice 800
The Graphene Handbook (2019 Edition) 700
Signals, Systems, and Signal Processing 610
IEST-RP-CC018: Cleanroom Cleaning and Sanitization: Operating and Monitoring Procedures 600
Fundamentals of Pharmaceutical and Biologics Regulations: A Global Perspective, Second Edition 600
Fundamentals of Modern Mathematics: A Practical Review (Dover Books on Mathematics) 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6533166
求助须知:如何正确求助?哪些是违规求助? 8326250
关于积分的说明 17832837
捐赠科研通 5634468
什么是DOI,文献DOI怎么找? 2933747
邀请新用户注册赠送积分活动 1910109
关于科研通互助平台的介绍 1768920