Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning

基因组 智人 秀丽隐杆线虫 计算生物学 背景(考古学) 生物 计算机科学 黑腹果蝇 基因组学 遗传学 基因 人类学 社会学 古生物学
作者
Leandro A. Bugnon,Cristian Yones,Diego H. Milone,Georgina Stegmayer
出处
期刊:Briefings in Bioinformatics [Oxford University Press]
卷期号:22 (3) 被引量:18
标识
DOI:10.1093/bib/bbaa184
摘要

The genome-wide discovery of microRNAs (miRNAs) involves identifying sequences having the highest chance of being a novel miRNA precursor (pre-miRNA), within all the possible sequences in a complete genome. The known pre-miRNAs are usually just a few in comparison to the millions of candidates that have to be analyzed. This is of particular interest in non-model species and recently sequenced genomes, where the challenge is to find potential pre-miRNAs only from the sequenced genome. The task is unfeasible without the help of computational methods, such as deep learning. However, it is still very difficult to find an accurate predictor, with a low false positive rate in this genome-wide context. Although there are many available tools, these have not been tested in realistic conditions, with sequences from whole genomes and the high class imbalance inherent to such data.In this work, we review six recent methods for tackling this problem with machine learning. We compare the models in five genome-wide datasets: Arabidopsis thaliana, Caenorhabditis elegans, Anopheles gambiae, Drosophila melanogaster, Homo sapiens. The models have been designed for the pre-miRNAs prediction task, where there is a class of interest that is significantly underrepresented (the known pre-miRNAs) with respect to a very large number of unlabeled samples. It was found that for the smaller genomes and smaller imbalances, all methods perform in a similar way. However, for larger datasets such as the H. sapiens genome, it was found that deep learning approaches using raw information from the sequences reached the best scores, achieving low numbers of false positives.The source code to reproduce these results is in: http://sourceforge.net/projects/sourcesinc/files/gwmirna Additionally, the datasets are freely available in: https://sourceforge.net/projects/sourcesinc/files/mirdata.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
1秒前
研友_VZG7GZ应助杨晓白采纳,获得10
1秒前
hihihihi完成签到,获得积分10
2秒前
科研通AI6.3应助stt采纳,获得10
2秒前
小米发布了新的文献求助10
2秒前
2秒前
迅速发财应助YunjiangZhang采纳,获得10
2秒前
WEAWEA应助畅快青荷采纳,获得10
3秒前
结实晓蕾应助YunjiangZhang采纳,获得10
3秒前
重要小懒虫应助Ck采纳,获得10
3秒前
结实晓蕾应助YunjiangZhang采纳,获得10
3秒前
斯文败类应助冬云雀采纳,获得10
3秒前
周不是舟应助YunjiangZhang采纳,获得10
3秒前
3秒前
3秒前
3秒前
391X小king发布了新的文献求助10
3秒前
万能图书馆应助YUZU采纳,获得10
3秒前
结实晓蕾应助YunjiangZhang采纳,获得10
3秒前
renrunxue应助凯撒00采纳,获得10
3秒前
顺顺顺发布了新的文献求助10
3秒前
tjcu发布了新的文献求助10
4秒前
qqkingdom完成签到,获得积分10
4秒前
panpanpanda完成签到 ,获得积分10
4秒前
4秒前
jimmyk完成签到,获得积分20
4秒前
千空应助wyzen采纳,获得10
4秒前
樊芙宾完成签到,获得积分10
5秒前
犹豫的行恶完成签到,获得积分10
5秒前
5秒前
5秒前
巴啦啦羊完成签到,获得积分10
6秒前
共享精神应助来都来了采纳,获得10
6秒前
Rosemary关注了科研通微信公众号
6秒前
qczhang完成签到,获得积分10
6秒前
6秒前
微微发布了新的文献求助10
6秒前
肖子瑶完成签到 ,获得积分10
6秒前
哈牛完成签到 ,获得积分10
6秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Modern Epidemiology, Fourth Edition 5000
Kinesiophobia : a new view of chronic pain behavior 5000
Molecular Biology of Cancer: Mechanisms, Targets, and Therapeutics 3000
Digital Twins of Advanced Materials Processing 2000
Propeller Design 2000
Weaponeering, Fourth Edition – Two Volume SET 2000
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 纳米技术 化学工程 生物化学 物理 计算机科学 内科学 复合材料 催化作用 物理化学 光电子学 电极 冶金 细胞生物学 基因
热门帖子
关注 科研通微信公众号,转发送积分 6014232
求助须知:如何正确求助?哪些是违规求助? 7587381
关于积分的说明 16144986
捐赠科研通 5161777
什么是DOI,文献DOI怎么找? 2763789
邀请新用户注册赠送积分活动 1744069
关于科研通互助平台的介绍 1634515