Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning

基因组 智人 秀丽隐杆线虫 计算生物学 背景(考古学) 生物 计算机科学 黑腹果蝇 基因组学 遗传学 基因 人类学 社会学 古生物学
作者
Leandro A. Bugnon,Cristian Yones,Diego H. Milone,Georgina Stegmayer
出处
期刊:Briefings in Bioinformatics [Oxford University Press]
卷期号:22 (3) 被引量:18
标识
DOI:10.1093/bib/bbaa184
摘要

The genome-wide discovery of microRNAs (miRNAs) involves identifying sequences having the highest chance of being a novel miRNA precursor (pre-miRNA), within all the possible sequences in a complete genome. The known pre-miRNAs are usually just a few in comparison to the millions of candidates that have to be analyzed. This is of particular interest in non-model species and recently sequenced genomes, where the challenge is to find potential pre-miRNAs only from the sequenced genome. The task is unfeasible without the help of computational methods, such as deep learning. However, it is still very difficult to find an accurate predictor, with a low false positive rate in this genome-wide context. Although there are many available tools, these have not been tested in realistic conditions, with sequences from whole genomes and the high class imbalance inherent to such data.In this work, we review six recent methods for tackling this problem with machine learning. We compare the models in five genome-wide datasets: Arabidopsis thaliana, Caenorhabditis elegans, Anopheles gambiae, Drosophila melanogaster, Homo sapiens. The models have been designed for the pre-miRNAs prediction task, where there is a class of interest that is significantly underrepresented (the known pre-miRNAs) with respect to a very large number of unlabeled samples. It was found that for the smaller genomes and smaller imbalances, all methods perform in a similar way. However, for larger datasets such as the H. sapiens genome, it was found that deep learning approaches using raw information from the sequences reached the best scores, achieving low numbers of false positives.The source code to reproduce these results is in: http://sourceforge.net/projects/sourcesinc/files/gwmirna Additionally, the datasets are freely available in: https://sourceforge.net/projects/sourcesinc/files/mirdata.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
刚刚
ChenXinde发布了新的文献求助10
刚刚
xzh完成签到,获得积分10
刚刚
刚刚
悦耳念梦完成签到,获得积分10
刚刚
小李完成签到 ,获得积分10
刚刚
1秒前
sarah完成签到,获得积分10
1秒前
顾矜应助Egwei采纳,获得10
1秒前
无花果应助wei采纳,获得10
1秒前
HJJHJH发布了新的文献求助10
2秒前
suu完成签到,获得积分10
2秒前
王宇轩完成签到,获得积分20
2秒前
2秒前
彭于晏应助tyx采纳,获得10
3秒前
Gary发布了新的文献求助10
4秒前
得且发布了新的文献求助10
4秒前
001完成签到,获得积分10
4秒前
Tingting完成签到,获得积分10
5秒前
儒雅水杯完成签到,获得积分10
5秒前
洁净慕青发布了新的文献求助10
5秒前
自然的诗翠完成签到,获得积分10
5秒前
yang完成签到,获得积分10
5秒前
dd完成签到,获得积分10
6秒前
Feng5945发布了新的文献求助10
6秒前
小林完成签到,获得积分20
6秒前
6秒前
6秒前
7秒前
7秒前
转角湾XYZ发布了新的文献求助10
7秒前
善学以致用应助carly采纳,获得10
8秒前
8秒前
外向访卉完成签到,获得积分10
8秒前
大意的楼房完成签到,获得积分10
8秒前
8秒前
颜妍完成签到,获得积分20
8秒前
9秒前
典雅的夜安完成签到,获得积分10
9秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Modern Epidemiology, Fourth Edition 5000
Kinesiophobia : a new view of chronic pain behavior 5000
Molecular Biology of Cancer: Mechanisms, Targets, and Therapeutics 3000
Digital Twins of Advanced Materials Processing 2000
Propeller Design 2000
Weaponeering, Fourth Edition – Two Volume SET 2000
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 纳米技术 化学工程 生物化学 物理 计算机科学 内科学 复合材料 催化作用 物理化学 光电子学 电极 冶金 细胞生物学 基因
热门帖子
关注 科研通微信公众号,转发送积分 6013718
求助须知:如何正确求助?哪些是违规求助? 7585223
关于积分的说明 16143045
捐赠科研通 5161263
什么是DOI,文献DOI怎么找? 2763570
邀请新用户注册赠送积分活动 1743713
关于科研通互助平台的介绍 1634431