Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning

基因组 智人 秀丽隐杆线虫 计算生物学 背景(考古学) 生物 计算机科学 黑腹果蝇 基因组学 遗传学 基因 人类学 社会学 古生物学
作者
Leandro A. Bugnon,Cristian Yones,Diego H. Milone,Georgina Stegmayer
出处
期刊:Briefings in Bioinformatics [Oxford University Press]
卷期号:22 (3) 被引量:18
标识
DOI:10.1093/bib/bbaa184
摘要

The genome-wide discovery of microRNAs (miRNAs) involves identifying sequences having the highest chance of being a novel miRNA precursor (pre-miRNA), within all the possible sequences in a complete genome. The known pre-miRNAs are usually just a few in comparison to the millions of candidates that have to be analyzed. This is of particular interest in non-model species and recently sequenced genomes, where the challenge is to find potential pre-miRNAs only from the sequenced genome. The task is unfeasible without the help of computational methods, such as deep learning. However, it is still very difficult to find an accurate predictor, with a low false positive rate in this genome-wide context. Although there are many available tools, these have not been tested in realistic conditions, with sequences from whole genomes and the high class imbalance inherent to such data.In this work, we review six recent methods for tackling this problem with machine learning. We compare the models in five genome-wide datasets: Arabidopsis thaliana, Caenorhabditis elegans, Anopheles gambiae, Drosophila melanogaster, Homo sapiens. The models have been designed for the pre-miRNAs prediction task, where there is a class of interest that is significantly underrepresented (the known pre-miRNAs) with respect to a very large number of unlabeled samples. It was found that for the smaller genomes and smaller imbalances, all methods perform in a similar way. However, for larger datasets such as the H. sapiens genome, it was found that deep learning approaches using raw information from the sequences reached the best scores, achieving low numbers of false positives.The source code to reproduce these results is in: http://sourceforge.net/projects/sourcesinc/files/gwmirna Additionally, the datasets are freely available in: https://sourceforge.net/projects/sourcesinc/files/mirdata.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
lalala发布了新的文献求助10
刚刚
科研通AI2S应助星星2012采纳,获得10
刚刚
辣辣完成签到,获得积分10
1秒前
元谷雪发布了新的文献求助10
1秒前
自由青柏发布了新的文献求助10
2秒前
汤灿发布了新的文献求助30
3秒前
shun完成签到,获得积分10
3秒前
椋鸟应助培a采纳,获得10
4秒前
5秒前
5秒前
EvilS完成签到,获得积分10
5秒前
梁婧茵关注了科研通微信公众号
6秒前
我吃吃吃吃吃吃完成签到 ,获得积分10
7秒前
7秒前
HLT发布了新的文献求助10
8秒前
英姑应助坚强紫山采纳,获得10
9秒前
wwwwrrrrr发布了新的文献求助30
9秒前
9秒前
12秒前
Eri_SCI完成签到 ,获得积分10
12秒前
无敌娜完成签到,获得积分10
13秒前
ZYX发布了新的文献求助10
13秒前
孤独依波完成签到,获得积分10
13秒前
Q华完成签到,获得积分10
13秒前
lucky发布了新的文献求助10
15秒前
15秒前
海棠先雪完成签到,获得积分10
16秒前
17秒前
星星2012发布了新的文献求助10
20秒前
孙非完成签到,获得积分10
20秒前
20秒前
坚强紫山发布了新的文献求助10
20秒前
20秒前
21秒前
wxaaaa发布了新的文献求助10
21秒前
22秒前
22秒前
小汤完成签到 ,获得积分10
22秒前
22秒前
沈昊发布了新的文献求助10
22秒前
高分求助中
Encyclopedia of Mathematical Physics 2nd edition 888
Technologies supporting mass customization of apparel: A pilot project 600
Introduction to Strong Mixing Conditions Volumes 1-3 500
Pharmacological profile of sulodexide 400
Optical and electric properties of monocrystalline synthetic diamond irradiated by neutrons 320
共融服務學習指南 300
Essentials of Pharmacoeconomics: Health Economics and Outcomes Research 3rd Edition. by Karen Rascati 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3805070
求助须知:如何正确求助?哪些是违规求助? 3350197
关于积分的说明 10347558
捐赠科研通 3066017
什么是DOI,文献DOI怎么找? 1683448
邀请新用户注册赠送积分活动 809021
科研通“疑难数据库(出版商)”最低求助积分说明 765153