CGPS: A machine learning-based approach integrating multiple gene set analysis tools for better prioritization of biologically relevant pathways

集成学习 分类 集合(抽象数据类型) 机器学习 计算生物学 优先次序 人工智能 计算机科学 数据挖掘 生物信息学 生物 工程类 程序设计语言 管理科学
作者
Ai Chen,Lei Kong
出处
期刊:Journal of Genetics and Genomics [Elsevier BV]
卷期号:45 (9): 489-504 被引量:88
标识
DOI:10.1016/j.jgg.2018.08.002
摘要

Gene set enrichment (GSE) analyses play an important role in the interpretation of large-scale transcriptome datasets. Multiple GSE tools can be integrated into a single method as obtaining optimal results is challenging due to the plethora of GSE tools and their discrepant performances. Several existing ensemble methods lead to different scores in sorting pathways as integrated results; furthermore, it is difficult for users to choose a single ensemble score to obtain optimal final results. Here, we develop an ensemble method using a machine learning approach called Combined Gene set analysis incorporating Prioritization and Sensitivity (CGPS) that integrates the results provided by nine prominent GSE tools into a single ensemble score (R score) to sort pathways as integrated results. Moreover, to the best of our knowledge, CGPS is the first GSE ensemble method built based on a priori knowledge of pathways and phenotypes. Compared with 10 widely used individual methods and five types of ensemble scores from two ensemble methods, we demonstrate that sorting pathways based on the R score can better prioritize relevant pathways, as established by an evaluation of 120 simulated datasets and 45 real datasets. Additionally, CGPS is applied to expression data involving the drug panobinostat, which is an anticancer treatment against multiple myeloma. The results identify cell processes associated with cancer, such as the p53 signaling pathway (hsa04115); by contrast, according to two ensemble methods (EnrichmentBrowser and EGSEA), this pathway has a rank higher than 20, which may cause users to miss the pathway in their analyses. We show that this method, which is based on a priori knowledge, can capture valuable biological information from numerous types of gene set collections, such as KEGG pathways, GO terms, Reactome, and BioCarta. CGPS is publicly available as a standalone source code at ftp://ftp.cbi.pku.edu.cn/pub/CGPS_download/cgps-1.0.0.tar.gz.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
西子发布了新的文献求助10
1秒前
Doctor.Xie完成签到,获得积分10
2秒前
3秒前
皮崇知发布了新的文献求助10
4秒前
6秒前
ddd完成签到 ,获得积分10
7秒前
上官若男应助壳儿小小采纳,获得10
7秒前
8秒前
ZJQ完成签到,获得积分10
9秒前
12秒前
SYLH应助西子采纳,获得10
13秒前
顾思凡完成签到,获得积分20
14秒前
大模型应助学术小垃圾采纳,获得10
14秒前
江苗苗完成签到,获得积分10
15秒前
yyj完成签到,获得积分10
15秒前
zhanlang发布了新的文献求助10
16秒前
帮主哥哥应助jjjjjjjjjjjjjy采纳,获得30
17秒前
18秒前
学术蝗虫发布了新的文献求助10
18秒前
18秒前
18秒前
wanci应助Huaiman采纳,获得10
19秒前
20秒前
研玲完成签到,获得积分20
20秒前
三年时光机完成签到,获得积分10
20秒前
思源应助LEE采纳,获得10
21秒前
研玲发布了新的文献求助20
23秒前
23秒前
23秒前
斯文败类应助qinghuixinyi采纳,获得10
23秒前
幸运的羊完成签到,获得积分10
24秒前
24秒前
24秒前
木木发布了新的文献求助10
25秒前
26秒前
ll完成签到,获得积分10
26秒前
深情安青应助小鱼干采纳,获得10
26秒前
虫虫发布了新的文献求助30
27秒前
阳光向秋完成签到,获得积分10
28秒前
MQRR发布了新的文献求助10
29秒前
高分求助中
Mass producing individuality 600
非光滑分析与控制理论 500
Разработка метода ускоренного контроля качества электрохромных устройств 500
A Combined Chronic Toxicity and Carcinogenicity Study of ε-Polylysine in the Rat 400
Advances in Underwater Acoustics, Structural Acoustics, and Computational Methodologies 300
TM 5-855-1(Fundamentals of protective design for conventional weapons) 200
Between east and west transposition of cultural systems and military technology of fortified landscapes 200
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3826191
求助须知:如何正确求助?哪些是违规求助? 3368614
关于积分的说明 10451355
捐赠科研通 3087956
什么是DOI,文献DOI怎么找? 1698907
邀请新用户注册赠送积分活动 817190
科研通“疑难数据库(出版商)”最低求助积分说明 770065