过度拟合
Lasso(编程语言)
基因调控网络
特征选择
计算生物学
生物网络
计算机科学
基因
基因选择
选择(遗传算法)
交互网络
数据挖掘
机器学习
生物
人工神经网络
遗传学
基因表达
微阵列分析技术
万维网
作者
Heewon Park,Atushi Niida,Satoru Miyano,Seiya Imoto
标识
DOI:10.1089/cmb.2014.0197
摘要
Gene networks and graphs are crucial tools for understanding a heterogeneous system of cancer, since cancer is a disease that does not involve individual genes but combinations of genes associated with oncogenic process. A goal of genomic data analysis via gene networks is to identify both gene networks and individual genes within the selected networks. Existing methods, however, perform only network selection, and thus all genes in selected networks are included in models. This leads to overfitting when uncovering driver genes, and the results are not biologically interpretable. To accomplish both "groupwise sparsity" and "within group sparsity" for identifying driver genes based on biological knowledge (i.e., predefined overlapping groups of features), we propose a sparse overlapping group lasso via duplicated predictors in extended space. The proposed method effectively identifies driver genes and their interactions using known biological pathway information. Monte Carlo simulations and The Cancer Genome Atlas (TCGA) project data analysis indicate that the proposed method is effective for fitting a regression model (i.e., feature selection and prediction accuracy) constructed with duplicated predictors in overlapping groups. In the TCGA data analysis, we uncover potential cancer driver genes via expression modules and gene networks constructed by multi-omics data and identify that the uncovered genes have strong evidences as a cancer driver gene. The proposed method is a useful tool for identifying cancer driver genes and for integrative multi-omics analysis.
科研通智能强力驱动
Strongly Powered by AbleSci AI