Identification of potential biomarkers on microarray data using distributed gene selection approach

判别式计算机科学特征选择朴素贝叶斯分类器支持向量机数据挖掘相互信息冗余（工程）决策树滤波器（信号处理）模式识别（心理学）贝叶斯定理人工智能基因选择机器学习微阵列分析技术基因贝叶斯概率基因表达操作系统生物化学化学计算机视觉

作者

Alok Kumar Shukla,Diwakar Tripathi

出处

期刊：Mathematical biosciences [Elsevier]
日期：2019-09-01 卷期号：315: 108230-108230 被引量：28

链接

nih.govdoi.org

标识

DOI：10.1016/j.mbs.2019.108230

摘要

In recent times, several feature selection (FS) methods have introduced to identify the biomarkers from gene expression datasets. It has gained extensive attention to solve cancer classification problem, but they have some limitations. First, the majority of FS approaches increases the computational cost due to the centralized data structure. Second, an irrelevant ranked gene that could perform well regarding classification accuracy with suitable subset of genes will be left out of the selection. To resolve these problems, we introduce a novel two-stage FS approach by combining Spearman's Correlation (SC) and distributed filter FS methods which can select the highly discriminative genes for distinguishing samples from high dimensional datasets. Concerning distributed FS, data is distributed by features according to vertical distribution and then performs a merging procedure which updates the feature subset along with improved classification accuracy. Moreover, it is used to quantify the relation between gene-gene and the gene-class and simultaneously detect subsets of essential genes. The proposed method is verified on six gene datasets with the help of four well-known classifiers namely, support vector machine, naïve Bayes, k-nearest neighbor, and decision tree. The performance of the proposed method is compared with traditional filter techniques such as Relief-F, Information gain, minimum redundancy maximum relevance, joint mutual information, Chi-square, and t-test. The experimental results demonstrate that the proposed method has significantly improved the performance regarding computational time and classification accuracy in comparison to standard algorithms when applied to the non-partitioned dataset.

求助该文献

最长约 10秒，即可获得该文献文件

Identification of potential biomarkers on microarray data using distributed gene selection approach

今日热心研友