判别式
计算机科学
特征选择
朴素贝叶斯分类器
支持向量机
数据挖掘
相互信息
冗余(工程)
决策树
滤波器(信号处理)
模式识别(心理学)
贝叶斯定理
人工智能
基因选择
机器学习
微阵列分析技术
基因
贝叶斯概率
基因表达
操作系统
生物化学
化学
计算机视觉
作者
Alok Kumar Shukla,Diwakar Tripathi
标识
DOI:10.1016/j.mbs.2019.108230
摘要
In recent times, several feature selection (FS) methods have introduced to identify the biomarkers from gene expression datasets. It has gained extensive attention to solve cancer classification problem, but they have some limitations. First, the majority of FS approaches increases the computational cost due to the centralized data structure. Second, an irrelevant ranked gene that could perform well regarding classification accuracy with suitable subset of genes will be left out of the selection. To resolve these problems, we introduce a novel two-stage FS approach by combining Spearman's Correlation (SC) and distributed filter FS methods which can select the highly discriminative genes for distinguishing samples from high dimensional datasets. Concerning distributed FS, data is distributed by features according to vertical distribution and then performs a merging procedure which updates the feature subset along with improved classification accuracy. Moreover, it is used to quantify the relation between gene-gene and the gene-class and simultaneously detect subsets of essential genes. The proposed method is verified on six gene datasets with the help of four well-known classifiers namely, support vector machine, naïve Bayes, k-nearest neighbor, and decision tree. The performance of the proposed method is compared with traditional filter techniques such as Relief-F, Information gain, minimum redundancy maximum relevance, joint mutual information, Chi-square, and t-test. The experimental results demonstrate that the proposed method has significantly improved the performance regarding computational time and classification accuracy in comparison to standard algorithms when applied to the non-partitioned dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI