随机森林
特征选择
支持向量机
人工智能
统计分类
计算机科学
机器学习
前列腺癌
特征(语言学)
选择(遗传算法)
癌症
模式识别(心理学)
生物
遗传学
语言学
哲学
作者
P. Swathypriyadharsini,P R Rupashini,K. Premalatha
标识
DOI:10.1088/2057-1976/adcf2b
摘要
Abstract Microarray technology has transformed the biotechnological research to next level in the recent years. It provides the expression levels of various genes involved in a particular disease. Prostate cancer disease turned into life threatening cancer. The genes causing this disease are identified through the classification methods. These gene expression data have problems like high dimensional with low sample size which imposes active challenges in the existing classification algorithms. Feature selection techniques are applied in order to address the dimensionality issues. This paper aims in analyzing the feature selection methods for classification of gene expression data of Prostate cancer and identifies the significant genes that cause the disease. The three different feature selection methods such as Filters, wrappers and embedded selectors are applied before the classification process for selecting the top ranked genes. Then, the extracted top ranked genes are applied on the classification algorithms such as SVM, k-NN, Random Forest and Artificial Neural Network. After the inclusion of feature selection technique, the classification accuracy is significantly boosted even with less number of genes. Random Forest classification algorithm outperforms other classification methods. The significant genes that has the major influence in prostate cancer disease are identified such as KLK3, GFI1, CXCR2 and TNFRSF10C.
科研通智能强力驱动
Strongly Powered by AbleSci AI