计算机科学
机器学习
特征选择
癌症
一般化
集成学习
特征(语言学)
人工智能
选择(遗传算法)
癌症存活率
计算智能
理论(学习稳定性)
癌症分期
数据挖掘
医学
数学
内科学
数学分析
哲学
语言学
标识
DOI:10.1007/s40747-022-00791-w
摘要
Abstract Cancer survival prediction is one of the three major tasks of cancer prognosis. To improve the accuracy of cancer survival prediction, in this paper, we propose a priori knowledge- and stability-based feature selection (PKSFS) method and develop a novel two-stage heterogeneous stacked ensemble learning model (BQAXR) to predict the survival status of cancer patients. Specifically, PKSFS first obtains the optimal feature subsets from the high-dimensional cancer datasets to guide the subsequent model construction. Then, BQAXR seeks to generate five high-quality heterogeneous learners, among which the shortcomings of the learners are overcome by using improved methods, and integrate them in two stages through the stacked generalization strategy based on optimal feature subsets. To verify the merits of PKSFS and BQAXR, this paper collected the real survival datasets of gastric cancer and skin cancer from the Surveillance, Epidemiology, and End Results (SEER) database of the National Cancer Institute, and conducted extensive numerical experiments from different perspectives based on these two datasets. The accuracy and AUC of the proposed method are 0.8209 and 0.8203 in the gastric cancer dataset, and 0.8336 and 0.8214 in the skin cancer dataset. The results show that PKSFS has marked advantages over popular feature selection methods in processing high-dimensional datasets. By taking full advantage of heterogeneous high-quality learners, BQAXR is not only superior to mainstream machine learning methods, but also outperforms improved machine learning methods, which indicates can effectively improve the accuracy of cancer survival prediction and provide a reference for doctors to make medical decisions.
科研通智能强力驱动
Strongly Powered by AbleSci AI