特征选择
随机森林
机器学习
人工智能
支持向量机
计算机科学
聚类分析
逻辑回归
Boosting(机器学习)
梯度升压
乳腺癌
Lasso(编程语言)
弹性网正则化
特征(语言学)
癌症
医学
内科学
语言学
万维网
哲学
作者
Erio Yoshino,Budi Juarto,Felix Indra Kurniadi
标识
DOI:10.1109/isemantic59612.2023.10295363
摘要
Breast cancer is one of the major causes of fatalities in women across the globe, necessitating early diagnosis and detection for successful treatment. Scientists have pinpointed numerous risk factors for breast cancer such as obesity and sex hormones. Machine learning prediction systems have employed diverse data types, like genetic profiles, clinical data, and X-ray images, to foresee the risk of breast cancer. To eliminate any unnecessary or superfluous attributes from high-dimensional datasets, algorithms for feature selection have been applied. This research proposes a composite machine learning model integrating five models: Gradient Boosting (GB), Random Forest (RF), Logistic Regression with Lasso Regularization, Logistic Regression with Ridge Regression, and Support Vector Machine (SVM). Feature selection methodologies employed are based on the K-Means clustering oriented feature importance measure. The Coimbra breast cancer dataset, composed of 116 data points and 10 characteristics, is used for training and assessing the models on their predictive abilities. According to the study's outcomes, the Support Vector Machine model demonstrates the greatest effectiveness for the dataset when used with the proposed feature selection, whereas Gradient Boosting and Random Forest are efficacious models without feature selection.
科研通智能强力驱动
Strongly Powered by AbleSci AI