特征选择
特征(语言学)
人工智能
计算机科学
集成学习
机器学习
马修斯相关系数
排名(信息检索)
堆积
算法
数据挖掘
模式识别(心理学)
支持向量机
化学
哲学
语言学
有机化学
作者
Jiahao Yu,Yongman Zhao,Rongshun Pan,Xue Zhou,Zikai Wei
出处
期刊:ACS omega
[American Chemical Society]
日期:2023-01-13
卷期号:8 (3): 3078-3090
被引量:10
标识
DOI:10.1021/acsomega.2c06324
摘要
The study of superconductors' critical temperature (Tc) has been a matter of interest. A method combining a two-layer feature selection (TL) and Optuna-Stacking ensemble learning model is proposed in the study for predicting Tc from physicochemical components. Since most machine-learning models require a large amount of prior knowledge to construct the feature vectors associated with Tc manually, they may contain redundant or invalid features that adversely affect the analysis and prediction of Tc. The TL model combines the advantages of filtered and packed feature selection. In the first layer, feature importance is ranked by "SHapley Additive explain (SHAP)" in combination with CatBoost, followed by maximum mutual information coefficient (MIC) and distance correlation coefficient (DCC) for initial feature selection in terms of feature importance ranking. The second layer uses a cross-validation-based genetic algorithm (cv-GA) to eliminate the remaining redundant/invalid features. The selected features are fed into the Stacking integrated learning model to achieve prediction of Tc, and the multidimensional hyperparametric optimization of the metamodel is achieved by Optuna, an improved Bayesian hyperparametric optimization framework based on the Tree-structured Parzen Estimator (TPE) and pruning strategy. The model has obvious advantages and generality in terms of prediction performance and feature reduction rate, and it also proves to be suitable for high-temperature superconductor Tc prediction. It provides an efficient and cost-effective method for data-driven superconductor research.
科研通智能强力驱动
Strongly Powered by AbleSci AI