Machine Learning Potential Model Based on Ensemble Bispectrum Feature Selection and Its Applicability Analysis

双谱超参数计算机科学人工智能特征选择特征（语言学）计算复杂性理论机器学习集成学习降维模式识别（心理学）算法语言学电信哲学光谱密度

作者

Jiawei Jiang,Li-Chun Xu,Fenglian Li,Jian-Li Shao

出处

期刊：Metals [MDPI AG]
日期：2023-01-13 卷期号：13 (1): 169-169 被引量：6

链接

mdpi.com mdpi.com doaj.orgdoi.org

标识

DOI：10.3390/met13010169

摘要

With the continuous improvement of machine learning methods, building the interatomic machine learning potential (MLP) based on the datasets from quantum mechanics calculations has become an effective technical approach to improving the accuracy of classical molecular dynamics simulation. The Spectral Neighbor Analysis Potential (SNAP) is one of the most commonly used machine learning potentials. It uses the bispectrum to encode the local environment of each atom in the lattice. The hyperparameter jmax controls the mapping complexity and precision between the local environment and the bispectrum descriptor. As the hyperparameter jmax increases, the description will become more accurate, but the number of parameters in the bispectrum descriptor will increase dramatically, increasing the computational complexity. In order to reduce the computational complexity without losing the computational accuracy, this paper proposes a two-level ensemble feature selection method (EFS) for a bispectrum descriptor, combining the perturbation method and the feature selector ensemble strategy. Based on the proposed method, the feature subset is selected from the original dataset of the bispectrum descriptor for building the dimension-reduced MLP. As a method application and validation, the data of Fe, Ni, Cu, Li, Mo, Si, and Ge metal elements are used to train the linear regression model based on SNAP for predicting these metals’ atomic energies and forces them to evaluate the performance of the feature subsets. The experimental results show that, compared to the features of SNAP and qSNAP, the training complexity improvement of our EFS method on the qSNAP feature is more effective than SNAP. Compared with the existing methods, when the feature subset size is 0.7 times that of the original features, the proposed EFS method based on the SSWRP ensemble strategy can achieve the best performance in terms of stability, achieving an average stability of 0.94 across all datasets. The training complexity of the linear regression model is reduced by about half, and the prediction complexity is reduced by about 30%.

求助该文献

Machine Learning Potential Model Based on Ensemble Bispectrum Feature Selection and Its Applicability Analysis

今日热心研友