摘要
The Sodium Adsorption Ratio (SAR) is a widely used variable in water quality research, particularly in agriculture and environmental studies. In many cases, the key variables required for SAR calculation, namely Na+, Mg+2, and Ca+2, are not available. Consequently, the potential to calculate SAR using a limited number of water quality variables becomes critically important. The study implemented the Multilayer Perceptron Neural Network (MLPNN), Support Vector Regression (SVR), and K-Nearest Neighbors (KNN) models at level-0 for prediction purposes, along with the Boruta model for variable selection. A stacked ensemble learning model at level-1 enhanced the prediction accuracy. The discharge and water quality dataset from the Zarrin-Gol River in northern Iran was utilized to implement the modeling procedure. Results obtained from the variable selection process using the Boruta model revealed that using a limited number of water quality variables can effectively predict SAR even without the principal variables. Further investigation of the input combinations for the level-0 models demonstrated that, for the MLPNN, KNN, and SVR models, 4, 3, and 1 input variables, respectively, yielded optimal predictions. Among the level-0 models, the MLPNN model exhibited the highest accuracy, with RMSE = 0.54, MBE = 0.26, MAE = 0.44, R = 0.84, IA = 0.67, and KGE = 0.79. Implementing the stacked ensemble learning model at level-1 significantly improved the SAR prediction compared to the level-0 models. The ensemble-NN model yielded the best performance in estimating SAR within the range of recorded data, with RMSE = 0.53, MBE = 0.29, MAE = 0.41, R = 0.87, IA = 0.70, and KGE = 0.82. Residual analysis further confirmed the superior predictive capability of the level-1 models compared to the level-0 models. The generalized-logistic probability distribution function is used to estimate the extreme values data. The Ensemble-KNN model best predicted extreme values data, with RMSE = 0.69, MBE = −0.61, MAE = 0.61, R = 0.61, IA = 0.26, and KGE = 0.37. The findings underscore the substantial advancements achieved through stacked ensemble methods in enhancing the modeling of SAR across various aspects, including total data, extreme values, and models' residuals.