Groundwater Fluoride Prediction for Sustainable Water Management: A Comparative Evaluation of Machine Learning Approaches Enhanced by Satellite Embeddings
Groundwater fluoride contamination poses a significant threat to sustainable water resources and public health, yet conventional water quality analysis is both time-consuming and costly, making large-scale, sustainable monitoring challenging. Machine learning methods offer a promising, cost-effective, and sustainable alternative for assessing the spatial distribution of fluoride. This study aimed to develop and compare the performance of Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN) models for predicting groundwater fluoride contamination in the Datong Basin with the help of satellite embeddings from the AlphaEarth Foundation. Data from 391 groundwater sampling points were utilized, with the dataset partitioned into training (80%) and testing (20%) sets. The ANOVA F-value of each feature was calculated for feature selection, identifying surface elevation, pollution, population, evaporation, vertical distance to the rivers, distance to the Sanggan river, and nine extra bands from the satellite embeddings as the most relevant input variables. Model performance was evaluated using the confusion matrix and the area under the receiver operating characteristic curve (ROC-AUC). The results showed that the SVM model demonstrated the highest ROC-AUC (0.82), outperforming the RF (0.80) and MLP (0.77) models. The introduction of satellite embeddings improved the performance of all three models significantly, with the prediction errors decreasing by 13.8% to 23.3%. The SVM model enhanced by satellite embeddings proved to be a robust and reliable tool for predicting groundwater fluoride contamination, highlighting its potential for use in sustainable groundwater management.