纳滤
反渗透
膜
渗透
计算机科学
化学
人工智能
生化工程
机器学习
可靠性(半导体)
实验数据
工艺工程
工程类
数学
统计
物理
量子力学
生物化学
功率(物理)
作者
Nohyeong Jeong,Tai-Heng Chung,Tiezheng Tong
标识
DOI:10.1021/acs.est.1c04041
摘要
Predictive models for micropollutant removal by membrane separation are highly desirable for the design and selection of appropriate membranes. While machine learning (ML) models have been applied for such purposes, their reliability might be compromised by data leakage due to inappropriate data splitting. More importantly, whether ML models can truly understand the mechanisms of membrane separation has not been revealed. In this study, we evaluate the capability of the XGBoost model to predict micropollutant removal efficiencies of reverse osmosis and nanofiltration membranes. Our results demonstrate that data leakage leads to falsely high prediction accuracy. By utilizing a model interpretation method based on the cooperative game theory, we test the knowledge of XGBoost on the mechanisms of membrane separation via quantifying the contributions of input variables to the model predictions. We reveal that XGBoost possesses an adequate understanding of size exclusion, but its knowledge of electrostatic interactions and adsorption is limited. Our findings suggest that future work should focus more on avoiding data leakage and evaluating the mechanistic knowledge of ML models. In addition, high-quality data from more diverse experimental conditions, as well as more informative variables, are needed to improve the accuracy of ML models for predicting membrane performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI