化学空间
计算机科学
生化工程
环境修复
扩散
生物系统
分子动力学
采样(信号处理)
机器学习
主动学习(机器学习)
环境科学
化学毒性
化学动力学
财产(哲学)
重要性抽样
人工智能
分子描述符
适用范围
化学
污染
计算模型
统计学习
系统动力学
作者
Archana Jagadisan,Hakim Boukhalfa,Mohamed Mehana,Archana Jagadisan,Hakim Boukhalfa,Mohamed Mehana
标识
DOI:10.1021/acs.est.5c08559
摘要
Per- and polyfluoroalkyl substances (PFAS) are over 14 000 synthetic compounds with exceptional environmental persistence. Used extensively in industrial and consumer applications, PFAS resist degradation and accumulate in environmental media and living organisms, causing serious health risks, including cancer, liver damage, and immune dysfunction. This persistence and toxicity create urgent needs for environmental fate assessment. Predicting PFAS environmental transport remains challenging due to the lack of reliable diffusion coefficient data, critical for modeling contaminant mobility and designing remediation strategies. Experimental measurements are time-consuming and expensive, while fully computational approaches are infeasible due to chemical space scale. We developed an integrated machine learning and molecular dynamics framework using active learning to predict diffusion coefficients across the PFAS chemical space. Starting with measured diffusion coefficients, we train models using chemical graph-based representations and physicochemical descriptors. The approach iteratively identifies molecules with highest prediction uncertainty, performs targeted MD simulations, and retrains models to efficiently explore chemical space while minimizing computational cost. The framework achieved significant performance improvements, reducing mean relative error by 88% and increasing R2 from 0.095 to 0.907. Uncertainty-based sampling consistently outperformed random selection at optimal batch sizes of 50-100 compounds. This data-efficient approach enables transport property prediction across thousands of PFAS molecules, supporting environmental risk assessment and remediation planning.
科研通智能强力驱动
Strongly Powered by AbleSci AI