脂毒性
集合(抽象数据类型)
学习迁移
核受体
机器学习
鉴定(生物学)
数据集
人工神经网络
深度学习
可靠性(半导体)
化学
多任务学习
试验装置
卷积神经网络
训练集
计算机科学
计算模型
生物系统
体内
参考数据
标记数据
代谢组学
知识转移
化学空间
计算生物学
神经科学
人工智能
作者
Rouyi Wang,Shujun Yi,Guoqiang Shan,Lingyan Zhu
标识
DOI:10.1021/acs.est.5c07895
摘要
Per/polyfluoroalkyl substances (PFAS) can induce hepatic lipotoxicity by activating nuclear receptors (NRs). Here, we first developed machine-learning models to predict activities of PFAS toward five NRs related to hepatic lipotoxicity using five conventional algorithms based on three commonly used data sets: a general chemical data set (A-data set, including 6388-10199 compounds), a broad PFAS data set based on OECD definition (B-data set, including 369-772 compounds), and a strictly defined PFAS data set (C-data set, including 184-198 compounds). Unexpectedly, the models trained on the broad chemical spaces (A- and B-data sets) showed weak identification of active PFAS, which might be due to distributional shifts. The C-data set-trained models exhibited the best identification performance, but with weaker discrimination than A-data set-trained models. There herein, a transfer-learning multitask deep neural network (TL-MT-DNN) was implemented to transfer knowledge from the A-data set to the C-data set, which greatly improved the prediction performance with an average AUC of 0.886 and F1 of 0.665. Applying this model to 3716 PFAS from the PFASSTRUCTv5 database, 391 compounds were predicted to activate all the five NRs. The model's prediction reliability was validated by in vitro cell-based assays and in vivo animal experiments. This study provides a modeling strategy to improve PFAS activity prediction, overcoming the distributional shift inherent in models trained on broad chemical spaces, and highlights its potential for practical application in risk screening.
科研通智能强力驱动
Strongly Powered by AbleSci AI