工作流程
背景(考古学)
计算机科学
优先次序
口译(哲学)
化学空间
风险分析(工程)
生化工程
注释
数据科学
化学毒性
风险评估
贝叶斯网络
风险管理
数据集成
化学过程
钥匙(锁)
修剪
环境科学
化学工业
公制(单位)
作者
Fei Cheng,Qianhui Li,Lan He,Huizhen Li,Bryan W. Brooks,Zhiqiang Yu,Jing You
标识
DOI:10.1021/acs.est.6c01342
摘要
Effective management of chemical mixtures presents a continuing challenge due to the growing diversity and inadequate characterization of contaminants of emerging concern (CECs). While recent advances in nontarget analysis enable the generation of extensive chemical inventories, key bottlenecks have shifted to postidentification interpretation within heterogeneous data. Here, we present an agent-based workflow that integrates large language models (LLMs) with functional categories, potential sources, and toxicology information to support risk prioritization. The practical technical components and evaluation benchmarks for LLMs were established, showing that optimized prompts and the best-performing model (GPT-4-Turbo) among the seven candidates enhanced user alignment with context perfectly. Integrating real-world data through retrieval-augmented generation enabled us to retrieve 100% truthful content, and further fine-tuning nearly doubled response consistency, substantially reducing hallucination. The workflow was validated using two mixture scenarios to assess the applicability across matrices and chemical contexts. The agent enabled complete functional and source annotation of chemicals by querying the NORMAN Network and achieved ∼85% accuracy for substances absent from existing databases by emulating NORMAN-aligned logic. This capability allowed mixture-level interpretation of chemical inventory, revealing dominant categories and industrial sources, such as lubricants in shale gas flowback produced water and semiconductor-related industrial intermediates, which contributed to elevated risks in the studied scenarios.
科研通智能强力驱动
Strongly Powered by AbleSci AI