计算机科学
瓶颈
可扩展性
钥匙(锁)
集成学习
机器学习
人工智能
大数据
航程(航空)
构造(python库)
执行模型
标杆管理
精确性和召回率
数据建模
实验数据
多尺度建模
桥接(联网)
氮化碳
深度学习
体积热力学
分布式计算
作者
Dianyuan Li,Xichen Sun,Shaohua Sun,Runzhou Wang,Miaomiao Zhang,Meng Xiao,Yue Wang,Yuezhou Zhang
标识
DOI:10.1002/advs.202524215
摘要
ABSTRACT The overwhelming volume of unstructured scientific literature presents a fundamental bottleneck to materials discovery, where critical data on synthesis and properties remain locked in text. Here, a closed‐loop framework that integrates automated knowledge extraction with interpretable machine learning and targeted experimental validation is presented. This approach is centered on a novel data extraction pipeline, which combines a prompt‐engineered large language model with a model ensemble strategy, systematically optimized to interpret complex materials science narratives. When deployed to construct a database for defect‐engineered carbon nitride photocatalysts, the system achieved 90% accuracy and recall for key parameters. Analysis of the high‐fidelity dataset enabled reliable machine learning models to identify specific surface area (170 m 2 g −1 ) and bandgap (≈2.31 eV) as dominant performance parameters. Crucially, SHapley Additive exPlanations analysis elucidated a non‐monotonic relationship for bandgap, identifying an optimal range of 2.2–2.4 eV and quantifying the fundamental trade‐off between light absorption and charge recombination. These data‐driven insights guided the synthesis of representative materials, with experimental hydrogen evolution rates deviating by less than 5% from predictions. This work establishes a scalable and transferable paradigm, transforming fragmented literature into actionable intelligence and offering a powerful strategy for accelerating the development of functional materials.
科研通智能强力驱动
Strongly Powered by AbleSci AI