管道(软件)
居里温度
任务(项目管理)
答疑
萃取(化学)
计算机科学
玛丽·居里
居里
自然语言处理
物理
工程类
化学
程序设计语言
凝聚态物理
色谱法
系统工程
铁磁性
业务
经济政策
欧洲联盟
作者
Aigerim Zhumabayeva,Nikhil Ranjan,Martin Takáč,Stefano Sanvito,Huseyin Ucar
标识
DOI:10.1021/acs.jpcc.4c01974
摘要
In this study, we develop and release two Bidirectional Encoder Representations (BERT) models that are trained primarily with roughly ≈144 K peer-reviewed publications within the magnetic materials domain, which we refer to as MagBERT and MagMatBERT, respectively. These transformer models are then used in chemical named entity recognition (CNER) and question answering (QA) tasks in a data extraction workflow. We demonstrate this approach by developing a magnetics data set of well-known magnetic property, i.e., Curie temperature TC. We evaluate the efficacy of these models along with the ones that were not trained with magnetics corpus in CNER and QA tasks. Our results indicate that an initial training using the magnetics corpus brings about an enhancement in these tasks. Additionally, the quality of each data set is assessed by comparing them with a manually developed ground truth TC data set as well as by employing a random forest (RF) model in its predictive ability of TC as the target quantity. Our analyses demonstrate that models pretrained with magnetic corpus, i.e., MagBERT and MagMatBERT are more efficient than the ones without pretraining.
科研通智能强力驱动
Strongly Powered by AbleSci AI