计算机科学
关系抽取
信息抽取
任务(项目管理)
情报检索
JSON文件
自然语言处理
简单(哲学)
知识抽取
人工智能
万维网
认识论
哲学
经济
管理
作者
John Dagdelen,Alexander Dunn,Sang‐Hoon Lee,Nicholas Walker,Andrew Rosen,Gerbrand Ceder,Kristin A. Persson,Anubhav Jain
标识
DOI:10.1038/s41467-024-45563-x
摘要
Extracting structured knowledge from scientific text remains a challenging task for machine learning models. Here, we present a simple approach to joint named entity recognition and relation extraction and demonstrate how pretrained large language models (GPT-3, Llama-2) can be fine-tuned to extract useful records of complex scientific knowledge. We test three representative tasks in materials chemistry: linking dopants and host materials, cataloging metal-organic frameworks, and general composition/phase/morphology/application information extraction. Records are extracted from single sentences or entire paragraphs, and the output can be returned as simple English sentences or a more structured format such as a list of JSON objects. This approach represents a simple, accessible, and highly flexible route to obtaining large databases of structured specialized scientific knowledge extracted from research papers.
科研通智能强力驱动
Strongly Powered by AbleSci AI