蒸馏
计算机科学
适应(眼睛)
人工智能
弹丸
机器学习
模式识别(心理学)
自然语言处理
化学
色谱法
心理学
神经科学
有机化学
作者
Yue Zhang,Dionisios G. Vlachos,Dongxia Liu,Hui Fang
标识
DOI:10.1021/acs.jcim.5c00248
摘要
Named entity recognition (NER) has been widely used in chemical text mining for the automatic identification and extraction of chemical entities. However, existing chemical NER systems primarily focus on scenarios with abundant training data, requiring significant human effort on annotations. This poses challenges for applications in the chemical field, such as catalysis, where many advancements have traditionally relied on trial-and-error investigations and incremental adjustment of variables. This hinders catalysis science and technology progress in addressing emerging energy and environmental crises. In this work, we propose a few-shot NER model that can quickly adapt to extract new types of chemical entities by using only a limited number of annotated examples. Our model employs a metric-learning approach to transfer entity similarity knowledge from high-resource chemical domains (with abundant annotations) to enable effective entity recognition in low-resource specialized domains (limited annotation). We validate the effectiveness of our model on a few-shot chemical NER benchmark built based on six existing chemical NER data sets. Experiments show that the proposed few-shot NER model can achieve reasonable performance with only 5 examples per entity type and shows consistent improvement as the number of examples increases. Furthermore, we demonstrate how the proposed model can be trained with large language model (LLM) annotated data, opening a new pathway for rapid adaptation of NER systems. Our approach leverages the knowledge broadness of large language models for chemistry while distilling this knowledge into a lightweight model suitable for efficient and in-house use.
科研通智能强力驱动
Strongly Powered by AbleSci AI