计算机科学
人工智能
信息抽取
机器学习
水准点(测量)
钥匙(锁)
跟踪(心理语言学)
自然语言处理
原始数据
数据科学
知识抽取
转化(遗传学)
情报检索
特征提取
数据提取
数据建模
深度学习
机器代码
作者
Yufan Chen,Yuxuan Zhang,Haifan Zhou,Ching Ting Leung,Hanyu Gao
标识
DOI:10.1146/annurev-chembioeng-100724-080433
摘要
Rich information in the chemical literature presents unprecedented opportunities for accelerating discovery and optimization in chemistry through data-driven approaches. Nevertheless, converting raw information in the literature into structured databases relies primarily on manual curation, which is time-consuming and costly. In this review, we comprehensively examine recent advances in automatic chemical information extraction from the literature, focusing on image and text modalities. We trace the evolution from early rule-based and machine learning approaches to state-of-the-art methods leveraging large language models (LLMs) and vision language models. We discuss core tasks such as optical chemical structure recognition, reaction diagram parsing, named entity recognition, and experimental procedure extraction, highlighting representative methods, benchmark data sets, and practical challenges such as multimodal integration and data annotation. By systematically comparing these approaches, we identify key trends and persistent limitations and outline promising future directions toward robust, scalable, and automated chemical information extraction frameworks. This review aims to provide a practical guide for researchers seeking to harness machine learning and LLM technologies to accelerate the digital transformation of chemical science.
科研通智能强力驱动
Strongly Powered by AbleSci AI