雅罗维亚
合成生物学
人工智能
管道(软件)
机器学习
生物制造
计算机科学
工作流程
信息抽取
生成语法
生化工程
酵母
生物技术
生物
计算生物学
工程类
生物化学
数据库
程序设计语言
作者
Zhengyang Xiao,Wenyu Li,Hannah Moon,Garrett W. Roell,Yixin Chen,Yinjie Tang
标识
DOI:10.1021/acssynbio.3c00310
摘要
Knowledge mining from synthetic biology journal articles for machine learning (ML) applications is a labor-intensive process. The development of natural language processing (NLP) tools, such as GPT-4, can accelerate the extraction of published information related to microbial performance under complex strain engineering and bioreactor conditions. As a proof of concept, we proposed prompt engineering for a GPT-4 workflow pipeline to extract knowledge from 176 publications on two oleaginous yeasts (Yarrowia lipolytica and Rhodosporidium toruloides). After human intervention, the pipeline obtained a total of 2037 data instances. The structured data sets and feature selections enabled ML approaches (e.g., a random forest model) to predict Yarrowia fermentation titers with decent accuracy (R2 of 0.86 for unseen test data). Via transfer learning, the trained model could assess the production potential of the engineered nonconventional yeast, R. toruloides, for which there are fewer published reports. This work demonstrated the potential of generative artificial intelligence to streamline information extraction from research articles, thereby facilitating fermentation predictions and biomanufacturing development.
科研通智能强力驱动
Strongly Powered by AbleSci AI