计算机科学
人工智能
分割
深度学习
市场细分
模式识别(心理学)
噪音(视频)
质量(理念)
机器学习
鉴定(生物学)
软件
图像(数学)
哲学
业务
认识论
生物
营销
程序设计语言
植物
作者
Joshua Staker,Kyle Marshall,Robert Abel,Carolyn M. McQuaw
标识
DOI:10.1021/acs.jcim.8b00669
摘要
Chemical structure extraction from documents remains a hard problem because of both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally but still routinely encounter situations where recognition rates are not yet satisfactory and systematic improvement is challenging. Complications impacting the performance of current approaches include the diversity in visual styles used by various software to render structures, the frequent use of ad hoc annotations, and other challenges related to image quality, including resolution and noise. We present end-to-end deep learning solutions for both segmenting molecular structures from documents and predicting chemical structures from the segmented images. This deep-learning-based approach does not require any handcrafted features, is learned directly from data, and is robust against variations in image quality and style. Using the deep learning approach described herein, we show that it is possible to perform well on both segmentation and prediction of low-resolution images containing moderately sized molecules found in journal articles and patents.
科研通智能强力驱动
Strongly Powered by AbleSci AI