计算机科学
人工智能
水准点(测量)
背景(考古学)
像素
计算机视觉
模式识别(心理学)
图像(数学)
人工神经网络
地理
大地测量学
考古
作者
Jiaxin Zhang,Lingyu Liang,Kai Ding,Fengjun Guo,Lianwen Jin
出处
期刊:IEEE transactions on artificial intelligence
[Institute of Electrical and Electronics Engineers]
日期:2023-10-02
卷期号:5 (5): 2319-2330
被引量:5
标识
DOI:10.1109/tai.2023.3321257
摘要
Camera-captured document images usually suffer from various appearance degradations, which hamper the clarity of content and preclude subsequent analysis and recognition systems. Most existing methods are tailored for one or relatively few degradations, making them feasible only in limited scenarios. However, in real-world applications, degradations are more diverse, and different degradations may arise simultaneously in a single image. To remedy this limitation, we aimed to achieve appearance enhancement for camera-captured document images in the wild, where degradations exhibit more diversity and may coexist simultaneously within the same image. To realize this, we propose a new end-to-end neural network called GCDRNet, which consists of two cascaded subnets, GC-Net and DR-Net. The GC-Net is used for global context modeling, and the DR- Net is used for detail restoration through a multi-scale and multi-loss training strategy. To train and validate GCDRNet in real-world scenarios, we constructed a new benchmark called RealDAE, which contains 600 real-world degraded document images that are carefully annotated with pixel-wise alignment. To the best of our knowledge, RealDAE is the first dataset that targets multiple degradations in the wild. Extensive experiments validated the superiority and advancement of our GCDRNet and RealDAE compared to existing methods and datasets, respectively. In addition, experiments also demonstrated that image appearance enhancement as a pre-processing procedure can effectively improve the performance of downstream tasks, such as text detection and recognition.
科研通智能强力驱动
Strongly Powered by AbleSci AI