计算机科学
解码方法
人工智能
编码(内存)
变压器
卷积神经网络
模式识别(心理学)
特征(语言学)
参数化复杂度
特征提取
算法
电压
语言学
量子力学
物理
哲学
作者
Mingming Yang,Zongfang Li
标识
DOI:10.1016/j.inffus.2022.12.011
摘要
Degraded document binarization has received keen attention due to its vital influence on subsequent document analysis tasks. In this study, we propose a novel Degraded Document Binarization model through the vision transFormer framework, termed D2BFormer. Thanks to its end-to-end trainable fashion, the D2BFormer model is able to autonomously optimize its parameterized configuration of the entire learning pipeline without incurring the intensity-to-binary value conversion phase, resulting in an improved binarization quality. In addition, we propose a novel dual-branched encoding feature fusion module, which combines architectural components from the vision transformer framework and deep convolutional neural networks. The resulting encoding module can extract features from an input document that are sensitive to both global and local characteristics. Meanwhile, the proposed encoding feature extraction module can operate internally at a much lower spatial resolution than that of a raw input document, leading to reduced computational complexity. Furthermore, we propose a novel progressively merged decoding feature fusion module through carefully introduced skip connections both inside and outside the decoding network. The resulting decoding module progressively combines counterpart features derived from the corresponding layers of the encoding network with comparable spatial resolutions and up-sampled features generated from previous layers in the decoding network. Finally, the experiments conducted on ten public datasets demonstrate that the proposed D2BFormer model gains promising performance in terms of four metrics.
科研通智能强力驱动
Strongly Powered by AbleSci AI