段落
计算机科学
人工智能
特征(语言学)
编码器
分割
直线(几何图形)
光学(聚焦)
语音识别
模式识别(心理学)
特征提取
自然语言处理
图像分割
过程(计算)
编码(集合论)
语言学
哲学
几何学
数学
万维网
操作系统
物理
集合(抽象数据类型)
光学
程序设计语言
作者
Denis Coquenet,Clément Chatelain,Thierry Paquet
标识
DOI:10.1109/tpami.2022.3144899
摘要
Unconstrained handwritten text recognition remains challenging for computer vision systems. Paragraph text recognition is traditionally achieved by two models: the first one for line segmentation and the second one for text line recognition. We propose a unified end-to-end model using hybrid attention to tackle this task. This model is designed to iteratively process a paragraph image line by line. It can be split into three modules. An encoder generates feature maps from the whole paragraph image. Then, an attention module recurrently generates a vertical weighted mask enabling to focus on the current text line features. This way, it performs a kind of implicit line segmentation. For each text line features, a decoder module recognizes the character sequence associated, leading to the recognition of a whole paragraph. We achieve state-of-the-art character error rate at paragraph level on three popular datasets: 1.91% for RIMES, 4.45% for IAM and 3.59% for READ 2016. Our code and trained model weights are available at https://github.com/FactoDeepLearning/VerticalAttentionOCR.
科研通智能强力驱动
Strongly Powered by AbleSci AI