计算机科学
光学字符识别
变压器
字错误率
人工智能
语音识别
手写体识别
笔迹
文本识别
自然语言处理
模式识别(心理学)
错误检测和纠正
文字处理
图像处理
误差分析
文件处理
单词识别
字符识别
光学(聚焦)
智能字识别
作者
Kartika Candra Kirana,Ira Kumalasari,Gulpi Qorik Oktagalu,Afwatul Maqbullah,Agung Faradiz Shobari,Bagus Hidayat
标识
DOI:10.1109/iceeie66203.2025.11252079
摘要
This study compares the performance of Tesseract, Easy-OCR, and Transformer OCR in recognizing crossed-out text in the Indonesian and English languages. The focus on crossed-out text aims to assess the ability of OCR methods to face the challenges of recognizing text with non-standard formats. This preliminary study provides an important finding for optimizing the OCR technology in crossed-out multilingual handwriting. Testing was conducted on the modified IAM Handwriting dataset and the UM-PTI-Handwriting dataset using the processing time, Character Error Rate (CER), and Word Error Rate (WER) metrics to compare Transformer (Tr-OCR, Donut), Tesseract, and Easy OCR. According to the WER and CER metrics, Transformerbased OCR (Tr-OCR) achieves the lowest error rate, achieving 0,63 WER and 0.63 CER on the modified IAM Handwriting dataset, while 1,92 WER and 3,05 CER on the UM-PTI-Handwriting dataset. In terms of the processing time, Tesseract OCR is the fastest, while Transformer-based OCR (Donut) is the slowest. It concluded that the Transformer OCR (Tr-OCR) excels in recognizing English handwritten. However, all compared OCRs have lower accuracy in Indonesian handwritten text compared to English handwritten text.
科研通智能强力驱动
Strongly Powered by AbleSci AI