性格(数学)
计算机科学
集合(抽象数据类型)
人工智能
任务(项目管理)
水准点(测量)
汉字
模式识别(心理学)
字符编码
字体
文件处理
自然语言处理
数据库
数学
几何学
管理
大地测量学
经济
程序设计语言
地理
作者
Yue Xu,Fei Yin,Da‐Han Wang,Xu-Yao Zhang,Zhaoxiang Zhang,Cheng‐Lin Liu
出处
期刊:International Conference on Document Analysis and Recognition
日期:2019-09-01
卷期号:: 793-798
被引量:23
标识
DOI:10.1109/icdar.2019.00132
摘要
This paper introduces a Chinese Ancient Handwritten Characters Database (CASIA-AHCDB) for character recognition research. The database was built by annotating 11,937 pages of Chinese ancient handwritten documents. It consists of more than 2.2 million annotated handwritten character samples of 10,350 categories. According to the source of these documents, the database is divided into two datasets of different styles: Complete Library in Four Sections (AHCDB-style1) and Ancient Buddhist Scriptures (AHCDB-style2). Each dataset can be divided into three parts based on its applications. The first part, called basic category set, contains samples of common categories in two datasets, and is suitable for basic character recognition task. The second part, called enhanced category set, is mainly used for open-set character recognition task based on the basic character recognition. The third part, called the reserved category set, can be used in many pattern recognition tasks in the future. Based on the large category set, the various writing styles and the imbalanced sample number per category, CASIA-AHCDB can also be used for various classification and learning tasks such as transfer learning, few-shot learning. We performed experiments of basic character recognition on the basic category set, and report the results for benchmark. More techniques can be evaluated on this challenging database in the future.
科研通智能强力驱动
Strongly Powered by AbleSci AI