The implications of handwritten text recognition for accessing the past at scale

计算机科学 独创性 元数据 蓝图 数据科学 叙述的 奖学金 抄写(语言学) 人工智能 万维网 社会学 政治学 定性研究 机械工程 社会科学 语言学 哲学 法学 工程类
作者
Joseph Nockels,Paul Gooding,Melissa Terras
出处
期刊:Journal of Documentation [Emerald Publishing Limited]
卷期号:80 (7): 148-167 被引量:1
标识
DOI:10.1108/jd-09-2023-0183
摘要

Purpose This paper focuses on image-to-text manuscript processing through Handwritten Text Recognition (HTR), a Machine Learning (ML) approach enabled by Artificial Intelligence (AI). With HTR now achieving high levels of accuracy, we consider its potential impact on our near-future information environment and knowledge of the past. Design/methodology/approach In undertaking a more constructivist analysis, we identified gaps in the current literature through a Grounded Theory Method (GTM). This guided an iterative process of concept mapping through writing sprints in workshop settings. We identified, explored and confirmed themes through group discussion and a further interrogation of relevant literature, until reaching saturation. Findings Catalogued as part of our GTM, 120 published texts underpin this paper. We found that HTR facilitates accurate transcription and dataset cleaning, while facilitating access to a variety of historical material. HTR contributes to a virtuous cycle of dataset production and can inform the development of online cataloguing. However, current limitations include dependency on digitisation pipelines, potential archival history omission and entrenchment of bias. We also cite near-future HTR considerations. These include encouraging open access, integrating advanced AI processes and metadata extraction; legal and moral issues surrounding copyright and data ethics; crediting individuals’ transcription contributions and HTR’s environmental costs. Originality/value Our research produces a set of best practice recommendations for researchers, data providers and memory institutions, surrounding HTR use. This forms an initial, though not comprehensive, blueprint for directing future HTR research. In pursuing this, the narrative that HTR’s speed and efficiency will simply transform scholarship in archives is deconstructed.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
阿可阿可完成签到,获得积分10
1秒前
1秒前
2秒前
2秒前
2秒前
Nick Green完成签到,获得积分10
3秒前
3秒前
3秒前
小仙女完成签到 ,获得积分10
4秒前
maji完成签到,获得积分10
5秒前
时尚的初柔完成签到,获得积分10
7秒前
沉静丹寒发布了新的文献求助10
7秒前
宇宇发布了新的文献求助10
7秒前
胖仔没烦恼完成签到 ,获得积分10
7秒前
悄悄完成签到 ,获得积分10
7秒前
归尘发布了新的文献求助10
8秒前
所所应助额威风采纳,获得10
10秒前
11秒前
12秒前
12秒前
3333333333完成签到,获得积分10
13秒前
科研通AI6.3应助shuzhaowen采纳,获得10
13秒前
13秒前
16秒前
16秒前
田田完成签到,获得积分10
16秒前
传奇3应助小爱采纳,获得10
17秒前
xuan发布了新的文献求助10
18秒前
19秒前
酷波er应助zjc采纳,获得10
19秒前
完美世界应助年轻南烟采纳,获得10
20秒前
柠觉呢发布了新的文献求助10
21秒前
cdercder应助Larry1226采纳,获得10
21秒前
Carrie发布了新的文献求助10
22秒前
24秒前
25秒前
PYF8086完成签到 ,获得积分10
26秒前
深情安青应助年轻南烟采纳,获得10
26秒前
shijingling完成签到 ,获得积分10
27秒前
djsj完成签到,获得积分10
28秒前
高分求助中
液晶指向矢仿真分析数据集 8888
Invited Discussant 63O and 64O 1000
Ideology and Meaning-Making under the Putin Regime 750
Advanced Memory Technology 500
Petrology and Plate Tectonics 500
Writing Systems 500
A Handbook of User Experience Research & Design in Libraries 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 计算机科学 化学工程 生物化学 物理 内科学 复合材料 催化作用 光电子学 物理化学 电极 细胞生物学 基因 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6864269
求助须知:如何正确求助?哪些是违规求助? 8567067
关于积分的说明 18216518
捐赠科研通 6232618
什么是DOI,文献DOI怎么找? 3048717
关于科研通互助平台的介绍 2050183
邀请新用户注册赠送积分活动 2026493