发音
代理(统计)
字错误率
自然语言处理
词(群论)
计算机科学
心理学
统计
计量经济学
语言学
数学
哲学
出处
期刊:Journal of second language pronunciation
[John Benjamins Publishing Company]
日期:2025-07-07
标识
DOI:10.1075/jslp.25012.won
摘要
Abstract This study examines the validity of WER as a proxy for pronunciation quality in EFL contexts. Human ratings of comprehensibility and accentedness were compared with WER and automated pronunciation scores from six ASR systems — Kaldi, wav2vec 2.0, HuBERT, Whisper (Base and Large-v3), and Microsoft Azure — using 190 read-aloud recordings by Korean elementary learners. With respect to pronunciation scoring, Azure’s phoneme-level accuracy scores demonstrated moderate correlations with human judgments, while Kaldi’s GOP scores showed no meaningful association. Analysis of WER revealed a critical trade-off between ASR accuracy and perceptual sensitivity: high-performing systems such as Whisper Large-v3 and Azure produced near-zero WERs but weakly correlated with human ratings. In contrast, mid-performing systems such as Whisper Base and HuBERT showed stronger correlations, indicating that moderate WER values may better reflect pronunciation variation. These results underscore the limitations of WER in advanced ASR systems and the need for perceptually grounded, interpretable metrics.
科研通智能强力驱动
Strongly Powered by AbleSci AI