计算机科学
端到端原则
语音识别
深度学习
集合(抽象数据类型)
噪音(视频)
钥匙(锁)
混响
语音处理
语音增强
人工智能
试验装置
降噪
工程类
电气工程
图像(数学)
程序设计语言
计算机安全
作者
Awni Hannun,Carl Case,Jared Casper,Bryan Catanzaro,Greg Diamos,Erich Elsen,Ryan Prenger,Sanjeev Satheesh,Shubho Sengupta,Adam Coates,Andrew Y. Ng
出处
期刊:Cornell University - arXiv
日期:2014-01-01
被引量:1444
标识
DOI:10.48550/arxiv.1412.5567
摘要
We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.
科研通智能强力驱动
Strongly Powered by AbleSci AI