变压器
计算机科学
安全性令牌
定位关键字
嵌入
卷积神经网络
人工智能
定位
机器翻译
语音识别
模式识别(心理学)
电压
计算机网络
工程类
电气工程
作者
Kevin Ding,Martin Zong,Jiakui Li,Baoxiang Li
标识
DOI:10.1109/icassp43922.2022.9747295
摘要
Transformer recently has achieved impressive success in a number of domains, including machine translation, image recognition, and speech recognition. Most of the previous work on Keyword Spotting (KWS) is built upon convolutional or recurrent neural networks. In this paper, we explore a family of Transformer architectures for keyword spotting, optimizing the trade-off between accuracy and efficiency in a high-speed regime. We also studied the effectiveness and summarized the principles of applying key components in vision Transformers to KWS, including patch embedding, position encoding, attention mechanism, and class token. On top of the findings, we propose the LeTR: a lightweight and highly efficient Transformer for KWS. We consider different efficiency measures on different edge devices so as to reflect a wide range of application scenarios best. Experimental results on two common benchmarks demonstrate that LeTR has achieved state-of-the-art results over competing methods with respect to the speed/accuracy trade-off.
科研通智能强力驱动
Strongly Powered by AbleSci AI