手势
计算机科学
机器人
渲染(计算机图形)
手势识别
语音识别
人工智能
人机交互
口语
通信系统
电信
作者
Sheuli Paul,Michael Sintek,Veton Këpuska,Marius Silaghi,Liam Robertson
标识
DOI:10.1109/icmla55696.2022.00127
摘要
Understanding the intent is an essential step for maintaining effective communications. This essential feature is used in communications for assembling, patrolling, and surveillance. A fused and interactive multimodal system for human-robot communication, used in assembly applications, is presented in this paper. Communication is multimodal. Having the options of multiple communication modes such as gestures, text, symbols, graphics, images, and speech increase the chance of effective communication. The intent is the main component that we are aiming to model, specifically in human machine dialogues. For this, we extract the intents from spoken dialogues and fuse the intent with any detected matching gesture that is used in interaction with the robot. The main components of the presented system are: (1) a speech recognizer system using Kaldi, (2) a deep-learning based Dual Intent and Entity Transformer (DIET) based classifier for intent and entity extraction, (3) a hand gesture recognition system, and (4) a dynamic fusion model for speech and gesture based communication. These are evaluated on contextual assembly situation using a simulated interactive robot.
科研通智能强力驱动
Strongly Powered by AbleSci AI