话语
计算机科学
对话
变压器
人气
自然语言处理
人工智能
心理学
沟通
量子力学
社会心理学
物理
电压
作者
Gopendra Vikram Singh,Mauajama Firdaus,Asif Ekbal,Pushpak Bhattacharyya
标识
DOI:10.1109/taslp.2022.3224287
摘要
In the natural language processing community, open-domain conversational agents, also known as chatbots, are gaining popularity. One of the difficulties is getting them to communicate in an emotionally intelligent manner. To generate dialogues, current neural response generation methods depend solely on end-to-end learning from large scale conversation data. Therefore, we introduce a large-scale multi Emotion and Intent guided Multimodal Dialogue (EmoInt-MD) dataset labelled with 32 emotions and 15 empathetic intents having 32 k dialogues taken from different movie genres. We propose a novel multi-task multimodal contextual Transformer framework for simultaneously identifying the emotions and intents in a given utterance utilizing audio and visual features in addition to the textual information. Experimental analysis proves that the proposed framework outperforms several unimodal and multimodal baselines on the EmoInt-MD dataset. This dataset along with our baseline and proposed framework implementations will be made publicly available for research purposes.
科研通智能强力驱动
Strongly Powered by AbleSci AI