计算机科学
人工智能
感知
领域(数学分析)
融合
比例(比率)
机器学习
人机交互
心理学
数学分析
语言学
哲学
物理
数学
量子力学
神经科学
作者
Qiya Song,Renwei Dian,Bin Sun,Jie Xie,Shutao Li
标识
DOI:10.1145/3581783.3612847
摘要
Understanding and elucidating human behavior across diverse scenarios represents a pivotal research challenge in pursuing seamless human-computer interaction. However, previous research on multi-participant dialogues has mostly relied on proprietary datasets, which are not standardized and openly accessible. To propel advancements in this domain, the MultiMediate'23 Challenge presents two sub-challenges: Eye contact detection and Next speaker prediction, aiming to foster a comprehensive understanding of multi-participant behavior. To tackle these challenges, we propose a multi-scale conformer fusion network (MSCFN) for enhancing the perception of multi-participant group behaviors. The conformer block combines the strengths of transformers and convolution networks to facilitate the establishment of global and local contextual relationships between sequences. Then the output features from all Conformer blocks are concatenated to fusion multi-scale representations. Our proposed method was evaluated using the officially provided dataset, and it achieves the best and second best performance in next speaker prediction and gaze detection tasks of MultiMediate'23, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI