计算机科学
推论
语义学(计算机科学)
特征(语言学)
特征提取
符号(数学)
动态贝叶斯网络
面子(社会学概念)
模式识别(心理学)
人工智能
语言学
贝叶斯网络
数学
哲学
程序设计语言
社会学
社会科学
数学分析
作者
Junseok Ahn,Youngjoon Jang,Joon Son Chung
标识
DOI:10.1109/icassp48485.2024.10445841
摘要
The objective of this work is the effective extraction of spatial and dynamic features for Continuous Sign Language Recognition (CSLR). To accomplish this, we utilise a two-pathway SlowFast network, where each pathway operates at distinct temporal resolutions to separately capture spatial (hand shapes, facial expressions) and dynamic (movements) information. In addition, we introduce two distinct feature fusion methods, carefully designed for the characteristics of CSLR: (1) Bi-directional Feature Fusion (BFF), which facilitates the transfer of dynamic semantics into spatial semantics and vice versa; and (2) Pathway Feature Enhancement (PFE), which enriches dynamic and spatial representations through auxiliary subnetworks, while avoiding the need for extra inference time. As a result, our model further strengthens spatial and dynamic representations in parallel. We demonstrate that the proposed framework outperforms the current state-of-the-art performance on popular CSLR datasets, including PHOENIX14, PHOENIX14-T, and CSL-Daily.
科研通智能强力驱动
Strongly Powered by AbleSci AI