粒度
计算机科学
可解释性
块(置换群论)
骨料(复合)
特征(语言学)
感知
运动(物理)
嵌入
编码
人工智能
语义学(计算机科学)
编码(内存)
模式识别(心理学)
人机交互
计算机视觉
数学
操作系统
哲学
基因
复合材料
神经科学
生物
化学
材料科学
程序设计语言
生物化学
语言学
几何学
作者
Mingyue Cao,Rui Yan,Xiangbo Shu,J. Zhang,Jinpeng Wang,Guo-Sen Xie
标识
DOI:10.1145/3581783.3612435
摘要
Panoramic activity recognition is required to jointly identify multi-granularity human behaviors including individual actions, group activities, and global activities in multi-person videos. Previous methods encode these behaviors hierarchically through multiple stages, which disturb the inherent co-occurrence across multi-granularity behaviors in the same scene. To this end, we propose a novel Multi-granularity Unified Perception (MUP) framework that perceives different granularity behaviors universally to explore the co-occurrence motion pattern via the same parameters in an end-to-end fashion. To be specific, the proposed framework stacks three Unified Motion Encoding (UME) blocks for modeling multiple granularity behaviors with shared parameters. UME block mines intra-relevant and cross-relevant semantics synchronously from input feature sequences via Intra-granularity Motion Embedding (IME) and Cross-granularity Motion Prototyping (CMP). In particular, IME aims to model the interactions among visual features within each granularity based on the attention mechanism. CMP aims to aggregate features across different granularities (i.e., person to group) via several learnable prototypes. Extensive experiments demonstrate that MUP outperforms the state-of-the-art methods on JRDB-PAR and has satisfactory interpretability.
科研通智能强力驱动
Strongly Powered by AbleSci AI