定位
宏
人工智能
计算机科学
表达式(计算机科学)
一致性(知识库)
模式识别(心理学)
机器学习
自然语言处理
程序设计语言
作者
Wang-Wang Yu,Kai-Fu Yang,Hongmei Yan,Yongjie Li
标识
DOI:10.1109/tpami.2025.3564951
摘要
Most micro- and macro-expression spotting methods in untrimmed videos suffer from the burden of video- wise collection and frame- wise annotation. Weakly supervised expression spotting (WES) based on video-level labels can potentially mitigate the complexity of frame-level annotation while achieving fine-grained frame-level spotting. However, we argue that existing weakly supervised methods are based on multiple instance learning (MIL) involving inter-modality, inter-sample, and inter-task gaps. The inter-sample gap is primarily from the sample distribution and duration. Therefore, we propose a novel and simple WES framework, MC-WES, using multi-consistency collaborative mechanisms that include modal-level saliency, video-level distribution, label-level duration and segment-level feature consistency strategies to implement fine frame-level spotting with only video-level labels to alleviate the above gaps and merge prior knowledge. The modal-level saliency consistency strategy focuses on capturing key correlations between raw images and optical flow. The video-level distribution consistency strategy utilizes the difference of sparsity in temporal distribution. The label-level duration consistency strategy exploits the difference in the duration of facial muscles. The segment-level feature consistency strategy emphasizes that features under the same labels maintain similarity. Experimental results on three challenging datasets-CAS(ME)$^{2}$, CAS(ME)$^{3}$, and SAMM-LV-demonstrate that MC-WES is comparable to state-of-the-art fully supervised methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI