计算机科学
判别式
混乱
图形
人工智能
卷积神经网络
质量评定
桥接(联网)
动作识别
机器学习
模式识别(心理学)
自然语言处理
情报检索
理论计算机科学
评价方法
班级(哲学)
工程类
计算机网络
可靠性工程
心理学
精神分析
作者
Kanglei Zhou,Yue Ma,Hubert P. H. Shum,Xinhong Liang
标识
DOI:10.1109/tcsvt.2023.3281413
摘要
Action quality assessment (AQA) automatically evaluates how well humans perform actions in a given video, a technique widely used in fields such as rehabilitation medicine, athletic competitions, and specific skills assessment. However, existing works that uniformly divide the video sequence into small clips of equal length suffer from intra-clip confusion and inter-clip incoherence, hindering the further development of AQA. To address this issue, we propose a hierarchical graph convolutional network (GCN). First, semantic information confusion is corrected through clip refinement, generating the ‘shot’ as the basic action unit. We then construct a scene graph by combining several consecutive shots into meaningful scenes to capture local dynamics. These scenes can be viewed as different procedures of a given action, providing valuable assessment cues. The video-level representation is finally extracted via sequential action aggregation among scenes to regress the predicted score distribution, enhancing discriminative features and improving assessment performance. Experiments on the AQA-7, MTL-AQA, and JIGSAWS datasets demonstrate the superiority of the proposed hierarchical GCN over state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI