计算机科学
人工智能
质量(理念)
视频质量
循环神经网络
帧(网络)
变化(天文学)
特征学习
机器学习
代表(政治)
深度学习
模式识别(心理学)
人工神经网络
电信
公制(单位)
哲学
运营管理
物理
认识论
政治
天体物理学
政治学
法学
经济
作者
Yuming Fang,Zhaoqian Li,Jiebin Yan,Xiangjie Sui,Hantao Liu
标识
DOI:10.1109/tip.2023.3272480
摘要
Video quality assessment (VQA) has received remarkable attention recently. Most of the popular VQA models employ recurrent neural networks (RNNs) to capture the temporal quality variation of videos. However, each long-term video sequence is commonly labeled with a single quality score, with which RNNs might not be able to learn long-term quality variation well. A natural question then arises: What's the real role of RNNs in learning the visual quality of videos? Does it learn spatio-temporal representation as expected or just aggregating spatial features redundantly? In this study, we conduct a comprehensive study by training a family of VQA models with carefully designed frame sampling strategies and spatio-temporal fusion methods. Our extensive experiments on four publicly available in-the-wild video quality datasets lead to two main findings. First, the plausible spatio-temporal modeling module ( i.e ., RNNs) does not facilitate quality-aware spatio-temporal feature learning. Second, sparsely sampled video frames are capable of obtaining the competitive performance against using all video frames as the input. In other words, spatial features play a vital role in capturing video quality variation for VQA. To our best knowledge, this is the first work to explore the issue of spatio-temporal modeling in VQA.
科研通智能强力驱动
Strongly Powered by AbleSci AI