计算机科学
主观视频质量
视频质量
人工智能
卷积神经网络
变压器
图像质量
质量评定
计算机视觉
建筑
模式识别(心理学)
实时计算
可靠性工程
图像(数学)
工程类
评价方法
电压
艺术
公制(单位)
视觉艺术
运营管理
电气工程
标识
DOI:10.1145/3474085.3475368
摘要
No-reference video quality assessment has not been widely benefited from deep learning, mainly due to the complexity, diversity and particularity of modelling spatial and temporal characteristics in quality assessment scenario. Image quality assessment (IQA) performed on video frames plays a key role in NR-VQA. A perceptual hierarchical network (PHIQNet) with an integrated attention module is first proposed that can appropriately simulate the visual mechanisms of contrast sensitivity and selective attention in IQA. Subsequently, perceptual quality features of video frames derived from PHIQNet are fed into a long short-term convolutional Transformer (LSCT) architecture to predict the perceived video quality. LSCT consists of CNN formulating quality features in video frames within short-term units that are then fed into Transformer to capture the long-range dependence and attention allocation over temporal units. Such architecture is in line with the intrinsic properties of VQA. Experimental results on publicly available video quality databases have demonstrated that the LSCT architecture based on PHIQNet significantly outperforms state-of-the-art video quality models.
科研通智能强力驱动
Strongly Powered by AbleSci AI