计算机科学
人工智能
频域
计算机视觉
时频分析
帧速率
特征提取
变压器
模式识别(心理学)
工程类
电压
电气工程
滤波器(信号处理)
作者
Zhongwei Qiu,Huan Yang,Jianlong Fu,Daochang Liu,Chang Xu,Dongmei Fu
标识
DOI:10.1109/tpami.2023.3312166
摘要
Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting pertinent textures from nearby frames with known degradation processes. Despite significant progress, grand challenges remain to effectively extract and transmit high-quality textures from high-degraded low-quality sequences, such as blur, additive noises, and compression artifacts. This work proposes a novel degradation-robust Frequency-Transformer (FTVSR++) for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain. First, video frames are split into patches and each patch is transformed into spectral maps in which each channel represents a frequency band. It permits a fine-grained self-attention on each frequency band so that real visual texture can be distinguished from artifacts. Second, a novel dual frequency attention (DFA) mechanism is proposed to capture the global and local frequency relations, which can handle different complicated degradation processes in real-world scenarios. Third, we explore different self-attention schemes for video processing in the frequency domain and discover that a "divided attention" which conducts joint space-frequency attention before applying temporal-frequency attention, leads to the best video enhancement quality. Extensive experiments on three widely-used VSR datasets show that FTVSR++ outperforms state-of-the-art methods on different low-quality videos with clear visual margins.
科研通智能强力驱动
Strongly Powered by AbleSci AI