SRFormer: Efficient Yet Powerful Transformer Network for Single Image Super Resolution

计算机科学变压器人工智能卷积神经网络模式识别（心理学）机器学习计算机工程工程类电压电气工程

作者

Armin Mehri,Parichehr Behjati,Darío Carpio,Ángel D. Sappa

出处

期刊：IEEE Access [Institute of Electrical and Electronics Engineers]
日期：2023-01-01 卷期号：11: 121457-121469 被引量：4

链接

ieee.org ieee.orgdoi.org

标识

DOI：10.1109/access.2023.3328229

摘要

Recent breakthroughs in single image super resolution have investigated the potential of deep Convolutional Neural Networks (CNNs) to improve performance. However, CNNs based models suffer from their limited fields and their inability to adapt to the input content. Recently, Transformer based models were presented, which demonstrated major performance gains in Natural Language Processing and Vision tasks while mitigating the drawbacks of CNNs. Nevertheless, Transformer computational complexity can increase quadratically for high-resolution images, and the fact that it ignores the original structures of the image by converting them to the 1D structure can make it problematic to capture the local context information and adapt it for real-time applications. In this paper, we present, SRFormer, an efficient yet powerful Transformer-based architecture, by making several key designs in the building of Transformer blocks and Transformer layers that allow us to consider the original structure of the image (i.e., 2D structure) while capturing both local and global dependencies without raising computational demands or memory consumption. We also present a Gated Multi-Layer Perceptron (MLP) Feature Fusion module to aggregate the features of different stages of Transformer blocks by focusing on inter-spatial relationships while adding minor computational costs to the network. We have conducted extensive experiments on several super-resolution benchmark datasets to evaluate our approach. SRFormer demonstrates superior performance compared to state-of-the-art methods from both Transformer and Convolutional networks, with an improvement margin of 0.1 ~ 0.53 dB . Furthermore, while SRFormer has almost the same model size, it outperforms SwinIR by 0.47% and inference time by half the time of SwinIR. The code will be available on GitHub.

求助该文献

最长约 10秒，即可获得该文献文件

SRFormer: Efficient Yet Powerful Transformer Network for Single Image Super Resolution

今日热心研友