Pin-Chi Pan*† Tzu-Hao Hsu Wen-Li Wei* Jen-Chun Lin*
* Institute of Information Science, Academia Sinica, Taiwan
† Department of Electrical Engineering, National Chung Cheng University
§ Graduate Institute of Electrical Engineering, National Taiwan University
[Paper] [Code]
【This Paper has been accepted by ICIP 2023】


Fig. 2. The architecture of the proposed GLA-Net for single image super-resolution.
Deep-net models based on self-attention, such as Swin Transformer, have achieved great success for single image superresolution (SISR). While self-attention excels at modeling global information, it is less effective at capturing high frequencies (e.g., edges etc.) that deliver local information primarily, which is crucial for SISR. To tackle this, we propose a global-local awareness network (GLA-Net) to effectively capture global and local information to learn comprehensive features with low- and high-frequency information. First, we design a GLA layer that combines a high-frequency-oriented Inception module with a low-frequency-oriented Swin Transformer module to simultaneously process local and global information. Second, we introduce dense connections inbetween GLA blocks to strengthen feature propagation and alleviate the vanishing-gradient problem, where each GLA block is composed of several GLA layers. By coupling these core designs, GLA-Net achieves SOTA performance on SISR.


【Quantitative Comparison】

Table 1. Quantitative comparison (average PSNR/SSIM for scale ×2, ×4) with state-of-the-art methods on benchmark datasets(Set5, Set14, BSD100, Urban100, and Manga109). Bold denotes the best and underlined denotes the second best performance.

【Visual Comparison】

Fig. 1. Visual comparison with state-of-the-art SwinIR [7] at 4× super-resolution. Our GLA-Net is more effective at recovering local details, such as edges, than SwinIR.

【Qualitative Comparison - ×2】

Fig. 4. Qualitative comparison of SwinIR [7] and our GLA-Net on Set14 and Urban100 datasets. It is the result of 2× super-resolution. Best viewed by zooming.

【Qualitative Comparison - ×4】

Fig. 4. Qualitative comparison of SwinIR [7] and our GLA-Net on Urban100 and BSD100 datasets. It is the result of 4× super-resolution. Best viewed by zooming.

【Ablation Study】

Table 2. Ablation study for removing different modules of the GLA-Net on the Urban100 dataset at 2× super-resolution.

【More Results】


[1] Wilman W. W. Zou and Pong C. Yuen, “Very low resolution face recognition problem,” IEEE Transactions on Image Processing, vol. 21, no. 1, pp. 327–340, 2012.
[2] Wenzhe Shi, Jose Caballero, Christian Ledig, Xiahai Zhuang, Wenjia Bai, Kanwal Bhatia, Antonio Marvao, Tim Dawes, Declan O’Regan, and Daniel Rueckert, “Cardiac image superresolution with global correspondence using multi-atlas patchmatch,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2013.
[3] Mehdi S. M. Sajjadi, Bernhard Scholkopf, and Michael Hirsch, “EnhanceNet: Single image super-resolution through automated texture synthesis,” in IEEE International Conference on Computer Vision (ICCV), 2017.
[4] Yulun Zhang, Kunpeng Li, Kai Li, Bineng Zhong, and Yun Fu, “Residual non-local attention networks for image restoration,” in International Conference on Learning Representations (ICLR), 2019.
[5] Yiqun Mei, Yuchen Fan, Yuqian Zhou, Lichao Huang, Thomas S. Huang, and Honghui Shi, “Image super-resolution with cross-scale non-local attention and exhaustive self-exemplars mining,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[6] Yiqun Mei, Yuchen Fan, and Yuqian Zhou, “Image super-resolution with non-local sparse attention,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[7] Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte, “SwinIR: Image restoration using swin transformer,” IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021.
[8] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
[9] Namuk Park and Songkuk Kim, “How do vision transformers work?,” in The Tenth International Conference on Learning Representations (ICLR), 2022.
[10] Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, and Shuicheng Yan, “Inception transformer,” in Advances in Neural Information Processing Systems (NeurIPS), 2022.
[11] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna, “Rethinking the inception architecture for computer vision,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[12] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger, “Densely connected convolutional networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[13] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems (NeurIPS), 2017.
[14] Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[15] Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), 2015.
[16] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer, “Automatic differentiation in PyTorch,” in NeurIPS Workshop on Autodiff, 2017.
[17] Eirikur Agustsson and Radu Timofte, “NTIRE 2017 challenge on single image super-resolution: Dataset and study,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017.
[18] Marco Bevilacqua, Aline Roumy, Christine Guillemot, and Marie-line Alberi Morel, “Low-complexity single-image super-resolution based on nonnegative neighbor embedding,” in Proceedings of the British Machine Vision Conference (BMVC), 2012.
[19] Roman Zeyde, Michael Elad, and Matan Protter, “On single image scale-up using sparse-representations,” in International Conference on Curves and Surfaces (ICCS), 2010.
[20] David Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proceedings Eighth IEEE International Conference on Computer Vision (ICCV), 2001.
[21] Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja, “Single image super-resolution from transformed self-exemplars,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[22] Yusuke Matsui, Kota Ito, Yuji Aramaki, Azuma Fujimoto, Toru Ogawa, Toshihiko Yamasaki, and Kiyoharu Aizawa, “Sketch-based manga retrieval using manga109 dataset,” Multimedia Tools and Applications, vol. 76, pp. 21811-21838, 2017.
[23] Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu, “Image super-resolution using very deep residual channel attention networks,” in European Conference on Computer Vision (ECCV), 2018.
[24] Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, and Lei Zhang, “Second-order attention network for single image super-resolution,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[25] Shangchen Zhou, Jiawei Zhang, Wangmeng Zuo, and Chen Change Loy, “Cross-scale internal graph neural network for image super-resolution,” in Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS), 2020.
[26] Ben Niu, Weilei Wen, Wenqi Ren, Xiangde Zhang, Lianping Yang, Shuzhen Wang, Kaihao Zhang, Xiaochun Cao, and Haifeng Shen, “Single image super-resolution via a holistic attention network,” in European Conference on Computer Vision(ECCV), 2020.

Contact Information

Pin-Chi Pan, Wen-Li Wei, Jen-Chun Lin {;;}


    author = {Pan, Pin-Chi and Hsu, Tzu-Hao and Wei, Wen-Li and Lin, Jen-Chun},
    booktitle = {2023 IEEE International Conference on Image Processing (ICIP)},
    month = {October},
    year = {2023}