Pansharpening via Multiscale Embedding and Dual Attention Transformers

Pansharpening is a fundamental and crucial image processing task for many remote sensing applications, which generates a high-resolution multispectral image by fusing a low-resolution multispectral image and a high-resolution panchromatic image. Recently, vision transformers have been introduced int...

Full description

Saved in:
Bibliographic Details
Main Authors: Wensheng Fan, Fan Liu, Jingzhi Li
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10365163/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Pansharpening is a fundamental and crucial image processing task for many remote sensing applications, which generates a high-resolution multispectral image by fusing a low-resolution multispectral image and a high-resolution panchromatic image. Recently, vision transformers have been introduced into the pansharpening task for utilizing global contextual information. However, long-range and local dependencies modeling and multiscale feature learning are all essential to the pansharpening task. Learning and exploiting these various information raises a big challenge and limits the performance and efficiency of existing pansharpening methods. To solve this issue, we propose a pansharpening network based on multiscale embedding and dual attention transformers (MDPNet). Specifically, a multiscale embedding block is proposed to embed multiscale information of the images into vectors. Thus, transformers only need to process a multispectral embedding sequence and a panchromatic embedding sequence to efficiently use multiscale information. Furthermore, an additive hybrid attention transformer is proposed to fuse the embedding sequences in an additive injection manner. Finally, a channel self-attention transformer is proposed to utilize channel correlations for high-quality detail generation. Experiments over QuickBird and WorldView-3 datasets demonstrate the proposed MDPNet outperforms state-of-the-art methods visually and quantitatively with low running time. Ablation studies further verify the effectiveness of the proposed multiscale embedding and transformers in pansharpening.
ISSN:1939-1404
2151-1535