SuperCoT-X: Masked Hyperspectral Image Modeling With Diverse Superpixel-Level Contrastive Tokenizer

Hyperspectral images (HSI) exhibit complex contextual relationships, including variations in local homogeneous regions and spectral similarities among different classes. Contrastive masked patch embedding prediction specializes in capturing rich, high-level visual context from neighborhoods. However...

Full description

Saved in:
Bibliographic Details
Main Authors: Miaomiao Liang, Weigang Wu, Huifang Shen, Lingjuan Yu, Xiangchun Yu, Licheng Jiao
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11072320/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Hyperspectral images (HSI) exhibit complex contextual relationships, including variations in local homogeneous regions and spectral similarities among different classes. Contrastive masked patch embedding prediction specializes in capturing rich, high-level visual context from neighborhoods. However, it is a challenge to balance representation certainty against intraclass diversity. Pursuing neighbor diversity through token-level contrast can disrupt the certainty of intraclass representation. To address this issue, we propose a superpixel-level contrastive tokenizer (SuperCoT) for masked HSI modeling. It performs mask prediction with superpixel-calibrated targets, enhancing representation certainty in homogeneous regions. In addition, to mitigate the contextual semantic loss resulting from excessively consistent representations within clusters in SuperCoT, we introduce an intracluster diversity regularization (SuperCoT-D) into the superpixel-level denoising contrast loss. Furthermore, to reduce the computational burden of aggregating superpixel tokens during each iteration in SuperCoT-D, we suggest an alternative approach, SuperCoT-M, which preserves a prototypical dictionary updated through momentum that refers to superpixel labels and implicitly improves the diversity of intracluster representations. Comprehensive experiments on five HSI datasets demonstrate that our proposed methods achieve favorable results and are competitive with other self-supervised approaches.
ISSN:1939-1404
2151-1535