Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
Remote Sensing (RS) scene classification, a pivotal task in Earth observation, involves categorizing satellite or aerial imagery into distinct land-use and land-cover classes. Major challenges in this task include high intraclass variability and low interclass distinctions. Historically, state-of-th...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10815958/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Remote Sensing (RS) scene classification, a pivotal task in Earth observation, involves categorizing satellite or aerial imagery into distinct land-use and land-cover classes. Major challenges in this task include high intraclass variability and low interclass distinctions. Historically, state-of-the-art methods in this field have struggled to achieve satisfactory results without a significant trade-off in computational efficiency. These methods often require substantial computational resources to process the complex data characteristics of RS imagery, leading to inefficiencies that limit their practical application in real-time or on resource-constrained platforms. Delving into these complexities, the Efficient Spectral Inception Former (ESIF) architecture is proposed, which introduces a novel paradigm to RS scene classification by integrating multi-domain feature fusion of the spatial, texture, and spectral (frequency) domains. The proposed approach leverages the strengths of convolutional neural networks (CNNs) for spatial information extraction, a novel texture feature alignment block (TFAB) for nuanced texture differentiation, an efficient spectro-former block (ESFB) that uses spectral analysis for enhanced pattern recognition, a cross-domain fusion block (CDFB) and finally, an inception transformer block (iFB) that balances high and low-frequency information. Furthermore, we construct a new remote scene dataset named ISL-RS50, which is significantly more challenging than the existing ones. The proposed method yield the best results when trained from scratch, in all seven tested datasets:: ISL-RS50 (60%), Optimal-31 (86.55%), UC-Merced (94.52%), RSSCN7 (94.1%), SIRI-WHU (95%), WHU-RS19 (94.52%), AID (93.5%). Finally, ESIF exemplifies an optimal accuracy-efficiency trade-off, supporting its suitability for deployment in real-world applications. |
---|---|
ISSN: | 2169-3536 |