Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification

Remote Sensing (RS) scene classification, a pivotal task in Earth observation, involves categorizing satellite or aerial imagery into distinct land-use and land-cover classes. Major challenges in this task include high intraclass variability and low interclass distinctions. Historically, state-of-th...

Full description

Saved in:
Bibliographic Details
Main Authors: Russo Ashraf, Kang-Hyun Jo
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10815958/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832583218773098496
author Russo Ashraf
Kang-Hyun Jo
author_facet Russo Ashraf
Kang-Hyun Jo
author_sort Russo Ashraf
collection DOAJ
description Remote Sensing (RS) scene classification, a pivotal task in Earth observation, involves categorizing satellite or aerial imagery into distinct land-use and land-cover classes. Major challenges in this task include high intraclass variability and low interclass distinctions. Historically, state-of-the-art methods in this field have struggled to achieve satisfactory results without a significant trade-off in computational efficiency. These methods often require substantial computational resources to process the complex data characteristics of RS imagery, leading to inefficiencies that limit their practical application in real-time or on resource-constrained platforms. Delving into these complexities, the Efficient Spectral Inception Former (ESIF) architecture is proposed, which introduces a novel paradigm to RS scene classification by integrating multi-domain feature fusion of the spatial, texture, and spectral (frequency) domains. The proposed approach leverages the strengths of convolutional neural networks (CNNs) for spatial information extraction, a novel texture feature alignment block (TFAB) for nuanced texture differentiation, an efficient spectro-former block (ESFB) that uses spectral analysis for enhanced pattern recognition, a cross-domain fusion block (CDFB) and finally, an inception transformer block (iFB) that balances high and low-frequency information. Furthermore, we construct a new remote scene dataset named ISL-RS50, which is significantly more challenging than the existing ones. The proposed method yield the best results when trained from scratch, in all seven tested datasets:: ISL-RS50 (60%), Optimal-31 (86.55%), UC-Merced (94.52%), RSSCN7 (94.1%), SIRI-WHU (95%), WHU-RS19 (94.52%), AID (93.5%). Finally, ESIF exemplifies an optimal accuracy-efficiency trade-off, supporting its suitability for deployment in real-world applications.
format Article
id doaj-art-3d8fde853fbd4c9daaddb72c49f77919
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-3d8fde853fbd4c9daaddb72c49f779192025-01-29T00:01:00ZengIEEEIEEE Access2169-35362025-01-0113163801639310.1109/ACCESS.2024.352289910815958Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene ClassificationRusso Ashraf0https://orcid.org/0000-0001-5954-8116Kang-Hyun Jo1https://orcid.org/0000-0002-4937-7082Department of Electrical, Electronic, and Computer Engineering, University of Ulsan, Ulsan, South KoreaDepartment of Electrical, Electronic, and Computer Engineering, University of Ulsan, Ulsan, South KoreaRemote Sensing (RS) scene classification, a pivotal task in Earth observation, involves categorizing satellite or aerial imagery into distinct land-use and land-cover classes. Major challenges in this task include high intraclass variability and low interclass distinctions. Historically, state-of-the-art methods in this field have struggled to achieve satisfactory results without a significant trade-off in computational efficiency. These methods often require substantial computational resources to process the complex data characteristics of RS imagery, leading to inefficiencies that limit their practical application in real-time or on resource-constrained platforms. Delving into these complexities, the Efficient Spectral Inception Former (ESIF) architecture is proposed, which introduces a novel paradigm to RS scene classification by integrating multi-domain feature fusion of the spatial, texture, and spectral (frequency) domains. The proposed approach leverages the strengths of convolutional neural networks (CNNs) for spatial information extraction, a novel texture feature alignment block (TFAB) for nuanced texture differentiation, an efficient spectro-former block (ESFB) that uses spectral analysis for enhanced pattern recognition, a cross-domain fusion block (CDFB) and finally, an inception transformer block (iFB) that balances high and low-frequency information. Furthermore, we construct a new remote scene dataset named ISL-RS50, which is significantly more challenging than the existing ones. The proposed method yield the best results when trained from scratch, in all seven tested datasets:: ISL-RS50 (60%), Optimal-31 (86.55%), UC-Merced (94.52%), RSSCN7 (94.1%), SIRI-WHU (95%), WHU-RS19 (94.52%), AID (93.5%). Finally, ESIF exemplifies an optimal accuracy-efficiency trade-off, supporting its suitability for deployment in real-world applications.https://ieeexplore.ieee.org/document/10815958/Convolutional neural networks (CNNs)frequency analysislarge kernel attention (LKA)multi domainremote sensing (RS)scene classification
spellingShingle Russo Ashraf
Kang-Hyun Jo
Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
IEEE Access
Convolutional neural networks (CNNs)
frequency analysis
large kernel attention (LKA)
multi domain
remote sensing (RS)
scene classification
title Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
title_full Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
title_fullStr Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
title_full_unstemmed Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
title_short Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
title_sort frequency and texture aware multi domain feature fusion for remote sensing scene classification
topic Convolutional neural networks (CNNs)
frequency analysis
large kernel attention (LKA)
multi domain
remote sensing (RS)
scene classification
url https://ieeexplore.ieee.org/document/10815958/
work_keys_str_mv AT russoashraf frequencyandtextureawaremultidomainfeaturefusionforremotesensingsceneclassification
AT kanghyunjo frequencyandtextureawaremultidomainfeaturefusionforremotesensingsceneclassification