Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
Remote Sensing (RS) scene classification, a pivotal task in Earth observation, involves categorizing satellite or aerial imagery into distinct land-use and land-cover classes. Major challenges in this task include high intraclass variability and low interclass distinctions. Historically, state-of-th...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10815958/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832583218773098496 |
---|---|
author | Russo Ashraf Kang-Hyun Jo |
author_facet | Russo Ashraf Kang-Hyun Jo |
author_sort | Russo Ashraf |
collection | DOAJ |
description | Remote Sensing (RS) scene classification, a pivotal task in Earth observation, involves categorizing satellite or aerial imagery into distinct land-use and land-cover classes. Major challenges in this task include high intraclass variability and low interclass distinctions. Historically, state-of-the-art methods in this field have struggled to achieve satisfactory results without a significant trade-off in computational efficiency. These methods often require substantial computational resources to process the complex data characteristics of RS imagery, leading to inefficiencies that limit their practical application in real-time or on resource-constrained platforms. Delving into these complexities, the Efficient Spectral Inception Former (ESIF) architecture is proposed, which introduces a novel paradigm to RS scene classification by integrating multi-domain feature fusion of the spatial, texture, and spectral (frequency) domains. The proposed approach leverages the strengths of convolutional neural networks (CNNs) for spatial information extraction, a novel texture feature alignment block (TFAB) for nuanced texture differentiation, an efficient spectro-former block (ESFB) that uses spectral analysis for enhanced pattern recognition, a cross-domain fusion block (CDFB) and finally, an inception transformer block (iFB) that balances high and low-frequency information. Furthermore, we construct a new remote scene dataset named ISL-RS50, which is significantly more challenging than the existing ones. The proposed method yield the best results when trained from scratch, in all seven tested datasets:: ISL-RS50 (60%), Optimal-31 (86.55%), UC-Merced (94.52%), RSSCN7 (94.1%), SIRI-WHU (95%), WHU-RS19 (94.52%), AID (93.5%). Finally, ESIF exemplifies an optimal accuracy-efficiency trade-off, supporting its suitability for deployment in real-world applications. |
format | Article |
id | doaj-art-3d8fde853fbd4c9daaddb72c49f77919 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-3d8fde853fbd4c9daaddb72c49f779192025-01-29T00:01:00ZengIEEEIEEE Access2169-35362025-01-0113163801639310.1109/ACCESS.2024.352289910815958Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene ClassificationRusso Ashraf0https://orcid.org/0000-0001-5954-8116Kang-Hyun Jo1https://orcid.org/0000-0002-4937-7082Department of Electrical, Electronic, and Computer Engineering, University of Ulsan, Ulsan, South KoreaDepartment of Electrical, Electronic, and Computer Engineering, University of Ulsan, Ulsan, South KoreaRemote Sensing (RS) scene classification, a pivotal task in Earth observation, involves categorizing satellite or aerial imagery into distinct land-use and land-cover classes. Major challenges in this task include high intraclass variability and low interclass distinctions. Historically, state-of-the-art methods in this field have struggled to achieve satisfactory results without a significant trade-off in computational efficiency. These methods often require substantial computational resources to process the complex data characteristics of RS imagery, leading to inefficiencies that limit their practical application in real-time or on resource-constrained platforms. Delving into these complexities, the Efficient Spectral Inception Former (ESIF) architecture is proposed, which introduces a novel paradigm to RS scene classification by integrating multi-domain feature fusion of the spatial, texture, and spectral (frequency) domains. The proposed approach leverages the strengths of convolutional neural networks (CNNs) for spatial information extraction, a novel texture feature alignment block (TFAB) for nuanced texture differentiation, an efficient spectro-former block (ESFB) that uses spectral analysis for enhanced pattern recognition, a cross-domain fusion block (CDFB) and finally, an inception transformer block (iFB) that balances high and low-frequency information. Furthermore, we construct a new remote scene dataset named ISL-RS50, which is significantly more challenging than the existing ones. The proposed method yield the best results when trained from scratch, in all seven tested datasets:: ISL-RS50 (60%), Optimal-31 (86.55%), UC-Merced (94.52%), RSSCN7 (94.1%), SIRI-WHU (95%), WHU-RS19 (94.52%), AID (93.5%). Finally, ESIF exemplifies an optimal accuracy-efficiency trade-off, supporting its suitability for deployment in real-world applications.https://ieeexplore.ieee.org/document/10815958/Convolutional neural networks (CNNs)frequency analysislarge kernel attention (LKA)multi domainremote sensing (RS)scene classification |
spellingShingle | Russo Ashraf Kang-Hyun Jo Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification IEEE Access Convolutional neural networks (CNNs) frequency analysis large kernel attention (LKA) multi domain remote sensing (RS) scene classification |
title | Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification |
title_full | Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification |
title_fullStr | Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification |
title_full_unstemmed | Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification |
title_short | Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification |
title_sort | frequency and texture aware multi domain feature fusion for remote sensing scene classification |
topic | Convolutional neural networks (CNNs) frequency analysis large kernel attention (LKA) multi domain remote sensing (RS) scene classification |
url | https://ieeexplore.ieee.org/document/10815958/ |
work_keys_str_mv | AT russoashraf frequencyandtextureawaremultidomainfeaturefusionforremotesensingsceneclassification AT kanghyunjo frequencyandtextureawaremultidomainfeaturefusionforremotesensingsceneclassification |