Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification

Remote Sensing (RS) scene classification, a pivotal task in Earth observation, involves categorizing satellite or aerial imagery into distinct land-use and land-cover classes. Major challenges in this task include high intraclass variability and low interclass distinctions. Historically, state-of-th...

Full description

Saved in:

Bibliographic Details
Main Authors:	Russo Ashraf, Kang-Hyun Jo
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Convolutional neural networks (CNNs) frequency analysis large kernel attention (LKA) multi domain remote sensing (RS) scene classification
Online Access:	https://ieeexplore.ieee.org/document/10815958/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832583218773098496
author	Russo Ashraf Kang-Hyun Jo
author_facet	Russo Ashraf Kang-Hyun Jo
author_sort	Russo Ashraf
collection	DOAJ
description	Remote Sensing (RS) scene classification, a pivotal task in Earth observation, involves categorizing satellite or aerial imagery into distinct land-use and land-cover classes. Major challenges in this task include high intraclass variability and low interclass distinctions. Historically, state-of-the-art methods in this field have struggled to achieve satisfactory results without a significant trade-off in computational efficiency. These methods often require substantial computational resources to process the complex data characteristics of RS imagery, leading to inefficiencies that limit their practical application in real-time or on resource-constrained platforms. Delving into these complexities, the Efficient Spectral Inception Former (ESIF) architecture is proposed, which introduces a novel paradigm to RS scene classification by integrating multi-domain feature fusion of the spatial, texture, and spectral (frequency) domains. The proposed approach leverages the strengths of convolutional neural networks (CNNs) for spatial information extraction, a novel texture feature alignment block (TFAB) for nuanced texture differentiation, an efficient spectro-former block (ESFB) that uses spectral analysis for enhanced pattern recognition, a cross-domain fusion block (CDFB) and finally, an inception transformer block (iFB) that balances high and low-frequency information. Furthermore, we construct a new remote scene dataset named ISL-RS50, which is significantly more challenging than the existing ones. The proposed method yield the best results when trained from scratch, in all seven tested datasets:: ISL-RS50 (60%), Optimal-31 (86.55%), UC-Merced (94.52%), RSSCN7 (94.1%), SIRI-WHU (95%), WHU-RS19 (94.52%), AID (93.5%). Finally, ESIF exemplifies an optimal accuracy-efficiency trade-off, supporting its suitability for deployment in real-world applications.
format	Article
id	doaj-art-3d8fde853fbd4c9daaddb72c49f77919
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-3d8fde853fbd4c9daaddb72c49f779192025-01-29T00:01:00ZengIEEEIEEE Access2169-35362025-01-0113163801639310.1109/ACCESS.2024.352289910815958Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene ClassificationRusso Ashraf0https://orcid.org/0000-0001-5954-8116Kang-Hyun Jo1https://orcid.org/0000-0002-4937-7082Department of Electrical, Electronic, and Computer Engineering, University of Ulsan, Ulsan, South KoreaDepartment of Electrical, Electronic, and Computer Engineering, University of Ulsan, Ulsan, South KoreaRemote Sensing (RS) scene classification, a pivotal task in Earth observation, involves categorizing satellite or aerial imagery into distinct land-use and land-cover classes. Major challenges in this task include high intraclass variability and low interclass distinctions. Historically, state-of-the-art methods in this field have struggled to achieve satisfactory results without a significant trade-off in computational efficiency. These methods often require substantial computational resources to process the complex data characteristics of RS imagery, leading to inefficiencies that limit their practical application in real-time or on resource-constrained platforms. Delving into these complexities, the Efficient Spectral Inception Former (ESIF) architecture is proposed, which introduces a novel paradigm to RS scene classification by integrating multi-domain feature fusion of the spatial, texture, and spectral (frequency) domains. The proposed approach leverages the strengths of convolutional neural networks (CNNs) for spatial information extraction, a novel texture feature alignment block (TFAB) for nuanced texture differentiation, an efficient spectro-former block (ESFB) that uses spectral analysis for enhanced pattern recognition, a cross-domain fusion block (CDFB) and finally, an inception transformer block (iFB) that balances high and low-frequency information. Furthermore, we construct a new remote scene dataset named ISL-RS50, which is significantly more challenging than the existing ones. The proposed method yield the best results when trained from scratch, in all seven tested datasets:: ISL-RS50 (60%), Optimal-31 (86.55%), UC-Merced (94.52%), RSSCN7 (94.1%), SIRI-WHU (95%), WHU-RS19 (94.52%), AID (93.5%). Finally, ESIF exemplifies an optimal accuracy-efficiency trade-off, supporting its suitability for deployment in real-world applications.https://ieeexplore.ieee.org/document/10815958/Convolutional neural networks (CNNs)frequency analysislarge kernel attention (LKA)multi domainremote sensing (RS)scene classification
spellingShingle	Russo Ashraf Kang-Hyun Jo Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification IEEE Access Convolutional neural networks (CNNs) frequency analysis large kernel attention (LKA) multi domain remote sensing (RS) scene classification
title	Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
title_full	Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
title_fullStr	Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
title_full_unstemmed	Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
title_short	Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
title_sort	frequency and texture aware multi domain feature fusion for remote sensing scene classification
topic	Convolutional neural networks (CNNs) frequency analysis large kernel attention (LKA) multi domain remote sensing (RS) scene classification
url	https://ieeexplore.ieee.org/document/10815958/
work_keys_str_mv	AT russoashraf frequencyandtextureawaremultidomainfeaturefusionforremotesensingsceneclassification AT kanghyunjo frequencyandtextureawaremultidomainfeaturefusionforremotesensingsceneclassification

Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification

Similar Items