UAV-FAENet: Frequency-Aware and Attention-Enhanced Network for Remote Sensing Semantic Segmentation of UAV Imagery
In recent years, with the rapid development of unmanned aerial vehicle (UAV) technology, low-altitude remote sensing imagery has been increasingly applied in urban planning, environmental monitoring, and disaster management. However, due to large variations in object scales and complex boundaries of...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11098848/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849340553040232448 |
|---|---|
| author | Dongbo Zhou Huan Wang Shiyan Pang Di Chen Huang Yao Jie Yu |
| author_facet | Dongbo Zhou Huan Wang Shiyan Pang Di Chen Huang Yao Jie Yu |
| author_sort | Dongbo Zhou |
| collection | DOAJ |
| description | In recent years, with the rapid development of unmanned aerial vehicle (UAV) technology, low-altitude remote sensing imagery has been increasingly applied in urban planning, environmental monitoring, and disaster management. However, due to large variations in object scales and complex boundaries of ground objects, existing methods still face significant challenges in handling small targets and objects with intricate edges. To address these issues, this article proposes a lightweight and efficient semantic segmentation network for UAV remote sensing images, named UAV-FAENet, aiming to improve the detection accuracy of small targets and the quality of edge segmentation. UAV-FAENet adopts an encoder-decoder architecture, integrating a high-low frequency adaptive enhancement module (HLAEM) and a decoder based on a state space modeling mechanism (CSRA-VSS decoder). Specifically, HLAEM decomposes features into low-frequency and high-frequency components in the frequency domain, enhances them separately, and dynamically fuses them through a channel attention mechanism to effectively preserve edge details. The CSRA-VSS decoder introduces a state space model to capture long-range dependencies in feature sequences and employs a cross-scale attention mechanism to enhance feature responses in small target regions. The proposed method achieves mIoU scores of 68.84%, 80.11%, and 70.26% on the UAVid, UDD6, and DroneDeploy UAV datasets, respectively. Our method outperforms existing mainstream approaches and demonstrates strong potential in improving small object recognition accuracy and edge detail recovery, providing a feasible solution for high-precision semantic segmentation in complex urban scenes. |
| format | Article |
| id | doaj-art-aef66cbf84984d24a25977188dce25cd |
| institution | Kabale University |
| issn | 1939-1404 2151-1535 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| spelling | doaj-art-aef66cbf84984d24a25977188dce25cd2025-08-20T03:43:52ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-0118198531986810.1109/JSTARS.2025.359355711098848UAV-FAENet: Frequency-Aware and Attention-Enhanced Network for Remote Sensing Semantic Segmentation of UAV ImageryDongbo Zhou0https://orcid.org/0000-0001-8682-6281Huan Wang1https://orcid.org/0009-0001-9779-6433Shiyan Pang2https://orcid.org/0000-0002-1713-3972Di Chen3https://orcid.org/0009-0000-3863-7566Huang Yao4Jie Yu5https://orcid.org/0000-0002-8519-1760Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan, ChinaFaculty of Artificial Intelligence in Education, Central China Normal University, Wuhan, ChinaFaculty of Artificial Intelligence in Education, Central China Normal University, Wuhan, ChinaFaculty of Artificial Intelligence in Education, Central China Normal University, Wuhan, ChinaFaculty of Artificial Intelligence in Education, Central China Normal University, Wuhan, ChinaState Key Laboratory of Information Engineering in Surveying, Wuhan University, Wuhan, ChinaIn recent years, with the rapid development of unmanned aerial vehicle (UAV) technology, low-altitude remote sensing imagery has been increasingly applied in urban planning, environmental monitoring, and disaster management. However, due to large variations in object scales and complex boundaries of ground objects, existing methods still face significant challenges in handling small targets and objects with intricate edges. To address these issues, this article proposes a lightweight and efficient semantic segmentation network for UAV remote sensing images, named UAV-FAENet, aiming to improve the detection accuracy of small targets and the quality of edge segmentation. UAV-FAENet adopts an encoder-decoder architecture, integrating a high-low frequency adaptive enhancement module (HLAEM) and a decoder based on a state space modeling mechanism (CSRA-VSS decoder). Specifically, HLAEM decomposes features into low-frequency and high-frequency components in the frequency domain, enhances them separately, and dynamically fuses them through a channel attention mechanism to effectively preserve edge details. The CSRA-VSS decoder introduces a state space model to capture long-range dependencies in feature sequences and employs a cross-scale attention mechanism to enhance feature responses in small target regions. The proposed method achieves mIoU scores of 68.84%, 80.11%, and 70.26% on the UAVid, UDD6, and DroneDeploy UAV datasets, respectively. Our method outperforms existing mainstream approaches and demonstrates strong potential in improving small object recognition accuracy and edge detail recovery, providing a feasible solution for high-precision semantic segmentation in complex urban scenes.https://ieeexplore.ieee.org/document/11098848/Attention mechanismhigh-low frequency enhancementremote sensingsemantic segmentationunmanned aerial vehicle (UAV) images |
| spellingShingle | Dongbo Zhou Huan Wang Shiyan Pang Di Chen Huang Yao Jie Yu UAV-FAENet: Frequency-Aware and Attention-Enhanced Network for Remote Sensing Semantic Segmentation of UAV Imagery IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Attention mechanism high-low frequency enhancement remote sensing semantic segmentation unmanned aerial vehicle (UAV) images |
| title | UAV-FAENet: Frequency-Aware and Attention-Enhanced Network for Remote Sensing Semantic Segmentation of UAV Imagery |
| title_full | UAV-FAENet: Frequency-Aware and Attention-Enhanced Network for Remote Sensing Semantic Segmentation of UAV Imagery |
| title_fullStr | UAV-FAENet: Frequency-Aware and Attention-Enhanced Network for Remote Sensing Semantic Segmentation of UAV Imagery |
| title_full_unstemmed | UAV-FAENet: Frequency-Aware and Attention-Enhanced Network for Remote Sensing Semantic Segmentation of UAV Imagery |
| title_short | UAV-FAENet: Frequency-Aware and Attention-Enhanced Network for Remote Sensing Semantic Segmentation of UAV Imagery |
| title_sort | uav faenet frequency aware and attention enhanced network for remote sensing semantic segmentation of uav imagery |
| topic | Attention mechanism high-low frequency enhancement remote sensing semantic segmentation unmanned aerial vehicle (UAV) images |
| url | https://ieeexplore.ieee.org/document/11098848/ |
| work_keys_str_mv | AT dongbozhou uavfaenetfrequencyawareandattentionenhancednetworkforremotesensingsemanticsegmentationofuavimagery AT huanwang uavfaenetfrequencyawareandattentionenhancednetworkforremotesensingsemanticsegmentationofuavimagery AT shiyanpang uavfaenetfrequencyawareandattentionenhancednetworkforremotesensingsemanticsegmentationofuavimagery AT dichen uavfaenetfrequencyawareandattentionenhancednetworkforremotesensingsemanticsegmentationofuavimagery AT huangyao uavfaenetfrequencyawareandattentionenhancednetworkforremotesensingsemanticsegmentationofuavimagery AT jieyu uavfaenetfrequencyawareandattentionenhancednetworkforremotesensingsemanticsegmentationofuavimagery |