NavBLIP: a visual-language model for enhancing unmanned aerial vehicles navigation and object detection
IntroductionIn recent years, Unmanned Aerial Vehicles (UAVs) have increasingly been deployed in various applications such as autonomous navigation, surveillance, and object detection. Traditional methods for UAV navigation and object detection have often relied on either handcrafted features or unim...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Neurorobotics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fnbot.2024.1513354/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832589823916900352 |
---|---|
author | Ye Li Li Yang Meifang Yang Fei Yan Tonghua Liu Chensi Guo Rufeng Chen |
author_facet | Ye Li Li Yang Meifang Yang Fei Yan Tonghua Liu Chensi Guo Rufeng Chen |
author_sort | Ye Li |
collection | DOAJ |
description | IntroductionIn recent years, Unmanned Aerial Vehicles (UAVs) have increasingly been deployed in various applications such as autonomous navigation, surveillance, and object detection. Traditional methods for UAV navigation and object detection have often relied on either handcrafted features or unimodal deep learning approaches. While these methods have seen some success, they frequently encounter limitations in dynamic environments, where robustness and computational efficiency become critical for real-time performance. Additionally, these methods often fail to effectively integrate multimodal inputs, which restricts their adaptability and generalization capabilities when facing complex and diverse scenarios.MethodsTo address these challenges, we introduce NavBLIP, a novel visual-language model specifically designed to enhance UAV navigation and object detection by utilizing multimodal data. NavBLIP incorporates transfer learning techniques along with a Nuisance-Invariant Multimodal Feature Extraction (NIMFE) module. The NIMFE module plays a key role in disentangling relevant features from intricate visual and environmental inputs, allowing UAVs to swiftly adapt to new environments and improve object detection accuracy. Furthermore, NavBLIP employs a multimodal control strategy that dynamically selects context-specific features to optimize real-time performance, ensuring efficiency in high-stakes operations.Results and discussionExtensive experiments on benchmark datasets such as RefCOCO, CC12M, and Openlmages reveal that NavBLIP outperforms existing state-of-the-art models in terms of accuracy, recall, and computational efficiency. Additionally, our ablation study emphasizes the significance of the NIMFE and transfer learning components in boosting the model's performance, underscoring NavBLIP's potential for real-time UAV applications where adaptability and computational efficiency are paramount. |
format | Article |
id | doaj-art-2504df3baf3343dab6ff0795b162f30b |
institution | Kabale University |
issn | 1662-5218 |
language | English |
publishDate | 2025-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Neurorobotics |
spelling | doaj-art-2504df3baf3343dab6ff0795b162f30b2025-01-24T07:13:46ZengFrontiers Media S.A.Frontiers in Neurorobotics1662-52182025-01-011810.3389/fnbot.2024.15133541513354NavBLIP: a visual-language model for enhancing unmanned aerial vehicles navigation and object detectionYe Li0Li Yang1Meifang Yang2Fei Yan3Tonghua Liu4Chensi Guo5Rufeng Chen6Department of Electrical Engineering, Baotou Iron and Steel Vocational Technical College, Baotou, ChinaDepartment of Electrical Engineering, Baotou Iron and Steel Vocational Technical College, Baotou, ChinaDepartment of Electrical Engineering, Baotou Iron and Steel Vocational Technical College, Baotou, ChinaDepartment of Electrical Engineering, Baotou Iron and Steel Vocational Technical College, Baotou, ChinaDepartment of Electrical Engineering, Baotou Iron and Steel Vocational Technical College, Baotou, ChinaDepartment of Electrical Engineering, Baotou Iron and Steel Vocational Technical College, Baotou, ChinaBaotou Iron and Steel (Group) Co., Ltd., Baotou, ChinaIntroductionIn recent years, Unmanned Aerial Vehicles (UAVs) have increasingly been deployed in various applications such as autonomous navigation, surveillance, and object detection. Traditional methods for UAV navigation and object detection have often relied on either handcrafted features or unimodal deep learning approaches. While these methods have seen some success, they frequently encounter limitations in dynamic environments, where robustness and computational efficiency become critical for real-time performance. Additionally, these methods often fail to effectively integrate multimodal inputs, which restricts their adaptability and generalization capabilities when facing complex and diverse scenarios.MethodsTo address these challenges, we introduce NavBLIP, a novel visual-language model specifically designed to enhance UAV navigation and object detection by utilizing multimodal data. NavBLIP incorporates transfer learning techniques along with a Nuisance-Invariant Multimodal Feature Extraction (NIMFE) module. The NIMFE module plays a key role in disentangling relevant features from intricate visual and environmental inputs, allowing UAVs to swiftly adapt to new environments and improve object detection accuracy. Furthermore, NavBLIP employs a multimodal control strategy that dynamically selects context-specific features to optimize real-time performance, ensuring efficiency in high-stakes operations.Results and discussionExtensive experiments on benchmark datasets such as RefCOCO, CC12M, and Openlmages reveal that NavBLIP outperforms existing state-of-the-art models in terms of accuracy, recall, and computational efficiency. Additionally, our ablation study emphasizes the significance of the NIMFE and transfer learning components in boosting the model's performance, underscoring NavBLIP's potential for real-time UAV applications where adaptability and computational efficiency are paramount.https://www.frontiersin.org/articles/10.3389/fnbot.2024.1513354/fullUAV navigationobject detectionmultimodal learningtransfer learningcomputational efficiency |
spellingShingle | Ye Li Li Yang Meifang Yang Fei Yan Tonghua Liu Chensi Guo Rufeng Chen NavBLIP: a visual-language model for enhancing unmanned aerial vehicles navigation and object detection Frontiers in Neurorobotics UAV navigation object detection multimodal learning transfer learning computational efficiency |
title | NavBLIP: a visual-language model for enhancing unmanned aerial vehicles navigation and object detection |
title_full | NavBLIP: a visual-language model for enhancing unmanned aerial vehicles navigation and object detection |
title_fullStr | NavBLIP: a visual-language model for enhancing unmanned aerial vehicles navigation and object detection |
title_full_unstemmed | NavBLIP: a visual-language model for enhancing unmanned aerial vehicles navigation and object detection |
title_short | NavBLIP: a visual-language model for enhancing unmanned aerial vehicles navigation and object detection |
title_sort | navblip a visual language model for enhancing unmanned aerial vehicles navigation and object detection |
topic | UAV navigation object detection multimodal learning transfer learning computational efficiency |
url | https://www.frontiersin.org/articles/10.3389/fnbot.2024.1513354/full |
work_keys_str_mv | AT yeli navblipavisuallanguagemodelforenhancingunmannedaerialvehiclesnavigationandobjectdetection AT liyang navblipavisuallanguagemodelforenhancingunmannedaerialvehiclesnavigationandobjectdetection AT meifangyang navblipavisuallanguagemodelforenhancingunmannedaerialvehiclesnavigationandobjectdetection AT feiyan navblipavisuallanguagemodelforenhancingunmannedaerialvehiclesnavigationandobjectdetection AT tonghualiu navblipavisuallanguagemodelforenhancingunmannedaerialvehiclesnavigationandobjectdetection AT chensiguo navblipavisuallanguagemodelforenhancingunmannedaerialvehiclesnavigationandobjectdetection AT rufengchen navblipavisuallanguagemodelforenhancingunmannedaerialvehiclesnavigationandobjectdetection |