CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments
Fire detection remains a challenging task due to varying fire scales, occlusions, and complex environmental conditions. This paper proposes the CN2VF-Net model, a novel hybrid architecture that combines vision Transformers (ViTs) and convolutional neural networks (CNNs), effectively addressing these...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Fire |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2571-6255/8/6/211 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849432636784640000 |
|---|---|
| author | Naveed Ahmad Mariam Akbar Eman H. Alkhammash Mona M. Jamjoom |
| author_facet | Naveed Ahmad Mariam Akbar Eman H. Alkhammash Mona M. Jamjoom |
| author_sort | Naveed Ahmad |
| collection | DOAJ |
| description | Fire detection remains a challenging task due to varying fire scales, occlusions, and complex environmental conditions. This paper proposes the CN2VF-Net model, a novel hybrid architecture that combines vision Transformers (ViTs) and convolutional neural networks (CNNs), effectively addressing these challenges. By leveraging the global context understanding of ViTs and the local feature extraction capabilities of CNNs, the model learns a multi-scale attention mechanism that dynamically focuses on fire regions at different scales, thereby improving accuracy and robustness. The evaluation on the D-Fire dataset demonstrate that the proposed model achieves a mean average precision at an IoU threshold of 0.5 (mAP50) of 76.1%, an F1-score of 81.5%, a recall of 82.8%, a precision of 83.3%, and a mean IoU (mIoU50–95) of 77.1%. These results outperform existing methods by 1.6% in precision, 0.3% in recall, and 3.4% in F1-score. Furthermore, visualizations such as Grad-CAM heatmaps and prediction overlays provide insight into the model’s decision-making process, validating its capability to effectively detect and segment fire regions. These findings underscore the effectiveness of the proposed hybrid architecture and its applicability in real-world fire detection and monitoring systems. With its superior performance and interpretability, the CN2VF-Net architecture sets a new benchmark in fire detection and segmentation, offering a reliable approach to protecting life, property, and the environment. |
| format | Article |
| id | doaj-art-4ea856e51ef44d2686c66a9f8a670dec |
| institution | Kabale University |
| issn | 2571-6255 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Fire |
| spelling | doaj-art-4ea856e51ef44d2686c66a9f8a670dec2025-08-20T03:27:18ZengMDPI AGFire2571-62552025-05-018621110.3390/fire8060211CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex EnvironmentsNaveed Ahmad0Mariam Akbar1Eman H. Alkhammash2Mona M. Jamjoom3Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, PakistanDepartment of Computer Science, COMSATS University Islamabad, Islamabad 44000, PakistanDepartment of Computer Science, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi ArabiaDepartment of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi ArabiaFire detection remains a challenging task due to varying fire scales, occlusions, and complex environmental conditions. This paper proposes the CN2VF-Net model, a novel hybrid architecture that combines vision Transformers (ViTs) and convolutional neural networks (CNNs), effectively addressing these challenges. By leveraging the global context understanding of ViTs and the local feature extraction capabilities of CNNs, the model learns a multi-scale attention mechanism that dynamically focuses on fire regions at different scales, thereby improving accuracy and robustness. The evaluation on the D-Fire dataset demonstrate that the proposed model achieves a mean average precision at an IoU threshold of 0.5 (mAP50) of 76.1%, an F1-score of 81.5%, a recall of 82.8%, a precision of 83.3%, and a mean IoU (mIoU50–95) of 77.1%. These results outperform existing methods by 1.6% in precision, 0.3% in recall, and 3.4% in F1-score. Furthermore, visualizations such as Grad-CAM heatmaps and prediction overlays provide insight into the model’s decision-making process, validating its capability to effectively detect and segment fire regions. These findings underscore the effectiveness of the proposed hybrid architecture and its applicability in real-world fire detection and monitoring systems. With its superior performance and interpretability, the CN2VF-Net architecture sets a new benchmark in fire detection and segmentation, offering a reliable approach to protecting life, property, and the environment.https://www.mdpi.com/2571-6255/8/6/211convolutional neural networks (CNNs)vision Transformers (ViTs)D-FireGrad-CAMmulti-scaleattention mechanism |
| spellingShingle | Naveed Ahmad Mariam Akbar Eman H. Alkhammash Mona M. Jamjoom CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments Fire convolutional neural networks (CNNs) vision Transformers (ViTs) D-Fire Grad-CAM multi-scale attention mechanism |
| title | CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments |
| title_full | CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments |
| title_fullStr | CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments |
| title_full_unstemmed | CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments |
| title_short | CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments |
| title_sort | cn2vf net a hybrid convolutional neural network and vision transformer framework for multi scale fire detection in complex environments |
| topic | convolutional neural networks (CNNs) vision Transformers (ViTs) D-Fire Grad-CAM multi-scale attention mechanism |
| url | https://www.mdpi.com/2571-6255/8/6/211 |
| work_keys_str_mv | AT naveedahmad cn2vfnetahybridconvolutionalneuralnetworkandvisiontransformerframeworkformultiscalefiredetectionincomplexenvironments AT mariamakbar cn2vfnetahybridconvolutionalneuralnetworkandvisiontransformerframeworkformultiscalefiredetectionincomplexenvironments AT emanhalkhammash cn2vfnetahybridconvolutionalneuralnetworkandvisiontransformerframeworkformultiscalefiredetectionincomplexenvironments AT monamjamjoom cn2vfnetahybridconvolutionalneuralnetworkandvisiontransformerframeworkformultiscalefiredetectionincomplexenvironments |