CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments

Fire detection remains a challenging task due to varying fire scales, occlusions, and complex environmental conditions. This paper proposes the CN2VF-Net model, a novel hybrid architecture that combines vision Transformers (ViTs) and convolutional neural networks (CNNs), effectively addressing these...

Full description

Saved in:
Bibliographic Details
Main Authors: Naveed Ahmad, Mariam Akbar, Eman H. Alkhammash, Mona M. Jamjoom
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Fire
Subjects:
Online Access:https://www.mdpi.com/2571-6255/8/6/211
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849432636784640000
author Naveed Ahmad
Mariam Akbar
Eman H. Alkhammash
Mona M. Jamjoom
author_facet Naveed Ahmad
Mariam Akbar
Eman H. Alkhammash
Mona M. Jamjoom
author_sort Naveed Ahmad
collection DOAJ
description Fire detection remains a challenging task due to varying fire scales, occlusions, and complex environmental conditions. This paper proposes the CN2VF-Net model, a novel hybrid architecture that combines vision Transformers (ViTs) and convolutional neural networks (CNNs), effectively addressing these challenges. By leveraging the global context understanding of ViTs and the local feature extraction capabilities of CNNs, the model learns a multi-scale attention mechanism that dynamically focuses on fire regions at different scales, thereby improving accuracy and robustness. The evaluation on the D-Fire dataset demonstrate that the proposed model achieves a mean average precision at an IoU threshold of 0.5 (mAP50) of 76.1%, an F1-score of 81.5%, a recall of 82.8%, a precision of 83.3%, and a mean IoU (mIoU50–95) of 77.1%. These results outperform existing methods by 1.6% in precision, 0.3% in recall, and 3.4% in F1-score. Furthermore, visualizations such as Grad-CAM heatmaps and prediction overlays provide insight into the model’s decision-making process, validating its capability to effectively detect and segment fire regions. These findings underscore the effectiveness of the proposed hybrid architecture and its applicability in real-world fire detection and monitoring systems. With its superior performance and interpretability, the CN2VF-Net architecture sets a new benchmark in fire detection and segmentation, offering a reliable approach to protecting life, property, and the environment.
format Article
id doaj-art-4ea856e51ef44d2686c66a9f8a670dec
institution Kabale University
issn 2571-6255
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Fire
spelling doaj-art-4ea856e51ef44d2686c66a9f8a670dec2025-08-20T03:27:18ZengMDPI AGFire2571-62552025-05-018621110.3390/fire8060211CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex EnvironmentsNaveed Ahmad0Mariam Akbar1Eman H. Alkhammash2Mona M. Jamjoom3Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, PakistanDepartment of Computer Science, COMSATS University Islamabad, Islamabad 44000, PakistanDepartment of Computer Science, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi ArabiaDepartment of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi ArabiaFire detection remains a challenging task due to varying fire scales, occlusions, and complex environmental conditions. This paper proposes the CN2VF-Net model, a novel hybrid architecture that combines vision Transformers (ViTs) and convolutional neural networks (CNNs), effectively addressing these challenges. By leveraging the global context understanding of ViTs and the local feature extraction capabilities of CNNs, the model learns a multi-scale attention mechanism that dynamically focuses on fire regions at different scales, thereby improving accuracy and robustness. The evaluation on the D-Fire dataset demonstrate that the proposed model achieves a mean average precision at an IoU threshold of 0.5 (mAP50) of 76.1%, an F1-score of 81.5%, a recall of 82.8%, a precision of 83.3%, and a mean IoU (mIoU50–95) of 77.1%. These results outperform existing methods by 1.6% in precision, 0.3% in recall, and 3.4% in F1-score. Furthermore, visualizations such as Grad-CAM heatmaps and prediction overlays provide insight into the model’s decision-making process, validating its capability to effectively detect and segment fire regions. These findings underscore the effectiveness of the proposed hybrid architecture and its applicability in real-world fire detection and monitoring systems. With its superior performance and interpretability, the CN2VF-Net architecture sets a new benchmark in fire detection and segmentation, offering a reliable approach to protecting life, property, and the environment.https://www.mdpi.com/2571-6255/8/6/211convolutional neural networks (CNNs)vision Transformers (ViTs)D-FireGrad-CAMmulti-scaleattention mechanism
spellingShingle Naveed Ahmad
Mariam Akbar
Eman H. Alkhammash
Mona M. Jamjoom
CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments
Fire
convolutional neural networks (CNNs)
vision Transformers (ViTs)
D-Fire
Grad-CAM
multi-scale
attention mechanism
title CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments
title_full CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments
title_fullStr CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments
title_full_unstemmed CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments
title_short CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments
title_sort cn2vf net a hybrid convolutional neural network and vision transformer framework for multi scale fire detection in complex environments
topic convolutional neural networks (CNNs)
vision Transformers (ViTs)
D-Fire
Grad-CAM
multi-scale
attention mechanism
url https://www.mdpi.com/2571-6255/8/6/211
work_keys_str_mv AT naveedahmad cn2vfnetahybridconvolutionalneuralnetworkandvisiontransformerframeworkformultiscalefiredetectionincomplexenvironments
AT mariamakbar cn2vfnetahybridconvolutionalneuralnetworkandvisiontransformerframeworkformultiscalefiredetectionincomplexenvironments
AT emanhalkhammash cn2vfnetahybridconvolutionalneuralnetworkandvisiontransformerframeworkformultiscalefiredetectionincomplexenvironments
AT monamjamjoom cn2vfnetahybridconvolutionalneuralnetworkandvisiontransformerframeworkformultiscalefiredetectionincomplexenvironments