FireCLIP: Enhancing Forest Fire Detection with Multimodal Prompt Tuning and Vision-Language Understanding

Forest fires are a global environmental threat to human life and ecosystems. This study compiles smoke alarm images from five high-definition surveillance cameras in Foshan City, Guangdong, China, collected over one year, to create a smoke-based early warning dataset. The dataset presents two key ch...

Full description

Saved in:
Bibliographic Details
Main Authors: Shanjunxia Wu, Yuming Qiao, Sen He, Jiahao Zhou, Zhi Wang, Xin Li, Fei Wang
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Fire
Subjects:
Online Access:https://www.mdpi.com/2571-6255/8/6/237
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Forest fires are a global environmental threat to human life and ecosystems. This study compiles smoke alarm images from five high-definition surveillance cameras in Foshan City, Guangdong, China, collected over one year, to create a smoke-based early warning dataset. The dataset presents two key challenges: (1) high false positive rates caused by pseudo-smoke interference, including non-fire conditions like cooking smoke and industrial emissions, and (2) significant regional data imbalances, influenced by varying human activity intensities and terrain features, which impair the generalizability of traditional pre-train–fine-tune strategies. To address these challenges, we explore the use of visual language models to differentiate between true alarms and false alarms. Additionally, our method incorporates a prompt tuning strategy which helps to improve performance by at least 12.45% in zero-shot learning tasks and also enhances performance in few-shot learning tasks, demonstrating enhanced regional generalization compared to baselines.
ISSN:2571-6255