FireCLIP: Enhancing Forest Fire Detection with Multimodal Prompt Tuning and Vision-Language Understanding
Forest fires are a global environmental threat to human life and ecosystems. This study compiles smoke alarm images from five high-definition surveillance cameras in Foshan City, Guangdong, China, collected over one year, to create a smoke-based early warning dataset. The dataset presents two key ch...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Fire |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2571-6255/8/6/237 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Forest fires are a global environmental threat to human life and ecosystems. This study compiles smoke alarm images from five high-definition surveillance cameras in Foshan City, Guangdong, China, collected over one year, to create a smoke-based early warning dataset. The dataset presents two key challenges: (1) high false positive rates caused by pseudo-smoke interference, including non-fire conditions like cooking smoke and industrial emissions, and (2) significant regional data imbalances, influenced by varying human activity intensities and terrain features, which impair the generalizability of traditional pre-train–fine-tune strategies. To address these challenges, we explore the use of visual language models to differentiate between true alarms and false alarms. Additionally, our method incorporates a prompt tuning strategy which helps to improve performance by at least 12.45% in zero-shot learning tasks and also enhances performance in few-shot learning tasks, demonstrating enhanced regional generalization compared to baselines. |
|---|---|
| ISSN: | 2571-6255 |