CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating
During humanitarian crises, social media generates over 30 million multimodal tweets daily, but 20% textual noise, 40% cross-modal misalignment, and severe class imbalance (4.1% rare classes) hinder effective classification. This study presents CLIP-BCA-Gated, a dynamic multimodal framework that int...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-08-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/15/8758 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849405883062157312 |
|---|---|
| author | Shanshan Li Qingjie Liu Zhian Pan Xucheng Wu |
| author_facet | Shanshan Li Qingjie Liu Zhian Pan Xucheng Wu |
| author_sort | Shanshan Li |
| collection | DOAJ |
| description | During humanitarian crises, social media generates over 30 million multimodal tweets daily, but 20% textual noise, 40% cross-modal misalignment, and severe class imbalance (4.1% rare classes) hinder effective classification. This study presents CLIP-BCA-Gated, a dynamic multimodal framework that integrates bidirectional cross-attention (Bi-Cross-Attention) and adaptive gating within the CLIP architecture to address these challenges. The Bi-Cross-Attention module enables fine-grained cross-modal semantic alignment, while the adaptive gating mechanism dynamically weights modalities to suppress noise. Hierarchical learning rate scheduling and multidimensional data augmentation further optimize feature fusion for real-time multiclass classification. On the CrisisMMD benchmark, CLIP-BCA-Gated achieves 91.77% classification accuracy (1.55% higher than baseline CLIP and 2.33% over state-of-the-art ALIGN), with exceptional recall for critical categories: infrastructure damage (93.42%) and rescue efforts (92.15%). The model processes tweets at 0.083 s per instance, meeting real-time deployment requirements for emergency response systems. Ablation studies show Bi-Cross-Attention contributes 2.54% accuracy improvement, and adaptive gating contributes 1.12%. This work demonstrates that dynamic multimodal fusion enhances resilience to noisy social media data, directly supporting SDG 11 through scalable real-time disaster information triage. The framework’s noise-robust design and sub-second inference make it a practical solution for humanitarian organizations requiring rapid crisis categorization. |
| format | Article |
| id | doaj-art-2abcfac230e54a2ea444e8e13a9b1ec0 |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-2abcfac230e54a2ea444e8e13a9b1ec02025-08-20T03:36:34ZengMDPI AGApplied Sciences2076-34172025-08-011515875810.3390/app15158758CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive GatingShanshan Li0Qingjie Liu1Zhian Pan2Xucheng Wu3School of Computer Science and Engineering, Institute of Disaster Prevention, Beijing 101601, ChinaSchool of Computer Science and Engineering, Institute of Disaster Prevention, Beijing 101601, ChinaSchool of Computer Science and Engineering, Institute of Disaster Prevention, Beijing 101601, ChinaSchool of Computer Science and Engineering, Institute of Disaster Prevention, Beijing 101601, ChinaDuring humanitarian crises, social media generates over 30 million multimodal tweets daily, but 20% textual noise, 40% cross-modal misalignment, and severe class imbalance (4.1% rare classes) hinder effective classification. This study presents CLIP-BCA-Gated, a dynamic multimodal framework that integrates bidirectional cross-attention (Bi-Cross-Attention) and adaptive gating within the CLIP architecture to address these challenges. The Bi-Cross-Attention module enables fine-grained cross-modal semantic alignment, while the adaptive gating mechanism dynamically weights modalities to suppress noise. Hierarchical learning rate scheduling and multidimensional data augmentation further optimize feature fusion for real-time multiclass classification. On the CrisisMMD benchmark, CLIP-BCA-Gated achieves 91.77% classification accuracy (1.55% higher than baseline CLIP and 2.33% over state-of-the-art ALIGN), with exceptional recall for critical categories: infrastructure damage (93.42%) and rescue efforts (92.15%). The model processes tweets at 0.083 s per instance, meeting real-time deployment requirements for emergency response systems. Ablation studies show Bi-Cross-Attention contributes 2.54% accuracy improvement, and adaptive gating contributes 1.12%. This work demonstrates that dynamic multimodal fusion enhances resilience to noisy social media data, directly supporting SDG 11 through scalable real-time disaster information triage. The framework’s noise-robust design and sub-second inference make it a practical solution for humanitarian organizations requiring rapid crisis categorization.https://www.mdpi.com/2076-3417/15/15/8758humanitarian crisis classificationcontrastive learningbidirectional cross-attentionadaptive gatingmultimodal fusioncross-modal alignment |
| spellingShingle | Shanshan Li Qingjie Liu Zhian Pan Xucheng Wu CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating Applied Sciences humanitarian crisis classification contrastive learning bidirectional cross-attention adaptive gating multimodal fusion cross-modal alignment |
| title | CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating |
| title_full | CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating |
| title_fullStr | CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating |
| title_full_unstemmed | CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating |
| title_short | CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating |
| title_sort | clip bca gated a dynamic multimodal framework for real time humanitarian crisis classification with bi cross attention and adaptive gating |
| topic | humanitarian crisis classification contrastive learning bidirectional cross-attention adaptive gating multimodal fusion cross-modal alignment |
| url | https://www.mdpi.com/2076-3417/15/15/8758 |
| work_keys_str_mv | AT shanshanli clipbcagatedadynamicmultimodalframeworkforrealtimehumanitariancrisisclassificationwithbicrossattentionandadaptivegating AT qingjieliu clipbcagatedadynamicmultimodalframeworkforrealtimehumanitariancrisisclassificationwithbicrossattentionandadaptivegating AT zhianpan clipbcagatedadynamicmultimodalframeworkforrealtimehumanitariancrisisclassificationwithbicrossattentionandadaptivegating AT xuchengwu clipbcagatedadynamicmultimodalframeworkforrealtimehumanitariancrisisclassificationwithbicrossattentionandadaptivegating |