CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating

During humanitarian crises, social media generates over 30 million multimodal tweets daily, but 20% textual noise, 40% cross-modal misalignment, and severe class imbalance (4.1% rare classes) hinder effective classification. This study presents CLIP-BCA-Gated, a dynamic multimodal framework that int...

Full description

Saved in:
Bibliographic Details
Main Authors: Shanshan Li, Qingjie Liu, Zhian Pan, Xucheng Wu
Format: Article
Language:English
Published: MDPI AG 2025-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/15/8758
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849405883062157312
author Shanshan Li
Qingjie Liu
Zhian Pan
Xucheng Wu
author_facet Shanshan Li
Qingjie Liu
Zhian Pan
Xucheng Wu
author_sort Shanshan Li
collection DOAJ
description During humanitarian crises, social media generates over 30 million multimodal tweets daily, but 20% textual noise, 40% cross-modal misalignment, and severe class imbalance (4.1% rare classes) hinder effective classification. This study presents CLIP-BCA-Gated, a dynamic multimodal framework that integrates bidirectional cross-attention (Bi-Cross-Attention) and adaptive gating within the CLIP architecture to address these challenges. The Bi-Cross-Attention module enables fine-grained cross-modal semantic alignment, while the adaptive gating mechanism dynamically weights modalities to suppress noise. Hierarchical learning rate scheduling and multidimensional data augmentation further optimize feature fusion for real-time multiclass classification. On the CrisisMMD benchmark, CLIP-BCA-Gated achieves 91.77% classification accuracy (1.55% higher than baseline CLIP and 2.33% over state-of-the-art ALIGN), with exceptional recall for critical categories: infrastructure damage (93.42%) and rescue efforts (92.15%). The model processes tweets at 0.083 s per instance, meeting real-time deployment requirements for emergency response systems. Ablation studies show Bi-Cross-Attention contributes 2.54% accuracy improvement, and adaptive gating contributes 1.12%. This work demonstrates that dynamic multimodal fusion enhances resilience to noisy social media data, directly supporting SDG 11 through scalable real-time disaster information triage. The framework’s noise-robust design and sub-second inference make it a practical solution for humanitarian organizations requiring rapid crisis categorization.
format Article
id doaj-art-2abcfac230e54a2ea444e8e13a9b1ec0
institution Kabale University
issn 2076-3417
language English
publishDate 2025-08-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-2abcfac230e54a2ea444e8e13a9b1ec02025-08-20T03:36:34ZengMDPI AGApplied Sciences2076-34172025-08-011515875810.3390/app15158758CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive GatingShanshan Li0Qingjie Liu1Zhian Pan2Xucheng Wu3School of Computer Science and Engineering, Institute of Disaster Prevention, Beijing 101601, ChinaSchool of Computer Science and Engineering, Institute of Disaster Prevention, Beijing 101601, ChinaSchool of Computer Science and Engineering, Institute of Disaster Prevention, Beijing 101601, ChinaSchool of Computer Science and Engineering, Institute of Disaster Prevention, Beijing 101601, ChinaDuring humanitarian crises, social media generates over 30 million multimodal tweets daily, but 20% textual noise, 40% cross-modal misalignment, and severe class imbalance (4.1% rare classes) hinder effective classification. This study presents CLIP-BCA-Gated, a dynamic multimodal framework that integrates bidirectional cross-attention (Bi-Cross-Attention) and adaptive gating within the CLIP architecture to address these challenges. The Bi-Cross-Attention module enables fine-grained cross-modal semantic alignment, while the adaptive gating mechanism dynamically weights modalities to suppress noise. Hierarchical learning rate scheduling and multidimensional data augmentation further optimize feature fusion for real-time multiclass classification. On the CrisisMMD benchmark, CLIP-BCA-Gated achieves 91.77% classification accuracy (1.55% higher than baseline CLIP and 2.33% over state-of-the-art ALIGN), with exceptional recall for critical categories: infrastructure damage (93.42%) and rescue efforts (92.15%). The model processes tweets at 0.083 s per instance, meeting real-time deployment requirements for emergency response systems. Ablation studies show Bi-Cross-Attention contributes 2.54% accuracy improvement, and adaptive gating contributes 1.12%. This work demonstrates that dynamic multimodal fusion enhances resilience to noisy social media data, directly supporting SDG 11 through scalable real-time disaster information triage. The framework’s noise-robust design and sub-second inference make it a practical solution for humanitarian organizations requiring rapid crisis categorization.https://www.mdpi.com/2076-3417/15/15/8758humanitarian crisis classificationcontrastive learningbidirectional cross-attentionadaptive gatingmultimodal fusioncross-modal alignment
spellingShingle Shanshan Li
Qingjie Liu
Zhian Pan
Xucheng Wu
CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating
Applied Sciences
humanitarian crisis classification
contrastive learning
bidirectional cross-attention
adaptive gating
multimodal fusion
cross-modal alignment
title CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating
title_full CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating
title_fullStr CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating
title_full_unstemmed CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating
title_short CLIP-BCA-Gated: A Dynamic Multimodal Framework for Real-Time Humanitarian Crisis Classification with Bi-Cross-Attention and Adaptive Gating
title_sort clip bca gated a dynamic multimodal framework for real time humanitarian crisis classification with bi cross attention and adaptive gating
topic humanitarian crisis classification
contrastive learning
bidirectional cross-attention
adaptive gating
multimodal fusion
cross-modal alignment
url https://www.mdpi.com/2076-3417/15/15/8758
work_keys_str_mv AT shanshanli clipbcagatedadynamicmultimodalframeworkforrealtimehumanitariancrisisclassificationwithbicrossattentionandadaptivegating
AT qingjieliu clipbcagatedadynamicmultimodalframeworkforrealtimehumanitariancrisisclassificationwithbicrossattentionandadaptivegating
AT zhianpan clipbcagatedadynamicmultimodalframeworkforrealtimehumanitariancrisisclassificationwithbicrossattentionandadaptivegating
AT xuchengwu clipbcagatedadynamicmultimodalframeworkforrealtimehumanitariancrisisclassificationwithbicrossattentionandadaptivegating