Multihead Average Pseudo-Margin Learning for Disaster Tweet Classification

During natural disasters, social media platforms, such as X (formerly Twitter), become a valuable source of real-time information, with eyewitnesses and affected individuals posting messages about the produced damage and the victims. Although this information can be used to streamline the interventi...

Full description

Saved in:
Bibliographic Details
Main Authors: Iustin Sîrbu, Robert-Adrian Popovici, Traian Rebedea, Ștefan Trăușan-Matu
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/6/434
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850168367266136064
author Iustin Sîrbu
Robert-Adrian Popovici
Traian Rebedea
Ștefan Trăușan-Matu
author_facet Iustin Sîrbu
Robert-Adrian Popovici
Traian Rebedea
Ștefan Trăușan-Matu
author_sort Iustin Sîrbu
collection DOAJ
description During natural disasters, social media platforms, such as X (formerly Twitter), become a valuable source of real-time information, with eyewitnesses and affected individuals posting messages about the produced damage and the victims. Although this information can be used to streamline the intervention process of local authorities and to achieve a better distribution of available resources, manually annotating these messages is often infeasible due to time and cost constraints. To address this challenge, we explore the use of semi-supervised learning, a technique that leverages both labeled and unlabeled data, to enhance neural models for disaster tweet classification. Specifically, we investigate state-of-the-art semi-supervised learning models and focus on co-training, a less-explored approach in recent years. Moreover, we propose a novel hybrid co-training architecture, Multihead Average Pseudo-Margin, which obtains state-of-the-art results on several classification tasks. Our approach extends the advantages of the voting mechanism from Multihead Co-Training by using the Average Pseudo-Margin (APM) score to improve the quality of the pseudo-labels and self-adaptive confidence thresholds for improving imbalanced classification. Our method achieves up to 7.98% accuracy improvement in low-data scenarios and 2.84% improvement when using the entire labeled dataset, reaching 89.55% accuracy on the Humanitarian task and 91.23% on the Informative task. These results demonstrate the potential of our approach in addressing the critical need for automated disaster tweet classification. We made our code publicly available for future research.
format Article
id doaj-art-4a0c6519caa24e37abd300522f2a2ead
institution OA Journals
issn 2078-2489
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Information
spelling doaj-art-4a0c6519caa24e37abd300522f2a2ead2025-08-20T02:20:58ZengMDPI AGInformation2078-24892025-05-0116643410.3390/info16060434Multihead Average Pseudo-Margin Learning for Disaster Tweet ClassificationIustin Sîrbu0Robert-Adrian Popovici1Traian Rebedea2Ștefan Trăușan-Matu3Faculty of Automatic Control and Computer Science, National University of Science and Technology POLITEHNICA Bucharest, 060042 Bucharest, RomaniaFaculty of Automatic Control and Computer Science, National University of Science and Technology POLITEHNICA Bucharest, 060042 Bucharest, RomaniaFaculty of Automatic Control and Computer Science, National University of Science and Technology POLITEHNICA Bucharest, 060042 Bucharest, RomaniaFaculty of Automatic Control and Computer Science, National University of Science and Technology POLITEHNICA Bucharest, 060042 Bucharest, RomaniaDuring natural disasters, social media platforms, such as X (formerly Twitter), become a valuable source of real-time information, with eyewitnesses and affected individuals posting messages about the produced damage and the victims. Although this information can be used to streamline the intervention process of local authorities and to achieve a better distribution of available resources, manually annotating these messages is often infeasible due to time and cost constraints. To address this challenge, we explore the use of semi-supervised learning, a technique that leverages both labeled and unlabeled data, to enhance neural models for disaster tweet classification. Specifically, we investigate state-of-the-art semi-supervised learning models and focus on co-training, a less-explored approach in recent years. Moreover, we propose a novel hybrid co-training architecture, Multihead Average Pseudo-Margin, which obtains state-of-the-art results on several classification tasks. Our approach extends the advantages of the voting mechanism from Multihead Co-Training by using the Average Pseudo-Margin (APM) score to improve the quality of the pseudo-labels and self-adaptive confidence thresholds for improving imbalanced classification. Our method achieves up to 7.98% accuracy improvement in low-data scenarios and 2.84% improvement when using the entire labeled dataset, reaching 89.55% accuracy on the Humanitarian task and 91.23% on the Informative task. These results demonstrate the potential of our approach in addressing the critical need for automated disaster tweet classification. We made our code publicly available for future research.https://www.mdpi.com/2078-2489/16/6/434semi-supervised learningdisaster tweet classificationco-trainingmachine learningmultimodal learning
spellingShingle Iustin Sîrbu
Robert-Adrian Popovici
Traian Rebedea
Ștefan Trăușan-Matu
Multihead Average Pseudo-Margin Learning for Disaster Tweet Classification
Information
semi-supervised learning
disaster tweet classification
co-training
machine learning
multimodal learning
title Multihead Average Pseudo-Margin Learning for Disaster Tweet Classification
title_full Multihead Average Pseudo-Margin Learning for Disaster Tweet Classification
title_fullStr Multihead Average Pseudo-Margin Learning for Disaster Tweet Classification
title_full_unstemmed Multihead Average Pseudo-Margin Learning for Disaster Tweet Classification
title_short Multihead Average Pseudo-Margin Learning for Disaster Tweet Classification
title_sort multihead average pseudo margin learning for disaster tweet classification
topic semi-supervised learning
disaster tweet classification
co-training
machine learning
multimodal learning
url https://www.mdpi.com/2078-2489/16/6/434
work_keys_str_mv AT iustinsirbu multiheadaveragepseudomarginlearningfordisastertweetclassification
AT robertadrianpopovici multiheadaveragepseudomarginlearningfordisastertweetclassification
AT traianrebedea multiheadaveragepseudomarginlearningfordisastertweetclassification
AT stefantrausanmatu multiheadaveragepseudomarginlearningfordisastertweetclassification