Multilingual hope speech detection from tweets using transfer learning models

Abstract Social media has become a powerful tool for public discourse, shaping opinions and the emotional landscape of communities. The extensive use of social media has led to a massive influx of online content. This content includes instances where negativity is amplified through hateful speech bu...

Full description

Saved in:
Bibliographic Details
Main Authors: Muhammad Ahmad, Iqra Ameer, Wareesa Sharif, Sardar Usman, Muhammad Muzamil, Ameer Hamza, Muhammad Jalal, Ildar Batyrshin, Grigori Sidorov
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-88687-w
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Social media has become a powerful tool for public discourse, shaping opinions and the emotional landscape of communities. The extensive use of social media has led to a massive influx of online content. This content includes instances where negativity is amplified through hateful speech but also a significant number of posts that provide support and encouragement, commonly known as hope speech. In recent years, researchers have focused on the automatic detection of hope speech in languages such as Russian, English, Hindi, Spanish, and Bengali. However, to the best of our knowledge, detection of hope speech in Urdu and English, particularly using translation-based techniques, remains unexplored. To contribute to this area we have created a multilingual dataset in English and Urdu and applied a translation-based approach to handle multilingual challenges and utilized several state-of-the-art machine learning, deep learning, and transfer learning based methods to benchmark our dataset. Our observations indicate that a rigorous process for annotator selection, along with detailed annotation guidelines, significantly improved the quality of the dataset. Through extensive experimentation, our proposed methodology, based on the Bert transformer model, achieved benchmark performance, surpassing traditional machine learning models with accuracies of 87% for English and 79% for Urdu. These results show improvements of 8.75% in English and 1.87% in Urdu over baseline models (SVM 80% English and 78% in Urdu).
ISSN:2045-2322