Multi-domain Urdu fake news detection using pre-trained ensemble model

Abstract Fake News (FN) dissemination on websites and online platforms influences human behaviours, socio-political domains, and the sovereignty of a country. The outpour of biased news and propaganda on online portals can be addressed by restricting online propaganda using an automated mechanism. P...

Full description

Saved in:
Bibliographic Details
Main Authors: Sheetal Harris, Hassan Jalil Hadi, Naveed Ahmad, Mohammed Ali Alshara
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-91054-4
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Fake News (FN) dissemination on websites and online platforms influences human behaviours, socio-political domains, and the sovereignty of a country. The outpour of biased news and propaganda on online portals can be addressed by restricting online propaganda using an automated mechanism. Proving the authenticity of news and information on online platforms in regional languages, such as Urdu, with limited resources and datasets, is challenging. Furthermore, limited research in resource-constrained languages has created language bias in Artificial Intelligence (AI) research, which is concentrated in this study. Natural Language Processing (NLP) techniques have been used for Fake News Detection (FND) for English news and various language-related tasks. Previous studies used Machine Learning (ML), Deep Learning (DL), and individual Pre-trained Language Models (PLMs) for Urdu FND. ML-based ensemble model showed better performance than pre-trained models for Urdu FND. We propose a methodology for Urdu FND by applying stacked ensemble learning of PLMs, ELECTRA, mBERT and XLM-RoBERTa after apposite fine-tuning and hyperparameter optimization. To overcome the limitations of each pre-trained transformer model, these are fine-tuned individually using a publicly available Urdu dataset. The prediction performance results of the proposed stacking approach surpass the performance of each pre-trained model. An Accuracy of 0.914, a Matthews Correlation Co-efficient (MCC) value of 0.898, and an F1-score of 0.904 validate the efficacy of the proposed ensemble model.
ISSN:2045-2322