Pre-Trained Language Model Ensemble for Arabic Fake News Detection

Fake news detection (FND) remains a challenge due to its vast and varied sources, especially on social media platforms. While numerous attempts have been made by academia and the industry to develop fake news detection systems, research on Arabic content remains limited. This study investigates tran...

Full description

Saved in:
Bibliographic Details
Main Authors: Lama Al-Zahrani, Maha Al-Yahya
Format: Article
Language:English
Published: MDPI AG 2024-09-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/12/18/2941
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850260425056190464
author Lama Al-Zahrani
Maha Al-Yahya
author_facet Lama Al-Zahrani
Maha Al-Yahya
author_sort Lama Al-Zahrani
collection DOAJ
description Fake news detection (FND) remains a challenge due to its vast and varied sources, especially on social media platforms. While numerous attempts have been made by academia and the industry to develop fake news detection systems, research on Arabic content remains limited. This study investigates transformer-based language models for Arabic FND. While transformer-based models have shown promising performance in various natural language processing tasks, they often struggle with tasks involving complex linguistic patterns and cultural contexts, resulting in unreliable performance and misclassification problems. To overcome these challenges, we investigated an ensemble of transformer-based models. We experimented with five Arabic transformer models: AraBERT, MARBERT, AraELECTRA, AraGPT2, and ARBERT. Various ensemble approaches, including a weighted-average ensemble, hard voting, and soft voting, were evaluated to determine the most effective techniques for boosting learning models and improving prediction accuracies. The results of this study demonstrate the effectiveness of ensemble models in significantly boosting the baseline model performance. An important finding is that ensemble models achieved excellent performance on the Arabic Multisource Fake News Detection (AMFND) dataset, reaching an F1 score of 94% using weighted averages. Moreover, changing the number of models in the ensemble has a slight effect on the performance. These key findings contribute to the advancement of fake news detection in Arabic, offering valuable insights for both academia and the industry
format Article
id doaj-art-1c9e8fb7a76247dca5b5d36138af3fab
institution OA Journals
issn 2227-7390
language English
publishDate 2024-09-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-1c9e8fb7a76247dca5b5d36138af3fab2025-08-20T01:55:38ZengMDPI AGMathematics2227-73902024-09-011218294110.3390/math12182941Pre-Trained Language Model Ensemble for Arabic Fake News DetectionLama Al-Zahrani0Maha Al-Yahya1Information Technology Department, College of Computer and Information Sciences, King Saud University, P.O. Box 145111, Riyadh 4545, Saudi ArabiaInformation Technology Department, College of Computer and Information Sciences, King Saud University, P.O. Box 145111, Riyadh 4545, Saudi ArabiaFake news detection (FND) remains a challenge due to its vast and varied sources, especially on social media platforms. While numerous attempts have been made by academia and the industry to develop fake news detection systems, research on Arabic content remains limited. This study investigates transformer-based language models for Arabic FND. While transformer-based models have shown promising performance in various natural language processing tasks, they often struggle with tasks involving complex linguistic patterns and cultural contexts, resulting in unreliable performance and misclassification problems. To overcome these challenges, we investigated an ensemble of transformer-based models. We experimented with five Arabic transformer models: AraBERT, MARBERT, AraELECTRA, AraGPT2, and ARBERT. Various ensemble approaches, including a weighted-average ensemble, hard voting, and soft voting, were evaluated to determine the most effective techniques for boosting learning models and improving prediction accuracies. The results of this study demonstrate the effectiveness of ensemble models in significantly boosting the baseline model performance. An important finding is that ensemble models achieved excellent performance on the Arabic Multisource Fake News Detection (AMFND) dataset, reaching an F1 score of 94% using weighted averages. Moreover, changing the number of models in the ensemble has a slight effect on the performance. These key findings contribute to the advancement of fake news detection in Arabic, offering valuable insights for both academia and the industryhttps://www.mdpi.com/2227-7390/12/18/2941fake news detectionArabiclearning ensembleLLMAraBERTMARBERT
spellingShingle Lama Al-Zahrani
Maha Al-Yahya
Pre-Trained Language Model Ensemble for Arabic Fake News Detection
Mathematics
fake news detection
Arabic
learning ensemble
LLM
AraBERT
MARBERT
title Pre-Trained Language Model Ensemble for Arabic Fake News Detection
title_full Pre-Trained Language Model Ensemble for Arabic Fake News Detection
title_fullStr Pre-Trained Language Model Ensemble for Arabic Fake News Detection
title_full_unstemmed Pre-Trained Language Model Ensemble for Arabic Fake News Detection
title_short Pre-Trained Language Model Ensemble for Arabic Fake News Detection
title_sort pre trained language model ensemble for arabic fake news detection
topic fake news detection
Arabic
learning ensemble
LLM
AraBERT
MARBERT
url https://www.mdpi.com/2227-7390/12/18/2941
work_keys_str_mv AT lamaalzahrani pretrainedlanguagemodelensembleforarabicfakenewsdetection
AT mahaalyahya pretrainedlanguagemodelensembleforarabicfakenewsdetection