Arabic Fake News Dataset Development: Humans and AI-Generated Contributions

The extensive use of social media platforms has promoted the rapid spread of fake news on the internet, such as fake reviews, rumors, and propaganda. Although these terminologies have different objectives, they share the aim of causing harm in the form of fake news. This study presents an Arabic fak...

Full description

Saved in:
Bibliographic Details
Main Authors: Hanen Himdi, Nuha Zamzami, Fatma Najar, Mada Alrehaili, Nizar Bouguila
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10945848/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The extensive use of social media platforms has promoted the rapid spread of fake news on the internet, such as fake reviews, rumors, and propaganda. Although these terminologies have different objectives, they share the aim of causing harm in the form of fake news. This study presents an Arabic fake news detection framework to overcome the widespread fake news phenomenon. The proposed framework introduces the first Arabic fake news dataset compiled by passing through strict guidelines to produce fake articles composed by humans and the generative pre-trained transformer (GPT). First, we performed human-based experiments to evaluate the ability of humans to distinguish real news articles from fake news articles. Our findings reveal that humans could roughly identify half of the fake articles from humans or GPT, raising concerns about their ability to detect fake news. This highlights the growing concern surrounding fake news, especially because GPT demonstrates the ability to generate fake news that closely resembles human-created content, further amplifying the issue. To address this issue, we performed the same task using Deep Learning (DL) and transformer-based methods with different word embeddings. Across all the employed models, the study revealed that the innovative transformer-based model, ARBERT, outperformed the DL models, reaching an accuracy of 78% in classifying real and fake news generated by humans and GPT. The findings suggest effective techniques for addressing and resolving this issue.
ISSN:2169-3536