Ensemble Transformer–Based Detection of Fake and AI–Generated News

The proliferation of fake online and AI–generated news content poses a significant threat to information integrity. This work leverages advanced natural language processing, machine learning, and deep learning algorithms to effectively detect fake and AI–generated content. The utilized dataset, comb...

Full description

Saved in:

Bibliographic Details
Main Authors:	Md. Ishraquzzaman, Mohammed Ashraful Islam Chowdhury, Shahreen Rahman, Riasat Khan
Format:	Article
Language:	English
Published:	Wiley 2025-01-01
Series:	Applied Computational Intelligence and Soft Computing
Online Access:	http://dx.doi.org/10.1155/acis/3268456
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849716687326150656
author	Md. Ishraquzzaman Mohammed Ashraful Islam Chowdhury Shahreen Rahman Riasat Khan
author_facet	Md. Ishraquzzaman Mohammed Ashraful Islam Chowdhury Shahreen Rahman Riasat Khan
author_sort	Md. Ishraquzzaman
collection	DOAJ
description	The proliferation of fake online and AI–generated news content poses a significant threat to information integrity. This work leverages advanced natural language processing, machine learning, and deep learning algorithms to effectively detect fake and AI–generated content. The utilized dataset, combined with multiple open-source datasets, comprises 43,000 real, 31,000 fake, and 80,000 AI–generated news articles and is augmented with an ensemble large language model. We combined three open-source LLMs (GPT-2, GPT-NEO, and Distil-GPT-2) into an ensemble LLM to generate new news titles, selecting the best outputs through majority voting for further dataset expansion. Preprocessing involved data cleaning, lowercasing, stop word removal, tokenization, and lemmatization. We applied six machine learning and five natural language processing models to this dataset. The two top-performing natural language–based models (RoBERTa and DeBERTa) have been combined to develop an ensemble transformer model. Among the machine learning models, random forest achieved the highest performance, with an accuracy of 92.49% and an F1 score of 92.60%. Among the natural language processing models, the ensemble transformer model attained the highest results, with 96.65% accuracy and an F1 score of 96.66%. The proposed ensemble model is optimized by applying model pruning (reducing parameters from 265M to 210M, improving training time by 25%) and dynamic quantization (reducing model size by 50%, maintaining 95.68% accuracy), enhancing scalability and efficiency while minimizing computational overhead. The DistilBERT-Student model, trained using a balanced combination of feature- and logit-based distillation from the RoBERTa-base Teacher network, achieved strong classification performance with 96.17% accuracy. Visualize-based attention maps are constructed for different news categories to enhance the interpretability of the applied transformer–based ensemble news detection models. Finally, a website was developed to enable users to identify fake, real, or AI–generated news content. The employed dataset, including AI–generated news articles and implementation scripts, can be found at the following website: https://github.com/ishraqisheree99/Combined-News-Dataset.git.
format	Article
id	doaj-art-7a1d22e4b47d4eb5b70ae60e7519c6b2
institution	DOAJ
issn	1687-9732
language	English
publishDate	2025-01-01
publisher	Wiley
record_format	Article
series	Applied Computational Intelligence and Soft Computing
spelling	doaj-art-7a1d22e4b47d4eb5b70ae60e7519c6b22025-08-20T03:12:54ZengWileyApplied Computational Intelligence and Soft Computing1687-97322025-01-01202510.1155/acis/3268456Ensemble Transformer–Based Detection of Fake and AI–Generated NewsMd. Ishraquzzaman0Mohammed Ashraful Islam Chowdhury1Shahreen Rahman2Riasat Khan3Electrical and Computer EngineeringElectrical and Computer EngineeringElectrical and Computer EngineeringElectrical and Computer EngineeringThe proliferation of fake online and AI–generated news content poses a significant threat to information integrity. This work leverages advanced natural language processing, machine learning, and deep learning algorithms to effectively detect fake and AI–generated content. The utilized dataset, combined with multiple open-source datasets, comprises 43,000 real, 31,000 fake, and 80,000 AI–generated news articles and is augmented with an ensemble large language model. We combined three open-source LLMs (GPT-2, GPT-NEO, and Distil-GPT-2) into an ensemble LLM to generate new news titles, selecting the best outputs through majority voting for further dataset expansion. Preprocessing involved data cleaning, lowercasing, stop word removal, tokenization, and lemmatization. We applied six machine learning and five natural language processing models to this dataset. The two top-performing natural language–based models (RoBERTa and DeBERTa) have been combined to develop an ensemble transformer model. Among the machine learning models, random forest achieved the highest performance, with an accuracy of 92.49% and an F1 score of 92.60%. Among the natural language processing models, the ensemble transformer model attained the highest results, with 96.65% accuracy and an F1 score of 96.66%. The proposed ensemble model is optimized by applying model pruning (reducing parameters from 265M to 210M, improving training time by 25%) and dynamic quantization (reducing model size by 50%, maintaining 95.68% accuracy), enhancing scalability and efficiency while minimizing computational overhead. The DistilBERT-Student model, trained using a balanced combination of feature- and logit-based distillation from the RoBERTa-base Teacher network, achieved strong classification performance with 96.17% accuracy. Visualize-based attention maps are constructed for different news categories to enhance the interpretability of the applied transformer–based ensemble news detection models. Finally, a website was developed to enable users to identify fake, real, or AI–generated news content. The employed dataset, including AI–generated news articles and implementation scripts, can be found at the following website: https://github.com/ishraqisheree99/Combined-News-Dataset.git.http://dx.doi.org/10.1155/acis/3268456
spellingShingle	Md. Ishraquzzaman Mohammed Ashraful Islam Chowdhury Shahreen Rahman Riasat Khan Ensemble Transformer–Based Detection of Fake and AI–Generated News Applied Computational Intelligence and Soft Computing
title	Ensemble Transformer–Based Detection of Fake and AI–Generated News
title_full	Ensemble Transformer–Based Detection of Fake and AI–Generated News
title_fullStr	Ensemble Transformer–Based Detection of Fake and AI–Generated News
title_full_unstemmed	Ensemble Transformer–Based Detection of Fake and AI–Generated News
title_short	Ensemble Transformer–Based Detection of Fake and AI–Generated News
title_sort	ensemble transformer based detection of fake and ai generated news
url	http://dx.doi.org/10.1155/acis/3268456
work_keys_str_mv	AT mdishraquzzaman ensembletransformerbaseddetectionoffakeandaigeneratednews AT mohammedashrafulislamchowdhury ensembletransformerbaseddetectionoffakeandaigeneratednews AT shahreenrahman ensembletransformerbaseddetectionoffakeandaigeneratednews AT riasatkhan ensembletransformerbaseddetectionoffakeandaigeneratednews

Ensemble Transformer–Based Detection of Fake and AI–Generated News

Similar Items