Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing Models

BackgroundDuring the COVID-19 pandemic, the continuous spread of misinformation on the internet posed an ongoing threat to public trust and understanding of epidemic prevention policies. Although the pandemic is now under control, information regarding the risks of long-term...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jian-An Chen, Wu-Chun Chung, Che-Lun Hung, Chun-Ying Wu
Format:	Article
Language:	English
Published:	JMIR Publications 2025-05-01
Series:	Journal of Medical Internet Research
Online Access:	https://www.jmir.org/2025/1/e73601
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849328756989100032
author	Jian-An Chen Wu-Chun Chung Che-Lun Hung Chun-Ying Wu
author_facet	Jian-An Chen Wu-Chun Chung Che-Lun Hung Chun-Ying Wu
author_sort	Jian-An Chen
collection	DOAJ
description	BackgroundDuring the COVID-19 pandemic, the continuous spread of misinformation on the internet posed an ongoing threat to public trust and understanding of epidemic prevention policies. Although the pandemic is now under control, information regarding the risks of long-term COVID-19 effects and reinfection still needs to be integrated into COVID-19 policies. ObjectiveThis study aims to develop a robust and generalizable deep learning framework for detecting misinformation related to the prolonged impacts of COVID-19 by integrating pretrained language models (PLMs) with an innovative fuzzy rank-based ensemble approach. MethodsA comprehensive dataset comprising 566 genuine and 2361 fake samples was curated from reliable open sources and processed using advanced techniques. The dataset was randomly split using the scikit-learn package to facilitate both training and evaluation. Deep learning models were trained for 20 epochs on a Tesla T4 for hierarchical attention networks (HANs) and an RTX A5000 (for the other models). To enhance performance, we implemented an ensemble learning strategy that incorporated a reparameterized Gompertz function, which assigned fuzzy ranks based on each model’s prediction confidence for each test case. This method effectively fused outputs from state-of-the-art PLMs such as robustly optimized bidirectional encoder representations from transformers pretraining approach (RoBERTa), decoding-enhanced bidirectional encoder representations from transformers with disentangled attention (DeBERTa), and XLNet. ResultsAfter training on the dataset, various classification methods were evaluated on the test set, including the fuzzy rank-based method and state-of-the-art large language models. Experimental results reveal that language models, particularly XLNet, outperform traditional approaches that combine term frequency–inverse document frequency features with support vector machine or utilize deep models like HAN. The evaluation metrics—including accuracy, precision, recall, F1-score, and area under the curve (AUC)—indicated a clear performance advantage for models that had a larger number of parameters. However, this study also highlights that model architecture, training procedures, and optimization techniques are critical determinants of classification effectiveness. XLNet’s permutation language modeling approach enhances bidirectional context understanding, allowing it to surpass even larger models in the bidirectional encoder representations from transformers (BERT) series despite having relatively fewer parameters. Notably, the fuzzy rank-based ensemble method, which combines multiple language models, achieved impressive results on the test set, with an accuracy of 93.52%, a precision of 94.65%, an F1-score of 96.03%, and an AUC of 97.15%. ConclusionsThe fusion of ensemble learning with PLMs and the Gompertz function, employing fuzzy rank-based methodology, introduces a novel prediction approach with prospects for enhancing accuracy and reliability. Additionally, the experimental results imply that training solely on textual content can yield high prediction accuracy, thereby providing valuable insights into the optimization of fake news detection systems. These findings not only aid in detecting misinformation but also have broader implications for the application of advanced deep learning techniques in public health policy and communication.
format	Article
id	doaj-art-d2c423bdcafc47979cdd9c727c3fb03d
institution	Kabale University
issn	1438-8871
language	English
publishDate	2025-05-01
publisher	JMIR Publications
record_format	Article
series	Journal of Medical Internet Research
spelling	doaj-art-d2c423bdcafc47979cdd9c727c3fb03d2025-08-20T03:47:28ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-05-0127e7360110.2196/73601Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing ModelsJian-An Chenhttps://orcid.org/0009-0003-8289-7404Wu-Chun Chunghttps://orcid.org/0009-0001-1358-6579Che-Lun Hunghttps://orcid.org/0000-0002-8906-9367Chun-Ying Wuhttps://orcid.org/0000-0001-5053-1801 BackgroundDuring the COVID-19 pandemic, the continuous spread of misinformation on the internet posed an ongoing threat to public trust and understanding of epidemic prevention policies. Although the pandemic is now under control, information regarding the risks of long-term COVID-19 effects and reinfection still needs to be integrated into COVID-19 policies. ObjectiveThis study aims to develop a robust and generalizable deep learning framework for detecting misinformation related to the prolonged impacts of COVID-19 by integrating pretrained language models (PLMs) with an innovative fuzzy rank-based ensemble approach. MethodsA comprehensive dataset comprising 566 genuine and 2361 fake samples was curated from reliable open sources and processed using advanced techniques. The dataset was randomly split using the scikit-learn package to facilitate both training and evaluation. Deep learning models were trained for 20 epochs on a Tesla T4 for hierarchical attention networks (HANs) and an RTX A5000 (for the other models). To enhance performance, we implemented an ensemble learning strategy that incorporated a reparameterized Gompertz function, which assigned fuzzy ranks based on each model’s prediction confidence for each test case. This method effectively fused outputs from state-of-the-art PLMs such as robustly optimized bidirectional encoder representations from transformers pretraining approach (RoBERTa), decoding-enhanced bidirectional encoder representations from transformers with disentangled attention (DeBERTa), and XLNet. ResultsAfter training on the dataset, various classification methods were evaluated on the test set, including the fuzzy rank-based method and state-of-the-art large language models. Experimental results reveal that language models, particularly XLNet, outperform traditional approaches that combine term frequency–inverse document frequency features with support vector machine or utilize deep models like HAN. The evaluation metrics—including accuracy, precision, recall, F1-score, and area under the curve (AUC)—indicated a clear performance advantage for models that had a larger number of parameters. However, this study also highlights that model architecture, training procedures, and optimization techniques are critical determinants of classification effectiveness. XLNet’s permutation language modeling approach enhances bidirectional context understanding, allowing it to surpass even larger models in the bidirectional encoder representations from transformers (BERT) series despite having relatively fewer parameters. Notably, the fuzzy rank-based ensemble method, which combines multiple language models, achieved impressive results on the test set, with an accuracy of 93.52%, a precision of 94.65%, an F1-score of 96.03%, and an AUC of 97.15%. ConclusionsThe fusion of ensemble learning with PLMs and the Gompertz function, employing fuzzy rank-based methodology, introduces a novel prediction approach with prospects for enhancing accuracy and reliability. Additionally, the experimental results imply that training solely on textual content can yield high prediction accuracy, thereby providing valuable insights into the optimization of fake news detection systems. These findings not only aid in detecting misinformation but also have broader implications for the application of advanced deep learning techniques in public health policy and communication.https://www.jmir.org/2025/1/e73601
spellingShingle	Jian-An Chen Wu-Chun Chung Che-Lun Hung Chun-Ying Wu Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing Models Journal of Medical Internet Research
title	Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing Models
title_full	Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing Models
title_fullStr	Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing Models
title_full_unstemmed	Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing Models
title_short	Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing Models
title_sort	identifying disinformation on the extended impacts of covid 19 methodological investigation using a fuzzy ranking ensemble of natural language processing models
url	https://www.jmir.org/2025/1/e73601
work_keys_str_mv	AT jiananchen identifyingdisinformationontheextendedimpactsofcovid19methodologicalinvestigationusingafuzzyrankingensembleofnaturallanguageprocessingmodels AT wuchunchung identifyingdisinformationontheextendedimpactsofcovid19methodologicalinvestigationusingafuzzyrankingensembleofnaturallanguageprocessingmodels AT chelunhung identifyingdisinformationontheextendedimpactsofcovid19methodologicalinvestigationusingafuzzyrankingensembleofnaturallanguageprocessingmodels AT chunyingwu identifyingdisinformationontheextendedimpactsofcovid19methodologicalinvestigationusingafuzzyrankingensembleofnaturallanguageprocessingmodels

Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing Models

Similar Items