Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data

The COVID-19 pandemic has highlighted the urgent need for rapid and accurate diagnostic methods. In this study, we evaluate three machine learning models—Random Forest (RF), Logistic Regression (LR) and Decision Tree (DT)—for detecting COVID-19 trained on preprocessed imbalanced datasets with 5086 n...

Full description

Saved in:
Bibliographic Details
Main Authors: Avaz Naghipour, Mohammad Reza Abbaszadeh Bavil Soflaei, mostafa ghader-zefrehei
Format: Article
Language:English
Published: Ferdowsi University of Mashhad 2024-12-01
Series:Computer and Knowledge Engineering
Subjects:
Online Access:https://cke.um.ac.ir/article_45898_b3c8e1d9ecf92ea8a3734a1aab782226.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832595452853223424
author Avaz Naghipour
Mohammad Reza Abbaszadeh Bavil Soflaei
mostafa ghader-zefrehei
author_facet Avaz Naghipour
Mohammad Reza Abbaszadeh Bavil Soflaei
mostafa ghader-zefrehei
author_sort Avaz Naghipour
collection DOAJ
description The COVID-19 pandemic has highlighted the urgent need for rapid and accurate diagnostic methods. In this study, we evaluate three machine learning models—Random Forest (RF), Logistic Regression (LR) and Decision Tree (DT)—for detecting COVID-19 trained on preprocessed imbalanced datasets with 5086 negative and 558 positive cases. To this end, we demonstrate the capability of two advanced data synthesis algorithms, Conditional Tabular Generative Adversarial Network (CTGAN) and Tabular Variational Autoencoder (TVAE), in addressing the class imbalance inherent in the dataset. The classifiers trained on the original as well as the balanced datasets were evaluated for comparison. Our findings reveal that RF obtains the highest accuracy of 98.83% on the CTGAN-balanced dataset. In conclusion, our results verify the potential of coupling data synthesis with traditional machine learning for the diagnosis of COVID-19. We hope that this research will make a significant contribution to the current AI (Artificial Intelligence) efforts in combating the pandemic.
format Article
id doaj-art-238c27b8ff784cfb9e6c7729b116b697
institution Kabale University
issn 2538-5453
2717-4123
language English
publishDate 2024-12-01
publisher Ferdowsi University of Mashhad
record_format Article
series Computer and Knowledge Engineering
spelling doaj-art-238c27b8ff784cfb9e6c7729b116b6972025-01-19T04:04:23ZengFerdowsi University of MashhadComputer and Knowledge Engineering2538-54532717-41232024-12-0172516410.22067/cke.2024.88940.112145898Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 DataAvaz Naghipour0Mohammad Reza Abbaszadeh Bavil Soflaei1mostafa ghader-zefrehei2Department of Computer Engineering, University College of Nabi Akram, Tabriz, Iran.Department of Computer Engineering, University College of Nabi Akram, Tabriz, Iran.Department of Animal Science, Faculty of Agriculture, University of Yasouj, Yasouj, Iran.The COVID-19 pandemic has highlighted the urgent need for rapid and accurate diagnostic methods. In this study, we evaluate three machine learning models—Random Forest (RF), Logistic Regression (LR) and Decision Tree (DT)—for detecting COVID-19 trained on preprocessed imbalanced datasets with 5086 negative and 558 positive cases. To this end, we demonstrate the capability of two advanced data synthesis algorithms, Conditional Tabular Generative Adversarial Network (CTGAN) and Tabular Variational Autoencoder (TVAE), in addressing the class imbalance inherent in the dataset. The classifiers trained on the original as well as the balanced datasets were evaluated for comparison. Our findings reveal that RF obtains the highest accuracy of 98.83% on the CTGAN-balanced dataset. In conclusion, our results verify the potential of coupling data synthesis with traditional machine learning for the diagnosis of COVID-19. We hope that this research will make a significant contribution to the current AI (Artificial Intelligence) efforts in combating the pandemic.https://cke.um.ac.ir/article_45898_b3c8e1d9ecf92ea8a3734a1aab782226.pdfcovid-19 detectionmachine learningctgantvaeclass imbalance
spellingShingle Avaz Naghipour
Mohammad Reza Abbaszadeh Bavil Soflaei
mostafa ghader-zefrehei
Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data
Computer and Knowledge Engineering
covid-19 detection
machine learning
ctgan
tvae
class imbalance
title Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data
title_full Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data
title_fullStr Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data
title_full_unstemmed Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data
title_short Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data
title_sort machine learning classifiers and data synthesis techniques to tackle with highly imbalanced covid 19 data
topic covid-19 detection
machine learning
ctgan
tvae
class imbalance
url https://cke.um.ac.ir/article_45898_b3c8e1d9ecf92ea8a3734a1aab782226.pdf
work_keys_str_mv AT avaznaghipour machinelearningclassifiersanddatasynthesistechniquestotacklewithhighlyimbalancedcovid19data
AT mohammadrezaabbaszadehbavilsoflaei machinelearningclassifiersanddatasynthesistechniquestotacklewithhighlyimbalancedcovid19data
AT mostafaghaderzefrehei machinelearningclassifiersanddatasynthesistechniquestotacklewithhighlyimbalancedcovid19data