Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data
The COVID-19 pandemic has highlighted the urgent need for rapid and accurate diagnostic methods. In this study, we evaluate three machine learning models—Random Forest (RF), Logistic Regression (LR) and Decision Tree (DT)—for detecting COVID-19 trained on preprocessed imbalanced datasets with 5086 n...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Ferdowsi University of Mashhad
2024-12-01
|
Series: | Computer and Knowledge Engineering |
Subjects: | |
Online Access: | https://cke.um.ac.ir/article_45898_b3c8e1d9ecf92ea8a3734a1aab782226.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832595452853223424 |
---|---|
author | Avaz Naghipour Mohammad Reza Abbaszadeh Bavil Soflaei mostafa ghader-zefrehei |
author_facet | Avaz Naghipour Mohammad Reza Abbaszadeh Bavil Soflaei mostafa ghader-zefrehei |
author_sort | Avaz Naghipour |
collection | DOAJ |
description | The COVID-19 pandemic has highlighted the urgent need for rapid and accurate diagnostic methods. In this study, we evaluate three machine learning models—Random Forest (RF), Logistic Regression (LR) and Decision Tree (DT)—for detecting COVID-19 trained on preprocessed imbalanced datasets with 5086 negative and 558 positive cases. To this end, we demonstrate the capability of two advanced data synthesis algorithms, Conditional Tabular Generative Adversarial Network (CTGAN) and Tabular Variational Autoencoder (TVAE), in addressing the class imbalance inherent in the dataset. The classifiers trained on the original as well as the balanced datasets were evaluated for comparison. Our findings reveal that RF obtains the highest accuracy of 98.83% on the CTGAN-balanced dataset. In conclusion, our results verify the potential of coupling data synthesis with traditional machine learning for the diagnosis of COVID-19. We hope that this research will make a significant contribution to the current AI (Artificial Intelligence) efforts in combating the pandemic. |
format | Article |
id | doaj-art-238c27b8ff784cfb9e6c7729b116b697 |
institution | Kabale University |
issn | 2538-5453 2717-4123 |
language | English |
publishDate | 2024-12-01 |
publisher | Ferdowsi University of Mashhad |
record_format | Article |
series | Computer and Knowledge Engineering |
spelling | doaj-art-238c27b8ff784cfb9e6c7729b116b6972025-01-19T04:04:23ZengFerdowsi University of MashhadComputer and Knowledge Engineering2538-54532717-41232024-12-0172516410.22067/cke.2024.88940.112145898Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 DataAvaz Naghipour0Mohammad Reza Abbaszadeh Bavil Soflaei1mostafa ghader-zefrehei2Department of Computer Engineering, University College of Nabi Akram, Tabriz, Iran.Department of Computer Engineering, University College of Nabi Akram, Tabriz, Iran.Department of Animal Science, Faculty of Agriculture, University of Yasouj, Yasouj, Iran.The COVID-19 pandemic has highlighted the urgent need for rapid and accurate diagnostic methods. In this study, we evaluate three machine learning models—Random Forest (RF), Logistic Regression (LR) and Decision Tree (DT)—for detecting COVID-19 trained on preprocessed imbalanced datasets with 5086 negative and 558 positive cases. To this end, we demonstrate the capability of two advanced data synthesis algorithms, Conditional Tabular Generative Adversarial Network (CTGAN) and Tabular Variational Autoencoder (TVAE), in addressing the class imbalance inherent in the dataset. The classifiers trained on the original as well as the balanced datasets were evaluated for comparison. Our findings reveal that RF obtains the highest accuracy of 98.83% on the CTGAN-balanced dataset. In conclusion, our results verify the potential of coupling data synthesis with traditional machine learning for the diagnosis of COVID-19. We hope that this research will make a significant contribution to the current AI (Artificial Intelligence) efforts in combating the pandemic.https://cke.um.ac.ir/article_45898_b3c8e1d9ecf92ea8a3734a1aab782226.pdfcovid-19 detectionmachine learningctgantvaeclass imbalance |
spellingShingle | Avaz Naghipour Mohammad Reza Abbaszadeh Bavil Soflaei mostafa ghader-zefrehei Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data Computer and Knowledge Engineering covid-19 detection machine learning ctgan tvae class imbalance |
title | Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data |
title_full | Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data |
title_fullStr | Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data |
title_full_unstemmed | Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data |
title_short | Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data |
title_sort | machine learning classifiers and data synthesis techniques to tackle with highly imbalanced covid 19 data |
topic | covid-19 detection machine learning ctgan tvae class imbalance |
url | https://cke.um.ac.ir/article_45898_b3c8e1d9ecf92ea8a3734a1aab782226.pdf |
work_keys_str_mv | AT avaznaghipour machinelearningclassifiersanddatasynthesistechniquestotacklewithhighlyimbalancedcovid19data AT mohammadrezaabbaszadehbavilsoflaei machinelearningclassifiersanddatasynthesistechniquestotacklewithhighlyimbalancedcovid19data AT mostafaghaderzefrehei machinelearningclassifiersanddatasynthesistechniquestotacklewithhighlyimbalancedcovid19data |