Unsupervised machine learning clustering approach for hospitalized COVID-19 pneumonia patients

Abstract Background Identification of distinct clinical phenotypes of diseases can guide personalized treatment. This study aimed to classify hospitalized COVID-19 pneumonia subgroups using an unsupervised machine learning approach. Methods We included hospitalized COVID-19 pneumonia patients from J...

Full description

Saved in:
Bibliographic Details
Main Authors: Nuttinan Nalinthasnai, Ratchainant Thammasudjarit, Tanapat Tassaneyasin, Dararat Eksombatchai, Somnuek Sungkanuparph, Viboon Boonsarngsuk, Yuda Sutherasan, Detajin Junhasavasdikul, Pongdhep Theerawit, Tananchai Petnak
Format: Article
Language:English
Published: BMC 2025-02-01
Series:BMC Pulmonary Medicine
Subjects:
Online Access:https://doi.org/10.1186/s12890-025-03536-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823863414149087232
author Nuttinan Nalinthasnai
Ratchainant Thammasudjarit
Tanapat Tassaneyasin
Dararat Eksombatchai
Somnuek Sungkanuparph
Viboon Boonsarngsuk
Yuda Sutherasan
Detajin Junhasavasdikul
Pongdhep Theerawit
Tananchai Petnak
author_facet Nuttinan Nalinthasnai
Ratchainant Thammasudjarit
Tanapat Tassaneyasin
Dararat Eksombatchai
Somnuek Sungkanuparph
Viboon Boonsarngsuk
Yuda Sutherasan
Detajin Junhasavasdikul
Pongdhep Theerawit
Tananchai Petnak
author_sort Nuttinan Nalinthasnai
collection DOAJ
description Abstract Background Identification of distinct clinical phenotypes of diseases can guide personalized treatment. This study aimed to classify hospitalized COVID-19 pneumonia subgroups using an unsupervised machine learning approach. Methods We included hospitalized COVID-19 pneumonia patients from July to September 2021. K-means clustering, an unsupervised machine learning method, was performed to identify clinical phenotypes based on clinical and laboratory variables collected within 24 hours of admission. Variables were normalized before clustering to ensure equal contribution to the analysis. The optimal number of clusters was determined using the elbow method and Silhouette scores. Cox proportional hazard models were used to compare the risk of intubation and 90-day mortality across the identified clusters. Results Three clinically distinct clusters were identified among 538 hospitalized COVID-19 pneumonia patients. Cluster 1 (N = 27) consisted predominantly of males and showed significantly elevated serum liver enzymes and LDH levels. Cluster 2 (N = 370) was characterized by lower chest x-ray scores and higher serum albumin levels. Cluster 3 (N = 141) was characterized by older age, diabetes mellitus, higher chest x-ray scores, more severe vital signs, higher creatinine levels, lower hemoglobin levels, lower lymphocyte counts, higher C-reactive protein, higher D-dimer, and higher LDH levels. When compared to cluster 2, cluster 3 was significantly associated with increased risk of 90-day mortality (HR, 6.24; 95% CI, 2.42–16.09) and intubation (HR, 5.26; 95% CI 2.37–11.72). In contrast, cluster 1 had a 100% survival rate with a non-significant increase in intubation risk compared to cluster 2 (HR, 1.40, 95% CI, 0.18–11.04). Conclusions We identified three distinct clinical phenotypes of COVID-19 pneumonia patients, with cluster 3 associated with an increased risk of respiratory failure and mortality. These findings may guide tailored clinical management strategies.
format Article
id doaj-art-22d925cf637e415c8c308681241f70ea
institution Kabale University
issn 1471-2466
language English
publishDate 2025-02-01
publisher BMC
record_format Article
series BMC Pulmonary Medicine
spelling doaj-art-22d925cf637e415c8c308681241f70ea2025-02-09T12:09:33ZengBMCBMC Pulmonary Medicine1471-24662025-02-012511910.1186/s12890-025-03536-wUnsupervised machine learning clustering approach for hospitalized COVID-19 pneumonia patientsNuttinan Nalinthasnai0Ratchainant Thammasudjarit1Tanapat Tassaneyasin2Dararat Eksombatchai3Somnuek Sungkanuparph4Viboon Boonsarngsuk5Yuda Sutherasan6Detajin Junhasavasdikul7Pongdhep Theerawit8Tananchai Petnak9Division of Pulmonary and Pulmonary Critical Care Medicine, Department of Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol UniversityDepartment of Computer Science, Faculty of Science, Srinakharinwirot UniversityDivision of Pulmonary and Pulmonary Critical Care Medicine, Department of Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol UniversityDivision of Pulmonary and Pulmonary Critical Care Medicine, Department of Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol UniversityFaculty of Medicine Ramathibodi Hospital, Chakri Naruebodindra Medical Institute, Mahidol UniversityDivision of Pulmonary and Pulmonary Critical Care Medicine, Department of Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol UniversityDivision of Pulmonary and Pulmonary Critical Care Medicine, Department of Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol UniversityDivision of Pulmonary and Pulmonary Critical Care Medicine, Department of Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol UniversityDivision of Critical Care Medicine, Department of Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol UniversityDivision of Pulmonary and Pulmonary Critical Care Medicine, Department of Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol UniversityAbstract Background Identification of distinct clinical phenotypes of diseases can guide personalized treatment. This study aimed to classify hospitalized COVID-19 pneumonia subgroups using an unsupervised machine learning approach. Methods We included hospitalized COVID-19 pneumonia patients from July to September 2021. K-means clustering, an unsupervised machine learning method, was performed to identify clinical phenotypes based on clinical and laboratory variables collected within 24 hours of admission. Variables were normalized before clustering to ensure equal contribution to the analysis. The optimal number of clusters was determined using the elbow method and Silhouette scores. Cox proportional hazard models were used to compare the risk of intubation and 90-day mortality across the identified clusters. Results Three clinically distinct clusters were identified among 538 hospitalized COVID-19 pneumonia patients. Cluster 1 (N = 27) consisted predominantly of males and showed significantly elevated serum liver enzymes and LDH levels. Cluster 2 (N = 370) was characterized by lower chest x-ray scores and higher serum albumin levels. Cluster 3 (N = 141) was characterized by older age, diabetes mellitus, higher chest x-ray scores, more severe vital signs, higher creatinine levels, lower hemoglobin levels, lower lymphocyte counts, higher C-reactive protein, higher D-dimer, and higher LDH levels. When compared to cluster 2, cluster 3 was significantly associated with increased risk of 90-day mortality (HR, 6.24; 95% CI, 2.42–16.09) and intubation (HR, 5.26; 95% CI 2.37–11.72). In contrast, cluster 1 had a 100% survival rate with a non-significant increase in intubation risk compared to cluster 2 (HR, 1.40, 95% CI, 0.18–11.04). Conclusions We identified three distinct clinical phenotypes of COVID-19 pneumonia patients, with cluster 3 associated with an increased risk of respiratory failure and mortality. These findings may guide tailored clinical management strategies.https://doi.org/10.1186/s12890-025-03536-wCOVID-19PneumoniaClustering analysisMachine learningMortality
spellingShingle Nuttinan Nalinthasnai
Ratchainant Thammasudjarit
Tanapat Tassaneyasin
Dararat Eksombatchai
Somnuek Sungkanuparph
Viboon Boonsarngsuk
Yuda Sutherasan
Detajin Junhasavasdikul
Pongdhep Theerawit
Tananchai Petnak
Unsupervised machine learning clustering approach for hospitalized COVID-19 pneumonia patients
BMC Pulmonary Medicine
COVID-19
Pneumonia
Clustering analysis
Machine learning
Mortality
title Unsupervised machine learning clustering approach for hospitalized COVID-19 pneumonia patients
title_full Unsupervised machine learning clustering approach for hospitalized COVID-19 pneumonia patients
title_fullStr Unsupervised machine learning clustering approach for hospitalized COVID-19 pneumonia patients
title_full_unstemmed Unsupervised machine learning clustering approach for hospitalized COVID-19 pneumonia patients
title_short Unsupervised machine learning clustering approach for hospitalized COVID-19 pneumonia patients
title_sort unsupervised machine learning clustering approach for hospitalized covid 19 pneumonia patients
topic COVID-19
Pneumonia
Clustering analysis
Machine learning
Mortality
url https://doi.org/10.1186/s12890-025-03536-w
work_keys_str_mv AT nuttinannalinthasnai unsupervisedmachinelearningclusteringapproachforhospitalizedcovid19pneumoniapatients
AT ratchainantthammasudjarit unsupervisedmachinelearningclusteringapproachforhospitalizedcovid19pneumoniapatients
AT tanapattassaneyasin unsupervisedmachinelearningclusteringapproachforhospitalizedcovid19pneumoniapatients
AT dararateksombatchai unsupervisedmachinelearningclusteringapproachforhospitalizedcovid19pneumoniapatients
AT somnueksungkanuparph unsupervisedmachinelearningclusteringapproachforhospitalizedcovid19pneumoniapatients
AT viboonboonsarngsuk unsupervisedmachinelearningclusteringapproachforhospitalizedcovid19pneumoniapatients
AT yudasutherasan unsupervisedmachinelearningclusteringapproachforhospitalizedcovid19pneumoniapatients
AT detajinjunhasavasdikul unsupervisedmachinelearningclusteringapproachforhospitalizedcovid19pneumoniapatients
AT pongdheptheerawit unsupervisedmachinelearningclusteringapproachforhospitalizedcovid19pneumoniapatients
AT tananchaipetnak unsupervisedmachinelearningclusteringapproachforhospitalizedcovid19pneumoniapatients