An efficient approach on risk factor prediction related to cardiovascular disease around Kumbakonam, Tamil Nadu, India, using unsupervised machine learning techniques

Abstract Nowadays, human beings suffer from varieties of diseases due to the environmental circumstances and their residing habits. Cardiovascular diseases (CVD) are the leading cause of mortality among all diseases. CVDs are heart-related diseases. In early days, the lack of technological advanceme...

Full description

Saved in:
Bibliographic Details
Main Authors: K Kannan, A Menaga
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-89403-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850197798597689344
author K Kannan
A Menaga
author_facet K Kannan
A Menaga
author_sort K Kannan
collection DOAJ
description Abstract Nowadays, human beings suffer from varieties of diseases due to the environmental circumstances and their residing habits. Cardiovascular diseases (CVD) are the leading cause of mortality among all diseases. CVDs are heart-related diseases. In early days, the lack of technological advancements resulted in the loss of many human lives. That is, delay in diagnosis resulted in delay in treatments, which obviously becomes the reason for loss of human lives. Hence, the prediction of diseases in advance becomes an inevitability that subsequently supports in providing the necessary treatments. Thus, the present paper deals with the risk factor prediction based on unsupervised learning methods and also identifying the predominant parameters that are vital to risk factors by using principal component analysis. In this article, we have collected the patient data of size 130 × 12 from four different laboratories in and around Kumbakonam, Tamil Nadu, and India. Here, various clustering techniques like k-means clustering, partition around medoids (PAM) clustering, hierarchical clustering, and fuzzy clustering have been applied to the patient data, and the results show that data can be taken in clusters of “patients with risk” and “patients with no risk”. The optimal number of clusters is determined using elbow and silhouette methods. The efficiency of the clustering is evaluated using the Hopkins statistic, Dunn’s index, and average Silhouette widths. The agglomerative coefficients computed indicate that there is a strong cluster structure in the dataset. The stability of the clusters is tested using bootstrapping cluster analysis, and the result showed that the clusters are highly stable. We have applied feature selection using principal component analysis. Also, on applying PCA, out of 12 parameters, it is inferred that Total Cholesterol is the highly correlated factor which plays an important role in the identification of risk factors among patients.
format Article
id doaj-art-23924ea28c314218bde375a61415bd86
institution OA Journals
issn 2045-2322
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-23924ea28c314218bde375a61415bd862025-08-20T02:13:02ZengNature PortfolioScientific Reports2045-23222025-02-0115111810.1038/s41598-025-89403-4An efficient approach on risk factor prediction related to cardiovascular disease around Kumbakonam, Tamil Nadu, India, using unsupervised machine learning techniquesK Kannan0A Menaga1SASTRA Deemed to be UniversitySASTRA Deemed to be UniversityAbstract Nowadays, human beings suffer from varieties of diseases due to the environmental circumstances and their residing habits. Cardiovascular diseases (CVD) are the leading cause of mortality among all diseases. CVDs are heart-related diseases. In early days, the lack of technological advancements resulted in the loss of many human lives. That is, delay in diagnosis resulted in delay in treatments, which obviously becomes the reason for loss of human lives. Hence, the prediction of diseases in advance becomes an inevitability that subsequently supports in providing the necessary treatments. Thus, the present paper deals with the risk factor prediction based on unsupervised learning methods and also identifying the predominant parameters that are vital to risk factors by using principal component analysis. In this article, we have collected the patient data of size 130 × 12 from four different laboratories in and around Kumbakonam, Tamil Nadu, and India. Here, various clustering techniques like k-means clustering, partition around medoids (PAM) clustering, hierarchical clustering, and fuzzy clustering have been applied to the patient data, and the results show that data can be taken in clusters of “patients with risk” and “patients with no risk”. The optimal number of clusters is determined using elbow and silhouette methods. The efficiency of the clustering is evaluated using the Hopkins statistic, Dunn’s index, and average Silhouette widths. The agglomerative coefficients computed indicate that there is a strong cluster structure in the dataset. The stability of the clusters is tested using bootstrapping cluster analysis, and the result showed that the clusters are highly stable. We have applied feature selection using principal component analysis. Also, on applying PCA, out of 12 parameters, it is inferred that Total Cholesterol is the highly correlated factor which plays an important role in the identification of risk factors among patients.https://doi.org/10.1038/s41598-025-89403-4K-means clusteringPAM clusteringHierarchical clusteringFuzzy clusteringPrincipal component Analysis
spellingShingle K Kannan
A Menaga
An efficient approach on risk factor prediction related to cardiovascular disease around Kumbakonam, Tamil Nadu, India, using unsupervised machine learning techniques
Scientific Reports
K-means clustering
PAM clustering
Hierarchical clustering
Fuzzy clustering
Principal component Analysis
title An efficient approach on risk factor prediction related to cardiovascular disease around Kumbakonam, Tamil Nadu, India, using unsupervised machine learning techniques
title_full An efficient approach on risk factor prediction related to cardiovascular disease around Kumbakonam, Tamil Nadu, India, using unsupervised machine learning techniques
title_fullStr An efficient approach on risk factor prediction related to cardiovascular disease around Kumbakonam, Tamil Nadu, India, using unsupervised machine learning techniques
title_full_unstemmed An efficient approach on risk factor prediction related to cardiovascular disease around Kumbakonam, Tamil Nadu, India, using unsupervised machine learning techniques
title_short An efficient approach on risk factor prediction related to cardiovascular disease around Kumbakonam, Tamil Nadu, India, using unsupervised machine learning techniques
title_sort efficient approach on risk factor prediction related to cardiovascular disease around kumbakonam tamil nadu india using unsupervised machine learning techniques
topic K-means clustering
PAM clustering
Hierarchical clustering
Fuzzy clustering
Principal component Analysis
url https://doi.org/10.1038/s41598-025-89403-4
work_keys_str_mv AT kkannan anefficientapproachonriskfactorpredictionrelatedtocardiovasculardiseasearoundkumbakonamtamilnaduindiausingunsupervisedmachinelearningtechniques
AT amenaga anefficientapproachonriskfactorpredictionrelatedtocardiovasculardiseasearoundkumbakonamtamilnaduindiausingunsupervisedmachinelearningtechniques
AT kkannan efficientapproachonriskfactorpredictionrelatedtocardiovasculardiseasearoundkumbakonamtamilnaduindiausingunsupervisedmachinelearningtechniques
AT amenaga efficientapproachonriskfactorpredictionrelatedtocardiovasculardiseasearoundkumbakonamtamilnaduindiausingunsupervisedmachinelearningtechniques