Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results

Clustering algorithms are widely used in the knowledge discovery domain, but concerns and questions about the validity of the results must be considered. The datasets commonly used for clustering tasks are often large and scale-free, making conventional statistical techniques inadequate for analyzin...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ammar Elnour, Wencheng Yang, Yan Li
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Open Journal of the Computer Society
Subjects:	Clustering validity testing randomness bootstrap likelihood significance testing statistical machine learning
Online Access:	https://ieeexplore.ieee.org/document/10902121/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849392894425694208
author	Ammar Elnour Wencheng Yang Yan Li
author_facet	Ammar Elnour Wencheng Yang Yan Li
author_sort	Ammar Elnour
collection	DOAJ
description	Clustering algorithms are widely used in the knowledge discovery domain, but concerns and questions about the validity of the results must be considered. The datasets commonly used for clustering tasks are often large and scale-free, making conventional statistical techniques inadequate for analyzing result uncertainty. This issue applies to most outcomes obtained from other knowledge discovery techniques, such as machine learning and statistical learning. Traditional statistical methods assume data follows standard distributions, whereas resampling and bootstrapping methods offer more accurate and reliable alternatives. This article introduces a method that employs bootstrap likelihood estimation to infer the uncertainty of generated clustering structures. We first calculated the clustering error in the original dataset and then utilized the proposed method to estimate its nonparametric bootstrapped likelihood. By comparing these two values, we can establish a nonparametric significance testing framework that directly determines the validity of the result. To evaluate the effectiveness of our method, we conducted experiments using synthetic and real datasets. The results demonstrate that our method can successfully validate clustering results.
format	Article
id	doaj-art-a3a63c2465b542b09cad7fcfaab7c71f
institution	Kabale University
issn	2644-1268
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Open Journal of the Computer Society
spelling	doaj-art-a3a63c2465b542b09cad7fcfaab7c71f2025-08-20T03:40:40ZengIEEEIEEE Open Journal of the Computer Society2644-12682025-01-01643844810.1109/OJCS.2025.354526110902121Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering ResultsAmmar Elnour0https://orcid.org/0009-0004-0944-2558Wencheng Yang1https://orcid.org/0000-0001-7800-2215Yan Li2https://orcid.org/0000-0002-4694-4926School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, AustraliaClustering algorithms are widely used in the knowledge discovery domain, but concerns and questions about the validity of the results must be considered. The datasets commonly used for clustering tasks are often large and scale-free, making conventional statistical techniques inadequate for analyzing result uncertainty. This issue applies to most outcomes obtained from other knowledge discovery techniques, such as machine learning and statistical learning. Traditional statistical methods assume data follows standard distributions, whereas resampling and bootstrapping methods offer more accurate and reliable alternatives. This article introduces a method that employs bootstrap likelihood estimation to infer the uncertainty of generated clustering structures. We first calculated the clustering error in the original dataset and then utilized the proposed method to estimate its nonparametric bootstrapped likelihood. By comparing these two values, we can establish a nonparametric significance testing framework that directly determines the validity of the result. To evaluate the effectiveness of our method, we conducted experiments using synthetic and real datasets. The results demonstrate that our method can successfully validate clustering results.https://ieeexplore.ieee.org/document/10902121/Clusteringvalidity testingrandomnessbootstrap likelihoodsignificance testingstatistical machine learning
spellingShingle	Ammar Elnour Wencheng Yang Yan Li Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results IEEE Open Journal of the Computer Society Clustering validity testing randomness bootstrap likelihood significance testing statistical machine learning
title	Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results
title_full	Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results
title_fullStr	Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results
title_full_unstemmed	Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results
title_short	Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results
title_sort	nonparametric bootstrap likelihood estimation to investigate the chance set up on clustering results
topic	Clustering validity testing randomness bootstrap likelihood significance testing statistical machine learning
url	https://ieeexplore.ieee.org/document/10902121/
work_keys_str_mv	AT ammarelnour nonparametricbootstraplikelihoodestimationtoinvestigatethechancesetuponclusteringresults AT wenchengyang nonparametricbootstraplikelihoodestimationtoinvestigatethechancesetuponclusteringresults AT yanli nonparametricbootstraplikelihoodestimationtoinvestigatethechancesetuponclusteringresults

Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results

Similar Items