Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results

Clustering algorithms are widely used in the knowledge discovery domain, but concerns and questions about the validity of the results must be considered. The datasets commonly used for clustering tasks are often large and scale-free, making conventional statistical techniques inadequate for analyzin...

Full description

Saved in:
Bibliographic Details
Main Authors: Ammar Elnour, Wencheng Yang, Yan Li
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of the Computer Society
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10902121/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849392894425694208
author Ammar Elnour
Wencheng Yang
Yan Li
author_facet Ammar Elnour
Wencheng Yang
Yan Li
author_sort Ammar Elnour
collection DOAJ
description Clustering algorithms are widely used in the knowledge discovery domain, but concerns and questions about the validity of the results must be considered. The datasets commonly used for clustering tasks are often large and scale-free, making conventional statistical techniques inadequate for analyzing result uncertainty. This issue applies to most outcomes obtained from other knowledge discovery techniques, such as machine learning and statistical learning. Traditional statistical methods assume data follows standard distributions, whereas resampling and bootstrapping methods offer more accurate and reliable alternatives. This article introduces a method that employs bootstrap likelihood estimation to infer the uncertainty of generated clustering structures. We first calculated the clustering error in the original dataset and then utilized the proposed method to estimate its nonparametric bootstrapped likelihood. By comparing these two values, we can establish a nonparametric significance testing framework that directly determines the validity of the result. To evaluate the effectiveness of our method, we conducted experiments using synthetic and real datasets. The results demonstrate that our method can successfully validate clustering results.
format Article
id doaj-art-a3a63c2465b542b09cad7fcfaab7c71f
institution Kabale University
issn 2644-1268
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of the Computer Society
spelling doaj-art-a3a63c2465b542b09cad7fcfaab7c71f2025-08-20T03:40:40ZengIEEEIEEE Open Journal of the Computer Society2644-12682025-01-01643844810.1109/OJCS.2025.354526110902121Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering ResultsAmmar Elnour0https://orcid.org/0009-0004-0944-2558Wencheng Yang1https://orcid.org/0000-0001-7800-2215Yan Li2https://orcid.org/0000-0002-4694-4926School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, AustraliaClustering algorithms are widely used in the knowledge discovery domain, but concerns and questions about the validity of the results must be considered. The datasets commonly used for clustering tasks are often large and scale-free, making conventional statistical techniques inadequate for analyzing result uncertainty. This issue applies to most outcomes obtained from other knowledge discovery techniques, such as machine learning and statistical learning. Traditional statistical methods assume data follows standard distributions, whereas resampling and bootstrapping methods offer more accurate and reliable alternatives. This article introduces a method that employs bootstrap likelihood estimation to infer the uncertainty of generated clustering structures. We first calculated the clustering error in the original dataset and then utilized the proposed method to estimate its nonparametric bootstrapped likelihood. By comparing these two values, we can establish a nonparametric significance testing framework that directly determines the validity of the result. To evaluate the effectiveness of our method, we conducted experiments using synthetic and real datasets. The results demonstrate that our method can successfully validate clustering results.https://ieeexplore.ieee.org/document/10902121/Clusteringvalidity testingrandomnessbootstrap likelihoodsignificance testingstatistical machine learning
spellingShingle Ammar Elnour
Wencheng Yang
Yan Li
Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results
IEEE Open Journal of the Computer Society
Clustering
validity testing
randomness
bootstrap likelihood
significance testing
statistical machine learning
title Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results
title_full Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results
title_fullStr Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results
title_full_unstemmed Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results
title_short Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results
title_sort nonparametric bootstrap likelihood estimation to investigate the chance set up on clustering results
topic Clustering
validity testing
randomness
bootstrap likelihood
significance testing
statistical machine learning
url https://ieeexplore.ieee.org/document/10902121/
work_keys_str_mv AT ammarelnour nonparametricbootstraplikelihoodestimationtoinvestigatethechancesetuponclusteringresults
AT wenchengyang nonparametricbootstraplikelihoodestimationtoinvestigatethechancesetuponclusteringresults
AT yanli nonparametricbootstraplikelihoodestimationtoinvestigatethechancesetuponclusteringresults