Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk

This paper introduces novel nonparametric supervised learning techniques for classifying massive datasets, addressing key limitations of existing methods in Big and Streaming Data framework. We propose an offline kernel-based classifier enhanced by Batch Principal Component Analysis (PCA) for dimens...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohamed Chaouch, Omama M. Al-Hamed
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11091306/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850075402840571904
author Mohamed Chaouch
Omama M. Al-Hamed
author_facet Mohamed Chaouch
Omama M. Al-Hamed
author_sort Mohamed Chaouch
collection DOAJ
description This paper introduces novel nonparametric supervised learning techniques for classifying massive datasets, addressing key limitations of existing methods in Big and Streaming Data framework. We propose an offline kernel-based classifier enhanced by Batch Principal Component Analysis (PCA) for dimensionality reduction to mitigate the “curse of dimensionality”. Additionally, an online classifier is developed for streaming data, combining online PCA with a kernel-based recursive classifier using a stochastic approximation algorithm. Application to fetal well-being monitoring demonstrates that the online classifier achieves a competitive median misclassification rate (11.92%), comparable to the offline classifier (11.54%) and Random Forest (11.31%), while requiring only 1/15th of the offline classifier’s computation time. Receiver Operating Characteristic (ROC) analysis shows superior Area Under the Curve (AUC) for the offline classifier but at a significant computational cost. A second study on larger database of credit scoring confirms these findings, showing that the online classifier achieves an F1-score of 96.40% and an accuracy of 93.08%, closely matching the performance of neural networks (96.46%, 93.22%) and boosting (96.51%, 93.31%). Notably, the online classifier accomplishes this with a CPU time of only 0.87 seconds per classification - over 600 times faster than neural networks - demonstrating its effectiveness for high-frequency, real-time financial decision-making.
format Article
id doaj-art-009712611eb940eba391f4a2b405182f
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-009712611eb940eba391f4a2b405182f2025-08-20T02:46:19ZengIEEEIEEE Access2169-35362025-01-011313171613173210.1109/ACCESS.2025.359188311091306Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit RiskMohamed Chaouch0https://orcid.org/0000-0003-4962-8205Omama M. Al-Hamed1Department of Mathematics and Statistics, College of Arts and Sciences, Statistics Program, Qatar University, Doha, QatarDepartment of Mathematics and Statistics, College of Arts and Sciences, Statistics Program, Qatar University, Doha, QatarThis paper introduces novel nonparametric supervised learning techniques for classifying massive datasets, addressing key limitations of existing methods in Big and Streaming Data framework. We propose an offline kernel-based classifier enhanced by Batch Principal Component Analysis (PCA) for dimensionality reduction to mitigate the “curse of dimensionality”. Additionally, an online classifier is developed for streaming data, combining online PCA with a kernel-based recursive classifier using a stochastic approximation algorithm. Application to fetal well-being monitoring demonstrates that the online classifier achieves a competitive median misclassification rate (11.92%), comparable to the offline classifier (11.54%) and Random Forest (11.31%), while requiring only 1/15th of the offline classifier’s computation time. Receiver Operating Characteristic (ROC) analysis shows superior Area Under the Curve (AUC) for the offline classifier but at a significant computational cost. A second study on larger database of credit scoring confirms these findings, showing that the online classifier achieves an F1-score of 96.40% and an accuracy of 93.08%, closely matching the performance of neural networks (96.46%, 93.22%) and boosting (96.51%, 93.31%). Notably, the online classifier accomplishes this with a CPU time of only 0.87 seconds per classification - over 600 times faster than neural networks - demonstrating its effectiveness for high-frequency, real-time financial decision-making.https://ieeexplore.ieee.org/document/11091306/Big data applicationsclassification algorithmsdimensionality reductionkernel methodsmachine learningnonparametric statistics
spellingShingle Mohamed Chaouch
Omama M. Al-Hamed
Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk
IEEE Access
Big data applications
classification algorithms
dimensionality reduction
kernel methods
machine learning
nonparametric statistics
title Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk
title_full Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk
title_fullStr Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk
title_full_unstemmed Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk
title_short Scalable Nonparametric Supervised Learning for Streaming and Massive Data: Applications in Healthcare Monitoring and Credit Risk
title_sort scalable nonparametric supervised learning for streaming and massive data applications in healthcare monitoring and credit risk
topic Big data applications
classification algorithms
dimensionality reduction
kernel methods
machine learning
nonparametric statistics
url https://ieeexplore.ieee.org/document/11091306/
work_keys_str_mv AT mohamedchaouch scalablenonparametricsupervisedlearningforstreamingandmassivedataapplicationsinhealthcaremonitoringandcreditrisk
AT omamamalhamed scalablenonparametricsupervisedlearningforstreamingandmassivedataapplicationsinhealthcaremonitoringandcreditrisk