Online Asynchronous Learning over Streaming Nominal Data
Online learning has become increasingly prevalent in real-world applications, where data streams often comprise heterogeneous feature types—both nominal and numerical—and labels may not arrive synchronously with features. However, most existing online learning methods assume homogeneous data types a...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Big Data and Cognitive Computing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-2289/9/7/177 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Online learning has become increasingly prevalent in real-world applications, where data streams often comprise heterogeneous feature types—both nominal and numerical—and labels may not arrive synchronously with features. However, most existing online learning methods assume homogeneous data types and synchronous arrival of features and labels. In practice, data streams are typically heterogeneous and exhibit asynchronous label feedback, making these methods insufficient. To address these challenges, we propose a novel algorithm, termed <i>Online Asynchronous Learning over Streaming Nominal Data</i> (OALN), which maps heterogeneous data into a continuous latent space and leverages a model pool alongside a hint mechanism to effectively manage asynchronous labels. Specifically, OALN is grounded in three core principles: (1) It utilizes a Gaussian mixture copula in the latent space to preserve class structure and numerical relationships, thereby addressing the encoding and relational learning challenges posed by mixed feature types. (2) It performs adaptive imputation through conditional covariance matrices to seamlessly handle random missing values and feature drift, while incrementally updating copula parameters to accommodate dynamic changes in the feature space. (3) It incorporates a model pool and hint mechanism to efficiently process asynchronous label feedback. We evaluate OALN on twelve real-world datasets; the average cumulative error rates are 23.31% and 28.28% under the missing rates of 10% and 50%, respectively, and the average AUC scores are 0.7895 and 0.7433, which are the best results among the compared algorithms. And both theoretical analyses and extensive empirical studies confirm the effectiveness of the proposed method. |
|---|---|
| ISSN: | 2504-2289 |