Online Asynchronous Learning over Streaming Nominal Data

Online learning has become increasingly prevalent in real-world applications, where data streams often comprise heterogeneous feature types—both nominal and numerical—and labels may not arrive synchronously with features. However, most existing online learning methods assume homogeneous data types a...

Full description

Saved in:
Bibliographic Details
Main Authors: Hongrui Li, Shengda Zhuo, Lin Li, Jiale Chen, Tianbo Wang, Jun Tang, Shaorui Liu, Shuqiang Huang
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Big Data and Cognitive Computing
Subjects:
Online Access:https://www.mdpi.com/2504-2289/9/7/177
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Online learning has become increasingly prevalent in real-world applications, where data streams often comprise heterogeneous feature types—both nominal and numerical—and labels may not arrive synchronously with features. However, most existing online learning methods assume homogeneous data types and synchronous arrival of features and labels. In practice, data streams are typically heterogeneous and exhibit asynchronous label feedback, making these methods insufficient. To address these challenges, we propose a novel algorithm, termed <i>Online Asynchronous Learning over Streaming Nominal Data</i> (OALN), which maps heterogeneous data into a continuous latent space and leverages a model pool alongside a hint mechanism to effectively manage asynchronous labels. Specifically, OALN is grounded in three core principles: (1) It utilizes a Gaussian mixture copula in the latent space to preserve class structure and numerical relationships, thereby addressing the encoding and relational learning challenges posed by mixed feature types. (2) It performs adaptive imputation through conditional covariance matrices to seamlessly handle random missing values and feature drift, while incrementally updating copula parameters to accommodate dynamic changes in the feature space. (3) It incorporates a model pool and hint mechanism to efficiently process asynchronous label feedback. We evaluate OALN on twelve real-world datasets; the average cumulative error rates are 23.31% and 28.28% under the missing rates of 10% and 50%, respectively, and the average AUC scores are 0.7895 and 0.7433, which are the best results among the compared algorithms. And both theoretical analyses and extensive empirical studies confirm the effectiveness of the proposed method.
ISSN:2504-2289