Semi-Supervised Attribute Selection Algorithms for Partially Labeled Multiset-Valued Data

In machine learning, when the labeled portion of data needs to be processed, a semi-supervised learning algorithm is used. A dataset with missing attribute values or labels is referred to as an incomplete information system. Addressing incomplete information within a system poses a significant chall...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuanzi He, Jiali He, Haotian Liu, Zhaowen Li
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/8/1318
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In machine learning, when the labeled portion of data needs to be processed, a semi-supervised learning algorithm is used. A dataset with missing attribute values or labels is referred to as an incomplete information system. Addressing incomplete information within a system poses a significant challenge, which can be effectively tackled through the application of rough set theory (<i>R</i>-theory). However, <i>R</i>-theory has its limits: It fails to consider the frequency of an attribute value and then cannot the distribution of attribute values appropriately. If we consider partially labeled data and replace a missing attribute value with the multiset of all possible attribute values under the same attribute, this results in the emergence of partially labeled multiset-valued data. In a semi-supervised learning algorithm, in order to save time and costs, a large number of redundant features need to be deleted. This study proposes semi-supervised attribute selection algorithms for partially labeled multiset-valued data. Initially, a partially labeled multiset-valued decision information system (p-MSVDIS) is partitioned into two distinct systems: a labeled multiset-valued decision information system (l-MSVDIS) and an unlabeled multiset-valued decision information system (u-MSVDIS). Subsequently, using the indistinguishable relation, distinguishable relation, and dependence function, two types of attribute subset importance in a p-MSVDIS are defined: the weighted sum of l-MSVDIS and u-MSVDIS determined by the missing rate of labels, which can be considered an uncertainty measurement (UM) of a p-MSVDIS. Next, two adaptive semi-supervised attribute selection algorithms for a p-MSVDIS are introduced, which leverage the degrees of importance, allowing for automatic adaptation to diverse missing rates. Finally, experiments and statistical analyses are conducted on 11 datasets. The outcome indicates that the proposed algorithms demonstrate advantages over certain algorithms.
ISSN:2227-7390