Unsupervised Attribute Reduction Algorithms for Multiset-Valued Data Based on Uncertainty Measurement

Missing data introduce uncertainty in data mining, but existing set-valued approaches ignore frequency information. We propose unsupervised attribute reduction algorithms for multiset-valued data to address this gap. First, we define a multiset-valued information system (MSVIS) and establish <inl...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaoyan Guo, Yichun Peng, Yu Li, Hai Lin
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/11/1718
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Missing data introduce uncertainty in data mining, but existing set-valued approaches ignore frequency information. We propose unsupervised attribute reduction algorithms for multiset-valued data to address this gap. First, we define a multiset-valued information system (MSVIS) and establish <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>θ</mi></semantics></math></inline-formula>-tolerance relation to form the information granules. Then, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>θ</mi></semantics></math></inline-formula>-information entropy and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>θ</mi></semantics></math></inline-formula>-information amount are introduced as uncertainty measures. Finally, these two UMs are used to design two unsupervised attribute reduction algorithms in an MSVIS. The experimental results demonstrate the superiority of the proposed algorithms, achieving average reductions of 50% in attribute subsets while improving clustering accuracy and outlier detection performance. Parameter analysis further validates the robustness of the framework under varying missing rates.
ISSN:2227-7390