Cost-sensitive regression learning on small dataset through intra-cluster product favoured feature selection

Massive regression and forecasting tasks are generally cost-sensitive regression learning problems with asymmetric costs between over-prediction and under-prediction. However, existing classic methods, such as clustering and feature selection, are subject to difficulties in dealing with small datase...

Full description

Saved in:
Bibliographic Details
Main Authors: Fangfang Xu, Huan Zhao, Weihua Zhou, Yun Zhou
Format: Article
Language:English
Published: Taylor & Francis Group 2022-12-01
Series:Connection Science
Subjects:
Online Access:http://dx.doi.org/10.1080/09540091.2021.1970719
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Massive regression and forecasting tasks are generally cost-sensitive regression learning problems with asymmetric costs between over-prediction and under-prediction. However, existing classic methods, such as clustering and feature selection, are subject to difficulties in dealing with small datasets. As one of the key challenges, it is difficult to statistically validate the importance of features using traditional algorithms (e.g. the Boruta algorithm) owing to insufficient available data. By leveraging the feature information intra-cluster (item group with similar attributes), we propose an intra-cluster product favoured (ICPF) feature selection algorithm to select the information based on the traditional filtering method (specifically the Boruta algorithm in our study). The experimental results show that the ICPF algorithm significantly reduces the number of dimensions of the selected feature set and improves the performance of cost-sensitive regression learning. The misprediction cost decreased by 33.5% (linear-linear cost function) and 32.4% (quadratic-quadratic cost function) after adopting the ICPF algorithm. In addition, the advantage of the ICPF algorithm is robust to other regression models, such as random forest and XGboost.
ISSN:0954-0091
1360-0494