Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions

Decision tree boosting algorithms, such as XGBoost, have demonstrated superior predictive performance on tabular data for supervised learning compared to neural networks. However, recent studies on loss functions for imbalanced data have primarily focused on deep learning. The goal of this study is...

Full description

Saved in:
Bibliographic Details
Main Authors: Akiyoshi Sutou, Jinfang Wang
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10807295/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850104692574519296
author Akiyoshi Sutou
Jinfang Wang
author_facet Akiyoshi Sutou
Jinfang Wang
author_sort Akiyoshi Sutou
collection DOAJ
description Decision tree boosting algorithms, such as XGBoost, have demonstrated superior predictive performance on tabular data for supervised learning compared to neural networks. However, recent studies on loss functions for imbalanced data have primarily focused on deep learning. The goal of this study is to improve the XGBoost algorithm for better performance on unbalanced data. To this end, Influence-balanced loss (IBL), originally introduced in deep learning, was applied to enhance the performance of the XGBoost algorithm. As a side effect, the proposed method was also found to perform well on datasets prone to over-specialization. Furthermore, we conducted a comparison between the proposed method and conventional techniques using 38 publicly available datasets. Our method outperforms other methods in terms of F1-score and Matthews correlation coefficient.
format Article
id doaj-art-2e172cf48fb143c793df065951ffc790
institution DOAJ
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-2e172cf48fb143c793df065951ffc7902025-08-20T02:39:16ZengIEEEIEEE Access2169-35362024-01-011219347319348610.1109/ACCESS.2024.352015910807295Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence FunctionsAkiyoshi Sutou0https://orcid.org/0009-0006-9095-4060Jinfang Wang1Graduate School of Data Science, Yokohama City University, Yokohama, JapanSchool of International Liberal Studies, Waseda University, Tokyo, JapanDecision tree boosting algorithms, such as XGBoost, have demonstrated superior predictive performance on tabular data for supervised learning compared to neural networks. However, recent studies on loss functions for imbalanced data have primarily focused on deep learning. The goal of this study is to improve the XGBoost algorithm for better performance on unbalanced data. To this end, Influence-balanced loss (IBL), originally introduced in deep learning, was applied to enhance the performance of the XGBoost algorithm. As a side effect, the proposed method was also found to perform well on datasets prone to over-specialization. Furthermore, we conducted a comparison between the proposed method and conventional techniques using 38 publicly available datasets. Our method outperforms other methods in terms of F1-score and Matthews correlation coefficient.https://ieeexplore.ieee.org/document/10807295/Imbalanced dataXGBoostinfluence-balanced lossover-specialization
spellingShingle Akiyoshi Sutou
Jinfang Wang
Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions
IEEE Access
Imbalanced data
XGBoost
influence-balanced loss
over-specialization
title Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions
title_full Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions
title_fullStr Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions
title_full_unstemmed Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions
title_short Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions
title_sort influence balanced xgboost improving xgboost for imbalanced data using influence functions
topic Imbalanced data
XGBoost
influence-balanced loss
over-specialization
url https://ieeexplore.ieee.org/document/10807295/
work_keys_str_mv AT akiyoshisutou influencebalancedxgboostimprovingxgboostforimbalanceddatausinginfluencefunctions
AT jinfangwang influencebalancedxgboostimprovingxgboostforimbalanceddatausinginfluencefunctions