Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions
Decision tree boosting algorithms, such as XGBoost, have demonstrated superior predictive performance on tabular data for supervised learning compared to neural networks. However, recent studies on loss functions for imbalanced data have primarily focused on deep learning. The goal of this study is...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10807295/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850104692574519296 |
|---|---|
| author | Akiyoshi Sutou Jinfang Wang |
| author_facet | Akiyoshi Sutou Jinfang Wang |
| author_sort | Akiyoshi Sutou |
| collection | DOAJ |
| description | Decision tree boosting algorithms, such as XGBoost, have demonstrated superior predictive performance on tabular data for supervised learning compared to neural networks. However, recent studies on loss functions for imbalanced data have primarily focused on deep learning. The goal of this study is to improve the XGBoost algorithm for better performance on unbalanced data. To this end, Influence-balanced loss (IBL), originally introduced in deep learning, was applied to enhance the performance of the XGBoost algorithm. As a side effect, the proposed method was also found to perform well on datasets prone to over-specialization. Furthermore, we conducted a comparison between the proposed method and conventional techniques using 38 publicly available datasets. Our method outperforms other methods in terms of F1-score and Matthews correlation coefficient. |
| format | Article |
| id | doaj-art-2e172cf48fb143c793df065951ffc790 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-2e172cf48fb143c793df065951ffc7902025-08-20T02:39:16ZengIEEEIEEE Access2169-35362024-01-011219347319348610.1109/ACCESS.2024.352015910807295Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence FunctionsAkiyoshi Sutou0https://orcid.org/0009-0006-9095-4060Jinfang Wang1Graduate School of Data Science, Yokohama City University, Yokohama, JapanSchool of International Liberal Studies, Waseda University, Tokyo, JapanDecision tree boosting algorithms, such as XGBoost, have demonstrated superior predictive performance on tabular data for supervised learning compared to neural networks. However, recent studies on loss functions for imbalanced data have primarily focused on deep learning. The goal of this study is to improve the XGBoost algorithm for better performance on unbalanced data. To this end, Influence-balanced loss (IBL), originally introduced in deep learning, was applied to enhance the performance of the XGBoost algorithm. As a side effect, the proposed method was also found to perform well on datasets prone to over-specialization. Furthermore, we conducted a comparison between the proposed method and conventional techniques using 38 publicly available datasets. Our method outperforms other methods in terms of F1-score and Matthews correlation coefficient.https://ieeexplore.ieee.org/document/10807295/Imbalanced dataXGBoostinfluence-balanced lossover-specialization |
| spellingShingle | Akiyoshi Sutou Jinfang Wang Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions IEEE Access Imbalanced data XGBoost influence-balanced loss over-specialization |
| title | Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions |
| title_full | Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions |
| title_fullStr | Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions |
| title_full_unstemmed | Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions |
| title_short | Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions |
| title_sort | influence balanced xgboost improving xgboost for imbalanced data using influence functions |
| topic | Imbalanced data XGBoost influence-balanced loss over-specialization |
| url | https://ieeexplore.ieee.org/document/10807295/ |
| work_keys_str_mv | AT akiyoshisutou influencebalancedxgboostimprovingxgboostforimbalanceddatausinginfluencefunctions AT jinfangwang influencebalancedxgboostimprovingxgboostforimbalanceddatausinginfluencefunctions |