A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud

Abstract Healthcare insurance fraud imposes a significant financial burden on healthcare systems worldwide, with annual losses reaching billions of dollars. This study aims to improve fraud detection accuracy using machine learning techniques. Our approach consists of three key stages: data preproce...

Full description

Saved in:
Bibliographic Details
Main Authors: Zeyu Wang, Xiaofang Chen, Yiwei Wu, Linke Jiang, Shiming Lin, Gang Qiu
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-82062-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850273333630730240
author Zeyu Wang
Xiaofang Chen
Yiwei Wu
Linke Jiang
Shiming Lin
Gang Qiu
author_facet Zeyu Wang
Xiaofang Chen
Yiwei Wu
Linke Jiang
Shiming Lin
Gang Qiu
author_sort Zeyu Wang
collection DOAJ
description Abstract Healthcare insurance fraud imposes a significant financial burden on healthcare systems worldwide, with annual losses reaching billions of dollars. This study aims to improve fraud detection accuracy using machine learning techniques. Our approach consists of three key stages: data preprocessing, model training and integration, and result analysis with feature interpretation. Initially, we examined the dataset’s characteristics and employed embedded and permutation methods to test the performance and runtime of single models under different feature sets, selecting the minimal number of features that could still achieve high performance. We then applied ensemble techniques, including Voting, Weighted, and Stacking methods, to combine different models and compare their performances. Feature interpretation was achieved through partial dependence plots (PDP), SHAP, and LIME, allowing us to understand each feature’s impact on the predictions. Finally, we benchmarked our approach against existing studies to evaluate its advantages and limitations. The findings demonstrate improved fraud detection accuracy and offer insights into the interpretability of machine learning models in this context.
format Article
id doaj-art-e4de126d13d440208255d98e9792c29d
institution OA Journals
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-e4de126d13d440208255d98e9792c29d2025-08-20T01:51:31ZengNature PortfolioScientific Reports2045-23222025-01-0115112210.1038/s41598-024-82062-xA robust and interpretable ensemble machine learning model for predicting healthcare insurance fraudZeyu Wang0Xiaofang Chen1Yiwei Wu2Linke Jiang3Shiming Lin4Gang Qiu5School of Informatics, Xiamen UniversityXiang’an Hospital, Xiamen UniversitySchool of Informatics, Xiamen UniversitySchool of Informatics, Xiamen UniversitySchool of Informatics, Xiamen UniversitySchool of Information Engineering, Changji UniversityAbstract Healthcare insurance fraud imposes a significant financial burden on healthcare systems worldwide, with annual losses reaching billions of dollars. This study aims to improve fraud detection accuracy using machine learning techniques. Our approach consists of three key stages: data preprocessing, model training and integration, and result analysis with feature interpretation. Initially, we examined the dataset’s characteristics and employed embedded and permutation methods to test the performance and runtime of single models under different feature sets, selecting the minimal number of features that could still achieve high performance. We then applied ensemble techniques, including Voting, Weighted, and Stacking methods, to combine different models and compare their performances. Feature interpretation was achieved through partial dependence plots (PDP), SHAP, and LIME, allowing us to understand each feature’s impact on the predictions. Finally, we benchmarked our approach against existing studies to evaluate its advantages and limitations. The findings demonstrate improved fraud detection accuracy and offer insights into the interpretability of machine learning models in this context.https://doi.org/10.1038/s41598-024-82062-xHealthcare insurance fraudMachine learningModel ensembleModel interpretability
spellingShingle Zeyu Wang
Xiaofang Chen
Yiwei Wu
Linke Jiang
Shiming Lin
Gang Qiu
A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud
Scientific Reports
Healthcare insurance fraud
Machine learning
Model ensemble
Model interpretability
title A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud
title_full A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud
title_fullStr A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud
title_full_unstemmed A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud
title_short A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud
title_sort robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud
topic Healthcare insurance fraud
Machine learning
Model ensemble
Model interpretability
url https://doi.org/10.1038/s41598-024-82062-x
work_keys_str_mv AT zeyuwang arobustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud
AT xiaofangchen arobustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud
AT yiweiwu arobustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud
AT linkejiang arobustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud
AT shiminglin arobustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud
AT gangqiu arobustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud
AT zeyuwang robustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud
AT xiaofangchen robustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud
AT yiweiwu robustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud
AT linkejiang robustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud
AT shiminglin robustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud
AT gangqiu robustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud