A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud
Abstract Healthcare insurance fraud imposes a significant financial burden on healthcare systems worldwide, with annual losses reaching billions of dollars. This study aims to improve fraud detection accuracy using machine learning techniques. Our approach consists of three key stages: data preproce...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-01-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-024-82062-x |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850273333630730240 |
|---|---|
| author | Zeyu Wang Xiaofang Chen Yiwei Wu Linke Jiang Shiming Lin Gang Qiu |
| author_facet | Zeyu Wang Xiaofang Chen Yiwei Wu Linke Jiang Shiming Lin Gang Qiu |
| author_sort | Zeyu Wang |
| collection | DOAJ |
| description | Abstract Healthcare insurance fraud imposes a significant financial burden on healthcare systems worldwide, with annual losses reaching billions of dollars. This study aims to improve fraud detection accuracy using machine learning techniques. Our approach consists of three key stages: data preprocessing, model training and integration, and result analysis with feature interpretation. Initially, we examined the dataset’s characteristics and employed embedded and permutation methods to test the performance and runtime of single models under different feature sets, selecting the minimal number of features that could still achieve high performance. We then applied ensemble techniques, including Voting, Weighted, and Stacking methods, to combine different models and compare their performances. Feature interpretation was achieved through partial dependence plots (PDP), SHAP, and LIME, allowing us to understand each feature’s impact on the predictions. Finally, we benchmarked our approach against existing studies to evaluate its advantages and limitations. The findings demonstrate improved fraud detection accuracy and offer insights into the interpretability of machine learning models in this context. |
| format | Article |
| id | doaj-art-e4de126d13d440208255d98e9792c29d |
| institution | OA Journals |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-e4de126d13d440208255d98e9792c29d2025-08-20T01:51:31ZengNature PortfolioScientific Reports2045-23222025-01-0115112210.1038/s41598-024-82062-xA robust and interpretable ensemble machine learning model for predicting healthcare insurance fraudZeyu Wang0Xiaofang Chen1Yiwei Wu2Linke Jiang3Shiming Lin4Gang Qiu5School of Informatics, Xiamen UniversityXiang’an Hospital, Xiamen UniversitySchool of Informatics, Xiamen UniversitySchool of Informatics, Xiamen UniversitySchool of Informatics, Xiamen UniversitySchool of Information Engineering, Changji UniversityAbstract Healthcare insurance fraud imposes a significant financial burden on healthcare systems worldwide, with annual losses reaching billions of dollars. This study aims to improve fraud detection accuracy using machine learning techniques. Our approach consists of three key stages: data preprocessing, model training and integration, and result analysis with feature interpretation. Initially, we examined the dataset’s characteristics and employed embedded and permutation methods to test the performance and runtime of single models under different feature sets, selecting the minimal number of features that could still achieve high performance. We then applied ensemble techniques, including Voting, Weighted, and Stacking methods, to combine different models and compare their performances. Feature interpretation was achieved through partial dependence plots (PDP), SHAP, and LIME, allowing us to understand each feature’s impact on the predictions. Finally, we benchmarked our approach against existing studies to evaluate its advantages and limitations. The findings demonstrate improved fraud detection accuracy and offer insights into the interpretability of machine learning models in this context.https://doi.org/10.1038/s41598-024-82062-xHealthcare insurance fraudMachine learningModel ensembleModel interpretability |
| spellingShingle | Zeyu Wang Xiaofang Chen Yiwei Wu Linke Jiang Shiming Lin Gang Qiu A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud Scientific Reports Healthcare insurance fraud Machine learning Model ensemble Model interpretability |
| title | A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud |
| title_full | A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud |
| title_fullStr | A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud |
| title_full_unstemmed | A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud |
| title_short | A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud |
| title_sort | robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud |
| topic | Healthcare insurance fraud Machine learning Model ensemble Model interpretability |
| url | https://doi.org/10.1038/s41598-024-82062-x |
| work_keys_str_mv | AT zeyuwang arobustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud AT xiaofangchen arobustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud AT yiweiwu arobustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud AT linkejiang arobustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud AT shiminglin arobustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud AT gangqiu arobustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud AT zeyuwang robustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud AT xiaofangchen robustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud AT yiweiwu robustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud AT linkejiang robustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud AT shiminglin robustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud AT gangqiu robustandinterpretableensemblemachinelearningmodelforpredictinghealthcareinsurancefraud |