Enhancing Medicare Fraud Detection With a CNN-Transformer-XGBoost Framework and Explainable AI

Healthcare fraud is a critical challenge, contributing significantly to rising healthcare costs and financial losses. This article proposes a hybrid architecture for healthcare fraud detection, combining deep learning-based feature representation with gradient boosting classification and explainable...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammad Balayet Hossain Sakil, Md Amit Hasan, Md Shahin Alam Mozumder, Md Rokibul Hasan, Shafiul Ajam Opee, M. F. Mridha, Zeyar Aung
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10971341/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Healthcare fraud is a critical challenge, contributing significantly to rising healthcare costs and financial losses. This article proposes a hybrid architecture for healthcare fraud detection, combining deep learning-based feature representation with gradient boosting classification and explainable AI techniques. The framework integrates convolutional neural networks (CNNs), transformers, and XGBoost to capture intricate patterns in claims data while maintaining interpretability through Shapley additive explanations. The model we proposed was tested on two datasets: the Medicare Provider Fraud dataset and the Healthcare Providers dataset. On the Medicare dataset, the framework achieved an F1-score of 0.95 on the training set and 0.92 on the test set, with an AUC-ROC of 0.98 and 0.97, respectively, outperforming state-of-the-art models such as LightGBM and CatBoost. On the Healthcare Providers dataset, the framework attained a test F1-score of 0.92 and an AUC-ROC of 0.96, consistently surpassing traditional models like Support Vector Machines and Random Forest. Key contributions include integrating domain-specific features, such as provider-patient interaction graphs and temporal patterns, and using explainability techniques to enhance trustworthiness. Furthermore, the framework demonstrated computational efficiency, with a training time of 150 seconds on the primary dataset, making it suitable for real-world deployment.
ISSN:2169-3536