An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection
Online financial transactions bring convenience to people’s lives, but also present vulnerabilities for criminals to embezzle users’ accounts and trick users into credit card fraud. Although machine learning methods have been adopted to detect anomalous transactions, it’s hard for a single machine l...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
PeerJ Inc.
2024-10-01
|
| Series: | PeerJ Computer Science |
| Subjects: | |
| Online Access: | https://peerj.com/articles/cs-2323.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850181166828617728 |
|---|---|
| author | Lianhong Ding Luqi Liu Yangchuan Wang Peng Shi Jianye Yu |
| author_facet | Lianhong Ding Luqi Liu Yangchuan Wang Peng Shi Jianye Yu |
| author_sort | Lianhong Ding |
| collection | DOAJ |
| description | Online financial transactions bring convenience to people’s lives, but also present vulnerabilities for criminals to embezzle users’ accounts and trick users into credit card fraud. Although machine learning methods have been adopted to detect anomalous transactions, it’s hard for a single machine learning method to achieve satisfying results with the increasing scale and dimensionality of financial datasets. In addition, for anomaly detection of financial data, there is an obvious imbalance between normal records and abnormal. In this situation, the experimental results cannot be objectively evaluated only by the traditional metrics, such as precision, recall, and accuracy. This paper proposes an AutoEncoder enhanced LightGBM method for credit card detection. The method inherits the advantages of each component, using an AutoEncoder for feature reconstruction on the dataset, and integrating the LightGBM algorithm for improving the GBDT (Gradient Boosting Decison Tree) to detect abnormal data more accurately and efficiently. Besides the traditional evaluation metrics, F-measure, area under curve (AUC), Matthew’s correlation coefficient (MCC), and balanced classification rate (BCR) are also adopted as the evaluation metrics. Two financial datasets were used to validate the performance and robustness of the proposed model. Results obtained from the credit card fraud dataset containing 31 features indicate that our model significantly outperforms other models with a recall of 94.85%, representing a 10.70% improvement compared to the best detection performance model with a recall of only 86%. Additionally, our model’s BCR score is also significantly better than other models, with a BCR score of 97%, as opposed to the best detection performance model’s BCR score of 92%, representing a 5% improvement by our model. Various sampling methods and model combinations were considered in this study. It was found that the SMOTE algorithm combined with the proposed model produced the best results, with an AUC value of 96.83% and an F-measure score of 80.27%. The Santander bank transaction record dataset is a high dimensional large dataset containing 200 features. Experimental results on this dataset reveal that compared to other models, our model significantly improved recall and F-measure results, raising the recall to 94.14% and the F-measure score by 11.51%, surpassing the second-best-performing model. Overall, these findings demonstrate the robustness and superiority of our model in detecting fraudulent transactions and highlight the effectiveness of the SMOTE algorithm in combination with the proposed model. |
| format | Article |
| id | doaj-art-d6fe3f80f3a443abac94cd307663d7d5 |
| institution | OA Journals |
| issn | 2376-5992 |
| language | English |
| publishDate | 2024-10-01 |
| publisher | PeerJ Inc. |
| record_format | Article |
| series | PeerJ Computer Science |
| spelling | doaj-art-d6fe3f80f3a443abac94cd307663d7d52025-08-20T02:17:57ZengPeerJ Inc.PeerJ Computer Science2376-59922024-10-0110e232310.7717/peerj-cs.2323An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detectionLianhong Ding0Luqi Liu1Yangchuan Wang2Peng Shi3Jianye Yu4Beijing Wuzi University, Beijing, ChinaBeijing Wuzi University, Beijing, ChinaBeijing Wuzi University, Beijing, ChinaUniversity of Science and Technology Beijing, Beijing, ChinaBeijing Wuzi University, Beijing, ChinaOnline financial transactions bring convenience to people’s lives, but also present vulnerabilities for criminals to embezzle users’ accounts and trick users into credit card fraud. Although machine learning methods have been adopted to detect anomalous transactions, it’s hard for a single machine learning method to achieve satisfying results with the increasing scale and dimensionality of financial datasets. In addition, for anomaly detection of financial data, there is an obvious imbalance between normal records and abnormal. In this situation, the experimental results cannot be objectively evaluated only by the traditional metrics, such as precision, recall, and accuracy. This paper proposes an AutoEncoder enhanced LightGBM method for credit card detection. The method inherits the advantages of each component, using an AutoEncoder for feature reconstruction on the dataset, and integrating the LightGBM algorithm for improving the GBDT (Gradient Boosting Decison Tree) to detect abnormal data more accurately and efficiently. Besides the traditional evaluation metrics, F-measure, area under curve (AUC), Matthew’s correlation coefficient (MCC), and balanced classification rate (BCR) are also adopted as the evaluation metrics. Two financial datasets were used to validate the performance and robustness of the proposed model. Results obtained from the credit card fraud dataset containing 31 features indicate that our model significantly outperforms other models with a recall of 94.85%, representing a 10.70% improvement compared to the best detection performance model with a recall of only 86%. Additionally, our model’s BCR score is also significantly better than other models, with a BCR score of 97%, as opposed to the best detection performance model’s BCR score of 92%, representing a 5% improvement by our model. Various sampling methods and model combinations were considered in this study. It was found that the SMOTE algorithm combined with the proposed model produced the best results, with an AUC value of 96.83% and an F-measure score of 80.27%. The Santander bank transaction record dataset is a high dimensional large dataset containing 200 features. Experimental results on this dataset reveal that compared to other models, our model significantly improved recall and F-measure results, raising the recall to 94.14% and the F-measure score by 11.51%, surpassing the second-best-performing model. Overall, these findings demonstrate the robustness and superiority of our model in detecting fraudulent transactions and highlight the effectiveness of the SMOTE algorithm in combination with the proposed model.https://peerj.com/articles/cs-2323.pdfAnomaly detectionAutoEncoderBCRCredit card fraudLightGBMMCC |
| spellingShingle | Lianhong Ding Luqi Liu Yangchuan Wang Peng Shi Jianye Yu An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection PeerJ Computer Science Anomaly detection AutoEncoder BCR Credit card fraud LightGBM MCC |
| title | An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection |
| title_full | An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection |
| title_fullStr | An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection |
| title_full_unstemmed | An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection |
| title_short | An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection |
| title_sort | autoencoder enhanced light gradient boosting machine method for credit card fraud detection |
| topic | Anomaly detection AutoEncoder BCR Credit card fraud LightGBM MCC |
| url | https://peerj.com/articles/cs-2323.pdf |
| work_keys_str_mv | AT lianhongding anautoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection AT luqiliu anautoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection AT yangchuanwang anautoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection AT pengshi anautoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection AT jianyeyu anautoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection AT lianhongding autoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection AT luqiliu autoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection AT yangchuanwang autoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection AT pengshi autoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection AT jianyeyu autoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection |