An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection

Online financial transactions bring convenience to people’s lives, but also present vulnerabilities for criminals to embezzle users’ accounts and trick users into credit card fraud. Although machine learning methods have been adopted to detect anomalous transactions, it’s hard for a single machine l...

Full description

Saved in:
Bibliographic Details
Main Authors: Lianhong Ding, Luqi Liu, Yangchuan Wang, Peng Shi, Jianye Yu
Format: Article
Language:English
Published: PeerJ Inc. 2024-10-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2323.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850181166828617728
author Lianhong Ding
Luqi Liu
Yangchuan Wang
Peng Shi
Jianye Yu
author_facet Lianhong Ding
Luqi Liu
Yangchuan Wang
Peng Shi
Jianye Yu
author_sort Lianhong Ding
collection DOAJ
description Online financial transactions bring convenience to people’s lives, but also present vulnerabilities for criminals to embezzle users’ accounts and trick users into credit card fraud. Although machine learning methods have been adopted to detect anomalous transactions, it’s hard for a single machine learning method to achieve satisfying results with the increasing scale and dimensionality of financial datasets. In addition, for anomaly detection of financial data, there is an obvious imbalance between normal records and abnormal. In this situation, the experimental results cannot be objectively evaluated only by the traditional metrics, such as precision, recall, and accuracy. This paper proposes an AutoEncoder enhanced LightGBM method for credit card detection. The method inherits the advantages of each component, using an AutoEncoder for feature reconstruction on the dataset, and integrating the LightGBM algorithm for improving the GBDT (Gradient Boosting Decison Tree) to detect abnormal data more accurately and efficiently. Besides the traditional evaluation metrics, F-measure, area under curve (AUC), Matthew’s correlation coefficient (MCC), and balanced classification rate (BCR) are also adopted as the evaluation metrics. Two financial datasets were used to validate the performance and robustness of the proposed model. Results obtained from the credit card fraud dataset containing 31 features indicate that our model significantly outperforms other models with a recall of 94.85%, representing a 10.70% improvement compared to the best detection performance model with a recall of only 86%. Additionally, our model’s BCR score is also significantly better than other models, with a BCR score of 97%, as opposed to the best detection performance model’s BCR score of 92%, representing a 5% improvement by our model. Various sampling methods and model combinations were considered in this study. It was found that the SMOTE algorithm combined with the proposed model produced the best results, with an AUC value of 96.83% and an F-measure score of 80.27%. The Santander bank transaction record dataset is a high dimensional large dataset containing 200 features. Experimental results on this dataset reveal that compared to other models, our model significantly improved recall and F-measure results, raising the recall to 94.14% and the F-measure score by 11.51%, surpassing the second-best-performing model. Overall, these findings demonstrate the robustness and superiority of our model in detecting fraudulent transactions and highlight the effectiveness of the SMOTE algorithm in combination with the proposed model.
format Article
id doaj-art-d6fe3f80f3a443abac94cd307663d7d5
institution OA Journals
issn 2376-5992
language English
publishDate 2024-10-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-d6fe3f80f3a443abac94cd307663d7d52025-08-20T02:17:57ZengPeerJ Inc.PeerJ Computer Science2376-59922024-10-0110e232310.7717/peerj-cs.2323An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detectionLianhong Ding0Luqi Liu1Yangchuan Wang2Peng Shi3Jianye Yu4Beijing Wuzi University, Beijing, ChinaBeijing Wuzi University, Beijing, ChinaBeijing Wuzi University, Beijing, ChinaUniversity of Science and Technology Beijing, Beijing, ChinaBeijing Wuzi University, Beijing, ChinaOnline financial transactions bring convenience to people’s lives, but also present vulnerabilities for criminals to embezzle users’ accounts and trick users into credit card fraud. Although machine learning methods have been adopted to detect anomalous transactions, it’s hard for a single machine learning method to achieve satisfying results with the increasing scale and dimensionality of financial datasets. In addition, for anomaly detection of financial data, there is an obvious imbalance between normal records and abnormal. In this situation, the experimental results cannot be objectively evaluated only by the traditional metrics, such as precision, recall, and accuracy. This paper proposes an AutoEncoder enhanced LightGBM method for credit card detection. The method inherits the advantages of each component, using an AutoEncoder for feature reconstruction on the dataset, and integrating the LightGBM algorithm for improving the GBDT (Gradient Boosting Decison Tree) to detect abnormal data more accurately and efficiently. Besides the traditional evaluation metrics, F-measure, area under curve (AUC), Matthew’s correlation coefficient (MCC), and balanced classification rate (BCR) are also adopted as the evaluation metrics. Two financial datasets were used to validate the performance and robustness of the proposed model. Results obtained from the credit card fraud dataset containing 31 features indicate that our model significantly outperforms other models with a recall of 94.85%, representing a 10.70% improvement compared to the best detection performance model with a recall of only 86%. Additionally, our model’s BCR score is also significantly better than other models, with a BCR score of 97%, as opposed to the best detection performance model’s BCR score of 92%, representing a 5% improvement by our model. Various sampling methods and model combinations were considered in this study. It was found that the SMOTE algorithm combined with the proposed model produced the best results, with an AUC value of 96.83% and an F-measure score of 80.27%. The Santander bank transaction record dataset is a high dimensional large dataset containing 200 features. Experimental results on this dataset reveal that compared to other models, our model significantly improved recall and F-measure results, raising the recall to 94.14% and the F-measure score by 11.51%, surpassing the second-best-performing model. Overall, these findings demonstrate the robustness and superiority of our model in detecting fraudulent transactions and highlight the effectiveness of the SMOTE algorithm in combination with the proposed model.https://peerj.com/articles/cs-2323.pdfAnomaly detectionAutoEncoderBCRCredit card fraudLightGBMMCC
spellingShingle Lianhong Ding
Luqi Liu
Yangchuan Wang
Peng Shi
Jianye Yu
An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection
PeerJ Computer Science
Anomaly detection
AutoEncoder
BCR
Credit card fraud
LightGBM
MCC
title An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection
title_full An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection
title_fullStr An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection
title_full_unstemmed An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection
title_short An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection
title_sort autoencoder enhanced light gradient boosting machine method for credit card fraud detection
topic Anomaly detection
AutoEncoder
BCR
Credit card fraud
LightGBM
MCC
url https://peerj.com/articles/cs-2323.pdf
work_keys_str_mv AT lianhongding anautoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection
AT luqiliu anautoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection
AT yangchuanwang anautoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection
AT pengshi anautoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection
AT jianyeyu anautoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection
AT lianhongding autoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection
AT luqiliu autoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection
AT yangchuanwang autoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection
AT pengshi autoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection
AT jianyeyu autoencoderenhancedlightgradientboostingmachinemethodforcreditcardfrauddetection