Statistical Data-Generative Machine Learning-Based Credit Card Fraud Detection Systems

This study addresses the challenges of data imbalance and missing values in credit card transaction datasets by employing mode-based imputation and various machine learning models. We analyzed two distinct datasets: one consisting of European cardholders and the other from American Express, applying...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaomei Feng, Song-Kyoo Kim
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/15/2446
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849405827161522176
author Xiaomei Feng
Song-Kyoo Kim
author_facet Xiaomei Feng
Song-Kyoo Kim
author_sort Xiaomei Feng
collection DOAJ
description This study addresses the challenges of data imbalance and missing values in credit card transaction datasets by employing mode-based imputation and various machine learning models. We analyzed two distinct datasets: one consisting of European cardholders and the other from American Express, applying multiple machine learning algorithms, including Artificial Neural Networks, Convolutional Neural Networks, and Gradient Boosted Decision Trees, as well as others. Notably, the Gradient Boosted Decision Tree demonstrated superior predictive performance, with accuracy increasing by 4.53%, reaching 96.92% on the European cardholders dataset. Mode imputation significantly improved data quality, enabling stable and reliable analysis of merged datasets with up to 50% missing values. Hypothesis testing confirmed that the performance of the merged dataset was statistically significant compared to the original datasets. This study highlights the importance of robust data handling techniques in developing effective fraud detection systems, setting the stage for future research on combining different datasets and improving predictive accuracy in the financial sector.
format Article
id doaj-art-8915171605f147d7906c1237efa2bf42
institution Kabale University
issn 2227-7390
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-8915171605f147d7906c1237efa2bf422025-08-20T03:36:34ZengMDPI AGMathematics2227-73902025-07-011315244610.3390/math13152446Statistical Data-Generative Machine Learning-Based Credit Card Fraud Detection SystemsXiaomei Feng0Song-Kyoo Kim1Faculty of Applied Sciences, Macao Polytechnic University, R. de Luis Gonzaga Gomes, Macao SAR, ChinaFaculty of Applied Sciences, Macao Polytechnic University, R. de Luis Gonzaga Gomes, Macao SAR, ChinaThis study addresses the challenges of data imbalance and missing values in credit card transaction datasets by employing mode-based imputation and various machine learning models. We analyzed two distinct datasets: one consisting of European cardholders and the other from American Express, applying multiple machine learning algorithms, including Artificial Neural Networks, Convolutional Neural Networks, and Gradient Boosted Decision Trees, as well as others. Notably, the Gradient Boosted Decision Tree demonstrated superior predictive performance, with accuracy increasing by 4.53%, reaching 96.92% on the European cardholders dataset. Mode imputation significantly improved data quality, enabling stable and reliable analysis of merged datasets with up to 50% missing values. Hypothesis testing confirmed that the performance of the merged dataset was statistically significant compared to the original datasets. This study highlights the importance of robust data handling techniques in developing effective fraud detection systems, setting the stage for future research on combining different datasets and improving predictive accuracy in the financial sector.https://www.mdpi.com/2227-7390/13/15/2446credit card fraudstatistical data generationmachine learningcredit predictionpredictive modeling
spellingShingle Xiaomei Feng
Song-Kyoo Kim
Statistical Data-Generative Machine Learning-Based Credit Card Fraud Detection Systems
Mathematics
credit card fraud
statistical data generation
machine learning
credit prediction
predictive modeling
title Statistical Data-Generative Machine Learning-Based Credit Card Fraud Detection Systems
title_full Statistical Data-Generative Machine Learning-Based Credit Card Fraud Detection Systems
title_fullStr Statistical Data-Generative Machine Learning-Based Credit Card Fraud Detection Systems
title_full_unstemmed Statistical Data-Generative Machine Learning-Based Credit Card Fraud Detection Systems
title_short Statistical Data-Generative Machine Learning-Based Credit Card Fraud Detection Systems
title_sort statistical data generative machine learning based credit card fraud detection systems
topic credit card fraud
statistical data generation
machine learning
credit prediction
predictive modeling
url https://www.mdpi.com/2227-7390/13/15/2446
work_keys_str_mv AT xiaomeifeng statisticaldatagenerativemachinelearningbasedcreditcardfrauddetectionsystems
AT songkyookim statisticaldatagenerativemachinelearningbasedcreditcardfrauddetectionsystems