CLASS IMBALANCE PROBLEM IN ANTI-FRAUD PROBLEM: METRICS, SAMPLING AND CONVOLUTIONAL NEURAL NETWORKS

The volume of transactions conducted without the consent of clients of financial institutions, according to the statistics of the Central Bank of Russian Federation, increased by 11.48% in 2023 compared to the previous year. Therefore, research aimed at improving existing and discovering new methods...

Full description

Saved in:
Bibliographic Details
Main Author: Ruslan Ch. Bobonazarov
Format: Article
Language:English
Published: Joint Stock Company "Experimental Scientific and Production Association SPELS 2025-05-01
Series:Безопасность информационных технологий
Subjects:
Online Access:https://bit.spels.ru/index.php/bit/article/view/1778
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The volume of transactions conducted without the consent of clients of financial institutions, according to the statistics of the Central Bank of Russian Federation, increased by 11.48% in 2023 compared to the previous year. Therefore, research aimed at improving existing and discovering new methods of counteracting fraud (anti-fraud) is relevant. The task of counteracting fraud is greatly complicated by the problem of class imbalance – the proportion of illegitimate transactions relative to the total flow of operations is extremely small, making up only thousandths of a percent. In addition, banks cannot publicly share information about fraud and financial transactions, as it is confidential data protected under Federal Law No. 152 "On Personal Data." Researchers, using publicly available datasets, apply different approaches to model evaluation, some of which are not effective in conditions of severe class imbalance. This study proposes using the Precision-Recall Curve (PR-Curve) and Precision-Recall Area Under the Curve (PR AUC) for model comparison, and demonstrates why these metrics are relevant, unlike most others. Moreover, the study shows common mistakes made in scientific research, such as incorrect resampling before dataset splitting, and shows that random dataset splitting without considering time stamps can significantly affect and distort the final results. In the concluding part of the study, an approach using convolutional neural networks for the task of fraud detection is proposed, which achieved a PR AUC score of 0.91, significantly surpassing the results of all traditional approaches.
ISSN:2074-7128
2074-7136