Multi-Relational Graph Representation Learning for Financial Statement Fraud Detection

Financial statement fraud refers to malicious manipulations of financial data in listed companies’ annual statements. Traditional machine learning approaches focus on individual companies, overlooking the interactive relationships among companies that are crucial for identifying fraud patterns. More...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chenxu Wang, Mengqin Wang, Xiaoguang Wang, Luyue Zhang, Yi Long
Format:	Article
Language:	English
Published:	Tsinghua University Press 2024-09-01
Series:	Big Data Mining and Analytics
Subjects:	financial statement fraud class imbalance graph neural networks (gnn) multi-relational graphs
Online Access:	https://www.sciopen.com/article/10.26599/BDMA.2024.9020013
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832572873139552256
author	Chenxu Wang Mengqin Wang Xiaoguang Wang Luyue Zhang Yi Long
author_facet	Chenxu Wang Mengqin Wang Xiaoguang Wang Luyue Zhang Yi Long
author_sort	Chenxu Wang
collection	DOAJ
description	Financial statement fraud refers to malicious manipulations of financial data in listed companies’ annual statements. Traditional machine learning approaches focus on individual companies, overlooking the interactive relationships among companies that are crucial for identifying fraud patterns. Moreover, fraud detection is a typical imbalanced binary classification task with normal samples outnumbering fraud ones. In this paper, we propose a multi-relational graph convolutional network, named FraudGCN, for detecting financial statement fraud. A multi-relational graph is constructed to integrate industrial, supply chain, and accounting-sharing relationships, effectively encapsulating the multidimensional and complex interactions among companies. We then develop a multi-relational graph convolutional network to aggregate information within each relationship and employ an attention mechanism to fuse information across multiple relationships. The attention mechanism enables the model to distinguish the importance of different relationships, thereby aggregating more useful information from key relationships. To alleviate the class imbalance problem, we present a diffusion-based under-sampling strategy that strategically selects key nodes globally for model training. We also employ focal loss to assign greater weights to harder-to-classify minority samples. We build a real-world dataset from the annual financial statement of listed companies in China. The experimental results show that FraudGCN achieves an improvement of 3.15% in Macro-recall, 3.36% in Macro-F1, and 3.86% in GMean compared to the second-best method. The dataset and codes are publicly available at: https://github.com/XNetLab/MRG-for-Finance.
format	Article
id	doaj-art-4abe6e7fdae4473ca58e85035de22299
institution	Kabale University
issn	2096-0654
language	English
publishDate	2024-09-01
publisher	Tsinghua University Press
record_format	Article
series	Big Data Mining and Analytics
spelling	doaj-art-4abe6e7fdae4473ca58e85035de222992025-02-02T06:29:08ZengTsinghua University PressBig Data Mining and Analytics2096-06542024-09-017392094110.26599/BDMA.2024.9020013Multi-Relational Graph Representation Learning for Financial Statement Fraud DetectionChenxu Wang0Mengqin Wang1Xiaoguang Wang2Luyue Zhang3Yi Long4School of Software Engineering, and also with MoE Key Lab of Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an 710049, ChinaSchool of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, ChinaSchool of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, ChinaSchool of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, ChinaShenzhen Finance Institute, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shenzhen 518026, ChinaFinancial statement fraud refers to malicious manipulations of financial data in listed companies’ annual statements. Traditional machine learning approaches focus on individual companies, overlooking the interactive relationships among companies that are crucial for identifying fraud patterns. Moreover, fraud detection is a typical imbalanced binary classification task with normal samples outnumbering fraud ones. In this paper, we propose a multi-relational graph convolutional network, named FraudGCN, for detecting financial statement fraud. A multi-relational graph is constructed to integrate industrial, supply chain, and accounting-sharing relationships, effectively encapsulating the multidimensional and complex interactions among companies. We then develop a multi-relational graph convolutional network to aggregate information within each relationship and employ an attention mechanism to fuse information across multiple relationships. The attention mechanism enables the model to distinguish the importance of different relationships, thereby aggregating more useful information from key relationships. To alleviate the class imbalance problem, we present a diffusion-based under-sampling strategy that strategically selects key nodes globally for model training. We also employ focal loss to assign greater weights to harder-to-classify minority samples. We build a real-world dataset from the annual financial statement of listed companies in China. The experimental results show that FraudGCN achieves an improvement of 3.15% in Macro-recall, 3.36% in Macro-F1, and 3.86% in GMean compared to the second-best method. The dataset and codes are publicly available at: https://github.com/XNetLab/MRG-for-Finance.https://www.sciopen.com/article/10.26599/BDMA.2024.9020013financial statement fraudclass imbalancegraph neural networks (gnn)multi-relational graphs
spellingShingle	Chenxu Wang Mengqin Wang Xiaoguang Wang Luyue Zhang Yi Long Multi-Relational Graph Representation Learning for Financial Statement Fraud Detection Big Data Mining and Analytics financial statement fraud class imbalance graph neural networks (gnn) multi-relational graphs
title	Multi-Relational Graph Representation Learning for Financial Statement Fraud Detection
title_full	Multi-Relational Graph Representation Learning for Financial Statement Fraud Detection
title_fullStr	Multi-Relational Graph Representation Learning for Financial Statement Fraud Detection
title_full_unstemmed	Multi-Relational Graph Representation Learning for Financial Statement Fraud Detection
title_short	Multi-Relational Graph Representation Learning for Financial Statement Fraud Detection
title_sort	multi relational graph representation learning for financial statement fraud detection
topic	financial statement fraud class imbalance graph neural networks (gnn) multi-relational graphs
url	https://www.sciopen.com/article/10.26599/BDMA.2024.9020013
work_keys_str_mv	AT chenxuwang multirelationalgraphrepresentationlearningforfinancialstatementfrauddetection AT mengqinwang multirelationalgraphrepresentationlearningforfinancialstatementfrauddetection AT xiaoguangwang multirelationalgraphrepresentationlearningforfinancialstatementfrauddetection AT luyuezhang multirelationalgraphrepresentationlearningforfinancialstatementfrauddetection AT yilong multirelationalgraphrepresentationlearningforfinancialstatementfrauddetection

Multi-Relational Graph Representation Learning for Financial Statement Fraud Detection

Similar Items