Attribution-based interpretable classification neural network with global and local perspectives

Abstract Neural networks are challenging to apply in domains requiring high reliability due to their black-box nature, and researchers are increasingly focusing on interpreting neural networks. While pursuing neural network performance, most methods often sacrifice interpretability by interpreting t...

Full description

Saved in:
Bibliographic Details
Main Authors: Zihao Shi, Zuqiang Meng, Haiming Tuo, Chaohong Tan
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-06218-z
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Neural networks are challenging to apply in domains requiring high reliability due to their black-box nature, and researchers are increasingly focusing on interpreting neural networks. While pursuing neural network performance, most methods often sacrifice interpretability by interpreting the model after training, which is often local and does not provide more detailed information. To obtain both great interpretability and classification performance, we propose an attribution-based interpretable classification model for tabular data, that maps the intermediate output to the interpretable data representation space and automatically selects the corresponding feature values for classification and interpretation. It can assign an importance value to each input feature of an instance to achieve local interpretability while also reflecting the global importance of input features. Furthermore, we propose different training methods. While finding the best way to train the model, we discover there is a trade-off between classification performance and interpretability. Experimental results on eight open-source datasets show that our method is comparable to the competitive black-box neural networks concerning classification accuracy. Regarding two metrics of attribution methods, Reverse Precision and Generality, our model outperforms two popular post-hoc interpretable methods.
ISSN:2045-2322