An Empirical Comparison of Interpretable Models to Post-Hoc Explanations
Recently, some effort went into explaining intransparent and black-box models, such as deep neural networks or random forests. So-called model-agnostic methods typically approximate the prediction of the intransparent black-box model with an interpretable model, without considering any specifics of...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2023-05-01
|
| Series: | AI |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2673-2688/4/2/23 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Recently, some effort went into explaining intransparent and black-box models, such as deep neural networks or random forests. So-called model-agnostic methods typically approximate the prediction of the intransparent black-box model with an interpretable model, without considering any specifics of the black-box model itself. It is a valid question whether direct learning of interpretable white-box models should not be preferred over post-hoc approximations of intransparent and black-box models. In this paper, we report the results of an empirical study, which compares post-hoc explanations and interpretable models on several datasets for rule-based and feature-based interpretable models. The results seem to underline that often directly learned interpretable models approximate the black-box models at least as well as their post-hoc surrogates, even though the former do not have direct access to the black-box model. |
|---|---|
| ISSN: | 2673-2688 |