An Empirical Comparison of Interpretable Models to Post-Hoc Explanations

Recently, some effort went into explaining intransparent and black-box models, such as deep neural networks or random forests. So-called model-agnostic methods typically approximate the prediction of the intransparent black-box model with an interpretable model, without considering any specifics of...

Full description

Saved in:
Bibliographic Details
Main Authors: Parisa Mahya, Johannes Fürnkranz
Format: Article
Language:English
Published: MDPI AG 2023-05-01
Series:AI
Subjects:
Online Access:https://www.mdpi.com/2673-2688/4/2/23
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recently, some effort went into explaining intransparent and black-box models, such as deep neural networks or random forests. So-called model-agnostic methods typically approximate the prediction of the intransparent black-box model with an interpretable model, without considering any specifics of the black-box model itself. It is a valid question whether direct learning of interpretable white-box models should not be preferred over post-hoc approximations of intransparent and black-box models. In this paper, we report the results of an empirical study, which compares post-hoc explanations and interpretable models on several datasets for rule-based and feature-based interpretable models. The results seem to underline that often directly learned interpretable models approximate the black-box models at least as well as their post-hoc surrogates, even though the former do not have direct access to the black-box model.
ISSN:2673-2688