Pre Hoc and Co Hoc Explainability: Frameworks for Integrating Interpretability into Machine Learning Training for Enhanced Transparency and Performance
Post hoc explanations for black-box machine learning models have been criticized for potentially inaccurate surrogate models and computational burden at prediction time. We propose pre hoc and co hoc explainability frameworks that integrate interpretability directly into the training process through...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/13/7544 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Post hoc explanations for black-box machine learning models have been criticized for potentially inaccurate surrogate models and computational burden at prediction time. We propose pre hoc and co hoc explainability frameworks that integrate interpretability directly into the training process through an inherently interpretable white-box model. Pre hoc uses the white-box model to regularize the black-box model, while co hoc jointly optimizes both models with a shared loss function. We extend these frameworks to generate instance-specific explanations using Jensen–Shannon divergence as a regularization term. Our two-phase approach first trains models for fidelity, then generates local explanations through neighborhood-based fine-tuning. Experiments on credit risk scoring and movie recommendation datasets demonstrate superior global and local fidelity compared to LIME, without compromising accuracy. The co hoc framework additionally enhances white-box model accuracy by up to 3%, making it valuable for regulated domains requiring interpretable models. Our approaches provide more faithful and consistent explanations at a lower computational cost than existing methods, offering a promising direction for making machine learning models more transparent and trustworthy while maintaining high prediction accuracy. |
|---|---|
| ISSN: | 2076-3417 |