Confidence evaluation for feature selection in expanded feature space based on density of states

In materials informatics, feature selection and model selection are utilized in the knowledge extraction process, and the confidence evaluation of the selection result is crucial for ensuring the reliability of the extracted knowledge. In this study, we propose a novel method to quantitatively evalu...

Full description

Saved in:
Bibliographic Details
Main Authors: Koki Obinata, Yasuhiko Igarashi, Kenji Nagata, Keitaro Sodeyama, Masato Okada
Format: Article
Language:English
Published: AIP Publishing LLC 2025-03-01
Series:APL Machine Learning
Online Access:http://dx.doi.org/10.1063/5.0245626
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In materials informatics, feature selection and model selection are utilized in the knowledge extraction process, and the confidence evaluation of the selection result is crucial for ensuring the reliability of the extracted knowledge. In this study, we propose a novel method to quantitatively evaluate the significance of low-dimensional models obtained from an expanded feature space using the density of states (DoS) for the evaluation metrics. This method allows us to compare the performance of the selected model to those of other models and evaluate its significance. We further propose an evaluation method for feature importance based on marginal posterior probabilities using the Bayesian model averaging (BMA) framework, which considers all models. We demonstrate the effectiveness of our proposed methods through their application to a crystal structure dataset. Our results show that the DoS analysis reveals the presence of a large number of models with a comparable performance with the best model reported in a previous research work. This suggests that knowledge extracted from a single selected model can be unreliable and that considering other models is crucial. We also show that the BMA-based feature importance evaluation provides valuable insights into the importance of features, highlighting both primary features and functional forms. In addition, we demonstrate that the LASSO + L0 method commonly used in existing research deteriorates the search space for model selection. Our findings suggest that our proposed methods provide valuable tools for assessing the significance of low-dimensional models and extracting knowledge from large-scale data in materials informatics.
ISSN:2770-9019