Confidence evaluation for feature selection in expanded feature space based on density of states
In materials informatics, feature selection and model selection are utilized in the knowledge extraction process, and the confidence evaluation of the selection result is crucial for ensuring the reliability of the extracted knowledge. In this study, we propose a novel method to quantitatively evalu...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
AIP Publishing LLC
2025-03-01
|
| Series: | APL Machine Learning |
| Online Access: | http://dx.doi.org/10.1063/5.0245626 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In materials informatics, feature selection and model selection are utilized in the knowledge extraction process, and the confidence evaluation of the selection result is crucial for ensuring the reliability of the extracted knowledge. In this study, we propose a novel method to quantitatively evaluate the significance of low-dimensional models obtained from an expanded feature space using the density of states (DoS) for the evaluation metrics. This method allows us to compare the performance of the selected model to those of other models and evaluate its significance. We further propose an evaluation method for feature importance based on marginal posterior probabilities using the Bayesian model averaging (BMA) framework, which considers all models. We demonstrate the effectiveness of our proposed methods through their application to a crystal structure dataset. Our results show that the DoS analysis reveals the presence of a large number of models with a comparable performance with the best model reported in a previous research work. This suggests that knowledge extracted from a single selected model can be unreliable and that considering other models is crucial. We also show that the BMA-based feature importance evaluation provides valuable insights into the importance of features, highlighting both primary features and functional forms. In addition, we demonstrate that the LASSO + L0 method commonly used in existing research deteriorates the search space for model selection. Our findings suggest that our proposed methods provide valuable tools for assessing the significance of low-dimensional models and extracting knowledge from large-scale data in materials informatics. |
|---|---|
| ISSN: | 2770-9019 |