KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation
Data valuation, critical for quantifying individual data points’ contributions to model performance, faces challenges in noisy, high-dimensional medical datasets due to inherent feature correlations and measurement uncertainties. Existing Shapley value-based methods struggle with an accur...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11072492/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Data valuation, critical for quantifying individual data points’ contributions to model performance, faces challenges in noisy, high-dimensional medical datasets due to inherent feature correlations and measurement uncertainties. Existing Shapley value-based methods struggle with an accuracy-efficiency tradeoff and fail to address medical data characteristics like noise and missing values. This paper introduces Kernel-Aligned Sampling (KAS), a dual-mechanism framework that innovates through: 1) A kernel-aligned sampling probability strategy that aligns valuation distributions with feature-space similarity to stabilize Shapley approximations and resolve the speed-accuracy dilemma; 2) A prior-guided data refinement mechanism that leverages domain knowledge to implicitly decompose data into signal and noise components, mitigating misattribution of noisy or incomplete data as low-value; 3) Experiments on 10 datasets and 4 baselines demonstrate that KAS outperforms baseline methods by 5.7%-8.8% in high-value data ablation tests and achieves an average improvement of 4.3% in noisy data detection accuracy compared to the second-best method. |
|---|---|
| ISSN: | 2169-3536 |