KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation

Data valuation, critical for quantifying individual data points’ contributions to model performance, faces challenges in noisy, high-dimensional medical datasets due to inherent feature correlations and measurement uncertainties. Existing Shapley value-based methods struggle with an accur...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mingming Jiang, Qinghe Liu, Hongli Xu, Wenting Liao, Rilige Wu, Longji Li, Yawei Zhao, Kunlun He
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Data valuation medical data analysis Shapley value kernel alignment implicit decomposition
Online Access:	https://ieeexplore.ieee.org/document/11072492/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Data valuation, critical for quantifying individual data points’ contributions to model performance, faces challenges in noisy, high-dimensional medical datasets due to inherent feature correlations and measurement uncertainties. Existing Shapley value-based methods struggle with an accuracy-efficiency tradeoff and fail to address medical data characteristics like noise and missing values. This paper introduces Kernel-Aligned Sampling (KAS), a dual-mechanism framework that innovates through: 1) A kernel-aligned sampling probability strategy that aligns valuation distributions with feature-space similarity to stabilize Shapley approximations and resolve the speed-accuracy dilemma; 2) A prior-guided data refinement mechanism that leverages domain knowledge to implicitly decompose data into signal and noise components, mitigating misattribution of noisy or incomplete data as low-value; 3) Experiments on 10 datasets and 4 baselines demonstrate that KAS outperforms baseline methods by 5.7%-8.8% in high-value data ablation tests and achieves an average improvement of 4.3% in noisy data detection accuracy compared to the second-best method.
ISSN:	2169-3536

KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation

Similar Items