KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation

Data valuation, critical for quantifying individual data points’ contributions to model performance, faces challenges in noisy, high-dimensional medical datasets due to inherent feature correlations and measurement uncertainties. Existing Shapley value-based methods struggle with an accur...

Full description

Saved in:
Bibliographic Details
Main Authors: Mingming Jiang, Qinghe Liu, Hongli Xu, Wenting Liao, Rilige Wu, Longji Li, Yawei Zhao, Kunlun He
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11072492/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data valuation, critical for quantifying individual data points’ contributions to model performance, faces challenges in noisy, high-dimensional medical datasets due to inherent feature correlations and measurement uncertainties. Existing Shapley value-based methods struggle with an accuracy-efficiency tradeoff and fail to address medical data characteristics like noise and missing values. This paper introduces Kernel-Aligned Sampling (KAS), a dual-mechanism framework that innovates through: 1) A kernel-aligned sampling probability strategy that aligns valuation distributions with feature-space similarity to stabilize Shapley approximations and resolve the speed-accuracy dilemma; 2) A prior-guided data refinement mechanism that leverages domain knowledge to implicitly decompose data into signal and noise components, mitigating misattribution of noisy or incomplete data as low-value; 3) Experiments on 10 datasets and 4 baselines demonstrate that KAS outperforms baseline methods by 5.7%-8.8% in high-value data ablation tests and achieves an average improvement of 4.3% in noisy data detection accuracy compared to the second-best method.
ISSN:2169-3536