KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation

Data valuation, critical for quantifying individual data points’ contributions to model performance, faces challenges in noisy, high-dimensional medical datasets due to inherent feature correlations and measurement uncertainties. Existing Shapley value-based methods struggle with an accur...

Full description

Saved in:
Bibliographic Details
Main Authors: Mingming Jiang, Qinghe Liu, Hongli Xu, Wenting Liao, Rilige Wu, Longji Li, Yawei Zhao, Kunlun He
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11072492/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849421318032719872
author Mingming Jiang
Qinghe Liu
Hongli Xu
Wenting Liao
Rilige Wu
Longji Li
Yawei Zhao
Kunlun He
author_facet Mingming Jiang
Qinghe Liu
Hongli Xu
Wenting Liao
Rilige Wu
Longji Li
Yawei Zhao
Kunlun He
author_sort Mingming Jiang
collection DOAJ
description Data valuation, critical for quantifying individual data points’ contributions to model performance, faces challenges in noisy, high-dimensional medical datasets due to inherent feature correlations and measurement uncertainties. Existing Shapley value-based methods struggle with an accuracy-efficiency tradeoff and fail to address medical data characteristics like noise and missing values. This paper introduces Kernel-Aligned Sampling (KAS), a dual-mechanism framework that innovates through: 1) A kernel-aligned sampling probability strategy that aligns valuation distributions with feature-space similarity to stabilize Shapley approximations and resolve the speed-accuracy dilemma; 2) A prior-guided data refinement mechanism that leverages domain knowledge to implicitly decompose data into signal and noise components, mitigating misattribution of noisy or incomplete data as low-value; 3) Experiments on 10 datasets and 4 baselines demonstrate that KAS outperforms baseline methods by 5.7%-8.8% in high-value data ablation tests and achieves an average improvement of 4.3% in noisy data detection accuracy compared to the second-best method.
format Article
id doaj-art-637e232ee0d14e9789570a7c8f50efb2
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-637e232ee0d14e9789570a7c8f50efb22025-08-20T03:31:30ZengIEEEIEEE Access2169-35362025-01-011313011013012410.1109/ACCESS.2025.358713611072492KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data ValuationMingming Jiang0https://orcid.org/0009-0003-8335-214XQinghe Liu1Hongli Xu2Wenting Liao3https://orcid.org/0009-0006-0582-0405Rilige Wu4https://orcid.org/0000-0002-5293-0782Longji Li5https://orcid.org/0009-0009-2232-3162Yawei Zhao6https://orcid.org/0000-0001-8352-0092Kunlun He7https://orcid.org/0000-0002-3335-5700Medical Innovation Research Department, PLA General Hospital, Haidian, Beijing, ChinaMedical Innovation Research Department, PLA General Hospital, Haidian, Beijing, ChinaMedical Innovation Research Department, PLA General Hospital, Haidian, Beijing, ChinaElectrical Engineering and Information Science, University of Science and Technology of China, Hefei, ChinaMedical Innovation Research Department, PLA General Hospital, Haidian, Beijing, ChinaDepartment of Chemistry, Faculty of Mathematical and Physical Sciences, University College London, London, U.K.Medical Innovation Research Department, PLA General Hospital, Haidian, Beijing, ChinaMedical Innovation Research Department, PLA General Hospital, Haidian, Beijing, ChinaData valuation, critical for quantifying individual data points’ contributions to model performance, faces challenges in noisy, high-dimensional medical datasets due to inherent feature correlations and measurement uncertainties. Existing Shapley value-based methods struggle with an accuracy-efficiency tradeoff and fail to address medical data characteristics like noise and missing values. This paper introduces Kernel-Aligned Sampling (KAS), a dual-mechanism framework that innovates through: 1) A kernel-aligned sampling probability strategy that aligns valuation distributions with feature-space similarity to stabilize Shapley approximations and resolve the speed-accuracy dilemma; 2) A prior-guided data refinement mechanism that leverages domain knowledge to implicitly decompose data into signal and noise components, mitigating misattribution of noisy or incomplete data as low-value; 3) Experiments on 10 datasets and 4 baselines demonstrate that KAS outperforms baseline methods by 5.7%-8.8% in high-value data ablation tests and achieves an average improvement of 4.3% in noisy data detection accuracy compared to the second-best method.https://ieeexplore.ieee.org/document/11072492/Data valuationmedical data analysisShapley valuekernel alignmentimplicit decomposition
spellingShingle Mingming Jiang
Qinghe Liu
Hongli Xu
Wenting Liao
Rilige Wu
Longji Li
Yawei Zhao
Kunlun He
KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation
IEEE Access
Data valuation
medical data analysis
Shapley value
kernel alignment
implicit decomposition
title KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation
title_full KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation
title_fullStr KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation
title_full_unstemmed KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation
title_short KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation
title_sort kas a kernel alignment based sampling probability approach for medical data valuation
topic Data valuation
medical data analysis
Shapley value
kernel alignment
implicit decomposition
url https://ieeexplore.ieee.org/document/11072492/
work_keys_str_mv AT mingmingjiang kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation
AT qingheliu kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation
AT honglixu kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation
AT wentingliao kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation
AT riligewu kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation
AT longjili kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation
AT yaweizhao kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation
AT kunlunhe kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation