KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation
Data valuation, critical for quantifying individual data points’ contributions to model performance, faces challenges in noisy, high-dimensional medical datasets due to inherent feature correlations and measurement uncertainties. Existing Shapley value-based methods struggle with an accur...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11072492/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849421318032719872 |
|---|---|
| author | Mingming Jiang Qinghe Liu Hongli Xu Wenting Liao Rilige Wu Longji Li Yawei Zhao Kunlun He |
| author_facet | Mingming Jiang Qinghe Liu Hongli Xu Wenting Liao Rilige Wu Longji Li Yawei Zhao Kunlun He |
| author_sort | Mingming Jiang |
| collection | DOAJ |
| description | Data valuation, critical for quantifying individual data points’ contributions to model performance, faces challenges in noisy, high-dimensional medical datasets due to inherent feature correlations and measurement uncertainties. Existing Shapley value-based methods struggle with an accuracy-efficiency tradeoff and fail to address medical data characteristics like noise and missing values. This paper introduces Kernel-Aligned Sampling (KAS), a dual-mechanism framework that innovates through: 1) A kernel-aligned sampling probability strategy that aligns valuation distributions with feature-space similarity to stabilize Shapley approximations and resolve the speed-accuracy dilemma; 2) A prior-guided data refinement mechanism that leverages domain knowledge to implicitly decompose data into signal and noise components, mitigating misattribution of noisy or incomplete data as low-value; 3) Experiments on 10 datasets and 4 baselines demonstrate that KAS outperforms baseline methods by 5.7%-8.8% in high-value data ablation tests and achieves an average improvement of 4.3% in noisy data detection accuracy compared to the second-best method. |
| format | Article |
| id | doaj-art-637e232ee0d14e9789570a7c8f50efb2 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-637e232ee0d14e9789570a7c8f50efb22025-08-20T03:31:30ZengIEEEIEEE Access2169-35362025-01-011313011013012410.1109/ACCESS.2025.358713611072492KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data ValuationMingming Jiang0https://orcid.org/0009-0003-8335-214XQinghe Liu1Hongli Xu2Wenting Liao3https://orcid.org/0009-0006-0582-0405Rilige Wu4https://orcid.org/0000-0002-5293-0782Longji Li5https://orcid.org/0009-0009-2232-3162Yawei Zhao6https://orcid.org/0000-0001-8352-0092Kunlun He7https://orcid.org/0000-0002-3335-5700Medical Innovation Research Department, PLA General Hospital, Haidian, Beijing, ChinaMedical Innovation Research Department, PLA General Hospital, Haidian, Beijing, ChinaMedical Innovation Research Department, PLA General Hospital, Haidian, Beijing, ChinaElectrical Engineering and Information Science, University of Science and Technology of China, Hefei, ChinaMedical Innovation Research Department, PLA General Hospital, Haidian, Beijing, ChinaDepartment of Chemistry, Faculty of Mathematical and Physical Sciences, University College London, London, U.K.Medical Innovation Research Department, PLA General Hospital, Haidian, Beijing, ChinaMedical Innovation Research Department, PLA General Hospital, Haidian, Beijing, ChinaData valuation, critical for quantifying individual data points’ contributions to model performance, faces challenges in noisy, high-dimensional medical datasets due to inherent feature correlations and measurement uncertainties. Existing Shapley value-based methods struggle with an accuracy-efficiency tradeoff and fail to address medical data characteristics like noise and missing values. This paper introduces Kernel-Aligned Sampling (KAS), a dual-mechanism framework that innovates through: 1) A kernel-aligned sampling probability strategy that aligns valuation distributions with feature-space similarity to stabilize Shapley approximations and resolve the speed-accuracy dilemma; 2) A prior-guided data refinement mechanism that leverages domain knowledge to implicitly decompose data into signal and noise components, mitigating misattribution of noisy or incomplete data as low-value; 3) Experiments on 10 datasets and 4 baselines demonstrate that KAS outperforms baseline methods by 5.7%-8.8% in high-value data ablation tests and achieves an average improvement of 4.3% in noisy data detection accuracy compared to the second-best method.https://ieeexplore.ieee.org/document/11072492/Data valuationmedical data analysisShapley valuekernel alignmentimplicit decomposition |
| spellingShingle | Mingming Jiang Qinghe Liu Hongli Xu Wenting Liao Rilige Wu Longji Li Yawei Zhao Kunlun He KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation IEEE Access Data valuation medical data analysis Shapley value kernel alignment implicit decomposition |
| title | KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation |
| title_full | KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation |
| title_fullStr | KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation |
| title_full_unstemmed | KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation |
| title_short | KAS: A Kernel Alignment-Based Sampling Probability Approach for Medical Data Valuation |
| title_sort | kas a kernel alignment based sampling probability approach for medical data valuation |
| topic | Data valuation medical data analysis Shapley value kernel alignment implicit decomposition |
| url | https://ieeexplore.ieee.org/document/11072492/ |
| work_keys_str_mv | AT mingmingjiang kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation AT qingheliu kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation AT honglixu kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation AT wentingliao kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation AT riligewu kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation AT longjili kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation AT yaweizhao kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation AT kunlunhe kasakernelalignmentbasedsamplingprobabilityapproachformedicaldatavaluation |