Solving Truthfulness-Privacy Trade-Off in Mixed Data Outsourcing by Using Data Balancing and Attribute Correlation-Aware Differential Privacy

In the modern era, data of diverse types (medical, financial, etc.) are outsourced from data owner environments to the public domains for data mining and knowledge discovery purposes. However, data often encompass sensitive information about individuals, and outsourcing the data without sufficient p...

Full description

Saved in:
Bibliographic Details
Main Authors: Abdul Majeed, Seong Oun Hwang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10858716/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1825207023606693888
author Abdul Majeed
Seong Oun Hwang
author_facet Abdul Majeed
Seong Oun Hwang
author_sort Abdul Majeed
collection DOAJ
description In the modern era, data of diverse types (medical, financial, etc.) are outsourced from data owner environments to the public domains for data mining and knowledge discovery purposes. However, data often encompass sensitive information about individuals, and outsourcing the data without sufficient protection may endanger privacy. Anonymization methods are mostly used in data outsourcing to protect privacy; however, it is very hard to apply anonymity to datasets of poor quality while maintaining an equilibrium between privacy, utility, and truthfulness (i.e., ensuring the values in anonymized data are consistent with the real data). To address these technical problems, we propose and implement a data balancing and attribute correlation-aware differential privacy (DP) method for mixed data outsourcing while accomplishing the three crucial objectives of privacy, truthfulness, and utility. Our method first identifies quality-related issues in the data and solves them in an automated manner by adding the fewest possible good-quality synthetic records. We propose a data partitioning method that exploits correlations between attributes to create blocks of data to lessen the amount of noise added by the DP model. To preserve higher truthfulness while guaranteeing privacy, categorical attributes are considered as one unit, and an exponential mechanism is applied to them. The numerical attributes are transformed using the Laplace mechanism with a relatively higher <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>. The joint application of these mechanisms to data blocks enables effective resolution of the truthfulness-privacy trade-off, and data usability is extremely high. Extensive experiments are performed on three benchmark datasets to demonstrate the effectiveness of our method in real scenarios. The experiment results and analysis indicate significantly better performance on four different evaluation metrics compared to the recent state-of-the-art (SOTA) DP-based methods. Furthermore, our method has better efficiency than its counterparts.
format Article
id doaj-art-67bd947109654a9eb90d8c62d0e855e1
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-67bd947109654a9eb90d8c62d0e855e12025-02-07T00:01:08ZengIEEEIEEE Access2169-35362025-01-0113231712319410.1109/ACCESS.2025.353710910858716Solving Truthfulness-Privacy Trade-Off in Mixed Data Outsourcing by Using Data Balancing and Attribute Correlation-Aware Differential PrivacyAbdul Majeed0https://orcid.org/0000-0002-3030-5054Seong Oun Hwang1https://orcid.org/0000-0003-4240-6255Department of Computer Engineering, Gachon University, Seongnam, South KoreaDepartment of Computer Engineering, Gachon University, Seongnam, South KoreaIn the modern era, data of diverse types (medical, financial, etc.) are outsourced from data owner environments to the public domains for data mining and knowledge discovery purposes. However, data often encompass sensitive information about individuals, and outsourcing the data without sufficient protection may endanger privacy. Anonymization methods are mostly used in data outsourcing to protect privacy; however, it is very hard to apply anonymity to datasets of poor quality while maintaining an equilibrium between privacy, utility, and truthfulness (i.e., ensuring the values in anonymized data are consistent with the real data). To address these technical problems, we propose and implement a data balancing and attribute correlation-aware differential privacy (DP) method for mixed data outsourcing while accomplishing the three crucial objectives of privacy, truthfulness, and utility. Our method first identifies quality-related issues in the data and solves them in an automated manner by adding the fewest possible good-quality synthetic records. We propose a data partitioning method that exploits correlations between attributes to create blocks of data to lessen the amount of noise added by the DP model. To preserve higher truthfulness while guaranteeing privacy, categorical attributes are considered as one unit, and an exponential mechanism is applied to them. The numerical attributes are transformed using the Laplace mechanism with a relatively higher <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>. The joint application of these mechanisms to data blocks enables effective resolution of the truthfulness-privacy trade-off, and data usability is extremely high. Extensive experiments are performed on three benchmark datasets to demonstrate the effectiveness of our method in real scenarios. The experiment results and analysis indicate significantly better performance on four different evaluation metrics compared to the recent state-of-the-art (SOTA) DP-based methods. Furthermore, our method has better efficiency than its counterparts.https://ieeexplore.ieee.org/document/10858716/Personal datadifferential privacydata truthfulnessattribute correlationsdata balancing
spellingShingle Abdul Majeed
Seong Oun Hwang
Solving Truthfulness-Privacy Trade-Off in Mixed Data Outsourcing by Using Data Balancing and Attribute Correlation-Aware Differential Privacy
IEEE Access
Personal data
differential privacy
data truthfulness
attribute correlations
data balancing
title Solving Truthfulness-Privacy Trade-Off in Mixed Data Outsourcing by Using Data Balancing and Attribute Correlation-Aware Differential Privacy
title_full Solving Truthfulness-Privacy Trade-Off in Mixed Data Outsourcing by Using Data Balancing and Attribute Correlation-Aware Differential Privacy
title_fullStr Solving Truthfulness-Privacy Trade-Off in Mixed Data Outsourcing by Using Data Balancing and Attribute Correlation-Aware Differential Privacy
title_full_unstemmed Solving Truthfulness-Privacy Trade-Off in Mixed Data Outsourcing by Using Data Balancing and Attribute Correlation-Aware Differential Privacy
title_short Solving Truthfulness-Privacy Trade-Off in Mixed Data Outsourcing by Using Data Balancing and Attribute Correlation-Aware Differential Privacy
title_sort solving truthfulness privacy trade off in mixed data outsourcing by using data balancing and attribute correlation aware differential privacy
topic Personal data
differential privacy
data truthfulness
attribute correlations
data balancing
url https://ieeexplore.ieee.org/document/10858716/
work_keys_str_mv AT abdulmajeed solvingtruthfulnessprivacytradeoffinmixeddataoutsourcingbyusingdatabalancingandattributecorrelationawaredifferentialprivacy
AT seongounhwang solvingtruthfulnessprivacytradeoffinmixeddataoutsourcingbyusingdatabalancingandattributecorrelationawaredifferentialprivacy