Solving Truthfulness-Privacy Trade-Off in Mixed Data Outsourcing by Using Data Balancing and Attribute Correlation-Aware Differential Privacy

In the modern era, data of diverse types (medical, financial, etc.) are outsourced from data owner environments to the public domains for data mining and knowledge discovery purposes. However, data often encompass sensitive information about individuals, and outsourcing the data without sufficient p...

Full description

Saved in:
Bibliographic Details
Main Authors: Abdul Majeed, Seong Oun Hwang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10858716/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the modern era, data of diverse types (medical, financial, etc.) are outsourced from data owner environments to the public domains for data mining and knowledge discovery purposes. However, data often encompass sensitive information about individuals, and outsourcing the data without sufficient protection may endanger privacy. Anonymization methods are mostly used in data outsourcing to protect privacy; however, it is very hard to apply anonymity to datasets of poor quality while maintaining an equilibrium between privacy, utility, and truthfulness (i.e., ensuring the values in anonymized data are consistent with the real data). To address these technical problems, we propose and implement a data balancing and attribute correlation-aware differential privacy (DP) method for mixed data outsourcing while accomplishing the three crucial objectives of privacy, truthfulness, and utility. Our method first identifies quality-related issues in the data and solves them in an automated manner by adding the fewest possible good-quality synthetic records. We propose a data partitioning method that exploits correlations between attributes to create blocks of data to lessen the amount of noise added by the DP model. To preserve higher truthfulness while guaranteeing privacy, categorical attributes are considered as one unit, and an exponential mechanism is applied to them. The numerical attributes are transformed using the Laplace mechanism with a relatively higher <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>. The joint application of these mechanisms to data blocks enables effective resolution of the truthfulness-privacy trade-off, and data usability is extremely high. Extensive experiments are performed on three benchmark datasets to demonstrate the effectiveness of our method in real scenarios. The experiment results and analysis indicate significantly better performance on four different evaluation metrics compared to the recent state-of-the-art (SOTA) DP-based methods. Furthermore, our method has better efficiency than its counterparts.
ISSN:2169-3536