An accuracy-privacy optimization framework considering user’s privacy requirements for data stream mining

Abstract Data stream mining is a critical process utilized by organizations to derive insights from real-time data. Consequently, preserving the privacy of sensitive information while maintaining high accuracy remains a persistent challenge. Privacy-preserving data mining techniques modify data to i...

Full description

Saved in:
Bibliographic Details
Main Authors: Waruni Hewage, R. Sinha, M. Asif Naeem
Format: Article
Language:English
Published: SpringerOpen 2025-06-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-025-01147-0
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Data stream mining is a critical process utilized by organizations to derive insights from real-time data. Consequently, preserving the privacy of sensitive information while maintaining high accuracy remains a persistent challenge. Privacy-preserving data mining techniques modify data to increase privacy, a process that invariably decreases the accuracy of data mining algorithms. Though different techniques have been proposed to preserve privacy, there is a lack of well-formulated frameworks to optimize the trade-off between accuracy and privacy. This paper introduces a novel Accuracy-Privacy Optimization Framework (APOF) that allows users to define privacy requirements and predicts achievable accuracy levels, enabling fine-tuning of this balance. The logistic cumulative noise addition was used as the data perturbation method that has experimentally shown better performance and Hoeffding trees as the classifier. Additionally, a data fitting module using kernel regression is integrated, a unique approach that predicts accuracy levels based on user-defined privacy thresholds. Experimental results show that the proposed framework archives an optimal privacy level above 97% while minimising the accuracy loss across various datasets. By addressing critical gaps in privacy-preserving data mining, this study offers significant contributions to real-world applications, facilitating secure and efficient data utilization in dynamic environments.
ISSN:2196-1115