An accuracy-privacy optimization framework considering user’s privacy requirements for data stream mining
Abstract Data stream mining is a critical process utilized by organizations to derive insights from real-time data. Consequently, preserving the privacy of sensitive information while maintaining high accuracy remains a persistent challenge. Privacy-preserving data mining techniques modify data to i...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SpringerOpen
2025-06-01
|
| Series: | Journal of Big Data |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s40537-025-01147-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Data stream mining is a critical process utilized by organizations to derive insights from real-time data. Consequently, preserving the privacy of sensitive information while maintaining high accuracy remains a persistent challenge. Privacy-preserving data mining techniques modify data to increase privacy, a process that invariably decreases the accuracy of data mining algorithms. Though different techniques have been proposed to preserve privacy, there is a lack of well-formulated frameworks to optimize the trade-off between accuracy and privacy. This paper introduces a novel Accuracy-Privacy Optimization Framework (APOF) that allows users to define privacy requirements and predicts achievable accuracy levels, enabling fine-tuning of this balance. The logistic cumulative noise addition was used as the data perturbation method that has experimentally shown better performance and Hoeffding trees as the classifier. Additionally, a data fitting module using kernel regression is integrated, a unique approach that predicts accuracy levels based on user-defined privacy thresholds. Experimental results show that the proposed framework archives an optimal privacy level above 97% while minimising the accuracy loss across various datasets. By addressing critical gaps in privacy-preserving data mining, this study offers significant contributions to real-world applications, facilitating secure and efficient data utilization in dynamic environments. |
|---|---|
| ISSN: | 2196-1115 |