Local differential privacy-based frequent sequence mining

Frequent sequence mining (FSM) is a fundamental component for analyzing sequential data in the big data era. However, collecting and analyzing sequence data incurs serious privacy issues for users. Local differential privacy (LDP) is a prevalent privacy paradigm for data statistics and analysis. Due...

Full description

Saved in:
Bibliographic Details
Main Authors: Teng Wang, Zhi Hu
Format: Article
Language:English
Published: Springer 2022-06-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157822001410
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849324825549471744
author Teng Wang
Zhi Hu
author_facet Teng Wang
Zhi Hu
author_sort Teng Wang
collection DOAJ
description Frequent sequence mining (FSM) is a fundamental component for analyzing sequential data in the big data era. However, collecting and analyzing sequence data incurs serious privacy issues for users. Local differential privacy (LDP) is a prevalent privacy paradigm for data statistics and analysis. Due to the high-dimensionality of sequence data, directly applying LDP on FSM faces severe challenges such as heavy perturbation, waste of privacy budget, poor accuracy, and high computational costs. This paper proposed PrivFSM and PrivFSM∗ which are two LDP-compliant FSM mechanisms with high data utility and low computational cost. Specifically, PrivFSM iteratively constructs sequences into a prefix tree under LDP to mine frequent sequences from huge domain. Besides, PrivFSM leverages a threshold-based pruning strategy to prune fake nodes, which reduces perturbations and computation costs. As an augmented version of PrivFSM, PrivFSM∗ no longer splits privacy budget, but partitions user set to avoid heavy noise. PrivFSM∗ maximizes the utilization of users’ data by applying the whole data on every iteration with a complete privacy budget. We conduct theoretical analysis on error-bound and extensive experiments on both real-world and synthetic datasets. The experimental results demonstrate the high data utility of our proposed algorithms when compared with other state-of-the-art mechanisms.
format Article
id doaj-art-ef7ad1cdda734e9fa22cec0912896306
institution Kabale University
issn 1319-1578
language English
publishDate 2022-06-01
publisher Springer
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj-art-ef7ad1cdda734e9fa22cec09128963062025-08-20T03:48:35ZengSpringerJournal of King Saud University: Computer and Information Sciences1319-15782022-06-013463591360110.1016/j.jksuci.2022.04.013Local differential privacy-based frequent sequence miningTeng Wang0Zhi Hu1School of Cyberspace Security, Xi’an University of Posts and Telecommunications, Shaanxi, China; Corresponding author.School of Information Science and Technology, Northwest University, Shaanxi, ChinaFrequent sequence mining (FSM) is a fundamental component for analyzing sequential data in the big data era. However, collecting and analyzing sequence data incurs serious privacy issues for users. Local differential privacy (LDP) is a prevalent privacy paradigm for data statistics and analysis. Due to the high-dimensionality of sequence data, directly applying LDP on FSM faces severe challenges such as heavy perturbation, waste of privacy budget, poor accuracy, and high computational costs. This paper proposed PrivFSM and PrivFSM∗ which are two LDP-compliant FSM mechanisms with high data utility and low computational cost. Specifically, PrivFSM iteratively constructs sequences into a prefix tree under LDP to mine frequent sequences from huge domain. Besides, PrivFSM leverages a threshold-based pruning strategy to prune fake nodes, which reduces perturbations and computation costs. As an augmented version of PrivFSM, PrivFSM∗ no longer splits privacy budget, but partitions user set to avoid heavy noise. PrivFSM∗ maximizes the utilization of users’ data by applying the whole data on every iteration with a complete privacy budget. We conduct theoretical analysis on error-bound and extensive experiments on both real-world and synthetic datasets. The experimental results demonstrate the high data utility of our proposed algorithms when compared with other state-of-the-art mechanisms.http://www.sciencedirect.com/science/article/pii/S1319157822001410Frequent sequence miningLocal differential privacyPrefix tree constructionData utility
spellingShingle Teng Wang
Zhi Hu
Local differential privacy-based frequent sequence mining
Journal of King Saud University: Computer and Information Sciences
Frequent sequence mining
Local differential privacy
Prefix tree construction
Data utility
title Local differential privacy-based frequent sequence mining
title_full Local differential privacy-based frequent sequence mining
title_fullStr Local differential privacy-based frequent sequence mining
title_full_unstemmed Local differential privacy-based frequent sequence mining
title_short Local differential privacy-based frequent sequence mining
title_sort local differential privacy based frequent sequence mining
topic Frequent sequence mining
Local differential privacy
Prefix tree construction
Data utility
url http://www.sciencedirect.com/science/article/pii/S1319157822001410
work_keys_str_mv AT tengwang localdifferentialprivacybasedfrequentsequencemining
AT zhihu localdifferentialprivacybasedfrequentsequencemining