Local differential privacy-based frequent sequence mining
Frequent sequence mining (FSM) is a fundamental component for analyzing sequential data in the big data era. However, collecting and analyzing sequence data incurs serious privacy issues for users. Local differential privacy (LDP) is a prevalent privacy paradigm for data statistics and analysis. Due...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2022-06-01
|
| Series: | Journal of King Saud University: Computer and Information Sciences |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S1319157822001410 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849324825549471744 |
|---|---|
| author | Teng Wang Zhi Hu |
| author_facet | Teng Wang Zhi Hu |
| author_sort | Teng Wang |
| collection | DOAJ |
| description | Frequent sequence mining (FSM) is a fundamental component for analyzing sequential data in the big data era. However, collecting and analyzing sequence data incurs serious privacy issues for users. Local differential privacy (LDP) is a prevalent privacy paradigm for data statistics and analysis. Due to the high-dimensionality of sequence data, directly applying LDP on FSM faces severe challenges such as heavy perturbation, waste of privacy budget, poor accuracy, and high computational costs. This paper proposed PrivFSM and PrivFSM∗ which are two LDP-compliant FSM mechanisms with high data utility and low computational cost. Specifically, PrivFSM iteratively constructs sequences into a prefix tree under LDP to mine frequent sequences from huge domain. Besides, PrivFSM leverages a threshold-based pruning strategy to prune fake nodes, which reduces perturbations and computation costs. As an augmented version of PrivFSM, PrivFSM∗ no longer splits privacy budget, but partitions user set to avoid heavy noise. PrivFSM∗ maximizes the utilization of users’ data by applying the whole data on every iteration with a complete privacy budget. We conduct theoretical analysis on error-bound and extensive experiments on both real-world and synthetic datasets. The experimental results demonstrate the high data utility of our proposed algorithms when compared with other state-of-the-art mechanisms. |
| format | Article |
| id | doaj-art-ef7ad1cdda734e9fa22cec0912896306 |
| institution | Kabale University |
| issn | 1319-1578 |
| language | English |
| publishDate | 2022-06-01 |
| publisher | Springer |
| record_format | Article |
| series | Journal of King Saud University: Computer and Information Sciences |
| spelling | doaj-art-ef7ad1cdda734e9fa22cec09128963062025-08-20T03:48:35ZengSpringerJournal of King Saud University: Computer and Information Sciences1319-15782022-06-013463591360110.1016/j.jksuci.2022.04.013Local differential privacy-based frequent sequence miningTeng Wang0Zhi Hu1School of Cyberspace Security, Xi’an University of Posts and Telecommunications, Shaanxi, China; Corresponding author.School of Information Science and Technology, Northwest University, Shaanxi, ChinaFrequent sequence mining (FSM) is a fundamental component for analyzing sequential data in the big data era. However, collecting and analyzing sequence data incurs serious privacy issues for users. Local differential privacy (LDP) is a prevalent privacy paradigm for data statistics and analysis. Due to the high-dimensionality of sequence data, directly applying LDP on FSM faces severe challenges such as heavy perturbation, waste of privacy budget, poor accuracy, and high computational costs. This paper proposed PrivFSM and PrivFSM∗ which are two LDP-compliant FSM mechanisms with high data utility and low computational cost. Specifically, PrivFSM iteratively constructs sequences into a prefix tree under LDP to mine frequent sequences from huge domain. Besides, PrivFSM leverages a threshold-based pruning strategy to prune fake nodes, which reduces perturbations and computation costs. As an augmented version of PrivFSM, PrivFSM∗ no longer splits privacy budget, but partitions user set to avoid heavy noise. PrivFSM∗ maximizes the utilization of users’ data by applying the whole data on every iteration with a complete privacy budget. We conduct theoretical analysis on error-bound and extensive experiments on both real-world and synthetic datasets. The experimental results demonstrate the high data utility of our proposed algorithms when compared with other state-of-the-art mechanisms.http://www.sciencedirect.com/science/article/pii/S1319157822001410Frequent sequence miningLocal differential privacyPrefix tree constructionData utility |
| spellingShingle | Teng Wang Zhi Hu Local differential privacy-based frequent sequence mining Journal of King Saud University: Computer and Information Sciences Frequent sequence mining Local differential privacy Prefix tree construction Data utility |
| title | Local differential privacy-based frequent sequence mining |
| title_full | Local differential privacy-based frequent sequence mining |
| title_fullStr | Local differential privacy-based frequent sequence mining |
| title_full_unstemmed | Local differential privacy-based frequent sequence mining |
| title_short | Local differential privacy-based frequent sequence mining |
| title_sort | local differential privacy based frequent sequence mining |
| topic | Frequent sequence mining Local differential privacy Prefix tree construction Data utility |
| url | http://www.sciencedirect.com/science/article/pii/S1319157822001410 |
| work_keys_str_mv | AT tengwang localdifferentialprivacybasedfrequentsequencemining AT zhihu localdifferentialprivacybasedfrequentsequencemining |