Crowdsourced data leaking user's privacy while using anonymization technique

Due to the tremendous value embedded in big educational data, numerous research institutes have collected large volumes of student behavioral data. To fully utilize the underlying values, the collected data may be shared with third parties, such as worldwide intelligent data experts. However, this m...

Full description

Saved in:
Bibliographic Details
Main Authors: Naadiya Mirbahar Mirbahar, Kamlesh Kumar, Asif Ali Laghari, Mansoor Ahmed Khuhro
Format: Article
Language:English
Published: Mehran University of Engineering and Technology 2025-04-01
Series:Mehran University Research Journal of Engineering and Technology
Subjects:
Online Access:https://murjet.muet.edu.pk/index.php/home/article/view/292
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Due to the tremendous value embedded in big educational data, numerous research institutes have collected large volumes of student behavioral data. To fully utilize the underlying values, the collected data may be shared with third parties, such as worldwide intelligent data experts. However, this may pose privacy risks to data owners, even though the data collectors usually anonymize the data before crowdsourcing. To demonstrate that anonymization alone is insufficient to protect user privacy, we conducted an experimental study using offline and online behavioral traces collected through campus cards and smartphones. Our study demonstrates that a student’s identity can be identified with high probability based on anonymized behavior payment traces. The analysis of results demonstrates that only ten features, i.e., Transmission Control Protocol (TCP), synchronization attempts, content length, downlink traffic, last acknowledgement packet delay, uplink traffic, cell ID, base station ID, day, hour (offline payment, time) day, hour, minute (online payment time), and point of sale ID (POS_ID) are sufficient to uniquely identify an individual. Five supervised standard learning algorithm classifiers have been utilized to predict the user identity i.e., Extra Tree, Bagging, Decision Tree, Nearest Neighbor (KNN), and Random Forest Tree classifiers. The evaluation results showed that the achieved accuracy reached 99.99%, 99.95%, 99.02%, 98.84%, and 99.56%, respectively.
ISSN:0254-7821
2413-7219