A Method of Complex Disclosure Risk Assessment for Microdata

This article proposes a method for constructing an aggregate measure of disclosure risk: the risk that a user or an intruder can derive an individual’s confidential information from a given data set. The method includes components of categorical and continuous variables, which leads to the identific...

Full description

Saved in:
Bibliographic Details
Main Author: Andrzej Młodak
Format: Article
Language:English
Published: Wiley 2025-01-01
Series:Journal of Probability and Statistics
Online Access:http://dx.doi.org/10.1155/jpas/1876232
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850149468326854656
author Andrzej Młodak
author_facet Andrzej Młodak
author_sort Andrzej Młodak
collection DOAJ
description This article proposes a method for constructing an aggregate measure of disclosure risk: the risk that a user or an intruder can derive an individual’s confidential information from a given data set. The method includes components of categorical and continuous variables, which leads to the identification of threats to data confidentiality in the maximum possible way. The construction of the suggested measure relies on the frequency approach. For continuous variables, this refers to the number of observed values for such a variable that belongs to the environment of the considered value, as determined by an arbitrarily defined precision level for reidentification. Moreover, the Shapley and solidarity values—two alternative solutions in cooperative game theory with properties that make them effective tools for this purpose—are employed to assess particular variables’ contribution to the total individual and global risk, using the idea of minimum unsafe combinations. To some extent, this proposal refers to the Special Uniques Detection Algorithm (SUDA) and may function as its extension toward computing overall risk that takes into account both categorical and continuous variables. The complex measure can reflect the actual level of disclosure risk better than commonly used tools, addressed separately for categorical and continuous quasi-identifiers. Moreover, the measures for the latter type are few and rather difficult to interpret. The solution presented in the article aims to overcome these problems. The simulation study and the assessment of disclosure risk for microdata from the Adult Person Survey within the Balance of Human Capital project in Poland confirm the utility of the proposed measures.
format Article
id doaj-art-4fbddcf119124f7ea6dc130f7ba30730
institution OA Journals
issn 1687-9538
language English
publishDate 2025-01-01
publisher Wiley
record_format Article
series Journal of Probability and Statistics
spelling doaj-art-4fbddcf119124f7ea6dc130f7ba307302025-08-20T02:26:55ZengWileyJournal of Probability and Statistics1687-95382025-01-01202510.1155/jpas/1876232A Method of Complex Disclosure Risk Assessment for MicrodataAndrzej Młodak0Centre for Small Area EstimationThis article proposes a method for constructing an aggregate measure of disclosure risk: the risk that a user or an intruder can derive an individual’s confidential information from a given data set. The method includes components of categorical and continuous variables, which leads to the identification of threats to data confidentiality in the maximum possible way. The construction of the suggested measure relies on the frequency approach. For continuous variables, this refers to the number of observed values for such a variable that belongs to the environment of the considered value, as determined by an arbitrarily defined precision level for reidentification. Moreover, the Shapley and solidarity values—two alternative solutions in cooperative game theory with properties that make them effective tools for this purpose—are employed to assess particular variables’ contribution to the total individual and global risk, using the idea of minimum unsafe combinations. To some extent, this proposal refers to the Special Uniques Detection Algorithm (SUDA) and may function as its extension toward computing overall risk that takes into account both categorical and continuous variables. The complex measure can reflect the actual level of disclosure risk better than commonly used tools, addressed separately for categorical and continuous quasi-identifiers. Moreover, the measures for the latter type are few and rather difficult to interpret. The solution presented in the article aims to overcome these problems. The simulation study and the assessment of disclosure risk for microdata from the Adult Person Survey within the Balance of Human Capital project in Poland confirm the utility of the proposed measures.http://dx.doi.org/10.1155/jpas/1876232
spellingShingle Andrzej Młodak
A Method of Complex Disclosure Risk Assessment for Microdata
Journal of Probability and Statistics
title A Method of Complex Disclosure Risk Assessment for Microdata
title_full A Method of Complex Disclosure Risk Assessment for Microdata
title_fullStr A Method of Complex Disclosure Risk Assessment for Microdata
title_full_unstemmed A Method of Complex Disclosure Risk Assessment for Microdata
title_short A Method of Complex Disclosure Risk Assessment for Microdata
title_sort method of complex disclosure risk assessment for microdata
url http://dx.doi.org/10.1155/jpas/1876232
work_keys_str_mv AT andrzejmłodak amethodofcomplexdisclosureriskassessmentformicrodata
AT andrzejmłodak methodofcomplexdisclosureriskassessmentformicrodata