Personal Data Recognition Using a Deep Learning Model

Protecting personal identifiable information is a crucial issue today due to individuals leaving traces of their activities on social media and various digital platforms, which can be exploited by attackers for identity theft and fraud. Consequently, there is a need to develop effective methods for...

Full description

Saved in:
Bibliographic Details
Main Author: Nikita Babak
Format: Article
Language:Russian
Published: The Fund for Promotion of Internet media, IT education, human development «League Internet Media» 2024-03-01
Series:Современные информационные технологии и IT-образование
Subjects:
Online Access:http://sitito.cs.msu.ru/index.php/SITITO/article/view/1119
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849731719446396928
author Nikita Babak
author_facet Nikita Babak
author_sort Nikita Babak
collection DOAJ
description Protecting personal identifiable information is a crucial issue today due to individuals leaving traces of their activities on social media and various digital platforms, which can be exploited by attackers for identity theft and fraud. Consequently, there is a need to develop effective methods for personal data protection. However, recognizing personal data for protection presents a significant challenge, given the diverse nature of personal data attributes, such as names and phone numbers, which can be present in various formats like tables or unstructured texts. To address this challenge, a range of techniques are employed for personal data recognition, with rule-based algorithms being the most used approach. These algorithms enable the identification of personalized data based on predefined rules, such as regular expressions and dictionaries. Nevertheless, such algorithms may lack the flexibility required to handle complex cases effectively. An alternative method involves the use of deep learning models, which are trained on large datasets and possess the capacity to adapt to diverse forms of data. In this paper, deep learning models featuring different neural network architectures were implemented and compared against rule-based algorithms. Additionally, the feasibility of using the Large Language Model for personal data recognition was explored. The research culminated in the development of a personal data recognition method that combines Artificial Intelligence language model with rule-based algorithms, capable of identifying personal data in structured and unstructured information. This paper underscores the imperative of personal data protection and highlights the potential of Artificial Intelligence models in mitigating this issue.
format Article
id doaj-art-dcc843ac6289426ab863f4f263aa7311
institution DOAJ
issn 2411-1473
language Russian
publishDate 2024-03-01
publisher The Fund for Promotion of Internet media, IT education, human development «League Internet Media»
record_format Article
series Современные информационные технологии и IT-образование
spelling doaj-art-dcc843ac6289426ab863f4f263aa73112025-08-20T03:08:27ZrusThe Fund for Promotion of Internet media, IT education, human development «League Internet Media»Современные информационные технологии и IT-образование2411-14732024-03-01201132610.25559/SITITO.020.202401.13-26Personal Data Recognition Using a Deep Learning ModelNikita Babak0https://orcid.org/0000-0001-7129-1018National Research University "Moscow Power Engineering Institute"; Sberbank of Russia, Moscow, RussiaProtecting personal identifiable information is a crucial issue today due to individuals leaving traces of their activities on social media and various digital platforms, which can be exploited by attackers for identity theft and fraud. Consequently, there is a need to develop effective methods for personal data protection. However, recognizing personal data for protection presents a significant challenge, given the diverse nature of personal data attributes, such as names and phone numbers, which can be present in various formats like tables or unstructured texts. To address this challenge, a range of techniques are employed for personal data recognition, with rule-based algorithms being the most used approach. These algorithms enable the identification of personalized data based on predefined rules, such as regular expressions and dictionaries. Nevertheless, such algorithms may lack the flexibility required to handle complex cases effectively. An alternative method involves the use of deep learning models, which are trained on large datasets and possess the capacity to adapt to diverse forms of data. In this paper, deep learning models featuring different neural network architectures were implemented and compared against rule-based algorithms. Additionally, the feasibility of using the Large Language Model for personal data recognition was explored. The research culminated in the development of a personal data recognition method that combines Artificial Intelligence language model with rule-based algorithms, capable of identifying personal data in structured and unstructured information. This paper underscores the imperative of personal data protection and highlights the potential of Artificial Intelligence models in mitigating this issue.http://sitito.cs.msu.ru/index.php/SITITO/article/view/1119cybersecuritydata protectiondeep learninglarge language modelsnatural language processingpersonal informationtransformers
spellingShingle Nikita Babak
Personal Data Recognition Using a Deep Learning Model
Современные информационные технологии и IT-образование
cybersecurity
data protection
deep learning
large language models
natural language processing
personal information
transformers
title Personal Data Recognition Using a Deep Learning Model
title_full Personal Data Recognition Using a Deep Learning Model
title_fullStr Personal Data Recognition Using a Deep Learning Model
title_full_unstemmed Personal Data Recognition Using a Deep Learning Model
title_short Personal Data Recognition Using a Deep Learning Model
title_sort personal data recognition using a deep learning model
topic cybersecurity
data protection
deep learning
large language models
natural language processing
personal information
transformers
url http://sitito.cs.msu.ru/index.php/SITITO/article/view/1119
work_keys_str_mv AT nikitababak personaldatarecognitionusingadeeplearningmodel