Research on multi-granularity password analysis based on LLM

Password-based authentication has been widely used as the primary authentication mechanism.However, occasional large-scale password leaks have highlighted the vulnerability of passwords to risks such as guessing or theft.In recent years, research on password analysis using natural language processin...

Full description

Saved in:

Bibliographic Details
Main Authors:	Meng HONG, Weidong QIU, Yangde WANG
Format:	Article
Language:	English
Published:	POSTS&TELECOM PRESS Co., LTD 2024-02-01
Series:	网络与信息安全学报
Subjects:	large language model password analysis natural language processing word segmentation
Online Access:	http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2024008
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841530304033980416
author	Meng HONG Weidong QIU Yangde WANG
author_facet	Meng HONG Weidong QIU Yangde WANG
author_sort	Meng HONG
collection	DOAJ
description	Password-based authentication has been widely used as the primary authentication mechanism.However, occasional large-scale password leaks have highlighted the vulnerability of passwords to risks such as guessing or theft.In recent years, research on password analysis using natural language processing techniques has progressed, treating passwords as a special form of natural language.Nevertheless, limited studies have investigated the impact of password text segmentation granularity on the effectiveness of password analysis with large language models.A multi-granularity password-analyzing framework was proposed based on a large language model, which follows the pre-training paradigm and autonomously learns prior knowledge of password distribution from large unlabelled datasets.The framework comprised three modules: the synchronization network, backbone network, and tail network.The synchronization network module implemented char-level, template-level, and chunk-level password segmentation, extracting knowledge on character distribution, structure, word chunk composition, and other password features.The backbone network module constructed a generic password model to learn the rules governing password composition.The tail network module generated candidate passwords for guessing and analyzing target databases.Experimental evaluations were conducted on eight password databases including Tianya and Twitter, analyzing and summarizing the effectiveness of the proposed framework under different language environments and word segmentation granularities.The results indicate that in Chinese user scenarios, the performance of the password-analyzing framework based on char-level and chunk-level segmentation is comparable, and significantly superior to the framework based on template-level segmentation.In English user scenarios, the framework based on chunk-level segmentation demonstrates the best password-analyzing performance.
format	Article
id	doaj-art-91be6021ba544df5aa0761a63e2632fc
institution	Kabale University
issn	2096-109X
language	English
publishDate	2024-02-01
publisher	POSTS&TELECOM PRESS Co., LTD
record_format	Article
series	网络与信息安全学报
spelling	doaj-art-91be6021ba544df5aa0761a63e2632fc2025-01-15T03:05:17ZengPOSTS&TELECOM PRESS Co., LTD网络与信息安全学报2096-109X2024-02-011011212259581795Research on multi-granularity password analysis based on LLMMeng HONGWeidong QIUYangde WANGPassword-based authentication has been widely used as the primary authentication mechanism.However, occasional large-scale password leaks have highlighted the vulnerability of passwords to risks such as guessing or theft.In recent years, research on password analysis using natural language processing techniques has progressed, treating passwords as a special form of natural language.Nevertheless, limited studies have investigated the impact of password text segmentation granularity on the effectiveness of password analysis with large language models.A multi-granularity password-analyzing framework was proposed based on a large language model, which follows the pre-training paradigm and autonomously learns prior knowledge of password distribution from large unlabelled datasets.The framework comprised three modules: the synchronization network, backbone network, and tail network.The synchronization network module implemented char-level, template-level, and chunk-level password segmentation, extracting knowledge on character distribution, structure, word chunk composition, and other password features.The backbone network module constructed a generic password model to learn the rules governing password composition.The tail network module generated candidate passwords for guessing and analyzing target databases.Experimental evaluations were conducted on eight password databases including Tianya and Twitter, analyzing and summarizing the effectiveness of the proposed framework under different language environments and word segmentation granularities.The results indicate that in Chinese user scenarios, the performance of the password-analyzing framework based on char-level and chunk-level segmentation is comparable, and significantly superior to the framework based on template-level segmentation.In English user scenarios, the framework based on chunk-level segmentation demonstrates the best password-analyzing performance.http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2024008large language modelpassword analysisnatural language processingword segmentation
spellingShingle	Meng HONG Weidong QIU Yangde WANG Research on multi-granularity password analysis based on LLM 网络与信息安全学报 large language model password analysis natural language processing word segmentation
title	Research on multi-granularity password analysis based on LLM
title_full	Research on multi-granularity password analysis based on LLM
title_fullStr	Research on multi-granularity password analysis based on LLM
title_full_unstemmed	Research on multi-granularity password analysis based on LLM
title_short	Research on multi-granularity password analysis based on LLM
title_sort	research on multi granularity password analysis based on llm
topic	large language model password analysis natural language processing word segmentation
url	http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2024008
work_keys_str_mv	AT menghong researchonmultigranularitypasswordanalysisbasedonllm AT weidongqiu researchonmultigranularitypasswordanalysisbasedonllm AT yangdewang researchonmultigranularitypasswordanalysisbasedonllm

Research on multi-granularity password analysis based on LLM

Similar Items