Research on multi-granularity password analysis based on LLM

Password-based authentication has been widely used as the primary authentication mechanism.However, occasional large-scale password leaks have highlighted the vulnerability of passwords to risks such as guessing or theft.In recent years, research on password analysis using natural language processin...

Full description

Saved in:
Bibliographic Details
Main Authors: Meng HONG, Weidong QIU, Yangde WANG
Format: Article
Language:English
Published: POSTS&TELECOM PRESS Co., LTD 2024-02-01
Series:网络与信息安全学报
Subjects:
Online Access:http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2024008
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841530304033980416
author Meng HONG
Weidong QIU
Yangde WANG
author_facet Meng HONG
Weidong QIU
Yangde WANG
author_sort Meng HONG
collection DOAJ
description Password-based authentication has been widely used as the primary authentication mechanism.However, occasional large-scale password leaks have highlighted the vulnerability of passwords to risks such as guessing or theft.In recent years, research on password analysis using natural language processing techniques has progressed, treating passwords as a special form of natural language.Nevertheless, limited studies have investigated the impact of password text segmentation granularity on the effectiveness of password analysis with large language models.A multi-granularity password-analyzing framework was proposed based on a large language model, which follows the pre-training paradigm and autonomously learns prior knowledge of password distribution from large unlabelled datasets.The framework comprised three modules: the synchronization network, backbone network, and tail network.The synchronization network module implemented char-level, template-level, and chunk-level password segmentation, extracting knowledge on character distribution, structure, word chunk composition, and other password features.The backbone network module constructed a generic password model to learn the rules governing password composition.The tail network module generated candidate passwords for guessing and analyzing target databases.Experimental evaluations were conducted on eight password databases including Tianya and Twitter, analyzing and summarizing the effectiveness of the proposed framework under different language environments and word segmentation granularities.The results indicate that in Chinese user scenarios, the performance of the password-analyzing framework based on char-level and chunk-level segmentation is comparable, and significantly superior to the framework based on template-level segmentation.In English user scenarios, the framework based on chunk-level segmentation demonstrates the best password-analyzing performance.
format Article
id doaj-art-91be6021ba544df5aa0761a63e2632fc
institution Kabale University
issn 2096-109X
language English
publishDate 2024-02-01
publisher POSTS&TELECOM PRESS Co., LTD
record_format Article
series 网络与信息安全学报
spelling doaj-art-91be6021ba544df5aa0761a63e2632fc2025-01-15T03:05:17ZengPOSTS&TELECOM PRESS Co., LTD网络与信息安全学报2096-109X2024-02-011011212259581795Research on multi-granularity password analysis based on LLMMeng HONGWeidong QIUYangde WANGPassword-based authentication has been widely used as the primary authentication mechanism.However, occasional large-scale password leaks have highlighted the vulnerability of passwords to risks such as guessing or theft.In recent years, research on password analysis using natural language processing techniques has progressed, treating passwords as a special form of natural language.Nevertheless, limited studies have investigated the impact of password text segmentation granularity on the effectiveness of password analysis with large language models.A multi-granularity password-analyzing framework was proposed based on a large language model, which follows the pre-training paradigm and autonomously learns prior knowledge of password distribution from large unlabelled datasets.The framework comprised three modules: the synchronization network, backbone network, and tail network.The synchronization network module implemented char-level, template-level, and chunk-level password segmentation, extracting knowledge on character distribution, structure, word chunk composition, and other password features.The backbone network module constructed a generic password model to learn the rules governing password composition.The tail network module generated candidate passwords for guessing and analyzing target databases.Experimental evaluations were conducted on eight password databases including Tianya and Twitter, analyzing and summarizing the effectiveness of the proposed framework under different language environments and word segmentation granularities.The results indicate that in Chinese user scenarios, the performance of the password-analyzing framework based on char-level and chunk-level segmentation is comparable, and significantly superior to the framework based on template-level segmentation.In English user scenarios, the framework based on chunk-level segmentation demonstrates the best password-analyzing performance.http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2024008large language modelpassword analysisnatural language processingword segmentation
spellingShingle Meng HONG
Weidong QIU
Yangde WANG
Research on multi-granularity password analysis based on LLM
网络与信息安全学报
large language model
password analysis
natural language processing
word segmentation
title Research on multi-granularity password analysis based on LLM
title_full Research on multi-granularity password analysis based on LLM
title_fullStr Research on multi-granularity password analysis based on LLM
title_full_unstemmed Research on multi-granularity password analysis based on LLM
title_short Research on multi-granularity password analysis based on LLM
title_sort research on multi granularity password analysis based on llm
topic large language model
password analysis
natural language processing
word segmentation
url http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2024008
work_keys_str_mv AT menghong researchonmultigranularitypasswordanalysisbasedonllm
AT weidongqiu researchonmultigranularitypasswordanalysisbasedonllm
AT yangdewang researchonmultigranularitypasswordanalysisbasedonllm