Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples

As a crucial component of account protection system evaluation and intrusion detection, the advancement of password guessing technology encounters challenges due to its reliance on password data. In password guessing research, there is a conflict between the traditional models’ need for large traini...

Full description

Saved in:
Bibliographic Details
Main Authors: Cheng Liu, Junrong Li, Xiheng Liu, Bo Li, Mengsu Hou, Wei Yu, Yujun Li, Wenjun Liu
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/27/7/752
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849406891727257600
author Cheng Liu
Junrong Li
Xiheng Liu
Bo Li
Mengsu Hou
Wei Yu
Yujun Li
Wenjun Liu
author_facet Cheng Liu
Junrong Li
Xiheng Liu
Bo Li
Mengsu Hou
Wei Yu
Yujun Li
Wenjun Liu
author_sort Cheng Liu
collection DOAJ
description As a crucial component of account protection system evaluation and intrusion detection, the advancement of password guessing technology encounters challenges due to its reliance on password data. In password guessing research, there is a conflict between the traditional models’ need for large training samples and the limitations on accessing password data imposed by privacy protection regulations. Consequently, security researchers often struggle with the issue of having a very limited password set from which to guess. This paper introduces a small-sample password guessing technique that enhances cross-domain features. It analyzes the password set using probabilistic context-free grammar (PCFG) to create a list of password structure probabilities and a dictionary of password fragment probabilities, which are then used to generate a password set structure vector. The method calculates the cosine similarity between the small-sample password set <i>B</i> from the target area and publicly leaked password sets <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> using the structure vector, identifying the set <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></semantics></math></inline-formula> with the highest similarity. This set is then utilized as a training set, where the features of the small-sample password set are enhanced by modifying the structure vectors of the training set. The enhanced training set is subsequently employed for PCFG password generation. The paper uses hit rate as the evaluation metric, and Experiment I reveals that the similarity between <i>B</i> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> can be reliably measured when the size of <i>B</i> exceeds 150. Experiment II confirms the hypothesis that a higher similarity between <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> and <i>B</i> leads to a greater hit rate of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> on the test set of <i>B</i>, with potential improvements of up to 32% compared to training with <i>B</i> alone. Experiment III demonstrates that after enhancing the features of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></semantics></math></inline-formula>, the hit rate for the small-sample password set can increase by as much as 10.52% compared to previous results. This method offers a viable solution for small-sample password guessing without requiring prior knowledge.
format Article
id doaj-art-45013a48c0804db9ac22161b913aebaf
institution Kabale University
issn 1099-4300
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj-art-45013a48c0804db9ac22161b913aebaf2025-08-20T03:36:14ZengMDPI AGEntropy1099-43002025-07-0127775210.3390/e27070752Cross-Domain Feature Enhancement-Based Password Guessing Method for Small SamplesCheng Liu0Junrong Li1Xiheng Liu2Bo Li3Mengsu Hou4Wei Yu5Yujun Li6Wenjun Liu7School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaSchool of Computer and Software Engineering, Xihua University, Chengdu 610039, ChinaSchool of Computer and Software Engineering, Xihua University, Chengdu 610039, ChinaSchool of Computer and Software Engineering, Xihua University, Chengdu 610039, ChinaSchool of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaNo. 30 Institute of CETC, Chengdu 610041, ChinaSchool of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaSchool of Computer and Software Engineering, Xihua University, Chengdu 610039, ChinaAs a crucial component of account protection system evaluation and intrusion detection, the advancement of password guessing technology encounters challenges due to its reliance on password data. In password guessing research, there is a conflict between the traditional models’ need for large training samples and the limitations on accessing password data imposed by privacy protection regulations. Consequently, security researchers often struggle with the issue of having a very limited password set from which to guess. This paper introduces a small-sample password guessing technique that enhances cross-domain features. It analyzes the password set using probabilistic context-free grammar (PCFG) to create a list of password structure probabilities and a dictionary of password fragment probabilities, which are then used to generate a password set structure vector. The method calculates the cosine similarity between the small-sample password set <i>B</i> from the target area and publicly leaked password sets <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> using the structure vector, identifying the set <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></semantics></math></inline-formula> with the highest similarity. This set is then utilized as a training set, where the features of the small-sample password set are enhanced by modifying the structure vectors of the training set. The enhanced training set is subsequently employed for PCFG password generation. The paper uses hit rate as the evaluation metric, and Experiment I reveals that the similarity between <i>B</i> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> can be reliably measured when the size of <i>B</i> exceeds 150. Experiment II confirms the hypothesis that a higher similarity between <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> and <i>B</i> leads to a greater hit rate of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> on the test set of <i>B</i>, with potential improvements of up to 32% compared to training with <i>B</i> alone. Experiment III demonstrates that after enhancing the features of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></semantics></math></inline-formula>, the hit rate for the small-sample password set can increase by as much as 10.52% compared to previous results. This method offers a viable solution for small-sample password guessing without requiring prior knowledge.https://www.mdpi.com/1099-4300/27/7/752password guessingsmall samplessimilarity computationprobabilistic context-free grammar
spellingShingle Cheng Liu
Junrong Li
Xiheng Liu
Bo Li
Mengsu Hou
Wei Yu
Yujun Li
Wenjun Liu
Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples
Entropy
password guessing
small samples
similarity computation
probabilistic context-free grammar
title Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples
title_full Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples
title_fullStr Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples
title_full_unstemmed Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples
title_short Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples
title_sort cross domain feature enhancement based password guessing method for small samples
topic password guessing
small samples
similarity computation
probabilistic context-free grammar
url https://www.mdpi.com/1099-4300/27/7/752
work_keys_str_mv AT chengliu crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples
AT junrongli crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples
AT xihengliu crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples
AT boli crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples
AT mengsuhou crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples
AT weiyu crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples
AT yujunli crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples
AT wenjunliu crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples