Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples
As a crucial component of account protection system evaluation and intrusion detection, the advancement of password guessing technology encounters challenges due to its reliance on password data. In password guessing research, there is a conflict between the traditional models’ need for large traini...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Entropy |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1099-4300/27/7/752 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849406891727257600 |
|---|---|
| author | Cheng Liu Junrong Li Xiheng Liu Bo Li Mengsu Hou Wei Yu Yujun Li Wenjun Liu |
| author_facet | Cheng Liu Junrong Li Xiheng Liu Bo Li Mengsu Hou Wei Yu Yujun Li Wenjun Liu |
| author_sort | Cheng Liu |
| collection | DOAJ |
| description | As a crucial component of account protection system evaluation and intrusion detection, the advancement of password guessing technology encounters challenges due to its reliance on password data. In password guessing research, there is a conflict between the traditional models’ need for large training samples and the limitations on accessing password data imposed by privacy protection regulations. Consequently, security researchers often struggle with the issue of having a very limited password set from which to guess. This paper introduces a small-sample password guessing technique that enhances cross-domain features. It analyzes the password set using probabilistic context-free grammar (PCFG) to create a list of password structure probabilities and a dictionary of password fragment probabilities, which are then used to generate a password set structure vector. The method calculates the cosine similarity between the small-sample password set <i>B</i> from the target area and publicly leaked password sets <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> using the structure vector, identifying the set <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></semantics></math></inline-formula> with the highest similarity. This set is then utilized as a training set, where the features of the small-sample password set are enhanced by modifying the structure vectors of the training set. The enhanced training set is subsequently employed for PCFG password generation. The paper uses hit rate as the evaluation metric, and Experiment I reveals that the similarity between <i>B</i> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> can be reliably measured when the size of <i>B</i> exceeds 150. Experiment II confirms the hypothesis that a higher similarity between <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> and <i>B</i> leads to a greater hit rate of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> on the test set of <i>B</i>, with potential improvements of up to 32% compared to training with <i>B</i> alone. Experiment III demonstrates that after enhancing the features of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></semantics></math></inline-formula>, the hit rate for the small-sample password set can increase by as much as 10.52% compared to previous results. This method offers a viable solution for small-sample password guessing without requiring prior knowledge. |
| format | Article |
| id | doaj-art-45013a48c0804db9ac22161b913aebaf |
| institution | Kabale University |
| issn | 1099-4300 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Entropy |
| spelling | doaj-art-45013a48c0804db9ac22161b913aebaf2025-08-20T03:36:14ZengMDPI AGEntropy1099-43002025-07-0127775210.3390/e27070752Cross-Domain Feature Enhancement-Based Password Guessing Method for Small SamplesCheng Liu0Junrong Li1Xiheng Liu2Bo Li3Mengsu Hou4Wei Yu5Yujun Li6Wenjun Liu7School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaSchool of Computer and Software Engineering, Xihua University, Chengdu 610039, ChinaSchool of Computer and Software Engineering, Xihua University, Chengdu 610039, ChinaSchool of Computer and Software Engineering, Xihua University, Chengdu 610039, ChinaSchool of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaNo. 30 Institute of CETC, Chengdu 610041, ChinaSchool of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaSchool of Computer and Software Engineering, Xihua University, Chengdu 610039, ChinaAs a crucial component of account protection system evaluation and intrusion detection, the advancement of password guessing technology encounters challenges due to its reliance on password data. In password guessing research, there is a conflict between the traditional models’ need for large training samples and the limitations on accessing password data imposed by privacy protection regulations. Consequently, security researchers often struggle with the issue of having a very limited password set from which to guess. This paper introduces a small-sample password guessing technique that enhances cross-domain features. It analyzes the password set using probabilistic context-free grammar (PCFG) to create a list of password structure probabilities and a dictionary of password fragment probabilities, which are then used to generate a password set structure vector. The method calculates the cosine similarity between the small-sample password set <i>B</i> from the target area and publicly leaked password sets <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> using the structure vector, identifying the set <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></semantics></math></inline-formula> with the highest similarity. This set is then utilized as a training set, where the features of the small-sample password set are enhanced by modifying the structure vectors of the training set. The enhanced training set is subsequently employed for PCFG password generation. The paper uses hit rate as the evaluation metric, and Experiment I reveals that the similarity between <i>B</i> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> can be reliably measured when the size of <i>B</i> exceeds 150. Experiment II confirms the hypothesis that a higher similarity between <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> and <i>B</i> leads to a greater hit rate of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mi>i</mi></msub></semantics></math></inline-formula> on the test set of <i>B</i>, with potential improvements of up to 32% compared to training with <i>B</i> alone. Experiment III demonstrates that after enhancing the features of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>A</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></semantics></math></inline-formula>, the hit rate for the small-sample password set can increase by as much as 10.52% compared to previous results. This method offers a viable solution for small-sample password guessing without requiring prior knowledge.https://www.mdpi.com/1099-4300/27/7/752password guessingsmall samplessimilarity computationprobabilistic context-free grammar |
| spellingShingle | Cheng Liu Junrong Li Xiheng Liu Bo Li Mengsu Hou Wei Yu Yujun Li Wenjun Liu Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples Entropy password guessing small samples similarity computation probabilistic context-free grammar |
| title | Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples |
| title_full | Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples |
| title_fullStr | Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples |
| title_full_unstemmed | Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples |
| title_short | Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples |
| title_sort | cross domain feature enhancement based password guessing method for small samples |
| topic | password guessing small samples similarity computation probabilistic context-free grammar |
| url | https://www.mdpi.com/1099-4300/27/7/752 |
| work_keys_str_mv | AT chengliu crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples AT junrongli crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples AT xihengliu crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples AT boli crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples AT mengsuhou crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples AT weiyu crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples AT yujunli crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples AT wenjunliu crossdomainfeatureenhancementbasedpasswordguessingmethodforsmallsamples |