Automatic vowels selection and ranking in Russian enciphered texts

This work was developed while teaching students the cryptanalysis. The course includes the study of statistics of (Russian encrypted) texts. The purpose of training is to learn how to extract redundant information of the text and to descript the cryptogram without a password. One of the most comfort...

Full description

Saved in:
Bibliographic Details
Main Author: Yuri I. Petrenko
Format: Article
Language:English
Published: Plekhanov Russian University of Economics 2018-03-01
Series:Открытое образование (Москва)
Subjects:
Online Access:https://openedu.rea.ru/jour/article/view/495
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850026297204408320
author Yuri I. Petrenko
author_facet Yuri I. Petrenko
author_sort Yuri I. Petrenko
collection DOAJ
description This work was developed while teaching students the cryptanalysis. The course includes the study of statistics of (Russian encrypted) texts. The purpose of training is to learn how to extract redundant information of the text and to descript the cryptogram without a password. One of the most comfortable methods for learning is a simple substitution and similar encryption systems, which are presented in most courses of cryptography. This paper presents a method of automatic separation of vowels and consonants in Russian texts, which releases some of the redundancy of the cipher text. In addition, this method greatly facilitates the task of descript some other symmetric ciphers which may be reduced to simple substitution.The aim of this work is to develop and implement a method for the automatic selection of vowels in Russian texts, enciphered by a simple substitution and similar encryption systems.According to the theory of Shannon, for unambiguous decoding of the text you want the redundancy of the text that exceeds the entropy of the password. After the separation of vowels and consonants redundancy of the text increases to one bit per symbol, this allows you to open shorter encrypted texts. Moreover, the separation of vowels and consonants greatly simplifies the cryptanalysis of some ciphers. For instance, cryptanalysis of the most famous encryption method - method of simple substitution-requires selection of one of N! possible passwords (where N is the number of letters in the alphabet). For the Russian language it is 33! or nearly 2 to 123rd degrees of options. After the separation of vowels and consonants you will need a selection of 10!*23!, or nearly 2 to 96th degrees of options. The number of combinations is reduced to one hundred million times, that makes the cryptanalysis much easier. The program that implements this method first creates a matrix of the probabilities of bigrams of the text.For this matrix Markov criterion calculated, defined as the difference between the conditional probabilities of vowel-consonant and vowelvowel diagram’s types. For an alphabet consisted of N characters the program defines a combination of a given number k of “vowels” by exhaustive search. This combination of k symbols maximizes Markov criterion. The order relation of the new “vowels” for k = 1, 2, 3... characterizes the descending of their “strength” and can be used to separate vowels from consonants. In texts of sufficient volume there are possible approximate ranking of the vowel’s set. A more accurate ranking is possible when as a measure of “symbol power” Markov criterion’s increments are used. The algorithm speed can be greatly accelerated by using some tricks of steepest descent method. The test program discovered the independence of Markov criterion from the text’s author as well as its unimodality for long texts. Using this criterion, the algorithm can separate vowels from consonants for short (up to 100 characters) texts as well as the ranking of vowels for texts as small as 250-500 letters. The similarity of Markov criterion’s statistics of letters “ь”, “ъ” and standard vowels is discovered. These two letters are inseparable by Markov criterion method from the standard vowels. The test results showed that Markov criterion method can be used for cryptanalysis of short Russian texts as well as texts of the other consonant languages.
format Article
id doaj-art-1e361b5ce7d54135938d37af0fda7b31
institution DOAJ
issn 1818-4243
2079-5939
language English
publishDate 2018-03-01
publisher Plekhanov Russian University of Economics
record_format Article
series Открытое образование (Москва)
spelling doaj-art-1e361b5ce7d54135938d37af0fda7b312025-08-20T03:00:35ZengPlekhanov Russian University of EconomicsОткрытое образование (Москва)1818-42432079-59392018-03-01221596910.21686/1818-4243-2018-1-59-69396Automatic vowels selection and ranking in Russian enciphered textsYuri I. Petrenko0Moscow Aviation Institute (National Research University)This work was developed while teaching students the cryptanalysis. The course includes the study of statistics of (Russian encrypted) texts. The purpose of training is to learn how to extract redundant information of the text and to descript the cryptogram without a password. One of the most comfortable methods for learning is a simple substitution and similar encryption systems, which are presented in most courses of cryptography. This paper presents a method of automatic separation of vowels and consonants in Russian texts, which releases some of the redundancy of the cipher text. In addition, this method greatly facilitates the task of descript some other symmetric ciphers which may be reduced to simple substitution.The aim of this work is to develop and implement a method for the automatic selection of vowels in Russian texts, enciphered by a simple substitution and similar encryption systems.According to the theory of Shannon, for unambiguous decoding of the text you want the redundancy of the text that exceeds the entropy of the password. After the separation of vowels and consonants redundancy of the text increases to one bit per symbol, this allows you to open shorter encrypted texts. Moreover, the separation of vowels and consonants greatly simplifies the cryptanalysis of some ciphers. For instance, cryptanalysis of the most famous encryption method - method of simple substitution-requires selection of one of N! possible passwords (where N is the number of letters in the alphabet). For the Russian language it is 33! or nearly 2 to 123rd degrees of options. After the separation of vowels and consonants you will need a selection of 10!*23!, or nearly 2 to 96th degrees of options. The number of combinations is reduced to one hundred million times, that makes the cryptanalysis much easier. The program that implements this method first creates a matrix of the probabilities of bigrams of the text.For this matrix Markov criterion calculated, defined as the difference between the conditional probabilities of vowel-consonant and vowelvowel diagram’s types. For an alphabet consisted of N characters the program defines a combination of a given number k of “vowels” by exhaustive search. This combination of k symbols maximizes Markov criterion. The order relation of the new “vowels” for k = 1, 2, 3... characterizes the descending of their “strength” and can be used to separate vowels from consonants. In texts of sufficient volume there are possible approximate ranking of the vowel’s set. A more accurate ranking is possible when as a measure of “symbol power” Markov criterion’s increments are used. The algorithm speed can be greatly accelerated by using some tricks of steepest descent method. The test program discovered the independence of Markov criterion from the text’s author as well as its unimodality for long texts. Using this criterion, the algorithm can separate vowels from consonants for short (up to 100 characters) texts as well as the ranking of vowels for texts as small as 250-500 letters. The similarity of Markov criterion’s statistics of letters “ь”, “ъ” and standard vowels is discovered. These two letters are inseparable by Markov criterion method from the standard vowels. The test results showed that Markov criterion method can be used for cryptanalysis of short Russian texts as well as texts of the other consonant languages.https://openedu.rea.ru/jour/article/view/495cryptanalysisseparation of vowels and consonantsmarkov criterionranking of vowelsconsonant language
spellingShingle Yuri I. Petrenko
Automatic vowels selection and ranking in Russian enciphered texts
Открытое образование (Москва)
cryptanalysis
separation of vowels and consonants
markov criterion
ranking of vowels
consonant language
title Automatic vowels selection and ranking in Russian enciphered texts
title_full Automatic vowels selection and ranking in Russian enciphered texts
title_fullStr Automatic vowels selection and ranking in Russian enciphered texts
title_full_unstemmed Automatic vowels selection and ranking in Russian enciphered texts
title_short Automatic vowels selection and ranking in Russian enciphered texts
title_sort automatic vowels selection and ranking in russian enciphered texts
topic cryptanalysis
separation of vowels and consonants
markov criterion
ranking of vowels
consonant language
url https://openedu.rea.ru/jour/article/view/495
work_keys_str_mv AT yuriipetrenko automaticvowelsselectionandrankinginrussianencipheredtexts