Content uploaded by Nils Kopal

Author content

All content in this area was uploaded by Nils Kopal on Jun 24, 2020

Content may be subject to copyright.

Of Ciphers and Neurons

Detecting the Type of Ciphers Using Artiﬁcial Neural Networks

Nils Kopal

University of Siegen, Germany

nils.kopal@uni-siegen.de

Abstract

There are many (historical) unsolved ci-

phertexts from which we don’t know the

type of cipher which was used to encrypt

these. A ﬁrst step each cryptanalyst does

is to try to identify their cipher types us-

ing different (statistical) methods. This

can be difﬁcult, since a multitude of ci-

pher types exist. To help cryptanalysts,

we developed a ﬁrst version of an artiﬁ-

cial neural network that is right now able

to differentiate between ﬁve classical ci-

phers: simple monoalphabetic substitu-

tion, Vigen`

ere, Playfair, Hill, and transpo-

sition. The network is based on Google’s

TensorFlow library as well as Keras. This

paper presents the current progress in the

research of using such networks for detect-

ing the cipher type. We tried to classify all

ciphers of a new MysteryTwister C3 chal-

lenge called “Cipher ID” created by Stamp

in 2019. The network is able to classify

about 90% of the ciphertexts of the chal-

lenge correctly. Furthermore, the paper

presents the current state-of-the-art of ci-

pher type detection. Finally, we present

a method which shows that one can save

about 54% computation time for classiﬁ-

cation of cipher types when using our arti-

ﬁcial neural network instead of trying dif-

ferent solvers for all ciphertext messages

of Stamp’s challenge.

1 Introduction

Artiﬁcial neural networks (ANNs) experienced a

renaissance over the past years. Supported by the

development of easy-to-use software libraries, e.g.

TensorFlow and Keras, as well as the wide range

of new powerful hardware (especially graphic card

processors and application-speciﬁc integrated cir-

cuits). ANNs found usages in a broad set of dif-

ferent applications and research ﬁelds. Their main

purpose is fast ﬁltering, classifying, and process-

ing of (mostly) non-linear data, e.g. image pro-

cessing, speech recognition, and language trans-

lation. Besides that, scientists were also able to

“teach” ANNs to play games or to create paintings

in the style of famous artists.

Inspired by the vast growth of ANNs, also cryp-

tologists started to use them for different crypto-

graphic and cryptanalytic problems. Examples are

the learning of complex cryptographic algorithms,

e.g. the Enigma machine, or the detection of the

type of cipher used for encrypting a speciﬁc ci-

phertext.

In late 2019 Stamp published a challenge on the

MysteryTwister C3 (MTC3) website called “Ci-

pher ID”. The goal of the challenge is to assign

the type of cipher to each ciphertext out of a set

of 500 ciphertexts, while 5 different types of ci-

phers were used to encrypt these ciphertexts us-

ing random keys. Each cipher type was used ex-

actly 100 times and the different ciphertexts were

shufﬂed then. While the intention of the author

was to motivate people to start research in the ﬁeld

of machine learning and cipher type detection, all

previous solvers solved the challenge by breaking

the ciphertexts using solvers for the 5 different ci-

pher types. Thus, after revealing the plaintext of

each cipher, the participants knew which type of

encryption algorithm was used.

We started to work on the cipher type detec-

tion problem in 2019 with the intention to de-

tect the ciphers’ types solely using ANNs. Ten-

sorFlow (Abadi et al., 2016) and Keras (Chol-

let, 2015) were used. TensorFlow is a free and

open-source data ﬂow and math library devel-

oped by Google written in Python, C++, and

CUDA, and was publicly released in 2015. Keras

is a free and open-source library for developing

ANNs developed by Chollet and also written in

Proceedings of the 3rd International Conference on Historical Cryptology, HistoCrypt 2020

77

Python. In 2017 Google’s TensorFlow team de-

cided to support Keras in the TensorFlow core li-

brary. While working on the cipher type detection

problem, Stamp’s challenge was published. We

then adapted our code and tools to the require-

ments of the challenge. Therefore, in this paper,

we present our current progress of implementing

a cipher type detection ANN with the help of the

aforementioned libraries especially for the MTC3

challenge. At the time of writing this paper, we are

able to classify the type of ciphers of the afore-

mentioned challenge at a success rate of about

90%. Despite this relatively good detection rate

it is still not good enough to solve the challenge

on its own. Therefore, we also propose a ﬁrst idea

of a detection (and solving) method for ciphertexts

with unknown cipher types.

The contributions and goals of this paper are:

1. First public ANN classiﬁer for classical ci-

phers developed with TensorFlow and Keras.

2. Presentation of the basics of ANNs to the au-

dience of HistoCrypt, who are from differ-

ent research areas, e.g. history and linguistics

(but mostly no computers scientists).

3. Example Python code which can be used to

directly implement our methods in Tensor-

Flow and Keras.

4. Overview of the existing work in the ﬁeld of

ANNs and cryptanalysis of classical/histori-

cal ciphers and cipher type detection.

5. Presentation of a ﬁrst idea of a method which

does both, cipher type detection and solving

of classical ciphers.

The rest of this paper is structured as follows:

Section 2 presents the related work in the ﬁeld

of machine learning and cryptanalysis with a fo-

cus on ANNs. Section 3 shows the founda-

tion on which we created our methods. Here,

ﬁrstly we discuss ANNs in general. Secondly, we

brieﬂy present TensorFlow as well as Keras. After

that, Section 4 presents our cipher type detection

approach based on the aforementioned libraries.

Then, Section 5 discusses our ﬁrst ideas for a ci-

pher type detection and solving method. Finally,

Section 6 brieﬂy concludes the paper and gives an

overview of planned future work with regards to

ANNs and cryptology.

2 Related Work

In this section, we present different papers and ar-

ticles, which deal with ANNs and cryptology. The

usage of ANNs in the paper ranges between the

emulation of ciphers, the detection of the cipher

type, and the recovering of cryptographic keys.

Also, there are papers where the authors worked

with other techniques to detect the cipher type.

1. Ibrahem (Khalel Ibrahem Al-Ubaidy, 2004)

presents two ideas: First, to determine the

key from a given plaintext-ciphertext pair. He

calls this the “cryptanalysis approach”. Sec-

ond, the emulation of an unknown cipher.

He calls this the “emulation approach”. He

used an ANN with two hidden layers in his

approach. For training his model he used

Levenberg-Marquardt (LM). He successfully

trains Vigen`

ere cipher as well as two different

stream ciphers (GEFFE and THRESHOLD,

which are both linear feedback shift regis-

ters).

2. Chandra (Chandra et al., 2007) present their

method of cipher type identiﬁcation. They

created different ANNs which are able to dis-

tinguish between different modern ciphers,

e.g. RC6 and Serpent. Their ANN architec-

ture is comparable small, consisting only of

2 hidden layers, where each layer has at most

25 neurons. They used different techniques

to map from the ciphertext to 400 “input pat-

terns”, which they fed to their network.

3. Sivagurunathan (Sivagurunathan et al., 2010)

created an ANN with one hidden layer to

distinguish between Vigen`

ere cipher, Hill ci-

pher, and Playfair cipher. While their net-

work was able to detect Playfair ciphers with

an accuracy of 100%, the detection rate of Vi-

gen`

ere and Hill was between 69% and 85%,

depending on their test scenarios.

4. The BION classiﬁers from BION’s gadget

website1are browser-based classiﬁers, inte-

grated in two well working cipher type de-

tection methods built in JavaScript. The ﬁrst

one works with random decision forests and

the second one is based on a multitude of

ANNs. The basic idea with the second clas-

siﬁer (ANN-based) is, that the different net-

works (different layers, activation functions,

1see https://bionsgadgets.appspot.com/

Proceedings of the 3rd International Conference on Historical Cryptology, HistoCrypt 2020

78

etc.) each have a “vote” for the cipher type.

In the end, the votes are shown, and the cor-

rect cipher type probably has the most votes.

The classiﬁers are able to detect the cipher

types deﬁned by the American cryptogram

association (ACM).

5. Nuhn and Knight’s (Nuhn and Knight, 2014)

extensive work on cipher type detection used

a support vector machine based on the lib-

SVM toolkit (Chang and Lin, 2011). In their

work, they used 58 different features to suc-

cessfully classify 50 different cipher types

out of 56 cipher types speciﬁed by the Amer-

ican cryptogram association (ACA).

6. Greydanus (Greydanus, 2017) used recurrent

neural networks (RNN) to learn the Enigma.

An RNN has connections going from succes-

sive hidden layer neurons to neurons in pre-

ceding layers. He showed that an RNN with

a 3000-unit long short-term memory cell can

learn the decryption function of an Enigma

machine with three rotors, of a Vigen`

ere ci-

pher, and of a Vigen`

ere Autokey cipher. Fur-

thermore, he created an RNN network which

was able to recover keys (length one to six)

of Vigen`

ere and Vigen`

ere Autokey.

7. Focardi and Luccio (Focardi and Luccio,

2018) present their method of breaking Cae-

sar and Vigen`

ere ciphers with the help of neu-

ral networks. They used fairly simple neural

networks having only one hidden layer. They

were able to recover substitution keys with a

success rate of about 93%, where at most 2

mappings in the keys were wrong.

8. Abd (Abd and Al-Janabi, 2019) developed

three different classiﬁers based on neural net-

works. Their work is the closest related to

our work. Their idea is to create three classi-

ﬁers, each a single ANN, with different lev-

els (1, 2, and 3), where each level increases

the detection accuracy. The ﬁrst level differ-

entiates between natural language, substitu-

tion ciphers, transposition ciphers, and com-

bined ciphers. Then, their second level dif-

ferentiates between monoalphabetic, polyal-

phabetic, and polygraphic. Their last level

differentiates between Playfair and different

Hill ciphers. They state that their success rate

is about 99.6%.

(∗

∗ ⋯ ∗

∗ )

I1

I2

...

In

O

O

O

O

1

b

... ...

Figure 1: A single neuron of an ANN with inputs,

outputs, bias, and activation function

3 Foundation

In this section, we describe the foundation used

for our detection method. First, we discuss the

ANN in general. Then, we give an introduction

to TensorFlow and Keras and show some example

Python code building an ANN.

3.1 Artiﬁcial Neural Network

Artiﬁcial neural networks (ANNs) are computing

models (organized as graphs) that are in principle

inspired by the human brain. The book “Make

your own neural network” from (Rashid, 2016)

gives a good introduction into ANNs. Differ-

ent neurons are connected via input and output

connections, providing signals, having different

weights assigned to them. A neuron itself contains

an activation function a, which ﬁres the neuron’s

outputs based on the neuron’s input values. For

example, all the values of the input connections

are combined with their respective weight values.

Then, all resulting values are combined and a bias

value bis also added to the result. After that, an

activation function is computed using the result of

the combined values. Figure 1 depicts an exam-

ple of one neuron with different input connections,

a bias input connection, an activation function a,

and output connections. Usually, the value of the

bias input connection is set to 1.

A common practice in ANNs is to organize neu-

rons in so-called layers. The input data is given to

an input layer consisting of n different neurons.

The input layer is then connected to one or more

hidden layers. Finally, the last hidden layer is

connected to an output layer. Each neuron of the

previous layer is connected to each neuron of the

following layer. Figure 2 depicts an example of an

ANN with only a single hidden layer. In general,

Proceedings of the 3rd International Conference on Historical Cryptology, HistoCrypt 2020

79

Input layer Hidden layer Output layer

Figure 2: An ANN with input, hidden, and output

layers

when working with ANNs having several hidden

layers, researchers refer to the term of deep learn-

ing (Wartala, 2018).

The learning, in general, is performed by adapt-

ing the weights of the connections between the

neurons. There exist different methods for learn-

ing, e.g. supervised and unsupervised learning.

Here, we focus on supervised learning, which is

suited well for classiﬁcation tasks. The input data

is given as a so called feature vector xfrom the

input space Xand the output is a label yfrom the

output space Y. A label, in general, clusters a set

of similar input values, i.e. each of the input val-

ues of the same cluster is mapped to the same la-

bel. The goal is to ﬁnd a function f:X→Ythat

maps each element of the input space correctly to

the labels of the output space.

As a basic idea, the ANN’s connection weights

are initialized with random values. Then, a set of

data (inputs and desired labels) is feeded to the

network. While doing so, the actual output labels

as well as the desired labels are compared using a

loss function. Using back propagation the error

is propagated in the reverse order through the net-

work and the weight values are changed for each

neuron of each layer accordingly.

Different parameters and attributes of the ANN

and the learning process inﬂuence the success rate

of the learning: e.g. the quality and quantity of

the input data and labels, the number of hidden

layers of the ANN, the number of neurons of each

layer, the types of used activation functions of the

neurons, the used loss function, and the number of

times the input data is feeded to the network.

Usually, the input data and their respective la-

bels are divided into two different sets: training

data and test data. For the actual learning, the

training data is used. Then, to measure the qual-

ity of the ANN the test data is used. In the best

case, after training the ANN is able to classify the

test data correctly. In the worst case, the ANN

only learned the training data (perfectly), but fails

in classifying the test data. In this case, researcher

refer to the term overﬁtting.

3.2 TensorFlow and Keras

TensorFlow (Abadi et al., 2016) is a software li-

brary developed by Google and ﬁrstly released

in 2015. Its name is based on the term “ten-

sor”, which describes a mathematical function that

maps a speciﬁc number of input vectors to output

vectors, and on the term ”ﬂow”, the idea of dif-

ferent tensors ﬂowing as data streams through a

dataﬂow graph. Keras (Chollet, 2015) is an open-

source deep learning Python library and since

2017 also included in TensorFlow.

Working with TensorFlow and Keras (with

ANNs), in general, consists of the following ﬁve

steps:

1. Loading and preparing training and test data

2. Creating a model

3. Training the model

4. Testing and optimizing the model

5. Persisting the model

In the following, we describe the above steps

involved in the creation, training, and usage of a

Keras model. TensorFlow models work on multi-

dimensional Python numpy arrays.

Step 1) First, the data has to be loaded and then

split into a test and a training data set. In the fol-

lowing example, we split a data set of 5000 test

data and their according labels (each label corre-

sponds to one output class) into two disjunct sets

of training and test data and labels:

# d at a i s a s e t o f d a ta

# l a b e l s i s a s e t o f l a b e l s

# he re , we s p l i t b o th i n t o

# two d i f f e r e n t s e t s

Proceedings of the 3rd International Conference on Historical Cryptology, HistoCrypt 2020

80

t ra i n da t a = dat a [0 :4 5 00 ]

t ra in l ab e ls = la be ls [ 0: 45 00 ]

t es t d at a = da ta [ 45 00 :5 00 0]

t es t l ab el s = la be ls [ 45 00 :5 00 0]

Step 2) The second step is the creation of a

Keras model. TensorFlow and Keras offer dif-

ferent methods of creating a model. The easiest

method is to use the sequential model, which cre-

ates a multi-layered ANN. An example call of cre-

ating a simple ANN with an input layer, a single

hidden layer, and an output layer is the following:

# c r e a t e m od el :

m = k er a s. S eq u en ti al ( )

# c r e a t e a nd add i n p u t l a y e r :

m. a dd ( Fl at t en ( i np u t s ha p e =( 10 0 ,) ))

# c r e a t e a nd add h i dd e n l a y e r :

m. a dd ( De n se ( 100 ,

ac tiv ati on = ’ r el u ’,

u se b ia s = T ru e ))

# c r e a t e a nd add o u t p u t l a y e r :

m. a dd ( D en s e (5 ,

ac tiv ati on = ’ s of tm ax ’ ))

m. c omp il e ( optim ize r ="a da m ",

lo ss = ’ s p ar s e c a t eg o r i ca l

crossentropy’,

me t ri cs = [ ’a c cu ra cy ’ ])

The ﬁrst call creates a sequential Keras model.

With the add-function, layers are added to the

model. We add an input layer with 100 neu-

rons (or features), a hidden layer with 100 neu-

rons, and an output layer with 5 neurons. Each

neuron of the next layer is automatically con-

nected to each neuron of the previous layer, as

shown in Figure 2. In this example, we classify

some data with 100 features into 5 different out-

put classes. Some remarks on the parameters:

the activation function of the hidden layer is set

to rectiﬁed linear unit (’relu’), wich is deﬁned as

y=max(0,x). The activation function of the out-

put layer is set to ’softmax’, which is also known

as a normalized exponential function. It maps an

input vector to a probability distribution consist-

ing, in our case, of 5 different probabilities. Each

probability corresponds to one of ﬁve classes, in

which we classify the input vectors. The last

call is the actual creation of the model using the

compile-function. Different loss-functions, opti-

mizers, and metrics can be used. In our example

we use the ’sparse categorical crossentropy’ loss

function, and as a metric the accuracy. The Adam

optimizer is an algorithm for ﬁrst-order gradient-

based optimization of stochastic objective func-

tions, based on adaptive estimates of lower-order

moments. (For details on Adam, see (Kingma and

Ba, 2014)).

Step 3) The next step is to train the newly cre-

ated model using the prepared test data and labels:

m. f it ( t ra i n d at a , tr ai n l ab e ls ,

ep o ch s =2 0,

b at ch s i ze = 32)

Calling the ﬁt-function starts the training. In

our case we use the train data and train labels to

train the model. Epochs deﬁne how many times

the model should be trained using the data set. The

data is always given in a different ordering to the

model. The batch size is the amount of samples

which are feeded to the ANN in a single training

step.

Step 4) After training, the test data is used for

testing the accuracy of the model:

# p r e d i c t t h e t e s t d at a

pr e di ct io n = m. pr ed ic t ( t es t d at a )

# we c ou n t t h e c o r r e c t p r e d i c t i o n s

co rre ct = 0 .0

# do t h e c o u n t i n g

fo r iin r an ge(0 , len( p re di ct io n )) :

if t e st l ab e ls [ i ] = =

np . argma x ( pr edi cti on [ i ]) :

co rre ct = c orr ec t + 1

pr int (’ Co rre ct : ’ , 10 0.0 ∗cor rect /

le n( pr ed i ct io n ))

First, we call the predict function on the model

to predict labels of the test data. After that, to

check how accurate the prediction with the trained

model is, we count how many times the prediction

equals the correct label and calculate the correct-

ness as percentage value. In the end, we output the

value to the console.

Step 5) In the last step, we persist the model by

storing it in the hierarchical data format (.h5).

# s av e t h e mo de l t o t h e h ar d d r i v e

m. s av e ("mymo del . h5 " )

# d e l e t e t h e m ode l

de l m

# l oa d m ode l fr om h ar d d r i v e

m = lo ad m od e l (" m ymode l .h5 ")

After persisting the model, it can be deleted

from memory and later be loaded from the hard

drive using the ’load model’ function.

Proceedings of the 3rd International Conference on Historical Cryptology, HistoCrypt 2020

81

4 Our Cipher Type Detection Approach

In this section, we present our cipher type detec-

tion approach. First, we give a short overview

of the MysteryTwister C3 challenge created by

Stamp. Then, we discuss the cipher ID prob-

lem as a classiﬁcation problem. After that,

we present our cipher detection ANN in detail

(input/hidden/output-layers, features, training and

test data).

4.1 The MTC3 Cipher ID Challenge

MysteryTwister C3 (MTC3) is an online plat-

form for publishing cryptographic riddles (= chal-

lenges). In 2019, Stamp published a cipher type

detection challenge2on MTC3, named “Cipher ID

– Part 1”. The detection of the cipher type of an

unsolved ciphertext is a difﬁcult problem, since a

multitude of different (classical as well as mod-

ern) ciphers exist. E.g. in the DECODE database

(Megyesi et al., 2019), there is a huge collection

of (historical) ciphertexts of which we don’t know

the (exact) type of cipher. Without knowing the

type, breaking of such texts is impossible. Thus, a

ﬁrst cryptanalysis step is always to determine the

cipher type. Different metrics, like text frequency

analysis and the index of coincidence are helpful

tools and indicators for the type of the cipher.

The MTC3 challenge is based on the aforemen-

tioned problem of often not knowing the type of

ciphers of historic encrypted texts. The term “Ci-

pher ID” refers to the type of used algorithm, or

its “identiﬁer”. In the challenge the participants

have to identify different ciphers that were used

for encryption of a given dataset of 500 cipher-

texts, where each is 100 characters long. The

goal is to determine the type of cipher used to en-

crypt each message. The following ciphers were

used exactlay 100 times each: simple monoalpha-

betic substitution cipher, Vigen´

ere cipher, colum-

nar transposition cipher, Playfair cipher, and the

Hill cipher. The English plaintexts are randomly

taken from the Brown University Standard Cor-

pus3. The set of provided ciphertexts is shufﬂed.

4.2 Cipher ID as a Classiﬁcation Problem

The general idea is to treat the detection of the ci-

pher type as a classiﬁcation problem. Each type of

2https://www.mysterytwisterc3.org/en/

challenges/level-2/cipher-id-part-1

3Brown University Standard Corpus of Present-Day

American English, available for download at http://www.

cs.toronto.edu/~gpenn/csc401/a1res.html

Feature A

Feature B

Cipher 1

Cipher 2

Cipher 3

Figure 3: Ciphertexts (dots) in a multidimensional

feature space. Classiﬁed into three cipher classes

(red, green, blue)

cipher is regarded as a disjunct class, hence, there

is a monoalphabetic substitution class, a Vigen`

ere

class, etc. Figure 3 depicts the general idea. In

the ﬁgure, two feature dimensions (A and B) are

shown. Based on the cipher‘s characteristics, fea-

tures have stronger or weaker inﬂuence on the out-

put. Examples for features are the frequency of the

letter ‘A’ or the index of coincidence. The colored

dots (red, green, and blue) represent different ci-

phertexts. The dots are surrounded by a line show-

ing the classes (or ciphers) each ciphertext belongs

to.

With Stamp’s challenge, we have 5 different

classes, one for each cipher type. The ciphertexts’

features are given as input vectors to an ANN

which then classiﬁes the text into one of the afore-

mentioned classes. As output, the ANN then re-

turns the ID of the detected cipher.

4.3 A Cipher ID Detection ANN

In the following we discuss the development of a

cipher ID detection ANN based on the steps intro-

duced in Section 3.2. Since it is a trivial step, we

omit the persisting step (Step 5):

Step 1: Loading/preparing training/test data

To train an ANN a sufﬁcient amount of training

and test data is needed. In the case of the cipher

ID detection ANN, ciphertexts of the types which

should be detected are needed. Therefore, we ﬁrst

implemented all 5 ciphers in Python. We also cre-

ated a Python script which extracts random texts

Proceedings of the 3rd International Conference on Historical Cryptology, HistoCrypt 2020

82

from a local copy of the Gutenberg library. Using

this script, we can create an arbitrary amount of

different (English) plaintexts of a speciﬁc length.

After extracting a sufﬁcient amount of plaintexts

of length 100 each, we encrypted these with the

ciphers – always using randomly generated keys.

We created different sets of ciphertext ﬁles with

different amounts of ciphertexts for each cipher

(1000, 5 000, 50 000, 100 000, and 250 000). Thus,

the total amount of ciphertexts provided to the

ANN is a multiple of 5 of those numbers.

Since the ANN is not able to work on text di-

rectly, the data has to be transformed into a nu-

merical representation. Our ﬁrst idea was to di-

rectly give each letter as a number to the network,

thus, having a feature vector of 100 ﬂoat values.

As this lead to a poor performance of our network

we began experimenting with different other fea-

tures, i.e. statistical values of the ciphertext. The

next step shows our features and the overall ANN.

Step 2: Creating a model We experimented

with different features as input values as well as

with different amounts of hidden layers, widths

of hidden layers, activation functions, optimizers,

etc. We here now present the ﬁnal ANN setup

which performed best in our tests.

We use the following features:

•1 neuron: index of coincidence (unigrams)

•1 neuron: index of coincidence (bigrams)

•26 neurons: text frequency distribution of un-

igrams

•676 neurons: text frequency distribution of

bigrams

Thus, the ANN has an input layer consisting of

a total of 704 input neurons. After that, we create

5 hidden layers, where each layer has a total of

2

3·in put Size +out putSize=2

3∗704 +5=474

(1)

neurons. Since we have 5 classes of cipher types,

the output layer consists of ﬁve output neurons,

each one for a speciﬁc cipher type. In Python, we

created the network with the following code:

# s i z e s o f l a y e r s

in put Size = 704

ou tpu tSi ze = 5

hi dde nSi ze = 2 ∗( i nputS ize / 3 ) +

ou tpu tSi ze

# c r e a t e ANN m od el w i t h K er as

mo de l = ke ra s. S e qu en ti a l ()

# c r e a t e i n p u t l a y e r

mo del . ad d(ke ra s .la yers.Fl at ten (

i np u t s h ap e = ( in pu tS i ze ,) ))

# c r e a t e f i v e h i dd e n l a y e r s

fo r iin r an ge(0 , 5) :

mo del . ad d(ke ra s .la yers.Dens e(

(in t( hi dd e nS iz e )) ,

ac tiv ation = " relu" ,

u se b ia s = T ru e ))

# c r e a t e o u t p u t l a y e r

mo del . ad d(ke ra s .la yers.Dens e(

ou tp utS ize ,

ac tiv ati on = ’ s of tm ax ’ ))

The type of the hidden layer’s activation func-

tion is ’relu’ and the output layer’s activation func-

tion is ’softmax’ (see Section 3.1).

Step 3: Training the model We trained dif-

ferent conﬁgurations of our model with different

amounts of ciphertexts. We used different sizes of

training data sets and obtained the following re-

sults (output of our test program) with our best

model:

T r ai ni n g d at a : 4 , 500 c i p h e r t e x t s

T es t d a t a : 500 c i p h e r t e x t s

−Si mp le S u b s t i t u t i o n : 87%

−V i g e n e r e : 75%

−Colu mnar T r a n s p o s i t i o n : 100%

−P l a y f a i r : 80%

−H i l l : 32%

T o ta l c o r r e c t : 74%

T r ai ni n g d at a : 2 4 , 50 0 c i p h e r t e x t s

T es t d a t a : 500 c i p h e r t e x t s

−Si mp le S u b s t i t u t i o n : 88%

−V i g e n e r e : 54%

−Colu mnar T r a n s p o s i t i o n : 100%

−P l a y f a i r : 93%

−H i l l : 64%

T o ta l c o r r e c t : 79%

T r a i n i n g d a t a : 2 4 9 , 5 00 c i p h e r t e x t s :

T es t d a t a : 500 c i p h e r t e x t s

−Si mp le S u b s t i t u t i o n : 97%

−V i g e n e r e : 63%

−Colu mnar T r a n s p o s i t i o n : 100%

−P l a y f a i r : 99%

−H i l l : 70%

Proceedings of the 3rd International Conference on Historical Cryptology, HistoCrypt 2020

83

T o ta l c o r r e c t : 86%

T r ai ni n g d at a : 4 99 , 50 0 c i p h e r t e x t s :

T es t d a t a : 500 c i p h e r t e x t s

−Si mp le S u b s t i t u t i o n : 99%

−V i g e n e r e : 63%

−Colu mnar T r a n s p o s i t i o n : 100%

−P l a y f a i r : 97%

−H i l l : 67%

T o ta l c o r r e c t : 87%

T ra in . d a ta : 1 , 249 , 50 0 c i p h e r t e x t s :

T es t d a t a : 500 c i p h e r t e x t s

−Si mp le S u b s t i t u t i o n : 100%

−V i g e n e r e : 69%

−Colu mnar T r a n s p o s i t i o n : 100%

−P l a y f a i r : 99%

−H i l l : 78%

T o ta l c o r r e c t : 90%

The ﬁrst two training runs were done in a few

minutes. The third test already took about an hour

on an AMD FX8350 with 8 cores. The last two

tests took several hours to run. Since there is a

problem with the CUDA support of TensorFlow

with the newest Nvidia driver in Microsoft Win-

dows, we could only work with the CPU and not

with the GPU, making the test runs quite slow.

During our tests, we saw that with increasing

the size of our training data, we could also increase

the quality of our detection ANN. Nevertheless,

the detection rate of the Vigen`

ere cipher and the

Hill cipher is too low (between 60% and 80%).

In our ﬁrst experiment, ciphertexts encrypted with

the Hill cipher were only correctly detected by

32% and Vigen`

ere was only 75%. We assume,

that there is a problem for our ANN to differenti-

ate between those two ciphers, since their statisti-

cal values (text frequencies, index of coincidence)

are similar.

Step 4: Testing and optimizing the model For

optimizing our model (with respect to detection

performance), we tested other additional features

provided to the ANN. Those features are:

•Text frequency distribution of trigrams

•Contains double letters

•Contains letter J

•Chi square

•Pattern repetitions

•Entropy

•Auto correlation

The text frequencies of trigrams had no notice-

able inﬂuence on the detection rate, but made the

training phase much slower, since 263=17576

additional input neurons were needed. Also, an

equivalent number of neurons in the hidden layers

were needed. Thus, we removed the trigrams from

our experiment.

The ”Contains double letters” feature did also

have no inﬂuence. We additionally realized that

the double letters are also detected by the bigram

frequencies. Thus, we also removed this feature.

Same applies to the ”Contains letter J” feature.

The idea here was, that the Playfair cipher has

I=J, thus, there is no Jin the ciphertext.

The chi square feature also had no inﬂuence on

the detection rate.

With pattern repetitions, we aimed at giving the

network an “idea” of the repetitive character of Vi-

gen`

ere ciphertexts. Unfortunately it did not help to

increase the detection rate.

Entropy and auto correlation of the ciphertext

were also given as features. Also no inﬂuence on

the detection rate was realized.

Finally, we kept only index of coincidence on

unigrams and bigrams as well as letter frequencies

of unigrams and bigrams.

5 Cipher Type Detection and Solving

Method for Stamp’s Challenge

To actually solve Stamp’s challenge this method

brings together the following parts:

•Cipher type detection ANN

•Monoalphabetic substitution solver

•Vigen`

ere solver

•Transposition solver

•Playfair solver

The method consists of the cipher type detection

ANN and of solvers for each cipher despite the

Hill cipher. The basic idea is the following: First,

the set of ciphers is classiﬁed by the cipher de-

tection ANN. After that, each cipher has been as-

signed a cipher ID. Since we know that only about

90% of the cipher types is classiﬁed correctly, we

have to check each cipher type for correctness, in

Proceedings of the 3rd International Conference on Historical Cryptology, HistoCrypt 2020

84

order to reach a overall classiﬁcation correctness

of 100%. Thus, each ciphertext is then tested in a

ﬁrst run using its corresponding solver, despite the

ciphertexts marked as Hill cipher. Hill cipher, es-

pecially in the case of a 4x4-matrix and ciphertext-

only is a hard to solve cipher.

After that, all ciphertexts that could be success-

fully solved using the solvers are marked as “cor-

rectly classiﬁed”. The remaining ciphertexts, that

could not be solved using the assigned cipher type,

are then tested using the three other solvers. In the

end, there should only be a set of 100 ciphertexts

(in the case of the Stamp challenge), which cannot

be solved with the four solvers. In that case, these

100 remaining ciphertexts must be encrypted by

the Hill cipher. Since there is no good solver avail-

able for Hill ciphers, which performs much better

than brute-force in the ciphertext-only case, this is

very time consuming or nearly impractical for the

Hill cipher.

Execution time for classiﬁcation with addi-

tional help of solvers Let Sbe the time a sin-

gle solver needs to test a given ciphertext, and this

time is the same for all solvers. After Stime is

elapsed, the solver either produced a correct result

or we stop it, since we assume that the solver is

the wrong one for the speciﬁc ciphertext. In the

case that we do not use the cipher detection ANN,

we would need an overall of 4 ·500 ·S=2000 ·S

amount of time to test each ciphertext with 4 dif-

ferent solvers. If after executing all solvers exactly

100 unsolved ciphertexts remain, these are most

probably texts encrypted using the Hill cipher. In

that case, we solved Stamp’s challenge.

Now, lets assume that testing a ciphertext using

the ANN takes only a fraction of S, i.e. the clas-

siﬁcation time for a single ciphertext is Twhere

TS. In the real world, this is true since test-

ing the 500 ciphertexts using our ANN only takes

less than a second to be done. Generally, apply-

ing (testing) an ANN is much faster than train-

ing it. Since we know that the classiﬁcation is

only correct by about 90%, we have to test each

ciphertext using the classiﬁed cipher type despite

those classiﬁed as Hill cipher-encrypted. Lets as-

sume that about 100 texts are classiﬁed as hill ci-

pher, thus about 400 ciphertexts remain to be ana-

lyzed. Since we know that 90% of those 400 texts

are already classiﬁed correctly, 10% of those texts

remain unsolved. These 10% plus the 100 hill-

cipher classiﬁed texts have now to be analyzed

using all 4 solvers (this can be further optimized

by only testing the remaining 10% with the three

unused solvers). This leads to the following total

amount of time needed for classiﬁcation:

500 ·T+400 ·S+40 ·3·S+100 ·4·S

which is 920 ·Sis so small that it can be left out

of the calculation since TS. Thus, we have a to-

tal execution time saving of about 100%−100% ·

920·S

2,000·S=54% for the classiﬁcation of the cipher-

texts of Stamp’s challenge.

If we assume that a solver needs about one

minute to successfully solve a ciphertext, using all

solvers for testing would take about 2,000 min-

utes (about 33h). Using the ANN to reduce the

amount of needed solvers, this time would now

be 920 minutes (about 15h). Clearly, in the case

of the ANN the time for training the network has

also to be considered, which can also take several

hours. Nevertheless, this time is only needed once,

since the resulting ANN can be reused for classi-

ﬁcation tasks. The solvers could be executed in

parallel, which further reduces the overall elapsed

time.

6 Conclusion

This paper shows the current progress of our work

in the area of artiﬁcial neural networks (ANN)

used to detect the cipher types of ciphertexts en-

crypted with ﬁve different classical ciphers: sim-

ple monoalphabetic substitution, columnar trans-

position, Vigen`

ere, Hill, and Playfair. For creation

and training of an ANN consisting of ﬁve hid-

den layers, we used Google’s TensorFlow library

and Keras. The goal of our initial research was to

solve Stamp’s challenge (see Section 4.1), which

required to determine the cipher type of 500 en-

crypted using the aforementioned ﬁve classical ci-

phers. The network was able to detect about 90%

of the ciphers correctly. Detection rates for Play-

fair and Hill were too low to solve the challenge

completely. Besides the creation of the ANN we

also proposed a method (see Section 4) for solv-

ing the challenge using the ANN as well as dif-

ferent solvers, e.g. from CrypTool 2 (Kopal et al.,

2014). Examples, how the solvers of CrypTool 2

can be used are shown in (Kopal, 2018). With the

method, described in Section 5, about 54% execu-

tion time could be saved for solving Stamp’s chal-

lenge. Another part of this paper is a survey of the

related work with respect to ANN and cryptanaly-

sis of classic ciphers (see Section 2) and an intro-

Proceedings of the 3rd International Conference on Historical Cryptology, HistoCrypt 2020

85

duction into the topic for the HistoCrypt audience

(see Section 3).

In future work, we want to extend our network

(e.g. by using different ANN architectures) and

method (e.g. by ﬁnding better features)in order

to detect more different and difﬁcult cipher types.

We also want to use the methods in the DECRYPT

research project (Megyesi et al., 2020) to further

identify unkown types of several ciphers currently

stored in the DECODE database.

Acknowledgments

This work has been supported by the Swedish Re-

search Council, grant 2018-06074, DECRYPT –

Decryption of historical manuscripts.

References

Mart´

ın Abadi, Paul Barham, Jianmin Chen, Zhifeng

Chen, Andy Davis, Jeffrey Dean, Matthieu Devin,

Sanjay Ghemawat, Geoffrey Irving, Michael Isard,

et al. 2016. Tensorﬂow: A system for large-

scale machine learning. In 12th USENIX Sympo-

sium on Operating Systems Design and Implemen-

tation (OSDI 16), pages 265–283.

Ahmed J Abd and Sufyan Al-Janabi. 2019. Classi-

ﬁcation and Identiﬁcation of Classical Cipher Type

Using Artiﬁcial Neural Networks. Journal of Engi-

neering and Applied Sciences, 14(11):3549–3556.

B Chandra, P Paul Varghese, Pramod K Saxena, and

Shri Kant. 2007. Neural Networks for Identiﬁcation

of Crypto Systems. In IICAI, pages 402–411.

Chih-Chung Chang and Chih-Jen Lin. 2011. LIB-

SVM: A library for support vector machines. ACM

transactions on intelligent systems and technology

(TIST), 2(3):27.

Franc¸ois Chollet. 2015. Keras: Deep learning library

for theano and tensorﬂow. URL: https://keras.io/,

7(8):T1.

Riccardo Focardi and Flaminia L Luccio. 2018. Neu-

ral Cryptanalysis of Classical Ciphers. In ICTCS,

pages 104–115.

Sam Greydanus. 2017. Learning the Enigma

with Recurrent Neural Networks. arXiv preprint

arXiv:1708.07576.

Mahmood Khalel Ibrahem Al-Ubaidy. 2004. Black-

box attack using neuro-identiﬁer. Cryptologia,

28(4):358–372.

Diederik P Kingma and Jimmy Ba. 2014. Adam: A

Method for Stochastic Optimization. arXiv preprint

arXiv:1412.6980.

Nils Kopal, Olga Kieselmann, Arno Wacker, and Bern-

hard Esslinger. 2014. CrypTool 2.0. Datenschutz

und Datensicherheit-DuD, 38(10):701–708.

Nils Kopal. 2018. Solving Classical Ciphers with

CrypTool 2. In Proceedings of the 1st Interna-

tional Conference on Historical Cryptology His-

toCrypt 2018, number 149, pages 29–38. Link¨

oping

University Electronic Press.

Be´

ata Megyesi, Nils Blomqvist, and Eva Pettersson.

2019. The DECODE Database: Collection of His-

torical Ciphers and Keys. In The 2nd International

Conference on Historical Cryptology, HistoCrypt

2019, June 23-26 2019, Mons, Belgium, pages 69–

78.

Be´

ata Megyesi, Bernhard Esslinger, Alicia Forn´

es,

Nils Kopal, Benedek L´

ang, George Lasry, Karl de

Leeuw, Eva Pettersson, Arno Wacker, and Michelle

Waldisp¨

uhl. 2020. Decryption of historical

manuscripts: the decrypt project. Cryptologia,

pages 1–15.

Malte Nuhn and Kevin Knight. 2014. Cipher Type De-

tection. In Proceedings of the 2014 Conference on

Empirical Methods in Natural Language Processing

(EMNLP), pages 1769–1773.

Tariq Rashid. 2016. Make your own neural network.

CreateSpace Independent Publishing Platform.

G Sivagurunathan, V Rajendran, and T Purusothaman.

2010. Classiﬁcation of Substitution Ciphers using

Neural Networks. International Journal of com-

puter science and network Security, 10(3):274–279.

Ramon Wartala. 2018. Praxiseinstieg Deep Learning:

Mit Python, Caffe, TensorFlow und Spark eigene

Deep-Learning-Anwendungen erstellen. O’Reilly.

Proceedings of the 3rd International Conference on Historical Cryptology, HistoCrypt 2020

86