Content uploaded by Rogério Reis
Author content
All content in this area was uploaded by Rogério Reis
Content may be subject to copyright.
Automated Ciphertext-Only Cryptanalysis
of the Bifid Cipher
Ant´onio Machiavelo Rog´erio Reis
Technical Report Series: DCC-2006-1
Departamento de Ciˆencia de Computadores – Faculdade de Ciˆencias
&
Laborat´orio de Inteligˆencia Artificial e Ciˆencia de Computadores
Universidade do Porto
Rua do Campo Alegre, 823 4150 Porto, Portugal
Tel: +351+2+6078830 – Fax: +351+2+6003654
http://www.ncc.up.pt/fcup/DCC/Pubs/treports.html
Automated Ciphertext-Only Cryptanalysis
of the Bifid Cipher
Ant´onio Machiavelo
ajmachia@fc.up.pt
Centro de Matem´atica da Universidade do Porto
Rog´erio Reis
rvr@ncc.up.pt
DCC&LIACC Universidade do Porto
February, 2006
Abstract
In this paper we describe a fully automated ciphertext-only cryptanalysis attack on
the Bifid cipher, for which the original text language is known. We have implemented this
attack using Python. We use an easily computable statistical function to find the period
of the cipher, and then the key-table is generated in a fairly efficient way. The process is
directed in such a way that strongly narrows the search space of possible solutions. This
results in a feasible attack to a Bifid cryptogram, provided that its length is enough for
accurate statistical analysis.
1 Introduction
The Bifid cipher [Ame05, Kah67] was invented by F´elix-Marie Delastelle (1840-1902) and
although was never used in any “serious” application, it became one of the most popular
ciphers among “amateur” cryptologists.
The key consists of a square table, henceforth called key-table, composed by the charac-
ters of the alphab et, normally a 5 ×5 square with characters iand jidentified, also called a
Polyabus key, and a small integer ℓ, the block size or period, normally greater than 6. Take
for instance ℓ= 7 and the following key-table
0 1 2 3 4
0u d v g r
1q t p z a
2h b w f e
3x c i k o
4n l s y m
The text is divided in blocks of size ℓ, padded if necessary with some nulls at the end,
and the coordinates of each letter are then written underneath it. Taking, for example, the
text
“Err and err and err again but less and less and less” ,
2
one would obtain
errande rrander ragainb utlessa ndlessa ndlessx
2001402 0014020 0101342 0142441 4142441 4042443
4444014 4440144 4434201 0114224 0114224 0114220.
The ciphertext is now obtained recoding each block, using the same table, by reading
pairs of coordinates horizontally, from left to the right as the following scheme suggests:
2001402
4444014
-
In our example the resulting cryptogram would be:
hdnemna uavrmdm ddoeysd dsmqtse lsmqtse nsmxtlh.
The parity of the period affects some aspects of the cipher, and in fact most authors
[Ame05, Gai39] tend to recognize in the ciphers with an odd period an additional crypto-
graphic strength, although some others present arguments against that [Bow60].
In general, let Σ be the alphabet used, with |Σ|=n2, for some n∈N. Let the key of a
specific Bifid cipher be:
0 1 ··· n−1
0σ0,0σ0,1··· σ0,n−1
1σ1,0σ1,1··· σ1,n−1
.
.
..
.
..
.
.··· .
.
.
n−1σn−1,0σn−1,1··· σn−1,n−1
When the block size is odd, encryption is done according to the following scheme,
σ0σ1σ2··· σℓ−3σℓ−2σℓ−1
x0x1x2··· xℓ−3xℓ−2xℓ−1
y0y1y2··· yℓ−3yℓ−2yℓ−1
7−→
τ0τ1··· τℓ−3
2τℓ−1
2τℓ+1
2··· τℓ−1
x0x2··· xℓ−3xℓ−1y1··· yℓ−2
x1x3··· xℓ−2y0y2··· yℓ−1
while for a even block size,
σ0σ1σ2··· σℓ−3σℓ−2σℓ−1
x0x1x2··· xℓ−3xℓ−2xℓ−1
y0y1y2··· yℓ−3yℓ−2yℓ−1
7−→
τ0τ1··· τℓ−2
2τℓ
2τℓ+2
2··· τℓ−1
x0x2··· xℓ−2y0y2··· yℓ−2
x1x3··· xℓ−1y1y3··· yℓ−1
For a fixed period, the number of different Bifid ciphers is actually smaller than the
number of distinct key-tables, because if one applies the very same permutation two both
rows and columns, the resulting key-table will yield the same cipher. That this comprehends
all possible Bifid ciphers, follows from the fact that in order to get the same cipher using a
table in which two rows were swapped, an additional swap of columns with identical indexes
must also be performed, because all the diagonal elements must be preserved. This is made
obvious by the observation that for a block made by repeating a single character, its image
in the cryptogram will result in itself if and only if that character is in the diagonal of the
key-table. This shows that, although the number of different tables is n2!, the number of
corresponding Bifid ciphers is only n2!
n!.
3
2 How to get the period
In this section we will see that the block size, ℓ, can easily be determined by computing the
frequency of pairs of equal letters at a given distance d,i.e. the number of occurrences of the
pattern αΣd−1α, for all α, and graphing the results as a function of d. The shape obtained
will approximately be a sinusoid, with the “right” period. That this is so, will now be shown
in detail for the case in which the period is odd. A similar argument applies to the even
case.
2.1 Distribution of frequencies of homogeneous non-connected digraphs
For α∈Σ, let row(α) and col(α) denote the row and column, respectively, of the key-table
to which αbelongs to. Let p(α) represent the probability of an occurrence of αin the text,
and p(αβ) the corresponding probability for the digraph αβ. By αt, the transposed of α,
we denote the key-table entry that satisfies: row(αt) = col(α) and col(αt) = row(α). The
following quantities play an important role in what follows:
ρα=X
β∈row(α)
p(β),(1)
κα=X
β∈col(α)
p(β),(2)
Bα=X
β∈row(α)
γ∈row(αt)
p(βγ),(3)
Cα=ρακα,(4)
Aα=X
β∈col(α)
γ∈col(αt)
p(βγ).(5)
We start by determining the probability, P(τi=α), that the i-th letter of the ciphertext
be equal to the particular letter α∈Σ. This probability depends on whether the letter τi
occurs in the section before the central position of the block, in that central position, or in
the section after it. By the way, the symbols B,Cand Agiven above stand for “before”,
“central” and “after”. Let us consider the three cases, as displayed in the following figure:
B C A
z}| { z }| { z }| {
··· τi··· τℓ−1
2··· τℓ+1
2+j···
··· x2i··· xℓ−1··· y2j−1···
··· x2i+1 ··· y0··· y2j···
(0≤i,j≤ℓ−3
2)
In the first case:
P(τi=α) = P(σ2i−1∈row(α)∧σ2i∈row(αt)) = Bα.(6)
In the second case:
P(τℓ−1
2=α) = P(σℓ∈row(α)∧σ1∈col(α)) ≃ρακα=Cα.(7)
4
Finally, in the last case:
P(τℓ+1
2+j=α) = P(σ2j∈col(αt)∧σ2j+1 ∈col(α)) = Aα.(8)
The probability, Pd, that two letters at distance dare the same can now be expressed in
terms of the quantities Bα, Cα, Aαas:
Pd=X
α
(B2
α+A2
α)·P(H H )
d+X
α
C2
α·P(M M )
d+X
α
Cα(Bα+Aα)·P(H M )
d+X
α
BαAα·P(H H′)
d,
where:
P(HH )
d=P(“letters are both in the same “half” (Bor A) of the blocks they belong to”),
P(MM)
d=P(“both letters are in middle position of their block”),
P(HM )
d=P(“one letter is in the middle position of its block while the other is not”),
P(HH ′)
d=P(“one of the letters is in the Bhalf and the other in the Ahalf”).
It is not too hard to see that these probabilities depend only on dmodulo ℓ, and that one
has, for 1 ≤d≤ℓ:
P(HH )
d=1−2d
ℓ−1
ℓ
P(MM)
d=(0, d < ℓ
1
ℓ, d =ℓ
P(HM )
d=(2
ℓ, d < ℓ
0, d =ℓ
P(HH ′)
d=(1−1
ℓ−1−2d
ℓ, d < ℓ
0, d =ℓ.
For example, the first equality may be established as follows. For an homogeneous
non-connected digraph of distance d, let jbe the position of its leftmost character in the
ciphertext. We have to consider two separate cases: d≤ℓ−1
2and d > ℓ−1
2. In the first case,
illustrated on the top of Figure 1, for the digraph to be entirely contained in a Bor A“half”
of a block, one must have
0≤j < ℓ−1
2∧j+d < ℓ−1
2∨ℓ−1
2< j < ℓ ∧j+d < ℓ.
The number of jthat satisfies the above conditions is
ℓ−2d−1.
In the second case, illustrated on the bottom of Figure 1, one is lead to the conditions:
0≤j < ℓ−1
2∧l≤j+d < ℓ +ℓ−1
2∨ℓ−1
2< j < ℓ ∧ℓ+ℓ−1
2< d +j < 2ℓ,
5
B BC CA A
? ?
d
? ?
d
j j
B BC CA A
? ?
d? ?
d
jj
Figure 1: Illustration for (H H ) case
and the number of jin this case is
−ℓ+ 2d−1.
Hence the result above for P(HH )
d.
Considering,
r=X
α
(B2
α+A2
α)
s=X
α
C2
α
u=X
α
Cα(Bα+Aα)
v=X
α
BαAα,
we get from the above that,
Pd=
r+1
ℓ(2u−v−r)−2d
ℓ(r−v), d ≤ℓ−1
2
2v−r+1
ℓ(2u−v−r) + 2d
ℓ(r−v),ℓ−1
2< d < ℓ
r+1
ℓ(s−r), d =ℓ.
(9)
Notice that r−v=P
αB2
α+A2
α−BαAα=P
α(Bα−1
2Aα)2+3
4A2
α≥0, and that Pi=
Pℓ−ifor i=1,2,··· ,ℓ−1
2. So one concludes that:
P1≥ P2≥ · · · ≥ Pℓ−1
2=Pℓ+1
2≤ · · · ≤ Pℓ−2≤ Pℓ−1=P1.(10)
On the other hand,
ℓ(Pℓ− P1) = (ℓr + (s−r)) −(ℓr + (2u−v−r)−2(r−v))
= 2r+s−2u−v
=X
α2B2
α+ 2A2
α+C2
α−2CαBα−2CαAα−BαAα.
The quadratic form 2x2+ 2y2+z2−2zx −2zy −yx is not positive definite and thus can
take on negative values. In fact, for some cryptograms we have observed a negative sum for
the right hand side of the last equality. However looking at Pℓ− P2one obtains a sum of
quadratic expressions that are positive definite. Therefore Pℓ>P2.
All this proves the following:
6
Theorem 1 The function
f:d7−→ Pd
evaluated over a cryptogram obtained from a Bifid cipher of period ℓis approximately a
periodic function with the same period.
This is well illustrated by the example in Figure 2.
1 10 20 30 40 50
Figure 2: Analysis for a cypher of period 9.
2.2 Distribution of the standard deviation for non-connected digraphs
For some keys the graph of the distribution function f:d7→ Pdis specially flat making
the above method very hard to apply (see Figure 3). However, if we take the function
f′:d7→ stdd, where stddis the standard deviation of the frequencies of the non connected
digraphs of distance d, its graph will reveal the half period as its maximum value (see
Figures 4 and 5). Because pairs of letters, in the cryptogram, that are not at distances
equal or around half of the period (respectively for the even and odd period cases, see figure
6) come from non contiguous letters in the text, this tends to flatten the original language
statistical signature. But for those special distances, the digraph statistical peculiarities of
the original language causes a slight increase of the standard deviation.
Figure 3: Example of a cryptogram that defeats method of Section 2.1 (period 8).
Figure 4: Example presented in Figure 3 defeated by method of Section 2.2.
3 How to find the key
All the previous known attacks on the Bifid cipher rely on the knowledge of some cribs (pieces
of known plaintext). Because we intend to design a ciphertext-only attack, and although we
use some results of Bowers [Bow60], we will essentially rely on the knowledge of statistical
properties of the original language.
In the cryptanalysis of the Bifid cipher we used three different kinds of clues that are
described below.
7
1 10 20 30 40 50
Figure 5: Standard deviation test for a cryptogram of period 7
Figure 6: Interference of coordinates for odd and even periods.
3.1 Row and column differential probabilities
Observe that:
X
β∈row(α)
Bβ=X
β∈row(α)X
γ∈row(β)
δ∈row(βt)
p(γδ) = X
β∈row(α)X
γ∈row(α)
δ∈row(βt)
p(γδ)
=X
γ∈row(α)X
β∈row(α)X
δ∈row(βt)
p(γδ) = X
γ∈row(α)
δ∈Σ
p(γδ) = X
β∈row(α)
p(β).
This, and entirely analogous observation for P
β∈col(α)
Aβ, leads to:
Theorem 2
X
β∈row(α)
(Bβ−p(β)) = 0,(11)
and
X
β∈col(α)
(Aβ−p(β)) = 0.(12)
Using the first of these equalities, that the sum of probability deviation of the entries
of each row must be approximately zero, we can generate all the acceptable key-tables,
although no information is available on the order of the entries in each row. To speed up this
process, and to avoid the generation of each key table more than once, we direct the choice
of the entries for each row by means of a sort of generalized Dirichlet principle: the value
of a summand must be at least the value of the sum divided by the number of summands.
To choose the first element of each row, we are free to choose one with maximal absolute
deviation, from the set of the unused letters (after all, it has to belong to some row). The
search for the other letters is narrowed by considering only the ones selected by the principle
just stated. The equality (12) is used only at a latter stage of the algorithm.
8
3.2 Clues on line and column membership
The following scheme, based on an observation that goes back at least to Bowers [Bow60],
··· σ2i−1σ2iσ2i+1 σ2i+2 σ2i+3 ···
··· x2i−1x2ix2i+1 x2i+2 x2i+3 ···
··· y2i−1y2iy2i+1 y2i+2 y2i+3 ···
↓
··· τiτi+1 τi+2 ··· τℓ+1
2−i−1τℓ−1
2+iτℓ−1
2+i+1 ···
··· x2ix2i+2 x2i+4 · · · ∗ y2i+1 y2i+3 ···
··· x2i+1 x2i+3 ∗ · · · y2iy2i+2 y2i+4 ···
makes it clear that, for an odd period greater than 5, if a five-letter group appears repeated
in the plaintext, in an odd position relative to the period, this will produce the appearance
of the following repetition in the ciphertext: ABW?XCD and ABY?ZCD, where Wand Ybelong
to the same line, while Xand Zbelong to the same column of the key table. This can be
used to shorten the row generation phase described above, as well to validate its output.
Information about row membership is used directly in the generation, while information on
column membership can be used as a negative filter.
3.3 Clues on the diagonal and transposed pairs
In a ciphertext generated from a random text one should expect
Bα≃ραραt,∀α∈Σ.(13)
Therefore, max{Bα:α∈Σ}should be attained at the letter in the diagonal on the line with
maximum ρvalue. Moreover, the values Bα(α∈Σ) should be distributed in pairs, since
Bα≃ραραt≃Bαt. Actual experiments show that in a ciphertext generated from a real
text, among the top five values of the sequence Bα(α∈Σ) one tends to observe a letter of
the diagonal, and a pair of transposed letters. Entirely similar remarks hold for the sequence
Aα(α∈Σ).
In order to crack the Bifid, besides the composition of all rows and columns, one needs to
know the diagonal composition. This determines, in conjunction with the period, the cipher.
To obtain the diagonal member of a given row, the following observation can be used. From
the ciphertext one readily obtains the value Bαfor all the row members. From the original
language statistics, one computes the right hand member of (3) using the given row for both
variables. This value identifies the letter αin the row that has the right Bαvalue and thus
belongs to diagonal. After obtaining in this way a candidate for the key table, one can
then use (3) once more to further verify its soundness. In the end, one tries to decrypt the
cryptogram using the candidate key, and validates the resulting text running it through a
Friedman test [Bau97, MvOV96].
4 The program
As referred before, this method was successfully implemented as a computer program. The
program has approximately 60Kcharacters of well commented Python [pyt], and can obtain
9
a solution for a cryptogram in less than 20 seconds for the majority of the cases, although
it can take a couple of hours for some others. Because most of the tests and properties in
which this method relies upon are of statistical nature, some limits for error tolerance must be
tuned and provided as parameters for the algorithm. The explanation of the structure of the
program can be better understood with the help of scheme in Figure 7. Given a cryptogram
A
?
C
?
D
?
E
?
F
?
B
-
ǫ199K
ǫ299K
ǫ399K
6
6
Figure 7: Scheme of the program structure
and the respective period, a first block, A, computes statistics for the frequencies of each
character in positions in the cipher blocks corresponding to B,Cand A(see Figure 1.
Because this only depends on the value of the period, and on the cryptogram itself, it is
computed only once. Statistics for the original language of the text must be also present,
both for monographs and digraphs occurrences.
The method described in Section 3.2 depends only on the data already computed in block
A. A new block of program, B, uses this information to infer the largest possible number
of pairs that must belong to the same row (or to the same column). But because some
pattern repetitions can be caused by “noise” introduced by the cipher and not by pattern
repetitions in the text, some of the pairs can be erroneous and the whole set of restrictions
become unsound. The transitive closure of the constraint pairs is evaluated, and the largest
sound set of restrictions is collected, disregarding all with less occurrences than the first
that originates a contradiction. This method does not ensure that all the restrictions kept
are valid, although no example of the contrary was found in the numerous test runs of the
program. This module contributes heavily for the reduction of the search space in block C,
but alone is not enough, not even close, to permit the rest of the search as a brute-force
approach. It is easy to see that although 16 well chosen pairs determine the row composition,
it is possible to have 41 pairs and still be impossible to infer such composition. Even with
texts with more than 100Kcharacters, for some keys, the number of pairs inferred do not
exceed 5 (!!). Once again this shows that the method described in Section 3.2 is not capable
of solving the problem by itself.
The next block of program, C, is responsible for the generation of the row composition,
i.e. what elements are inscribed in each row regardless the order of the entries inside each
10
row and the order of the rows. Using restrictions produced by the Bblock, the program tries
to fill the rest of the table taking in consideration what was described in Section 3.1. The
equality (11) must be considered as an approximation, because it is going to be evaluated
using two different sets of data: the one pertaining to the plaintext (that we ignore) is
substituted by the language statistics; the other can directly obtained from the ciphertext.
To evaluate the validity of the approximation, an error tolerance value ǫ1is thus needed. It is
not possible to establish the “right” value for ǫ1for all cryptograms. The same value can be
too loose for one problem, thus giving place to a search space larger than what is feasible to
visit, and too tight for another, leaving the search space empty, or even worst desert of valid
solutions. The solution for this problem was to assume a rather conservative value for ǫ1,
that normally originates a valid solution, but does not take too much time to compute, and
in the eventuality of a set of solutions without any valid one, the process ins repeated with a
larger value. Experience has shown that successive increments of 10% give good behaviour
results. Each candidate solution is generated by a recursive call of the main method in this
program block, and then goes through all the other stages of generation and validation of
the other blocks. If in any stage a validation concludes that the key table (or some other
preliminary structure) generated is not acceptable, the process stops for that instance, and
backtracks to the last point where another choice is available. This use of recursion and
depth-first search permits that the amount of memory wasted is relatively modest and do
not constitute the bottleneck for the all process.
After row composition is established (in program block C), it is time for finding the
order inside each row, i.e. the columns composition. This is accomplished in block D, using
exactly the same method used in the previous block, and with the same value for error
tolerance ǫ1. If no solutions are find, program backtracks to block Cwith a higher value for
ǫ1.
As referred above, the key table is only completely determined after the composition of
all rows and columns have been established, if it is possible to identify the elements of the
diagonal. This is the role of block E. Using the observations made in Section 3.3 these letters
are identified in each row, as being the ones that satisfy equation (13) for α=αt. For the
same reasons explained before, an error bound ǫ2is needed to evaluate acceptance. If no
solution is provided by this block, the whole program returns to block C, relaxing ǫ2.
Finally, block Fevaluates each key-table. Using the key table in question, a try is made
to decrypt the cryptogram, and the Friedman test is used on the resulting text. All tests
whose coefficient differs from the known Friedman coefficient for the original language less
than an error ǫ3are printed in the output. The program does not try to adjust the value of
this limit ǫ3. If some adjustment is necessary, it must be done “manually”, and the whole
process restarted. If no solutions are produced at the end of this block, the program returns
to block C, alternating some relaxation on the limits ǫ1and ǫ2. The solutions that pass this
final test, as well as some segment of the tentative decryption are printed in the output. At
this moment, human text recognition is the ultimate test. Amazingly, it is not uncommon
to get “wrong” translations of the cryptogram, that pass Friedman test with better results
than the original text! If the human does not find a suitable key in the set proposed by the
program, what we discovered that is quite uncommon, the whole process should be restarted
with all error limits (ǫ1,ǫ2and ǫ3) increased.
11
5 Concluding remarks
The method here presented, and its implementation as a computer program, permits to
crack in an acceptable time any sufficiently large cryptogram, provided that the language of
origin is know. It is not, because it essentially uses heavy statistical analysis, a system to
solve small cryptographic puzzles such as those for which the Bifid cipher is currently used:
for recreational purposes. “Typographical” ciphers such as this simply do not have good
security properties when used for large messages. If modified to better resist this kind of
attacks, they become too cumbersome to be used “by hand”.
To what this attack is concerned, there is no difference in cryptographic resistance
between ciphers of even and odd period. Some relations apply to the occurrence statistics
in part Cof ciphertext block, that only exists in odd period ciphers, but because of its less
significance, no advantage on it could be obtained.
Although some cryptograms can be harder to break than others, because of the “noise”
induced by the cipher, that result in a necessary relaxation of the error bounds in the
program, no key-tables should be considered weaker than the others. It is to the coupling
message-cipher that those characteristics belong, and not only to the cipher. By other hand,
key-tables that show a good distribution of the maximum differential probabilities in both
rows and columns, seem to be stronger in respect to Bowers’ attack. From this results a
much smaller set of a priori information possible to deduce on the composition of rows and
columns. This can slow down the process of cracking, but not in a impeditive manner.
References
[Ame05] American Cryptogram Association. The ACA and You — A handbook for the
members of the American Cryptogram Association, 2005.
[Bau97] F. L. Bauer. Decrypted Secrets. Methods and Maxims of Cryptology. Springer,
1997.
[Bow60] William Maxwell Bowers. Practical Cryptanalisys (Volume II). The American
Cryptogram Association, 1960.
[Gai39] Helen Fouch´e Gaines. Cryptanalysis. A study of ciphers and their resolution.
Dover Publications, 1939.
[Kah67] David Kahn. The Codebreakers. The Story of Secret Writing. Scribner, 1967.
[MvOV96] A. Menezes, P. van Oorschot, and S. Vanstone. Handbook of Applied Cryptogra-
phy. CRC Press, 1996.
[pyt] Python ”official” web page. http://www.python.org.
12