Content uploaded by Oliver Bonham-carter
Author content
All content in this area was uploaded by Oliver Bonham-carter on Jul 25, 2015
Content may be subject to copyright.
sEncrypt: An Encryption Algorithm Inspired From
Biological Processes
Oliver Bonham-Carter1, Abhishek Parakh1,2, Dhundy Bastola1
1School of Interdisciplinary Informatics
2Nebraska University Center for Information Assurance
University of Nebraska at Omaha
Omaha, NE, 68182, USA
Email{obonhamcarter, aparakh, dkbastola}@unomaha.edu
Abstract—We present a new conceptual methodology for realiz-
ing encryption involving trap-door functions built from biological
processes. Many standard encryption methods such as RSA
security, for example, utilize functions that are easy to compute in
one direction but the reverse is a computationally hard problem
without a key. In biology, a trap-door like functions can be
created from natural phenomena such as the process of creating
protein sequences. A fragment of DNA can be transformed to
protein easily however given a protein sequence, it is very hard
to convert the protein information back to DNA. In essence,
protein creation is a lossy function and if we keep certain
side-information secret, then a trap-door like function can be
constructed from this mechanism that is ideal for encryption.
We propose sEncrypt (sequence Encrypt), a model inspired
by the central dogma of biology to encode, encrypt, decrypt
and decode plain text using publicly-available sequence data
from bioinformatics research. We evaluate the entropy of the
cipher text to show randomness of characters and show by
autocorrelation tests that the encrypted text of our method
contains no repetition which could form potential weaknesses.
These tests and results show that the sEncrypt framework
constitutes a good encryption framework for use in information
exchange.
Index Terms—DNA Encryption, DNA Decryption, Coding,
Latin Squares, sEncrypt.
I. INTRODUCTION AND RELATED WORK
In many modern information security technologies such as
encryption, key exchange, password protection and similar
kinds of security, the protocols which provide the actual
security are likely built out of functions similar to trap-door
functions [8]. When computing in one direction across one-
way functions, the task is trivial, but computing in the reverse
direction generally cannot be performed in feasible time [19].
However, trap-door one-way functions are easy to invert when
using a key.
To secure information, there are many different kinds of
algorithms available which rely on trap-door functions or other
functions for which the inverse is very difficult to find without
a key. For instance, the RSA algorithm [31] is based on the
presumed difficulty of factoring large integers (the factoring
problem).
A different kind of encryption, the Advanced Encryption
Standard (AES), originally called Rijndael, is a cryptographic
algorithm using symmetric block ciphers for protecting elec-
tronic data [7]. This forms a part of the Federal Information
Processing Standard. Serpent is a similar encryption system
using symmetric key block ciphers, [1]. Twofish also uses
symmetric key block cipher but with a block size of 128 bits
and key sizes up to 256 bits [33].
The AES, serpent and Twofish algorithms employ the
substitution-permutation network method to confuse and dif-
fuse output bits based on the input bits of the plain text. This
network forms a series of operations which are hard to invert
due to the near impossibility of constructing the input informa-
tion from the output bits of the substitution-permutation net-
work without the key. Additionally, these described methods
also satisfy Shannon’s confusion and diffusion properties [34]
which imply that the inter-character associations have been
removed when constructing the cipher text.
Other low power consuming algorithms have been devel-
oped. Among notables are RC series of algorithms, RC4 being
the most popular. Quasigroup based encryption algorithms
have been explored in [2], [3], [11]. At the same time quantum
cryptography [5], [23] has shown promise for perfectly secure
communication, however, remain far away from practical use
because of limitations in hardware design. “Encryption less”
secure storage of data by dividing it into partitions has been
explored in [24].
DNA watermarking, a system to identify fraudulent se-
quences or the unauthorized use of genetically modified or-
ganisms, has recently gained attention. In [12], DNA-Crypt, a
method to secretly mark sequences, is proposed using concepts
from both encryption and steganography. By their method,
the authors propose that information be conveniently binary-
encrypted using algorithms such as AES, RSA or Blowfish.
This encrypted information may eventually be placed into the
DNA of a living organism for long term storage. Since DNA
is likely to undergo mutations, the information must first be
protected by correction codes such as the Hamming-code or
WDH-code [35] to ensure that the information is unaltered.
Finally, the encrypted information is converted to DNA form
and is placed into organismal DNA where it cannot be found,
except by those that placed it there.
Encryption is another area where biological processes have
2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications
978-0-7695-5022-0/13 $26.00 © 2013 IEEE
DOI 10.1109/TrustCom.2013.43
321
contributed. The literature contains many methods which
involve wet-lab techniques. In [18], a wet-lab method of
encryption is presented employing primers (i.e., small and
unique strands of DNA used for locating specific regions
of DNA in a solution and here used as keys) to find a
specific region of DNA corresponding to a binary-encoded
plain text. Also employing primers, Gehani et al. [10] pro-
posed DNA-driven cryptography methods based on one-time
pads (encryption by modular addition) of nearly unlimited
size using DNA as a data structure. Encryption techniques
based on number conversion, DNA digital coding and PCR
(polymerase chain reaction) amplification are being explored
by [27]. Although wet-lab methods have been well-received
by the community, they may be much slower than computer-
algorithmic approaches such as the ones proposed in [13] and
[32].
In this paper we propose sEncrypt (sequence Encryption), a
new conceptual technique for encryption of plain text that uses
the process of DNA to protein translation as its foundation.
The proposed technique leverages two observations: first, is
the existence of publicly available databases containing DNA
sequences of millions of organisms and second, the many-to-
one mapping between DNA codons and corresponding amino
acid.
The mounting availability of publicly available DNA se-
quence data allows us to use terra-bytes of DNA data to form
a part of our encryption key. In other words, if one were
to randomly choose one of these sequences (belonging to a
particular organism), to encrypt the message (plaintext) to
be sent, then the identity of this organism could serve as a
part of the secret key that would need to be conveyed to the
recipient. Upon receiving the encrypted data and the secret key
(organism ID), the recipient could go to one of these public
repositories and download the corresponding DNA sequence
to decrypt the data. The existence of terra-bytes of DNA
data belonging to millions of different organisms makes it
difficult to determine the DNA of which organism was used
for encryption.
The second observation, many-to-one mapping, enables us
to create a trap-door like function in which some amount of
side information is needed to invert the function. This many-
to-one mapping arises from the fact that for 64 combinations of
DNA bases only 20 possible amino-acids exist. Amino-acids
form the building blocks of proteins. Further, for different
organisms, this mapping differs in frequency of use. In other
words, the frequency with which a given DNA codon maps
to a given amino acid is unique for different algorithms. This
information forms the second part of the secret key.
We show that the proposed encryption algorithm performs
well to increase the entropy of the input sequence (indicating
a highly random looking output sequence). This is one of the
properties a good cipher text should have. Further, we perform
auto-correlation tests between the outputs of different input
sequences to determine if the proposed technique introduces
any similar looking structure.
A. Background on DNA to Protein Translation
DNA (Deoxyribonucleic acid) is comprised of a sequence of
“bases” (molecules) called adenine (A), guanine (G), cytosine
(C) and thymine (T), sometimes also referred to as nucleotides.
Therefore, a strand of DNA can be represented as a string
of characters A, G, T and C. Further, DNA exists in the
form of double stranded helical structure and each strand runs
anti-parallel to the other based on certain pairing rules of
molecules.
DNA is first converted into mRNA by a process called
transcription and then mRNA is translated into amino acids
that form proteins. There are 20 amino acids. However, the
translation of DNA to an amino acid involves forming groups
of three bases called codons that correspond to one amino-
acid. Since there are four possible bases (A, G, T and C) and
a codon consists of three bases, a total of 64 possible codons
can be constructed. However, only 20 amino acids exist. As
a result, multiple codes translate to a single amino acid while
some codes act as stop and start signals for the translation
process.
B. Lossy Biological Functions
In essence the DNA to protein mapping is a many-to-one
function. Therefore, obtaining the original sequence of DNA
from the protein is not trivial because of its lossy nature. As
we will see later, a function similar to a trap-door function
may be constructed from this system to obtain the original
sequence of DNA using a key.
In the set of 20 protein amino acids, most have between
two and six different codons from DNA that encode them.
For instance, leucine, a protein amino acid, can be encoded
by six different DNA codons. By this redundancy, it becomes
increasingly hard to determine the exact DNA sequence when
starting at the protein sequence and working backward. Basic
problems cannot always be answered, such as, whether the
same codon is used each time to encode a particular protein
amino acid, or whether there is no such logic. As the protein
sequence gets longer the complexity eventually diverges since
a sequence of nleucines has 6npossible DNA formations.
This assumes a uniform probability of mapping a codon
to its corresponding protein amino acid. In [22], codon-use
frequency tables have been created which shows that the
translation process varies by organism.
II. THE PROPOSED ALGORITHM
Although each step is discussed in detail below, we provide
Figure 1 to summarize each step of the encryption and
decryption steps. In addition, we provide Table I to help the
reader keep tract of the information as it transitions between
the steps.
A. Phase One
1) Latin Squares: Latin squares are extensively discussed
in [20]. A Latin square is an nby nmatrix filled with nunique
symbols such that no symbol occurs more than once in any
row or column. The Latin square has elements on the top row
322
Fig. 1. The flow chart of the encryption and decryption phases.
Phase Stage Description Sequence
One PT Plain text written in a language (here, English) A, B, C
PtDNA Binary PT encoded into DNA CAA CCA AGC AAT
keyDNA The sequence of DNA which is used by the
Latin square to encrypt PtDNA
AGC TTT TCA TTC
CtDNA Cipher text in DNA form having completed the
Latin square of phase one
TGC GGT TTT TTG
Two CtProtein Amino acids, A translated version of CtDNA [‘C’, ‘G’, ‘F’, ‘L’]
CtFinal Text version of the encoded amino acids and
triplets
[(‘0001011’,‘1’), ...,]
CT CtFinal in binary format containing the protein
encoding followed by the corresponding triplet
codes.
[00010110010...,
011011110101...]
TABLE I
AS
UMMARY OF STEPS.THE PLAIN TEXT MOVES THROUGH PHASES ONE AND TWO BEFORE BECOMING THE CIPHER TEXT.
and left-side column which, at their intersection points, yield a
specific element. Although any permutation of the Latin square
using the four bases of DNA {A,C,G,T}can be used as long
as the same square is used for both encryption and decryption
operations, we designed our Latin square for this study using
a rotational ordering as shown in Table II. The notation of
A C G T
Aa c g t
Ct a c g
Gg t a c
Tc g t a
TABLE II
THE QUASIGROUP TABLE USED.
the Latin square is the following: each cell of a Latin square
is written using a coordinate system of three members: {row,
column,symbol}. The notation begins at the top, left-most
cell and finishes at the bottom, right-most cell, covering the
rows similarly in-between. Our own Latin square of Table II is
therefore written: {(1,1,a), (1,2,c), (1,3,g), (1,4,t), ..., (4,1,c),
(4,2,g), (4,3,t), (4,4,a)}.
2) Sequence Encoding: In keeping with the biological
concepts of this study, we mapped the English language plain
text into a DNA form as done in Pedersen et al. [25]. For this
effort, we imply that the data is manipulated as DNA, using
modeled biological functions and processes. To create this
unambiguous sequence of DNA from the plain text, the plain
text was first encoded in its binary form. Since the English-
language PT characters and their punctuation can be encoded
by a length-8 binary word, the concatenation of these words
creates a single long binary sequence. Moving in pairs down
this binary sequence, we assign DNA bases to the encountered
pairs as shown in Figure 2. In this way, we obtained the plain
text in a DNA form (PtDNA) for the entire English language
sequence.
At this point, we have only converted the PT into a PtDNA
by an encoding procedure. It should be stressed here that
encoding is the action of mapping the alphabet of the message
to another via an arbitrary, but logical, one-to-one function.
The logic behind this mapping is not hidden as it is in
encryption.
3) Choosing the Encryption Key - part 1: The KeyDNA
is a fragment of biologically relevant (or even synthetically
323
Fig. 2. The process to convert PT into PtDNA. The plain text is encoded
to binary words of length eight. This binary sequence is read a pair at a time
and each is encoded by the correlating DNA bases.
created) DNA whose length is equal that of the PtDNA.
This fragment can be obtained from Genbank [6] or similar
databases such as Ensemble [9] or SwissProt [15]. The same
strand of keyDNA must be used by both the encryption and
the decryption stages. We suggest using the biological DNA
of some organism having an established genome in a public
database. If a key from non-biological or synthetic DNA is
desired, then this sequence must be transferred to the receiving
party, which creates concerns of how the exchange will be
made. It is therefore easier to construct the key from a genome
from a database which both the sender and the receiver can
access.
Once a DNA sequence is chosen, we can extract the
encryption key from it using one of the two techniques. The
first method is the inclusive base-to-base where the key begins
at some randomly chosen base in the genome and includes
all bases for a distance of the necessary length. Another key
selection method is the periodicity of the nth base system,
where an arbitrarily base is selected and each nth base after
this location is appended to create the key of the desired length.
Since selecting the nth base in the natural DNA to create a
KeyDNA sequence likely removes base structure, the resulting
key may not resemble actual DNA. If it is desirable that the
key retain some biological structure, then we recommend the
inclusive base-to-base method instead to create the KeyDNA
as the chances of including some basic structure are better
than employing the periodicity of the nth base method.
For this study, our KeyDNA was arbitrarily created from
the genome of Escherichia coli (LOCUS: NC 008253) which
is publicly available from Genbank [6]. Although we could
have chosen any point for the start, we chose the first base
(position 1) of the genome, for simplicity, using the inclusive
base-to-base method to create a key sequence of length of 12
– the length of our PtDNA.
4) Encryption: After the KeyDNA has been created, we
aligned both strings (the PtDNA and the keyDNA) to achieve
a pairing of their bases at each position (i.e., PtDNAiAND
KeyDNAi, for 0≤i≤the length of the PtDNA). Taken
together, each position gives a base from the KeyDNA and
one from the PtDNA, as described in Figure 3. When using
the Latin square, there are rows and columns for which the
keyDNA and PtDNA sequence data must be applied. In our
study, we arbitrarily chose to use the rows for the key data and
the columns for the PtDNA data. We note: the Latin square
on the receiving end must apply the same sequence data to
the same row and column for the decryption to work.
To encrypt the data using the Latin square, we located the
keyDNA base-character in its left-most column. We then found
the intersection of the column containing the PtDNA base-
character (found in the top row). This intersection between
the KeyDNA (row) and the PtDNA (column) is the cipher
text base-character as illustrated in Figure 4.
Fig. 3. The application of the Latin square to the PtDNA and the KeyDNA.
This step serves to encrypt the information by recombination in function of
two input-sequences.
Fig. 4. Encryption using the Latin square. Here the KeyDNA character is
‘A’ from the left column and the plain text character is ‘C’ of the top row.
At the intersection of these two characters is the cipher text character ‘t’.
B. Phase Two: Translation of DNA to Protein Code
According to the central dogma of biology, DNA is encoded
into RNA to be translated into protein code. In the literature,
it is well known that there is much redundancy in the triplets
which encode a particular protein amino acid. In addition, it
is also understood that organisms have varying habits of how
they encode proteins from triplets. In this phase, sEncrypt
converts CtDNA to protein amino acids (CtProtein) to further
conceal the original PT using natural biological processes.
CtProtein is then converted to binary using efficient Huff-
man codes to make CtFinal (explained below). We use Huff-
man codes for efficient transmission of encrypted messages.
1) Huffman Codes from Triplet Frequencies: Each listed
organism appears to have a unique preference of triplet usage
to build its proteins during translation. From these preferences,
frequency distributions have been derived [22] which we
applied to constructing Huffman codes [14]. These prefix
codes are further optimized as they are able to be read without
the need of a delimiter separating them.
324
Fig. 5. The transformation of DNA code to Huffman binary codes. Here
we coded these triplets and their amino acids according to the codon usage
frequencies of Bacillus phage PBS2. We note that the final product of phase
two contains two levels of code: the amino acid from the translation and its
exact RNA triplet. Since there are redundant triplets encoding the same amino
acid, coding the triplets all together make lengthy codes. To maintain shorter
codes, we made sets of triplet codes, corresponding to each unique protein
amino acid.
To translate our CtDNA, we arbitrarily chose an organism,
Bacillus phage PBS2 from which to use these frequencies to
create codes. A casual glance will show that the triplet, UUA for
leucine (L) is the only triplet used to create this amino acid
by this organism. However, in a related organism, Bacillus
phage SP82,UUC has the highest frequency of use for this
protein amino acid. In using these frequencies to make codes,
the task of determining the original DNA from our protein
sequence is further complicated without the knowledge of the
exact organism for its frequencies.
Each amino acid was given a Huffman code to record the
exact sequence of protein amino acids. For each protein, the
sum of its associated triplet frequencies was used to create
its Huffman code. For example, according to Bacillus phage
PBS2, proline is created by the following triplets and their
associated frequencies (i.e., probabilities): {P(‘CCA’) = 0.353,
P(‘CCT’) = 0.118, P(‘CCC’) = P(‘CCG’) = 0.0 }. We ranked
proline’s frequency by the following: Rank(proline)=
m
j=0 freq(cj)for each of the mtriplets, associated to an
amino acid, c. Therefore, Rank(P)=0.353+0.118+0+0 =
0.471. Each protein amino acid was treated similarly.
2) Encoding Triplet Codes By Protein Amino Acid Codes:
During translation, we split up CtDNA into 3-mers. Each
triplet group of the sequence was converted to RNA and then
translated to its protein amino acid by a biological codon table.
To make a new binary sequence, each protein amino acid was
Huffman encoded according to codes prepared by the codon
use frequencies of Bacillus phage PBS2. Simply having the
knowledge of a protein amino acid is insufficient information
to return to the original sequence of RNA or DNA. The exact
triplet data must be used and so we kept a record of these
triplets. All triplets, corresponding to each protein amino acid,
were encoded as a set according to frequency data. Each triplet
code must be written with the knowledge of its own protein
amino acid to avoid confusion with the same arbitrary code
which is associated with a difference amino acid.
The CtFinal contains two binary sequences, one for the
protein amino acids and another for their triplets. To decode
the triplets, the same codon-usage table must be used to
reconstruct the protein and triplet codes. Since each protein is
encoded using a prefix code, the string can be read in absence
of code delimiters. This is also the case for the triplet codes
but they can only be prefix-free codes once their corresponding
protein code (their code set) is known. Therefore, to decode
this string, the triplets are decoded in function of the protein
codes which are read first. We include a summary of phase 2
in Figure 5.
Since CtProtein will likely be sent over a computer network,
it would be convenient to have the data in a file-format. If we
were to save the file as a simple text file, then the size of the
file would soon become large but would be reduced in size
when in a binary format. To prepare the binary, the binary
sequences of CtFinal were split into length-8 words which
were written to a binary file (CT).
C. Decryption
When the cipher text CtFinal has been decoded, the work
involving Huffman codes of phase two is undone. Here, we
return to the CtDNA which is the encrypted sequence from
phase one. To decrypt this sequence and obtain plain text
in DNA (PtDNA), we apply the Latin square in the reverse
direction using the KeyDNA. The KeyDNA and the CT
sequences are aligned to locate the base pairings by position in
the sequences. For each position, the KeyDNA base is found
in the left-most row. The CT base is then found along this
row and the PtDNA character is the entry at the top of this
column. Figure 6 describes how this method is performed
using a KeyDNA and CT base. This concludes the encryption
and decryption steps of phases one and two of the sEncrypt
framework.
Fig. 6. Decryption using the Latin square. Here the KeyDNA character is
‘A’ from the left column and the cipher text character is ‘t’. At the top of the
column is the plain text character is ‘C.
III. RESULTS AND DISCUSSION
When encrypting data, the resulting cipher text must be
made to look as random as possible to defeat any statistical
tests which work to break the CT code. Below we discuss
how we measured the randomness of the CT, compared to the
English language PT.
325
A. Entropy
Shannon’s entropy [34] is a measure the predictability
of information by the frequency of content occurrence. We
measured the unpredictability of characters in the PT and CT
text sequences using normalized entropy, bounded by zero and
one for low or high randomness of sequence data, respectively.
For a set of probabilities Psuch that pi∈Pfor 0≤i≤m
and m
i=1 pi=1, Shannon’s entropy is defined as, h(P)=
−m
i=1 pi∗log2pi. The upper bound of entropy (i.e., complete
unpredictability of the mcharacters) is reached when the m
frequencies are identical (i.e., p1=p2=... =pm). Mathe-
matically, this upper bound can be written, hmax =log
2(m).
We define normalized-entropy, hnorm =h(P)/hmax, which
we used to compare the frequencies of characters occurrence
each PT and the corresponding CT. This measurement was
applied in the same style by Minosse et al. [21].
To test the randomness of sEncrypt’s CT data, we chose PT
data which was made up of about 500 to 3000 characters of
the following kinds of arbitrary text: a data table, a fragment
of biological gene code, a sample of legal text (an end-
user agreement), a piece of poetry (Alfred Tennyson), a news
article, a piece of prose (Conan Doyle’s, The Red Headed
League), a paragraph of random words and a technical abstract
(one of the papers in the references).
Figure 7 illustrates the entropy scores for each PT and
its corresponding CT. We note that the normalized entropy
for the PT of each text was between 0.65 and 1 (the upper
bound). Although genetic code for making protein is highly
structured [4], [28], [30], [36], the structure of our sample was
not apparent. As we expected, the CT of each of our samples of
text obtained maximum entropy scores after being processed
by sEncrypt. We recall entropy scores, approaching the upper
bound, imply that the individual frequencies of elemental oc-
currence are similar. Thanks to the Huffman encoding process,
these similar elemental frequencies are important for thwarting
statistical attacks which exploit their differences.
Fig. 7. Raising entropy in CT from PT forms. We note an incline in entropy
from sEncrypt. The Gene code was the only information type that already
had high entropy when PT form. There was an entropy increase of all other
forms of information we tested.
B. Autocorrelation Evaluation
It is undesirable to have large fragments of repetition in a
cipher text file. To detect repetitive sequence data in the cipher
text we performed an autocorrelation test. Here, we utilized
the dottup tool, available from Emboss [29] which displays
a word match dot-plot of two sequences on each axis. The
diagonal line indicates that the sequence characters were the
same at each position. Any deviation would be expressed as
a mark away from the diagonal line. In Figure 8 we show the
output of the poetry text which has no repetition except for a
few random marks. We obtained similar results from the other
tests.
Fig. 8. Autocorrelation of the poetry CT sequence was tested by dottup from
Emboss. We note some tiny regions where the cipher text has repetition but
these are too small to be significant.
C. General File Reductions
When we converted a sequence of English plain text to
one of DNA via binary, the resulting sequence length was
dramatically increased. This was because each character of
the original text had to be first converted to a length-8 binary
word from which pairs corresponding to DNA bases were
created. There were three DNA bases for each character of
English language text. In phase two, the Huffman encoding
was efficiently constructed according to an arbitrarily chosen
organism and additional bulk was added to the sequence
information of the CtFinal file size. When the sequence data
from this file was split into length-8 words to be saved to a
file in binary, we noted a significant drop in size as noted in
Figure 9. Interestingly, we note that only the genetic data had
the largest CT file size as a binary file which was likely an
observation connected to its high entropy value from Figure 7.
While sequences of high entropy are generally challenging to
compress, [16], [17], [26], our method is able to significantly
reduce the size of the file’s binary version.
IV. CONCLUSION AND FUTURE WORK
We’ve proposed an encryption algorithm that utilizes a
structure similar to biological process of conversion of DNA
to proteins. The key is formed from two pieces of information,
the identity of the organism whose DNA was chosen in phase
1 and the identity of the organism whose DNA codon to
326
Fig. 9. There is a significant reduction in the file size containing the CT
when saved in binary.
amino acid mapping table was used in phase 2. As a result,
we make use of the already present and accessible tera-
bytes of public DNA information. The security arises from
the fact that given millions of organisms in the databases,
without the correct identities of the right organisms, the
reverse mapping is very difficult. Further, we note that the
actual encryption sequences arise from nature and not from
a computer algorithm. Therefore, without the correct DNA
sequence and amino acid mapping, cryptanalysis of the cipher
text is very nearly impossible.
In the future, we intend to further analyze sEncrypt’s
algorithmic complexity, strength of encryption and determine
the amount of computational power and time necessary to
break the codes. In addition, we will study the subtle changes
in informational content between the stages of biologic data
(PtDNA, CtDNA, and CtProtein) of different organisms. We
also plan to compare our algorithm to some of the standard
encryption algorithms such as those mentioned in the intro-
duction.
REFERENCES
[1] R. Anderson, E. Biham, and L. Knudsen. Serpent and smartcards. In
Smart Card Research and Applications, pages 246–253. Springer, 2000.
[2] M. Battey and A. Parakh. An efficient quasigroup block cipher. Wireless
Personal Communications, pages 1–14, 2012.
[3] M. Battey and A. Parakh. Efficient quasigroup block cipher for sensor
networks. In Computer Communications and Networks (ICCCN), 2012
21st International Conference on, pages 1–5, 2012.
[4] L. Beese, V. Derbyshire, T. Steitz, et al. Structure of DNA polymerase
i klenow fragment bound to duplex dna. Science (New York, NY),
260(5106):352, 1993.
[5] C. H. Bennett and G. Brassard. Quantum Cryptography: Public Key
Distribution and Coin Tossing. In Proceedings of the IEEE International
Conference on Computers, Systems and Signal Processing, pages 175–
179, New York, 1984. IEEE Press.
[6] D. A. Benson and al. al. al. al. Genbank. Nucleic acids research,
39(Database issue):D32–D37, Jan. 2011.
[7] J. Daemen and V. Rijmen. The Design of Rijndael. Springer-Verlag
New York, Inc., Secaucus, NJ, USA, 2002.
[8] W. Diffie and M. Hellman. New directions in cryptography. Information
Theory, IEEE Transactions on, 22(6):644–654, 1976.
[9] P. Flicek, M. Amode, D. Barrell, K. Beal, S. Brent, D. Carvalho-Silva,
P. Clapham, G. Coates, S. Fairley, S. Fitzgerald, et al. Ensembl 2012.
Nucleic acids research, 40(D1):D84–D90, 2012.
[10] A. Gehani, T. LaBean, and J. Reif. Dna-based cryptography. Aspects of
Molecular Computing, pages 34–50, 2004.
[11] D. Gligoroski. Candidate one-way functions and one-way permutations
based on quasigroup string transformations. arXiv preprint cs/0510018,
2005.
[12] D. Heider and A. Barnekow. Dna-based watermarks using the dna-crypt
algorithm. BMC bioinformatics, 8(1):176, 2007.
[13] D. Heider and A. Barnekow. Dna-based watermarks using the dna-crypt
algorithm. BMC bioinformatics, 8(1):176, 2007.
[14] D. Huffman. A Method for the Construction of Minimum-Redundancy
Codes. Proceedings of the IRE, 40(9):1098–1101, Sept. 1952.
[15] E. Jain, A. Bairoch, S. Duvaud, I. Phan, N. Redaschi, B. Suzek,
M. Martin, P. McGarvey, and E. Gasteiger. Infrastructure for the life
sciences: design and implementation of the UniProt website. BMC
Bioinformatics, 10(1):136+, 2009.
[16] Z. Ji, J. Zhou, Z. Zhu, and S. Chen. Self-configuration single particle
optimizer for DNA sequence compression. Soft Computing-A Fusion of
Foundations, Methodologies and Applications, pages 1–8, 2012.
[17] S. Kuruppu, B. Beresford-Smith, T. Conway, and J. Zobel. Iterative dic-
tionary construction for compression of large DNA data sets. IEEE/ACM
Transactions on Computational Biology and Bioinformatics (TCBB),
9(1):137–149, 2012.
[18] A. Leier, C. Richter, W. Banzhaf, and H. Rauhe. Cryptography with
DNA binary strands. Biosystems, 57(1):13–22, 2000.
[19] L. Levin. The tale of one-way functions. Problems of Information
Transmission, 39(1):92–103, 2003.
[20] H. B. Mann. The construction of orthogonal latin squares. The Annals
of Mathematical Statistics, 13(4):418–423, 1942.
[21] C. Minosse, S. Calcaterra, I. Abbate, M. Selleri, M. S. Zaniratti, and
M. R. Capobianchi. Possible compartmentalization of hepatitis c viral
replication in the genital tract of hiv-1-coinfected women. J Infect Dis.,
194:1529–1536, 2006.
[22] Y. Nakamura, T. Gojobori, and T. Ikemura. Codon usage tabulated from
the international DNA sequence databases: status for the year 2000.
Nucleic acids research, 28:292, 2000.
[23] A. Parakh. A probabilistic quantum key transfer protocol. Security and
Communication Networks, 2013.
[24] A. Parakh and S. Kak. Online data storage using implicit security.
Information Sciences, 179(19):3323–3331, 2009.
[25] J. Pedersen, D. Bastola, K. Dick, R. Gandhi, and W. Mahoney. Blast
your way through malware malware analysis assisted by bioinformatics
tools. International Conference on Security and Management 2012.
[26] A. Pinho, D. Pratas, and S. Garcia. Green: a tool for efficient
compression of genome resequencing data. Nucleic Acids Research,
40(4):e27–e27, 2012.
[27] D. Prabhu and M. Adimoolam. Bi-serial DNA encryption algorithm
(bdea). arXiv preprint arXiv:1101.2577, 2011.
[28] J. R¨
adler, I. Koltover, T. Salditt, and C. Safinya. Structure of DNA -
cationic liposome complexes: DNA intercalation in multilamellar mem-
branes in distinct interhelical packing regimes. Science, 275(5301):810–
814, 1997.
[29] P. Rice, I. Longden, A. Bleasby, et al. Emboss: the european molecular
biology open software suite. Trends in genetics, 16(6):276–277, 2000.
[30] T. Richmond and C. Davey. The structure of DNA in the nucleosome
core. Nature, 423(6936):145–150, 2003.
[31] R. L. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital
signatures and public-key cryptosystems. Commun. ACM, 21(2):120–
126, Feb. 1978.
[32] B. Roy, G. Rakshit, P. Singha, A. Majumder, and D. Datta. An improved
symmetric key cryptography with DNA based strong cipher. In Devices
and Communications (ICDeCom), 2011 International Conference on,
pages 1–5. IEEE, 2011.
[33] B. Schneier et al. The twofish encryption algorithm: a 128-bit block
cipher. New York: J. Wiley, 1999.
[34] C. Shannon, W. Weaver, R. Blahut, and B. Hajek. The mathematical
theory of communication, volume 117. University of Illinois press
Urbana, 1949.
[35] A. S. Tanenbaum. Computer networks, ch. 5, 1996.
[36] Q. Xia and X. Qiu. Sequence and structure dependent DNA-DNA
interactions. Biophysical Journal, 102:636, 2012.
327