ArticlePDF Available

Abstract and Figures

p> Information security and confidentiality are the prime concern of any type of communication. Rapidly evolution of technology recently, leads to increase the intruder’s ability and a main challenge to information security. Therefore, utilizing the non-traditional basics for information security is required, such as DNA which is focused as a new aspect to achieve better security. In this paper, a survey of more recent DNA based on data hiding algorithms are covered. With particular emphasis of different parameters several data hiding algorithms based on DNA has been reviewed. To present a more secure an efficient data hiding algorithms based on DNA for future works, this willbe helpful. </p
Content may be subject to copyright.
A Simultaneous Approach for Compression and
Encryption Techniques Using Deoxyribonucleic Acid
Dilovan Asaad Zebari Habibollah Haron
Diyar Qader Zeebaree
Azlan Mohd Zain
School of Computing Department of Computer School of Computing School of Computing
F
acult
y
o
f
En
g
ineerin
g
Science
F
acult
y
o
f
En
g
ineerin
g
Facult
y
o
f
En
g
ineerin
g
University Teknologi Faculty of Computing
University Teknologi University Teknologi
Malaysia Universiti Teknologi Malaysia
Malaysia
Malaysia
Johor, Malaysia Johor, Malaysia Johor, Malaysia Johor, Malaysia
damzebari2@live.utm.my habib@utm.my dqszeebaree2@live.utm.my azlanmz@utm.my
Abstract— The Data Compression is a creative skill which defined
scientific concepts of providing contents in a compact form. Thus, it
has turned into a need in the field of communication as well as in
different scientific studies. Data transmission must be sufficiently
secure to be utilized in a channel medium with no misfortune; and
altering of information. Encryption is the way toward scrambling an
information with the goal that just the known receiver can peruse or
see it. Encryption can give methods for anchoring data. Along these
lines, the two strategies are the two crucial advances that required
for the protected transmission of huge measure of information. In
typical cases, the compacted information is encoded and transmitted.
In any case, this sequential technique is time consumption and
computationally cost. In the present paper, an examination on
simultaneous compression and encryption technique depends on
DNA which is proposed for various sorts of secret data. In
simultaneous technique, both techniques can be done at single step
which lessens the time for the whole task. The present work is
consisting of two phases. First phase, encodes the plaintext by 6-bits
instead of 8-bits, means each character represented by three DNA
nucleotides whereas to encode any pixel of image by four DNA
nucleotides. This phase can compress the plaintext by 25% of the
original text. Second phase, compression and encryption has been
done at the same time. Both types of data have been compressed by
their half size as well as encrypted the generated symmetric key.
Thus, this technique is more secure against intruders. Experimental
results show a better performance of the proposed scheme compared
with standard compression techniques.
Index Terms—Security, Compression, Encryption, DNA,
Standard Compression Techniques.
I. INTRODUCTION
Due to the expansion in transmission of data through the Internet
and its constrained transfer speed, and time taken by the data to
achieve the goal additionally expanded [1]. Data compression is
one of the most far reaching applications in computer technology
which proves to be useful in this scenario as it lessens the asset
use, for example, data storage space or the capacity of
transmission [2]. The underlying feature of data compression is to
change over a series of characters into another arrangement of
characters which comprises of same data however whose length
is as little as could be allowed [3]. Compression is thinkable in
light of the reality that the large priority of this present fact data is
exceptionally excessed and can decrease the measure of data by
expelling pointless data and deception to store and transmit the
required bits of data [4]. Data compression has turned into a need
while handling data that possesses colossal memory [5]. The
conspicuous
advantages that can be accomplished incorporate lessening
stockpiling necessity, data transfer capacity, encoding fewer
bits, less time requests for transmission, compelling channel
usage and so on [6]. Algorithms and techniques that are
utilized depended firmly on the kind of data, for example,
regardless of whether the data is static or dynamic, and on the
substance, that can be any mix of content, images, numeric
data or unhindered binary data [2]. Different space-time
exchanges, for example, compression ratio, the time taken for
encoding and decoding, measure of storage space required and
so forth can be determined for each compression method [1].
In alike way, decompression is an opposite technique to
compression which gives back the data to its one of kind
structures as it is shown in Figure 1.
Output
Input Compression
Figure 1: Compression and Decompression Process
In theory, compression and cryptography are two
contradicting techniques. Cryptography techniques can give a
protected method to exchange secure data. Enhancing the
security and dependability of data can be received dependent
on the nature of DNA [12]. Conversely, a compression
technique tries to decrease the extent of transmit or put away
data by discovering and omitting duplicate parts patterns of
data. Nevertheless, data compression and cryptographic
framework are profoundly associated and commonly helpful
that they are fit for being utilized together. The points are to
create a littler size of data; to guarantee a quality of data amid
reconstruction; to accelerate data transmission; to decrease
transfer speed prerequisite, and to guarantee its condition [22].
Utilizing a data compression technique together with an
encryption technique, in the right request, bodes well for three
reasons. To begin with, compression data before encryption
diminishes the redundancies that can be exploited by
cryptanalysts to recoup the first data. Second, the encryption
procedure works faster after compression. Third, whenever
encrypted data are transmitted in a computer system, the
transmission capacity is better used. Information must be
compressed before encryption. On the off chance that it was
the contrary case, the aftereffect of the cryptographic task
would be indecipherable data and no pattern or redundancy
would be available, prompting exceptionally poor or no
compression by any means [7].
Recently, Deoxyribonucleic Acid (DNA) assumes an
incredible job in enhancing applications by means of joining
978-1-7281-2741-5/19/$31.00 ©2019 IEEE
with computer science. In this paper, we proposed a hybrid
algorithm among compression and encryption for various data
which are plaintext and image based upon DNA.
A. Data Compression Techniques
Essentially data compression is defined as a method which can
be used in several methods to decrease the size of different data
such as audio, image, video, and text in order to decrease the size
of memory as well as the transformation time over the network.
In general, lossless and lossy are considered as two main types of
compression techniques [8]. Lossless compression is the process
of converting the original data with compressed data becomes
more concise without reducing the loss of information [9].
Huffman coding, and Shannon-Fano coding, Run Length
Encoding (RLE), Arithmetic encoding, and Lempel Ziv families
which are LZ77 and LZ78 are different lossless algorithms.
These algorithms are typically used for text also can be used for
programs, images and sound. The major benefit of this type could
be preserved the quality whilst the size of data could be reduced
less than lossy compression [8]. Lossy compression algorithms,
on the other hand, remove unnecessary data permanently and so
the original data couldn’t be completely regenerated. Several
techniques are used for lossy compression such as DCT, DWT,
Rectangle Segmentation, Transform coding and Sparse Matrix
Storage (RSSMS). This type of compression is utilized for sound,
image, and video despite of no preserving the quality, but it can
cause big reduction in size [10].
B. DNA Background
DNA is a molecule that represents the genetic material for
all living organisms. It is the information carrier of all life
forms and considered as the genetic drawing of living or
existing creatures. DNA consists of tow chains which are
twisted around each other to compose a double- stranded helix
with four different nucleotides on the inside [11]. A single
strand of DNA molecule consists of four chemical bases
representing its building blocks, named as nucleotides. Each
nucleotide consists of a phosphate, a sugar, and a nucleotide
together. Nitrogenous base causes to initialize types of
nucleotide which are purine or pyrimidine base. Guanine (G)
and Adenine (A) are considered as purine bases whereas
cytosine (C) and thymine (T) are pyrimidines bases, where G
pairs with C and T pairs with A. Any DNA sequence is
represented by a combination of these 4 bases, encoding the
genetic information. Each three nucleotides are called a codon
there are 64 codons since there are (43) letter combinations.
Finally, amino acid can be created by codon translation. The
structure and function of protein would be dictated after
arrangement of the amino acids [29].
II. LITERATURE REVIEW
By seeking out previous works; the procedure type of
literature works is done of lossless data compression and
hybrid data compression encryption mapped. Large amount of
data must be stored and handle in such a way using efficient
algorithm usually succeed in compression. In literature, many
compression techniques are proposed for compress large data
[13]. Entropy and dictionary are the two main techniques which
have been done in the literature. Entropy based several
techniques have been proposed; for example; Huffman coding
[2], Modified Huffman Coding [5], RLE [15], improved RLE
[17], Shanon-Fano coding [14], and Arithmetic coding by
exploiting the statistical allocation of input data and
implementing the variable length encoding [18]. LZ77 and LZ78
are data compression techniques based on dictionary. This type
compression of data can be achieved by substituting duplicated
occurrence of data with indicates to elements in the dictionary
[19]. Utilizing only one technique of both types is called single
compression techniques. These techniques may have benefits in
some specific bit-streams while cannot consider as suitable for all
streams. Thus, a presumption has been made which can make a
combination from these two types and utilize them in a right
order can collect their benefits and enlarge the suitable range. In
[20] Lempel Ziv Welch (LZW) is an algorithm from LZ78 family
and a combination has been made with RLE where the results
showed that this combination obtained better results than single
RLE, single LZW.
Because of enlarging need for data transforming safely and
quickly, researches on data security via both cryptography and
compression algorithms begin to take form. Based on the
processual sequences the combination of these two algorithms
can be divided into three categories which are a cryptography
algorithm followed a compression algorithm, vies versa, and both
algorithms used in a single process [22]. The first category has
been done in [21] which focused on mobile communication
security. At first step, Elliptic Curve technique is utilised for
encryption whereas Shannon-Fano is utilized for compression at
the second step. Low efficiency on different points has been
showed which leads resultant into negligence. Several algorithms
have been proposed based on second category [23] introduced an
optimized technique for text data. Huffman compression is used
at first stage and combined with recently improved symmetric
cryptographic techniques.
In [24] data compressed by half at the first step and at the
second step Shalloon's notion has been justified of diffusion by
producing various cipher text character for single plaintext
character for its different occurrences in the plaintext. in [25]
Arnold transform, key matrix and chaos system are used to
introduce a hybrid technique for image. In order to comprise and
encrypt the image is decomposed into four bands. First the
duplicated data has been eliminated to achieve the compression
while chaos system with Arnold transform are applied for
encryption. In normal cases, the first two categories are done
which are called sequential methods, but this extremely causes
increasing the time complexity. Thus, several simultaneous
methods have been proposed in which both techniques
compression and encryption can be used at the same stage which
is far better than sequential methods [1]. A simultaneous
technique is introduced that is based on Cosine Number
Transform [26]. This technique is achieved by using a key in the
measurement matrix generation based on chaotic map. This
technique can be resistant to statistical analysis also an effective
compression is provided. In [1] a hybrid compression –
encryption algorithm is introduced for ratio compression and
time. Based on Arithmetic coding and XOR both compression
and encryption is done respectively. A simultaneous technique
for both data compression and encryption based on DNA is
proposed here.
III. PROPOSED SCHEME
The main purpose of the proposed scheme is to ensure
security and capacity for secret data. This scheme has been
applied on different secret data which are text and image. Our
proposed scheme is based on achieving compression and
encryption based on DNA. The proposed simultaneous
scheme consists of two phases. Each character of secret text
and image pixel represents DNA nucleotides at first phase.
Each DNA nucleotide represents a binary bit with generating
a secure symmetric key at the second phase. The simultaneous
compression of the proposed scheme has been explained in
more detail in the following sections.
A. First Phase
The compression consists of two phases as we mentioned
above. The first phase is encoding the secret data to DNA by
mapping each character of secret text and each pixel of image
to DNA nucleotides. Each DNA nucleotide represents two
binary bits as it is shown in Table I. Because of representing
each text character and image pixel by eight binary bits, each
of them represents four DNA nucleotides. Some previous
works [27] encoded each text character by three DNA
nucleotides. The drawback of their work is that secret text
should contain only (26 capital letters, 26 small letters,
numbers from 0 to 9, space, and dot) which cannot contain
any other punctuation marks such as (+,?, $, ... , etc).
Therefore, their algorithm cannot encode a secret text which
contain punctuation marks. Due to the importance of having
punctuation marks in secret text, in this phase we improved
their work by extending of using other punctuation marks
where the secret text can contain (26 capital and small latter’s,
numbers from 0 to 9, space, dot, and 26 different punctuation
marks). Based on 64 different codons of DNA we generate a
permutation which works on mapping from each text
character to three DNA nucleotides.
Table I: DNA Encoding Rule
Decimal DNA
Binary
0
A
00
1
C
01
2
G
10
3
T
11
This Algorithm works to generate 64 numbers, each of this
number consists of combination of three different decimal
numbers of the range between 0 to 3, total combination
numbers (4
3
equal to 64) is same as DNA codon numbers.
Consequently, each DNA codon can represent a generated
combination of decimal numbers. Each number of the
generated serial of decimal numbers can represent a DNA
nucleotide or two binary bits depending on Table I. Then, in
this phase we can describe a 1-1 map from all possible
combinations between these 64 numbers and (26 capital and
small latter’s, numbers fro m 0 to 9, space, dot, and 26
different punctuation marks). This map is known for both
sender and receiver and it should keep secret. For example, at
first generation, this Algorithm will generate combinations of
decimal numbers (000, 213, …, 333), these combinations can
be encoded to DNA nucleotides and binary bits based on Table I,
where (0 = A = 00, 1 = C = 01, 2 = G = 10, 3 = T = 11). In our
improved method each of the same capital and small letters can
represent same combination of decimal numbers such as (A or a
represents 000, H or h represents 013). This can be encoded
without any problem, but during decoding the algorithm decoded
to small letter where capital letters disregarded always. Thus, to
overcome this problem we considered each name should be
written in double brackets and after each dot the sentences must
be start with capital letter. In this case, our algorithm can encode
each letter of secret text by 6 -bits instead of 8-bits. Furthermore,
this algorithm can encode a secret text which contains 26
different punctuation marks instead of encoding a secret text
without any punctuation mark. For black-white images there are
256 pixels where consists from 0 to 255, each pixel can represent
by 8 binary bits. Every two bits represent a DNA nucleotide
based on Table I. Based on this encoding method each 4 pair of
image pixels represent a four DNA nucleotide. For example,
pixel 0 is equal to AAAA, pixel 182 is equal to GTCG, pixel 255
is equal to TTTT, and so on. Because (52 capital and small
letters, numbers from 0 to 9, space, dot, and 26 punctuation
marks are equal to 90), proposed scheme can denote capital and
small letters by same DNA codon. Thus, each character of
plaintext can be converted into 3 DNA bases, but each image
pixel cannot convert into 3 DNA bases because image contain
256 different pixels. As a result, in this phase we compressed
secret text with %25 by representing each letter by 6- bits
whereas image is converted into DNA form as it is shown in
Figure 2.
Figure 2: Block Diagram of First Phase
B. Second Phase
In this phase, the researcher proposed a method for further text
compression and to compress image by it is half. We consider
this phase is a data compression and encryption at the same time.
In this phase, the obtained form of secret data from first phase
will be used as input which represented by a DNA sequence such
as, ACAGTAC. In normal cases, each DNA nucleotide represents
by two binary bits. Our proposed method can represent each
DNA nucleotide by only one binary bit with generating a
symmetric key. In the first step, the DNA sequence will be
divided into two segments, in case of having a unique nucleotide
at the end after segmentation an (A) will be added which is value
is equal to 0. With four possible nucleotides, the two nucleotides
can give 4
2
= 16 different possibilities (segments), and the
frequency of each segment will be calculated. In the second step,
a 4 * 4 matrix will be generated as it is shown in Figure 3. Then,
the four highest frequencies will set into (00, 01, 10, 11) fields,
the second four highest
frequencies will set into (20, 21, 30, 31) fields, the third four
highest frequencies will set into (02, 03, 12, 13) fields in order
to increase the number of (0’s and 1’s), and the four lowest
frequencies will set into (22, 23, 33, 32) fields during
conversion. Security also is obtained in this step because it is
hard to know by attackers which segment is settled on which
field. After that, the DNA sequence will be converted into a
serial number of ranges between 0 to 3 based on generated
matrix. This matrix is known for both sender and receiver and
kept secret. Each number of the obtained sequence number
represents two binary bits. In the third step, we represent each
number of new forms by only one bit with keeping all (0’s
and 1’s) as same whereas each 2 will be converted by 0 and
each 3 will be converted by 1 with saving the positions within
a file as a symmetric key. Therefore, in this phase the size of
secret data is reduced by it is half size because each DNA
nucleotide represented by only one binary bit. As a result, the
size of secret text is compressed by %62.5 and image by %50
after applying these two phases.
r/c
0 1 2 3
0
hi
g
h
1
hi
g
h
2
hi
g
h
5
hi
g
h
6
1
high
3
high
4
high
7
high
8
2
high
9
high
10
high
13
high
14
3
high
11
high
12
high
15
high
16
Figure 3: Generated 4 * 4 Matrix
To provide further security, the symmetric key has been
encrypted. Generated symmetric key is a file for example n which
holds the position of 2’s and 3’s as we mentioned above. The
basic concept of our proposed method is that the first step is to
select the first value of n, and for the second value we used
subtraction operation between first and second and so on. If n
= R
1
, R
2
, R
3
, …, R
n
, to produce new value of n in this step R
1
will be selected and unchanged; new value of R
2
= R
2
-R
1,
R
3
=
R
3
-R
2,
R
n
= R
n
-R
n-1.
Here, each number will be considered as
four binary bits from 1 to 9, but in case of producing a number
out of 1 to 9 we put between two comma which considered it
is value as 0 because there is no 0 value in n. For example, if n
= {1,2,5,7,17,30,31} after applying this step the new produced
file of n is equal to {1132,10,131}, the general steps of this
phase has been illustrated in Figure 4.
DNA data of first phase
Convert into two segments
Calculate frequency of each segments
Generate a 4*4 matrix
Convert DNA form into serial
numbers in range of 0 to 3
Keep all 0’s and 1’s converts each 2
to 0 and each 3 to 1
Binary data
Generate a symmetric key
Generate new form of key
Further changing the key form
Figure 4: General Diagram of Second Phase
IV. DATASETS AND PERFORMANCE EVALUATION
Our attention will be focused in this section to evaluate the
performance of the proposed scheme based on text and image
datasets. Different corpora’s are composed of a collection of
text files designed specifically as datasets for test applications
lossless compression methods. Seven different files from
different corpora have been selected and tested. Calgary
corpus contains of 18 different files which are totalling about
3.2 million bytes. Four files have been selected from this
corpus which are paper1, paper2, book1, and book2, category
of these files are technical papers, fiction and non-fiction
books. The Canterbury corpus consists of 11 different type of
files totalling about 2.8 million bytes. Three files have been
selected from this corpus which are alice29, asyoulik, and
plrabn12, the category of them are English text, Shakespeare,
and a poetry. lena, baboon, and peppers in quality 128x128
and JPG format has been used as image datasets to evaluate
the performance of proposed method.
Based upon mentioned datasets some commonly used data
compression performance indices, including compression ratio,
bit per character, bit per pixel, and compressed file size, will be
calculated and discussed. The performance of plaintext for the
proposed data compression scheme were compared through their
compression ratio, bit per character, and compressed file size
with other basic data compression techniques such as LZFG, Run
Length Encoding (RLE), Huffman Encoding (HUFF), Shannon
Fano (SF), and Arithmetic Coding (AC). However, the
performance of the proposed scheme for image were compared
through their bit per pixel, and compared with HUFF and AC.
Following measurements to evaluate the performances of
proposed scheme have been reviewed.
Compression Ratio (CR) is utilized to find the ratio between the
original data size and the corresponding compressed data size.
This is to find out how much data gets compression by using
existing scheme, mathematically it is denoted as CR and it is
calculated by following formula 1 [28]. The file size after
processing will be depended by CR. The output files which
compressed more have less CR. The compressed file size is
greater than the initial file size when the CR exceeds one. [17].
CR = compressed size / uncompressed size (1)
Bit Per Character (BPC) means how many bits need to
represent in each character. So, less BPC is better than the
higher BPC which makes the file size smaller because Smaller
files use less space to store.
BPC = number of bits in compressed text / number of
characters in uncompressed text (2)
Also, the compression factor (CF) and saving data (SD)
for text data have been calculated by using equation 3 and
4 respectively.
CF = uncompressed size / compressed size (3)
SD = (1-compression ratio) *100 (4)
Bit Per Pixel (BPP) is the bits are stored per each pixel of an
image. This is considered as the ratio bit numbers in an image
after compression to all pixel numbers in initial image [34]. large
memory space is required to store an Image when BPP is large,
because of many colours in image so BPP should be less.
BPP = number of bits in compressed image / total number
of pixel in the image (5)
File Size (FS): is the size of compressed data file.
V. RESULTS AND DISCUSSION
After the researcher simulated our simultaneous data
compression encryption scheme, the following results were
obtained from the simulation.
A. Plaintext Datasets
The results of applying the proposed data compression
scheme on standard seven text data files are presented in the
Table II. The table shows the size of original text files in KB
also the number of bits before applying the proposed scheme
have been showed. In the field of the compressed data, the
size text files in KB also the number of bits is reviewed after
applying the proposed scheme. As observed from the Table II,
the size of all files and the number of bits continuously
decreased after compression. This means the number of
characters in each file has been decreased. As we see the
number of bits in paper1 was 415,288 bits means 51,911
characters whereas after compression we obtained 154,214
bits which about 19,276 characters. Hence, 32,635 characters
have been omitted from the original file.
Table II: Original and Compressed Plaintext Data
Dataset
Original Data
Compressed Data
File
No. of
No. of
CR
SP
CF
size
bits
bits
ib
109
839848
317558
0.3781
0.6219 2.6435
b
ook1 751
3317427 1248734 0.3764
0.6236 2.6566
b
ook2 597
4761776 1780264 0.3738
0.6262 2.6747
news
369
2936400 1095624 0.3731
0.6269 2.6801
p
aper1
52
415288
154214
0.3713
0.6287 2.6929
p
aper2
81
643744
242090
0.376
0.624
2.6591
p
ro
g
c
39
304992
110572
0.3625
0.6375 2.7583
Previous works accomplished to evaluate the efficiency of any
compression technique are performed having important
parameters. In Table III several experiments conducted and
evaluated. Table shows the results of the compression based on
CR, BPC, and FS for the proposed scheme, and the standards
compression algorithms such as RLE, HUFF, SF, and AC. The
results of the standards compression algorithms obtained from
[17,19,33], then compared with the proposed scheme. According
to the results of [17,19,33], the HUUF, SF, and AC techniques
obtained very similar against standard text files whereas RLE
obtained far results. Here, we can see that, for some of the text
files, the RLE has CR greater than 1, which means, this technique
expands the original size file instead of the compressed file size.
At most the CR of the standard algorithms is in the range of 0.57
to 1.95 whereas the CR of the proposed scheme is located in the
range of 0.20 to 0.36. The
results of CR on tested text files based on the proposed scheme is
the best ratio achieved, this mean text size is reduced more
compared to the standard compression algorithms. The BPC of
the standard algorithms is in the range of 4.55 to 5.09 except the
RLE range located in 8.17 to 8.12, this means more space
required to store the text file. The BPC of the proposed scheme is
located in the range of 1.62 to 2.94. Thus, table results explain
the proposed scheme obtained less results of BPC, so it needs
fewer numbers of BPC than the standard compression algorithms.
Also, the table shows that the compressed file size in byte or KB
is smaller than the size of the standard compression algorithms.
As illustrated in Table III, the results show that the proposed
scheme has better results than the standard compression
algorithm mentioned above.
Table III: Comparison between Proposed and Standard Algorithms for
Plaintext Data
Dataset
LZFG
RLE
HUFF
SF
AC
Proposed
method
bib 2.90
8.16
5.26
5.56
5.23
3.02
book1 3.62
8.17
4.57
4.83
4.55
3.01
book2 3.05
8.17
4.83
5.08
4.78
2.99
news
3.44
7.98
5.24
5.41
5.19
2.98
paper1
3.03
8.12
5.09
5.34
4.98
2.97
paper2
3.16
8.14
4.68
4.94
4.63
3.00
progc 2.89
8.10
5.33
5.47
5.23
2.90
B. Image Datasets
In the proposed method the BPP of lena, baboon, and pappers
are calculated and compared with AC and HUFF. As it is
illustrated in Table IV the BPP of AC is in the range of 5.58 to
6.51, and HUFF in the range 5.84 to 6.49, this means more space
required to store image. The BPP of the proposed method is
located in the range of 3.71 to 3.75, this means our method needs
a smaller number of BPP, so it needs fewer numbers of BPC than
the standard compression algorithms which has better results than
the standard compression algorithm.
Table IV: Comparison between Proposed and Standard Algorithms for Image
Data
Datasets
AC
HUFF
Pro
p
osed
method
lena
5.85
5.84
3.71
b
aboon
6.51
6.49
3.75
p
eppers 6.2
6.18
3.73
VI. C
ONCLUSION
Previously the researchers have demonstrated that the
simultaneous techniques are far better than sequential techniques.
An efficient simultaneous technique for image and plaintext
datasets based on DNA is proposed here. For performance
evaluation various files in different size are inputted to the
proposed technique. Compression ratio, bit per character, bit per
pixel, and compressed file size are used to evaluate the technique
performance, then compared with different standard compression
techniques. Results showed that the proposed scheme can
compress about 62.5% of plaintext whereas 50% of image.
Further the security is increased by encrypting the generated
symmetric key, when the compressed data is transmitted,
attackers must decrypt the key using the
same procedures. In the future we can extend this work for
audio as well as videos.
REFERENCES
[1] Raju, Jini. "A study of joint lossless compression and encryption
scheme." In Circuit, Power and Computing Technologies
(ICCPCT), 2017 International Conference, Kollam, India, pp. 1-6.
IEEE, 2017.
[2] Shoba, D. Jasmine, and S. Sivakumar. "A Study on Data
Compression Using Huffman Coding Algorithms." International
Journal of Computer Science Trends and Technology (IJCST),
ISSN: 2347-8578, Vol. 5, Issue No. 1, pp. 58-63, 2017.
[3] Sethi, G., Sweta S., Vinutha, K., and Chakravorty, C. "Data
Compression Techniques." International Journal of Computer
Science and Information Technologies (IJCSIT), Vol. 5, Issue No.
4, pp. 5584-5586, 2014.
[4] Sharma, N., Kaur, J., and Kaur, N. "A review on various Lossless
text data compression techniques." Research Cell: An International
Journal of Engineering Sciences, ISSN: 2229-6913, Vol. 12, Issue
No. 2, pp. 58-63, 2014.
[5]
Gautam, R., Murali, S. "An Optimized Huffmans Coding by the
method of Grouping." arXiv preprint arXiv: pp. 1607.08433, Jul
2016.
[6] Mahmood, A., Latif, T., and Hasan, K. M. A. "An efficient 6-bit
encoding scheme for printable characters by table look up." In
Electrical, Computer and Communication Engineering (ECCE),
International Conference, Cox’s Bazar, Bangladesh, IEEE, pp.
468-472, Feb 2017.
[7] Sandoval, M. M., & Feregrino-Uribe, C. “A hardware architecture
for elliptic curve cryptography and lossless data compression”. In
Electronics, Communications and Computers. CONIELECOMP
2005. Proceedings. 15th International Conference, IEEE, pp. 113-
118, 2005.
[8] Ramya, K. A., Pushpa, M. “A Survey on Lossless and Lossy Data
Compression Methods”. International Journal of Computer
Science and Engineering Communications, Vol. 4, Issue No. 1, pp.
1277-1280, 2016.
[9] Fitriya, L. A., Purboyo, T. W., & Prasasti, A. L. (2017). “A
Review of Data Compression Tec niques”. International Journal of
Applied Engineering Research, ISSN 0973-4562, Vol. 12, Issue
No, 19, pp. 8956-8963, 2017.
[10] Hosseini, M. “A Survey of Data Compression Algorithms and
their Applications.” Conference: Applications of Advanced
Algorithms, At Simon Fraser University, Jan 2012.
[11] Al-Wattar, A. H., Mahmod, R., Zukarnain, Z. A., & Udzir, N. I.
“A New DNA-Based Approach of Generating Key-dependent
ShiftRows Transformation.” arXiv preprint arXiv:1502.03544,
2015.
[12] Zebari, D. A., Haron, H., Zeebaree, S. R., & Zeebaree, D. Q.
(2018, October). Multi-Level of DNA Encryption Technique
Based on DNA Arithmetic and Biological Operations. In 2018
International Conference on Advanced Science and Engineering
(ICOASE) (pp. 312-317). IEEE.
[13] Patel, h., Itwala, U., Rana, R., and Dangarwala, K. “Survey of
Lossless Data Compression Algorithms.” In International Journal
of Engineering Research and Technology, ESRSA Publications,
Vol. 4, Issue No. 4, pp. 926-929, April 2015.
[14] Lamorahan, C., Pinontoan, B., & Nainggolan, N. “Data
Compression Using Shannon-Fano Algorithm.” de CARTESIAN,
Vol. 2, Issue No. 2, pp. 10-17, September 2013.
[15] Kaur, K., Saxena, J., and Singh, S. “Image Compression Using
Run Length Encoding (RLE).” International Journal on Recent and
Innovation Trends in Computing and Communication (IJRITCC),
ISSN: 2321-8169, Vol. 5, Issue No. 5, pp. 1280 – 1285, May 2017.
[16]
Sarika, S., Srilali, S. “Improved Run Length Encoding Scheme for
Efficient Compression Data Rate.” Int. Journal of Engineering
Research and Applications, ISSN: 2248-9622, Vol. 3, Issue No. 6,
pp.2017-2020, Nov-Dec 2013.
[17] Sailunaz, K., Alam, M. R., Huda, M. N. “Data Compression
Considering Text Files.” International Journal of Computer
Applications, Vol. 90, Issue No. 11, pp. 27-32, March 2014.
[18] Gomathymeenakshi, M., Sruti, S., Karthikeyan, B., Nayana, M.
“An Efficient Arithmetic Coding Data Compression with
Steganography.” International Conference on Emerging Trends in
Computing, Communication and Nanotechnology (ICECCN),
IEEE, pp. 342-345, 2013.
[19] Senthil, S., and Robert, L. "Text Compression Algorithms: A
Comparative Study." Journal on Communication Technology, Vol:
2, Issue No. 4, pp. 444-451, December 2011.
[20] Li, T., Zhao, T., Nho, M., & Zhou, X. “A novel RLE & LZW for
bit-stream compression.” In Solid-State and Integrated Circuit
Technology (ICSICT), 2016 13th IEEE International Conference,
IEEE, pp. 1600-1602, October 2016.
[21] Chavan, R. R., and Sabnees, M. “Secured Mobile Messaging”. Int.
Conf. Comput. Electron. Elect. Technol., ICCEET, IEEE, pp.
1036-1043, 2012.
[22] Setyaningsih, E., & Wardoyo, R. “Review of Image Compression
and Encryption Techniques.” International Journal of Advanced
Computer Science and Applications (IJACSA), Vol. 8, Issue No.
2, pp. 83-94, 2017.
[23] Sangwan, N. “Combining Huffman text compression with new
double encryption algorithm.” In Emerging Trends in
Communication, Control, Signal Processing & Computing
Applications (C2SPCA), 2013 International Conference, IEEE, pp.
1-6, October 2013.
[24] Singh, R., Panchbhaiya, I., Pandey, A., & Goudar, R. H. “Hybrid
Encryption Scheme (HES): An Approach for Transmitting Secure
Data over Internet.” International Conference on Intelligent
Computing, Communication & Convergence, Procedia Computer
Science, 48, pp. 51-57, 2015.
[25] Ilakkiya, A. Pushparani, M. “An Image Compression – Encryption
Hybrid Algorithm Using DWT and Chaos System.” International
Journal on Recent and Innovation Trends in Computi ng and
Communication (IJRITCC), ISSN: 2321-8169, Vol. 4, Issue No. 4,
pp. 638-643, April 2016.
[26] Ponuma, R., Aarthi, V., & Amutha, R. “Cosine Number Transform
based hybrid image compression-encryption.” In Wireless
Communications, Signal Processing and Networking (WiSPNET),
International Conference, IEEE, pp. 172-176, March 2016.
[27] Ibrahim, F. E., Abdalkader, H. M., & Moussa, M. I. “Enhancing the
Security of Data Hiding Using Double DNA Sequences.” In
Industry Academia Collaboration Conference (IAC), pp. 6-8,
2015.
[28] Avudaiappan, T., Ilam Parithi, T., Balasubramanian, R., Sujatha,
K. “Performance Analysis on Lossless Image Compression
Techniques for General Images”. International Journal of Pure and
Applied Mathematics, ISSN: 1314-3395, Vol. 117, Issue No. 10,
pp. 1-5, 2017.
[29] Zebari, D. A., Haron, H., & Zeebaree, S. R. “Security Issues in
DNA Based on Data Hiding: A Review.” International Journal of
Applied Engineering Research, ISSN: 0973-4562, Vol. 12, Issue
No. 24, pp. 15363-15377, December 2017.
... A modified artificial neural network is an advanced form of neural network (NN). It varies from traditional neural networks with respect to the speed at which the signal progresses through neurons [19]. It is distinguished from classical neural networks in terms of how rapidly signals pass across neural networks. ...
Article
Due to many new medical uses, the value of ECG classification is very demanding. There are some Machine Learning (ML) algorithms currently available that can be used for ECG data processing and classification. The key limitations of these ML studies, however, are the use of heuristic hand-crafted or engineered characteristics of shallow learning architectures. The difficulty lies in the probability of not having the most suitable functionality that will provide this ECG problem with good classification accuracy. One choice suggested is to use deep learning algorithms in which the first layer of CNN acts as a feature. This paper summarizes some of the key approaches of ECG classification in machine learning, assessing them in terms of the characteristics they use, the precision of classification important physiological keys ECG biomarkers derived from machine learning techniques, and statistical modeling and supported simulation.
... In this method, a complementary rule is initially considered and the data is inserted before the longest complimentary substring in the reference DNA sequence. The complementary rule is set in such a way where x ≠ c(x) ≠ c(c(x)) ≠ c(c(c(x))) and x is one of the four possible genetic letters [47]. For instance, if the following complementary rule is applied: ((AC) (CG) (GT) (TA)), the complementary string of AATGC will be CCATG [19]. ...
Article
Full-text available
In the last two decades, the field of DNA-based steganography has emerged as a promising domain to provide security for sensitive information transmitted over an untrusted channel. DNA is strongly nominated by researchers in this field to exceed other data covering mediums like video, image, and text due to its structural characteristics. Features like enormous hiding capacity, high computational power, and the randomness of its building contents, all sustained to prove DNA supremacy. There are mainly three types of DNA-based algorithms. These are insertion, substitution, and complementary rule-based algorithms. In the last few years, a new generation of DNA-based steganography approaches has been proposed by researchers. These modern algorithms overpass the performance of the old ones either by exploiting a biological factor that exists in the DNA itself or by using a suitable technique available in another field of computer science like artificial intelligence, data structure, networking, etc. The main goal of this paper is to thoroughly analyze these modern DNA-based steganography approaches. This will be achieved by explaining their working mechanisms, stating their pros and cons, and proposing suggestions to improve these methods. Additionally, a biological background about DNA structure, the main security parameters, and classical concealing approaches will be illustrated to give a comprehensive picture of the field.
... In addition, the video is considered more complicated than the image files, and this increases security against attacks, and this is why the process of steganography the message in the video is an important part of the art of hiding information, which prevents the detection of the hidden message [33], [43], [51], [52]. Most methods of steganography are conducted by including secret messages, with less distortion in the digital medium [4], [8], [41], [53]- [60], [11], [12], [23], [28], [29], [33], [34], [39]. Therefore, to maintain the quality of the digital medium, it provides limited security capability for any limited medium [11], [61], [62], also by considering the use of a method to successfully detect the hidden message in the digital medium by the recipient [4], [11], [12], [23], [33], [39], [41], [63], [64]. ...
Article
Full-text available
Steganography techniques have taken a major role in the development in the field of transferring multimedia contents and communications. Therefore, field of steganography become interested as the need for security increased significantly. Steganography is a technique to hide information within cover media so that this media does not change significantly. Steganography process in a video is to hide the information from the intruder and prevent him access to that hidden information. This paper presents the algorithm of steganography in the video frames. The proposed algorithm selected the best frames to hide the message in video using 3D distance equation to increasing difficulty onto the intruder to detect and guess the location of the message in the video frames. As well as selected the best frames in this algorithm increased the difficulty and give us the best stego-video quality using structural similarity (SSIM). Also, the hash function was used to generate random positions to hide the message in the lines of video frames. The proposed algorithm evaluated with mean squared error (MSE), peak signalto-noise ratio (PSNR) and SSIM measurement. The results were acceptable and shows that is the difficulty of distinguishing the hidden message in stego-video with the human eye.
... Many studies about the use of machine learning algorithms for student performance in the education data mining field are an essential data mining sector and the crossroads of statistics, information science and education [22][23][24][25]. The use of machine learning techniques depends mainly on the feature taken to predict something [26]. This paper presents suggested features and Logistic Regression, Naive Bayes, SVM, Gradient Boosting DT was used in the development of a machine learning model. ...
Article
Full-text available
Machine learning algorithms have been used in many fields, like economics, medicine, etc. Education data mining is one of the areas concerned with exploring patterns of data in an educational environment. One of the most important uses is to predict students' performance to improve the existing educational situation. It can be considered as one of the data mining sciences. The ability to predict in advance in many areas has many benefits. In the case of learning, it enables us to know students' levels in advance and identify students who need special attention. This paper proposes using the algorithm (GBDT) which is a machine learning technology used for regression, classification, and ranking tasks, and is part of the Boosting method family to predict university students' performance in final exams. It compares the proposed system's performance with selected machine learning algorithms (Support vector machine, Logistic Regression, Naive Bayes, Gradient Boosted Trees).
... True negative rate: It's the number of correctly labelled Normal samples divided by the total number of samples that are Normal Precision: It's the ratio of correctly expected Attacks to all Attacks samples [64] Recall: It's the proportion of all Attacks samples correctly listed to all Attacks samples that are actually Attacks. It's also known as a Detection Rate [65]. ...
Article
Full-text available
The bird classifier is a system that is equipped with an area machine learning technology and uses a machine learning method to store and classify bird calls. Bird species can be known by recording only the sound of the bird, which will make it easier for the system to manage. The system also provides species classification resources to allow automated species detection from observations that can teach a machine how to recognize whether or classify the species. Non-undesirable noises are filtered out of and sorted into data sets, where each sound is run via a noise suppression filter and a separate classification procedure so that the most useful data set can be easily processed. Mel-frequency cepstral coefficient (MFCC) is used and tested through different algorithms, namely Naïve Bayes, J4.8 and Multilayer perceptron (MLP), to classify bird species. J4.8 has the highest accuracy (78.40%) and is the best. Accuracy and elapsed time are (39.4 seconds).
Article
Deep Learning is a machine learning area that has recently been used in a variety of industries. Unsupervised, semi-supervised, and supervised-learning are only a few of the strategies that have been developed to accommodate different types of learning. A number of experiments showed that deep learning systems fared better than traditional ones when it came to image processing, computer vision, and pattern recognition. Several real-world applications and hierarchical systems have utilised transfer learning and deep learning algorithms for pattern recognition and classification tasks. Real-world machine learning settings, on the other hand, often do not support this assumption since training data can be difficult or expensive to get, and there is a constant need to generate high-performance beginners who can work with data from a variety of sources. The objective of this paper is using deep learning to uncover higher-level representational features, to clearly explain transfer learning, to provide current solutions and evaluate applications in diverse areas of transfer learning as well as deep learning.
Article
Full-text available
Image steganography we can hide the secret data in cover manner. Where present of secret information can't realize or visible by malicious users. In this approach steganography procedure divided into two steps. In first step DNA sequence (combination of four nucleotides A,C,G&T) used to convert secret information into a grayscale image by generating key. In second step, gray scale image of step-1 will steganography by any standard steganography procedure.
Article
Full-text available
Information security and confidentiality are a key concern, particularly with the rampant growth and use of the internet. Along with the growth comes the incidents of unauthorized information access which are countered by the use of varied secure communication techniques, namely; cryptography and data hiding. More recent trends are concerned with the application of DNA cryptography and data hiding by using it as a carrier thereby making use of its bio-molecular computational properties. This paper provides a survey of recently published DNA based data hiding algorithms which make use of DNA to safeguard critical data being transmitted over an insecure communication channel. Several DNA-based data hiding techniques will be discussed with particular emphasis on strength and weaknesses of the algorithm in question; algorithms are compared based on the cracking probability, double layer of security, single or double hiding layer, blindness, and much more. This will be useful for future research in the design of more efficient and reliable of secure DNA data hiding techniques.
Article
Full-text available
This paper presents a review kind of data compression techniques. Data compression is widely used by the community because through a compression we can save storage. Data compression can also speed up a transmission of data from one person to another. In performing a compression requires a method of data compression that can be used, the method can then be used to compress a data. Data that can be compressed not only text data but can be images and video. Data compression technique is divided into 2 namely lossy compression and lossless compression. But which is often used to perform a compression that is lossless compression. A kind of lossless compressions such as Huffman, Shannon Fano, Tunstall, Lempel Ziv welch and run-length encoding. Each method has the ability to perform a different compression. This paper explains how a method works in doing a compression and explains which method is well used in doing a data compression in the form of text. The output generated in doing a can be known through the compression file size that becomes smaller than the original file.
Article
As invent of the new technology is rapidly increasing and population is adopting the new technologies. Cyber security is becoming a major concern. Today almost everything is done online like shopping, banking, storing, sharing and communication. Everywhere barcodes, QR codes, OTP, PIN and etc., are used. Still it is becoming very difficult to secure things. In this paper, we propose a new method of Data security and user authentication using DNA QR coding which is very much fast, secure and less predictable. In this proposed method any message is converted to DNA sequence. This DNA sequence is converted to QR codes which are called as DNA QR codes. These QR codes are transmitted to receiver and receiver needs to decode it using QR code scanner or QR code decoder.
Article
Nowadays, data size is increasing day by day from gigabytes to terabytes or even petabytes, mainly because of the evolution of a large amount of real-time data. Most of the big data is transmitted through the internet and they are stored on the cloud computing environment. As cloud computing provides internet-based services, there are many attackers and malicious users. They always try to access user’s confidential big data without having the access right. Sometimes, they replace the original data by any fake data. Therefore, big data security has become a significant concern recently. Deoxyribonucleic Acid (DNA) computing is an advanced emerged field for improving data security, which is based on the biological concept of DNA. A novel DNA based data encryption scheme has been proposed in this paper for the cloud computing environment. Here, a 1024-bit secret key is generated based on DNA computing, user’s attributes and Media Access Control (MAC) address of the user, and decimal encoding rule, American Standard Code for Information Interchange (ASCII) value, DNA bases and complementary rule are used to generate the secret key that enables the system to protect against many security attacks. Experimental results, as well as theoretical analyses, show the efficiency and effectivity of the proposed scheme over some well-known existing schemes.
Chapter
Data security has got many remarkable achievements. However, the issues of the lower distortion and the higher embedding capacity in the embedded secret data in media have not much been considered by scholars. This paper proposes a new data hiding approach to the embedded secrets based on the guidance of the x-cross-shaped reference-affected matrix to solve these issues. Adjacent pixels would be found out large area with similar values which can utilize for manipulating data embedding and extracting on a difference–coordinate plan instead of the traditional pixel–coordinate plan. Three parts: petal matrix, calyx matrix, and stamen matrix are combined for data embedding by using the x-cross-shaped reference matrix. The experimental results compared with the previous methods in the literature shows that the proposed approach brings outstanding payload with the cover visual quality.
Article
Communication systems in the world of technology, information and communication are known as data transfer system. Sometimes the information received lost its authenticity, because size of data to be transferred exceeds the capacity of the media used. This problem can be reduced by applying compression process to shrink the size of the data to obtain a smaller size. This study considers compression for data text using Shannon – Fano algorithm and shows how effective these algorithms in compressing it when compared with the Huffman algorithm. This research shows that text data compression using Shannon-Fano algorithm has a same effectiveness with Huffman algorithm when all character in string all repeated and when the statement short and just one character in the statement that repeated, but the Shannon-Fano algorithm more effective then Huffman algorithm when the data has a long statement and data text have more combination character in statement or in string/ word. Keywords: Data compression, Huffman algorithm, Shannon-Fano algorithm Abstrak Sistem komunikasi dalam dunia teknologi informasi dan komunikasi dikenal sebagai sistem transfer data. Informasi yang diterima kadang tidak sesuai dengan aslinya, dan salah satu penyebabnya adalah besarnya ukuran data yang akan ditransfer melebihi kapasitas media yang digunakan. Masalah ini dapat diatasi dengan menerapkan proses kompresi untuk mengecilkan ukuran data yang besar sehingga diperoleh ukuran yang lebih kecil. Penelitian ini menunjukan salah satu kompresi untuk data teks dengan menggunakan algoritma Shannon – Fano serta menunjukan seberapa efektif algoritma tersebut dalam mengkompresi data jika dibandingkan dengan algoritma Huffman. Kompresi untuk data teks dengan algoritma Shannon-Fano menghasilkan suatu data dengan ukuran yang lebih kecil dari data sebelumnya dan perbandingan dengan algoritma Huffman menunjukkan bahwa algoritma Shannon- Fano memiliki keefektifan yang sama dengan algoritma Huffman jika semua karakter yang ada di data berulang dan jika dalam satu kalimat hanya ada satu karakter yang berulang, tapi algoritma Shannon-Fano lebih efektif jika kalimat lebih panjang dan jumlah karakter di dalam kalimat atau kata lebih banyak dan beragam. Kata kunci: Algoritma Huffman, Algoritma Shannon-Fano, Kompresi data