Content uploaded by Khaled Elleithy
Author content
All content in this area was uploaded by Khaled Elleithy on May 17, 2015
Content may be subject to copyright.
Ammar Odeh, Khaled Elleithy, Miad Faezipour
Department of Computer Science and Engineering
University of Bridgeport
Bridgeport, CT
aodeh@my.bridgeport.edu, elleithy@bridgeport.edu,
mfaezipo@bridgeport.edu
Eman Abdelfattah
School of Engineering & Computing Sciences
Texas A&M University
Corpus Christi, TX
Eman.Abdelfattah@tamucc.edu
Abstract—This paper investigates eight novel Steganography
algorithms employing text file as a carrier file. The proposed
model hides secret data in the text file by manipulating the font
format or inserting special symbols in the text file.
Furthermore, the suggested algorithms can be applied to both
Unicode and ASCII code languages, regardless of the text file
format. In addition, a merging capability among the techniques
is introduced, which allows alternatives for users based on the
system requirements. The proposed algorithms achieve a high
degree of optimized Steganography attributes such as hidden
ratio, robustness, and transparency.
Index Terms— HTML, Kashida, Multipoint Letter, Left
Remark, Right Remark, Steganography, Zero Width.
I. INTRODUCTION
Different strategies are used to protect sensitive data
during transmission over unsecure channels. Some
algorithms suggest changing plain text into cipher text,
which is called Cryptography [1]. On the other hand, some
algorithms are presented to protect secret information by
hiding its existence [2]. Steganography is a security
mechanism that is used to hide data inside a carrier file such
as image, sound, video, or text [3]. Secret information can be
inserted inside a carrier file in different strategies, where
each one of them has advantages and drawbacks.
The first steganography technique is injection where the
secret data is injected inside the carrier file, which increases
the carrier file size, and sometimes changes the carrier file
format [4]. The second suggested technique is substitution
where sensitive information is replaced by other data from
the carrier file [5]. The substitution method searches for bits
that have the lowest effect in the carrier file to apply the
exchange operation on them. The main advantage of this
approach is that the Stego object size is the same as the
carrier file. In addition, this approach avoids the possibility
of attacker’s suspicions. In all Steganography approaches,
the main idea is to hide the data inside the carrier file and
then to place the Stego file in some transport media. After
the data is transmitted, a Stego-analyzer might analyze the
data if there is any suspicion about the carrier file [6].
II. TECHNICAL SOLUTION AND ALGORITHMS DETAILS
This section presents eight new text Steganography
algorithms: (1) Multipoint, (2) ZKA, (3) Diacritics, (4)
KVA, (5) ZWC, and (6) Remarks (7) HTML code (8) MS
Word symbols. The proposed algorithms enhance the hidden
capacity ratio, the system’s transparency and robustness. The
diversity of the algorithms enables the users to pass their
sensitive data in an authenticated and secure manner. Each
algorithm has a different strategy that allows the user to hide
the data inside the text file. Some of these algorithms
employ the text’s font format to embed. Other algorithms
use symbol insertion technique to pass sensitive data over
the public channels. The presented algorithms can be
classified into two categories (1) Unicode languages and (2)
Multiple languages.
1. Multipoint Algorithm (A1)
Multipoint algorithm is presented and applied in Unicode
language (Arabic, Persian). These languages have some
letters called multipoint shaped letters, which contain more
than one point, unlike the English shaped letters (i, j).
Vertically or horizontally shifting techniques can be
employed to hide two bits of information in each character.
The proposed algorithm offers a highly hidden capacity ratio
compared to other algorithms. Each multipoint letter can
hide 2 bits, whereas, other algorithms can hide only one bit
per letter. The proposed algorithm enhances robustness by
changing the Stego-object to an image or a PDF file in order
to avoid the retype problem. In addition, the introduced
algorithm can be applied into other languages such as Pashto
and Urdu [7].
2. Diacritics Algorithm (A2)
Diacritics Algorithm is used to hide data inside Arabic
text [8]. This proposed algorithm uses Arabic diacritics
which are language characteristics represented by small
vowel letters. Diacritics are an optional property for any
Arabic text, and are not popularly used. Most of the Arabic
letters need diacritics to correct word pronunciation. The
proposed algorithm employs diacritics to hide two secret bits
from each diacritic, in order to offer a high hidden ratio
compared to other algorithms. The main drawback of the
Highly Efficient Novel Text Steganography
Algorithms
diacritics algorithm is that the use of diacritics is uncommon.
Thus, hackers might be more intrigued to analyze the file.
Table 1 shows a sample cover object before and after
inserting the secret data.
TABLE 1. DIACR ITICS ALGORITHM SIMULATION OUTPUT
3. Zero Width Character and Kashida Algorithm (ZKA)
(A3)
Zero Width character and Kashida are introduced through
the Unicode language (Arabic) by using special characters to
justify sentences. Kashida and Zero Width character hide
two bits between connected letters. In order to avoid any
attackers’ suspicion, a randomization algorithm utilizing
ZKA has been applied. By using this process, each message
applies different strategies to conceal secret messages. The
proposed algorithm can be extended to other Unicode
languages. Two weaknesses of ZKA are: (1) the retyping
problem, and (2) the clear format problem. These
deficiencies reduce the algorithm robustness [9]. Table 2
illustrates the changes in the carrier text and secret data
when applying the ZKA algorithm.
TABLE 2. ZKA SIMULATION OUTPUT
Cover
Object
ﺖﻧﺎآ ﻲﺘﻟا ﺔﻴﻧﺎﻣوﺮﻟا ﺔﻳرﻮﻃاﺮﺒﻣﻻاو ﺮﺼﻣ ءاﺬﻏ ﺔَﻠﺳ ﻲﻟﺎﻤﺸﻟا ﺮﺼﻣ ﻞﺣﺎﺳ نﺎآ
مﻼﺳﻹا ﻞﺒﻗ ﺮﺼﻣ ﻞﺘﺤﺗ . ﻰﻓ ﻞﻴﻨﻟا ﻩﺎﻴﻣ ﻰﻠﻋ نﻮﻳﺮﺼﻤﻟا ﺪﻤﺘﻋا ،ﻲﻟﺎﻌﻟا ﺪﺴﻟا ءﺎﻨﺑ ﻞﺒﻘﻓ
ﺔﻋارز ﻲﻓ رﺎﻄﻣﻷا ﻩﺎﻴﻣ ﻰﻠﻋ اوﺪﻤﺘﻋا ﺎﻤآ ﺎﺘﻟﺪﻟاو يداﻮﻟا ﻲﻓ ﺔﻴﻔﻴﺼﻟا تﺎﻋارﺰﻟا
ﺢﻤﻘﻟا
Stego
Object
ﻲﺘـﻟا ﺔـﻴـﻧﺎـﻣوﺮﻟا ﺔﻳرﻮـﻃاﺮﺒﻣﺎـﻟاو ﺮـﺼـﻣ ءاﺬﻏ ﺔَـﻠـﺳ ﻲﻟﺎـﻤﺸـﻟا ﺮﺼـﻣ ﻞﺣﺎﺳ نﺎـآ
مﺎـﻠـﺳﻹا ﻞـﺒـﻗ ﺮـﺼـﻣ ﻞـﺘـﺤـﺗ ﺖـﻧﺎآ . نﻮـﻳﺮﺼﻤـﻟا ﺪﻤﺘﻋا ،ﻲـﻟﺎـﻌﻟا ﺪـﺴـﻟا ءﺎﻨـﺑ ﻞﺒﻘـﻓ
اوﺪـﻤـﺘـﻋا ﺎـﻤآ ﺎـﺘﻟﺪﻟاو يداﻮـﻟا ﻲـﻓ ﺔـﻴـﻔـﻴـﺼﻟا تﺎﻋارﺰﻟا ﻰـﻓ ﻞﻴـﻨـﻟا ﻩﺎـﻴـﻣ ﻰـﻠـﻋ
ﻩﺎﻴـﻣ ﻰـﻠـﻋﺢـﻤـﻘﻟا ﺔﻋارز ﻲﻓ رﺎـﻄﻣﻷا
Secret
Data
100100100000111010110001011110110001111111001110111
110011111010000010010111000110000000101101000111111
1000000101001
4. Kashida Variation Algorithm (KVA) (A4)
Kashida variation algorithm (KVA) is presented in
Kashida according to the space position in order to hide two
bits between words. KVA is a simple application that,
typically, uses justified text over any word editors. The
proposed KVA introduces four different Kashida
information hidden scenarios. A specific scenario is applied
in each fragment to embed the secret data inside carrier file.
Furthermore, an aggregation is applied over message blocks
to reassemble the message that contains hidden information.
The benefits of the variation process within KVA is the
creation of an extra complex dimension, enhancing
robustness, and improving transparency [10]. Table 3
illustrates the cover file effect after embedding the secret
data. As demonstrated in Table 3, the semantics and the
syntax of the carrier file did not change.
TABLE 3. KVA SIMULATION OUTPUT
Cover
Object ﺖﻧﺎآ ﻲﺘﻟا ﺔﻴﻧﺎﻣوﺮﻟا ﺔﻳرﻮﻃاﺮﺒﻣﻻاو ﺮﺼﻣ ءاﺬﻏ ﺔَﻠﺳ ﻲﻟﺎﻤﺸﻟا ﺮﺼﻣ ﻞﺣﺎﺳ نﺎآ
مﻼﺳﻹا ﻞﺒﻗ ﺮﺼﻣ ﻞﺘﺤﺗ . ﻞﺒﻘﻓ ﻰﻓ ﻞﻴﻨﻟا ﻩﺎﻴﻣ ﻰﻠﻋ نﻮﻳﺮﺼﻤﻟا ﺪﻤﺘﻋا ،ﻲﻟﺎﻌﻟا ﺪﺴﻟا ءﺎﻨﺑ
ﺔﻋارز ﻲﻓ رﺎﻄﻣﻷا ﻩﺎﻴﻣ ﻰﻠﻋ اوﺪﻤﺘﻋا ﺎﻤآ ﺎﺘﻟﺪﻟاو يداﻮﻟا ﻲﻓ ﺔﻴﻔﻴﺼﻟا تﺎﻋارﺰﻟا
ﺢﻤﻘﻟا
Stego
Object ﻲﺘـﻟا ﺔـﻴـﻧﺎـﻣوﺮﻟا ﺔﻳرﻮـﻃاﺮﺒﻣﺎـﻟاو ﺮـﺼـﻣ ءاﺬﻏ ﺔَـﻠـﺳ ﻲﻟﺎـﻤﺸـﻟا ﺮﺼـﻣ ﻞﺣﺎﺳ نﺎـآ
ﻞـﺒـﻗ ﺮـﺼـﻣ ﻞـﺘـﺤـﺗ ﺖـﻧﺎآمﺎـﻠـﺳﻹا . نﻮـﻳﺮﺼﻤـﻟا ﺪﻤﺘﻋا ،ﻲـﻟﺎـﻌﻟا ﺪـﺴـﻟا ءﺎﻨـﺑ ﻞﺒﻘـﻓ
اوﺪـﻤـﺘـﻋا ﺎـﻤآ ﺎـﺘﻟﺪﻟاو يداﻮـﻟا ﻲـﻓ ﺔـﻴـﻔـﻴـﺼﻟا تﺎﻋارﺰﻟا ﻰـﻓ ﻞﻴـﻨـﻟا ﻩﺎـﻴـﻣ ﻰـﻠـﻋ
ﺢـﻤـﻘﻟا ﺔﻋارز ﻲﻓ رﺎـﻄﻣﻷا ﻩﺎﻴـﻣ ﻰـﻠـﻋ
Secret
Data 100100100000111010110001011110110001111111001110111
110011111010000010010111000110000000101101000111111
1000000101001
5. Zero Width Character Algorithm (ZWC) (A5)
Zero Width Character algorithm merges a space and Zero
Width Character to hide two bits between words inside any
document[11]. The main idea of ZWC is to introduce a
variation between two letters in order to hide data. An
advantage of ZWC is its ability to be applied to any
language. A disadvantage of ZWC is the addition of an extra
space that can be checked by most of the word editors. In
addition, retyping can destroy the robustness of the
algorithm. Table 4 exemplifies the Stego Object and the
secret data position.
TABLE 4. ZWC ALGORITHM OUTPUT
6. Remarks Steganography (A6)
Remarks Steganography Algorithm uses a text file as a
carrier to hide the data inside it[12]. The main goal of the
algorithm is to hide data inside a word file without any
changes in the file format. In Remarks Algorithm the Right-
to-Left Remark (U200F) symbol “ ” and the Left-to-Right
Remark (U200E) symbol “ ” are used to hide the bits inside
the carrier file. This method does not change the format of
the file and can also be applied to different languages,
regardless of UNICODE or ASCII coding. This method is
easily applied to the Microsoft Office Word files to hide
secret data.
7. Novel Steganography over HTML Code (A7)
In the HTML Steganography technique [13],
Cryptography and Steganography techniques are used to
Cover
Object ﱠﻞ َﺘ ْﺤ َﺗ ْﺖ َﻧ ﺎ َآ ﻲِﺘﱠﻟا َﺔ ﱠﻴ ِﻧ ﺎ ﻣ و ﱡﺮ ﻟ ا ﺔﻳرﻮﻃاﺮﺒﻣﻻاو ٍﺮ ْﺼ ِﻣ ُء ا َﺬ ِﻏ ﺔَﻠﺳ ِﻲ ِﻟ ﺎ َﻤ ﱠﺸ ﻟ ا ِﺮ ْﺼ ِﻣ ُﻞ ِﺣ ﺎ َﺳ َن ﺎ َآ
ِم ﻼ ْﺳ ِﺈ ْﻟ ا َﻞ ْ ﺒ َﻗ ًﺮ ْﺼ ِﻣ . ﻰﻓ ِﻞ ْﻴ ﱠﻨ ﻟ ا ِﻩ ﺎ َﻴ ِﻣ ﻰَﻠَﻋ َن ﻮ ﱡﻳ ِﺮ ْ ﺼ ِﻤ ْﻟ ا َﺪ َﻤ َﺘ ْﻋ ِا ، ﻲِﻟﺎَﻌْﻟا ﱢﺪ ﱠﺴ ﻟ ا ُء ﺎ َﻨ ِﺑ َﻞ ْﺒ َﻘ َﻓ
ِﺢ ْﻤ َﻘ ْ ﻟ ا ِﺔ َﻋ ا ر ِز ﻲِﻓ ِرﺎَﻄْﻣَﺄْﻟا ِﻩ ﺎ َﻴ ِﻣ ﻰَﻠَﻋ ا و ُﺪ َﻤ َﺘ ْﻋ ِا ﺎَﻤَآ ﺎﺘْﻟﱢﺪﻟاَو يِداَﻮْﻟا ﻲِﻓ ُﺔ ﱠﻴ ِﻔ ْ ﻴ ﱠﺼ ﻟ ا ُتﺎَﻋارﱢﺰﻟا
Stego
Object َﻤ ﱠﺸ ﻟ ا ِﺮ ْ ﺼ ِﻣ ُﻞ ِﺣ ﺎ َﺳ َن ﺎ َآ ﱠﻞ َﺘ ْﺤ َﺗ ْﺖ َﻧ ﺎ َآ ﻲِﺘﱠﻟا َﺔ ﱠﻴ ِﻧ ﺎ ﻣ و ﱡﺮ ﻟ ا ﺔﻳرﻮﻃاﺮﺒﻣﻻاو ٍﺮ ْﺼ ِﻣ ُء ا َﺬ ِﻏ ﺔَﻠﺳ ِﻲ ِﻟ ﺎ
ِم ﻼ ْﺳ ِﺈ ْﻟ ا َﻞ ْ ﺒ َﻗ ًﺮ ْﺼ ِﻣ . ﻰﻓ ِﻞ ْﻴ ﱠﻨ ﻟ ا ِﻩ ﺎ َﻴ ِﻣ ﻰَﻠَﻋ َن ﻮ ﱡﻳ ِﺮ ْ ﺼ ِﻤ ْﻟ ا َﺪ َﻤ َﺘ ْﻋ ِا ، ﻲِﻟﺎَﻌْﻟا ﱢﺪ ﱠﺴ ﻟ ا ُء ﺎ َﻨ ِﺑ َﻞ ْﺒ َﻘ َﻓ
ﺎﺘْﻟﱢﺪﻟاَو يِداَﻮْﻟا ﻲِﻓ ُﺔ ﱠﻴ ِﻔ ْ ﻴ ﱠﺼ ﻟ ا ُتﺎَﻋارﱢﺰﻟا ِﺢ ْﻤ َﻘ ْ ﻟ ا ِﺔ َﻋ ا ر ِز ﻲِﻓ ِرﺎَﻄْﻣَﺄْﻟا ِﻩ ﺎ َﻴ ِﻣ ﻰَﻠَﻋ ا و ُﺪ َﻤ َﺘ ْﻋ ِا ﺎَﻤَآ
Secret
Bits 1001001000001110101100010111101100011111110011101111
1001111101000001001011100011000000010110100011111110
00000101001
Cover
Object ﺖﻧﺎآ ﻲﺘﻟا ﺔﻴﻧﺎﻣوﺮﻟا ﺔﻳرﻮﻃاﺮﺒﻣﻻاو ﺮﺼﻣ ءاﺬﻏ ﺔَﻠﺳ ﻲﻟﺎﻤﺸﻟا ﺮﺼﻣ ﻞﺣﺎﺳ نﺎآ
مﻼﺳﻹا ﻞﺒﻗ ﺮﺼﻣ ﻞﺘﺤﺗ . ﻰﻓ ﻞﻴﻨﻟا ﻩﺎﻴﻣ ﻰﻠﻋ نﻮﻳﺮﺼﻤﻟا ﺪﻤﺘﻋا ،ﻲﻟﺎﻌﻟا ﺪﺴﻟا ءﺎﻨﺑ ﻞﺒﻘﻓ
ﺔﻋارز ﻲﻓ رﺎﻄﻣﻷا ﻩﺎﻴﻣ ﻰﻠﻋ اوﺪﻤﺘﻋا ﺎﻤآ ﺎﺘﻟﺪﻟاو يداﻮﻟا ﻲﻓ ﺔﻴﻔﻴﺼﻟا تﺎﻋارﺰﻟا
ﺢﻤﻘﻟا.
Stego
Object ﻞﺣﺎﺳ نﺎـآ ﻲﺘـﻟا ﺔـﻴـﻧﺎـﻣوﺮﻟا ﺔﻳرﻮـﻃاﺮﺒﻣﺎـﻟاو ﺮـﺼـﻣ ءاﺬﻏ ﺔَـﻠـﺳ ﻲﻟﺎـﻤﺸـﻟا ﺮﺼـﻣ
مﺎـﻠـﺳﻹا ﻞـﺒـﻗ ﺮـﺼـﻣ ﻞـﺘـﺤـﺗ ﺖـﻧﺎآ . نﻮـﻳﺮﺼﻤـﻟا ﺪﻤﺘﻋا ،ﻲـﻟﺎـﻌﻟا ﺪـﺴـﻟا ءﺎﻨـﺑ ﻞﺒﻘـﻓ
اوﺪـﻤـﺘـﻋا ﺎـﻤآ ﺎـﺘﻟﺪﻟاو يداﻮـﻟا ﻲـﻓ ﺔـﻴـﻔـﻴـﺼﻟا تﺎﻋارﺰﻟا ﻰـﻓ ﻞﻴـﻨـﻟا ﻩﺎـﻴـﻣ ﻰـﻠـﻋ
ﻓ رﺎـﻄﻣﻷا ﻩﺎﻴـﻣ ﻰـﻠـﻋﺢـﻤـﻘﻟا ﺔﻋارز ﻲ
Secret
Bits
Positions
pass secure information. Webpages are used as the carrier
for secret data while they are published over the Internet.
Authenticated users can access the hidden data. The
proposed algorithm consists of three main steps, where the
first and third steps represent inverse operations.
The Conceal operation consists of six stages:
1. The Statistical Stage consists of an array of 26
elements that count the characters' frequency. The
frequency array can be increased or decreased based
on a webpage’s language. Our experiments are based
on the English language.
2. The Character Representation Stage assigns “bits”
based on the usage frequency of the character. For
example, after the frequency array has been generated,
the lowest two frequently found characters could be
represented by one bit. If two characters have the
same frequency, the character order specifies which
one is zero. For example, if letter ‘X’ appears 10 times
and a letter ‘Z’ appears 6 times, then ‘Z’ is
represented as ‘0’, and ‘X’ as ‘1’. Moreover, if both
letters have the same occurrence number, then ‘X’ is
represented as ‘0’ and ‘Z’ is represented as ‘1’.
Similarly, the next four characters can be represented
by two bits.
3. The Embedding Stage occurs when the secret bits are
embedded after the character representation equals ‘8’
bits. In other words, if the first character
representation is ‘0’ and the secret bits are “0111011”,
the code will be “00111011”.
4. The Encryption Stage: consists of three simple binary
operations. First, the binary representation is
complemented. Then, “exclusive OR” (XOR) is
performed with the binary key. In order to produce the
output of XOR gate, shift left must be applied by one
bit and re-entered as XOR input. In addition, the
binary key code creation depends on the webpage
index where each page has a rear index. This
operation is repeated twice as shown in Figure 1.
FIGURE 1. ENCRYPTION GATES
A numerical example is presented below where the input
is (C) 01000011, and the key is 10001100:
Step 1:- Binary representation for C =>01000011
Step 2: - 1’s complement of C => 10111100
Step 3:- 10001100 XOR 10111100=>00110000
Step 4:-Shift left 00110000 => 01100000
Step 5:- 10001100 XOR 01100000=>11101100
Step 6:- Shift left 11101100=>11011001
5. The Decoding Stage converts the binary code to ASCII
code. Here, the numerical form is changed into text. In
the example, shown above “11011001” is decoded to
(Ù).
6. The Insertion Stage occurs when the text from the
decoding stage is produced. Then, the text is placed into
the webpage code as comment. The comments do not
appear in the webpage’s output view.
To reveal the secret message the following procedural
stages are applied:-
1. The Statistical Stage relies on 26 characters, similar to
those found in the “Conceal Operation” statistical stage.
2. The Reading Comments Stage allows the user to read
comments from the webpage.
3. The Encoding Stage converts any comment appearing in
the text into binary representation.
4. The Decryption Stage follows the same binary process
that exists in the “Conceal Operation” of the encryption
stage.
The Reveal Character Code Stage is performed by using
the frequency array created in the statistical stage, and
comparing it with the binary output of the decryption stage.
The embedded information is obtained by removing the
character representation.
8. Steganography in Text by Using MS Word Symbols (A8)
The MS Word Symbols Algorithm hides data inside a
Word file without altering the carrier file properties such as
file size, content and format[14]. The MS Word Symbols
Algorithm employs some non-printable symbols to hide four
bits between letters. This process improves the hidden
capacity ratio compared to other algorithms. Moreover, no
modification in the Word format file or letter shapes would
be made. Thus, the suggested algorithm avoids raising any
hackers’ suspicions.
Table 5 represents some of the hidden codes. For
example, if we insert all four-table variation symbols after
each letter, then the passing bits code is 0000. This insertion
enables us to hide four bits of secret data. The four symbols
applied are Right remark (200E), Left remark (200F), Zero
width joiner (200D), and Zero width non-joiner (200C). In
this technique, different variations can be used to represent
hidden bits for a total of 16 different codes.
Three inputs are used in the hiding process: the Stego key,
carrier file, and hidden data. The main purpose of Stego key
is to change the symbols’ bit representation. For example,
‘1’ represents a bit’s absence, and ‘0’ represents a bit’s
presence. In the next step, a symbols’ table is created
depending on the outcome of the Stego key.
TABLE 5. SAMPLE OF HIDDEN BITS BY USING WORD SYMBOLS
Right
Remark
Left
Remark
ZWJ ZWNJ Hidden
code
X X X X 0000
X X X 0001
X X 0101
X X 0011
X 0111
1111
X 1101
III. PERFORMANCE EVALUATION
Table 7 shows the proposed algorithms and their
classifications. Multipoint, Diacritics, ZKA, and KVA
algorithms are language dependent. These algorithms (A1,
A2, A3, and A4) have the potential to hide secret data inside
Unicode Languages such as Arabic, Persian, and Urdu.
Multipoint and Diacritics algorithms hide data by
substituting one letter with another modified shaped letter.
This approach works well, since there is no standard position
or distance between the letter and the points. Also, this
approach has proven that there is no standard position
between the letter and the Diacritics. The main advantage of
the A1 and A2 algorithms is the file’s size stability.
Moreover, A1 and A2 have the ability to combine with A3,
A4, A5, A6, A7, and A8 in order to improve the hidden ratio
capacity. A3 and A4 hide the data inside a carrier object by
inserting non-printed symbols. The main drawback of A3
and A4 is that the carrier file’s size increases with the
process of embedding the secret message.
A5, A6, and A8 hide the data by embedding non-printed
symbols. In addition, A5, A6, and A8 provide a highly
hidden ratio capacity. Moreover, these algorithms have the
ability to be combined with A1 and A2 in order to improve
the hidden ratio capacity. One disadvantage associated with
A5, A6, and A8 is that the carrier file size increases through
the insertion of non-printed symbols. HTML Code A7
algorithm hides the encrypted secret data as comments
inside the HTML Webpage. Limited hidden bits can be
inserted based on the webpage’s original language. The
hidden capacity of HTML Code A7 algorithm can be
enhanced by combing it with A1 and/or A2. Table 8 explains
the results of combining one algorithm with another
algorithm and the effect on the hidden ratio capacity and the
file size. It is notable that combing one algorithm with
another will increase the hidden ratio capacity at a risk of
increasing the file size. The only case that the file size does
not increase is when A1 and A2 are combined. Here, either
A1 or A2 did not increase the carrier file size. These
algorithms are only based on changing the points or the
Diacritics positions.
FIGURE 2. HIDDEN RATIO [BIT/KBYTE]
Figure 2 illustrates and compares the hidden ratio of eight
proposed algorithms and Line and Word Shift algorithms
introduced in literature. Equation 1, stated below, represents
the hidden ratio capacity (HR). Based on the simulation
results, A8 achieves the highest hidden ratio range. A6
achieves the second highest hidden ratio range. A7
represents a constant hidden ratio range. The average range
of the hidden ratio of A1 = 104 bits per Kbyte (KB); A2 =
144b/KB; A3 = 153b/KB; A4 =355b/KB; A5 = 153 b/KB;
A6 = 626b/KB; A7 = 127 b/KB; and A8 = 799 b/KB.
1
FIGURE 3. FILE SIZE EFFECT AFTER INSERTING SECRET DATA
Figure 3 represents and compares the file size of the Stego
objects that increase the hidden ratio capacity of eight
proposed algorithms with Line and Word Shift algorithms.
Based on the simulation results for the hidden ratio range,
there is no change in the carrier file size when each A1, A2,
and A7 is applied. On other hand, the highest file increase
ratio appears in A8. A6 offers the second highest file size
increase.
0
200
400
600
800
1000
A1
A2
A3
A4
A5
A6
A7
A8
Line Shift
Word Shift
Aljazeera.net
BBC.COM
CNN
Alhayat
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
A1
A2
A3
A4
A5
A6
A7
A8
Line Shift
Word Shift
Aljazeera.net
BBC.COM
CNN
Alhayat
IV. MERGING ALGORITHMS
In this section, we discuss the possibility of merging more
than one algorithm. The proposed algorithms can be merged
collectively to further improve the hidden ratio range of the
carrier file. For example, A1 can be combined with either
A2, A3, A4, A5, A6, A7 or A8. Figure 4 illustrates the
hidden ratio output when A1 is combined with other
algorithms. Figure 5 represents the change in the carrier file
size, once other algorithms have been merged with A1. The
merging process will be done in sequence. For example, the
user can select A1 algorithm. The carrier file will be updated
based on the secret data, and then the user can get the output
file of the A1 algorithm and select the second suggested
algorithm.
FIGURE 4.HIDDEN RATIO MERGED WITH MULTIPOINT
FIGURE 5. FILE SIZE CHANGE MERGED WITH MULTIPOINT
Based on the simulation results, the best outcome will
depend on the individual user requirements. If the file size is
not allowed to change, then the user will select A1, A2, or
A7. Otherwise, if the carrier file size is allowed to increase,
and large amounts of secret data are being embedded in the
carrier file, then A8 will be the best algorithm choice. One
distinctive merging algorithm scenario is (A1, A2, A4, A7)
Multipoint, Diacritics, KVA, and HTML Code. This
scenario provides a highly hidden ratio while the size of the
carrier file will minimally increase, as shown in Table 6. The
results show that the merging scenario is very consistent
among different Websites in terms of hidden ratio capacity
and the file size change.
TABLE 6 . MERGED SIMU LATION RESULTS OF A1, A2, A4, A7
Webpage Hidden Ratio b/KB File size Change
Aljazeera 723 2%
CNN 709 1%
BBC 717 2%
Alhayat 771 2%
V. CONCLUSION
In this paper, we have discussed and compared the
performance of eight new text Steganography Algorithms.
Multipoint, ZKA, Diacritics, KVA, ZWC, Remarks, HTML
code, and MS Word Symbols Algorithms. These algorithms
provide additional and necessary protection of confidential
data. KVA, ZKA, Multipoint, and Diacritics Algorithms are
applied to specific Unicode based languages such as, Arabic,
Persian, Pashto, and Urdu. Hence, these particular
algorithms could potentially benefit, approximately, a
population of two billion people. Whereas, ZWC Remarks,
HTML code, and MS Word Symbols Algorithms are
language independent, and can be used to pass sensitive
information by inserting specific symbols.
We have presented and discussed the possibility and
effect of merging several algorithms together. The choice
between different merging scenarios depends on the user’s
constraints. The scenarios we provided work well in
situations where there is either a constraint or no constraint
placed on the entire file. Furthermore, we presented tradeoff
scenarios between the hidden ratio and the increase of the
file size.
The presented algorithms can be used to establish safe
communication, privacy enhancement, secure data sharing,
and additional protection of copyrighted products. For
example, our algorithms can be applied to multimedia and
publishing products. By using any of the eight referenced
algorithms or merging among them, users will have a highly,
secured data system in place to prevent and curtail hackers.
0
200
400
600
800
1000
1200
A1
and
A2
A1
and
A3
A1
and
A4
A1
and
A5
A1
and
A6
A1
and
A7
A1
and
A8
Hidden Ratio/ Merge
with Multipoint
Aljazeera.net
BBC.COM
CNN
Alhayat
0%
50%
100%
150%
200%
A1
and
A2
A1
and
A3
A1
and
A4
A1
and
A5
A1
and
A6
A1
and
A7
A1
and
A8
File size change /Merge
with Multipoint
Aljazeera.net
BBC.COM
CNN
Alhayat
REFERENCES
[1] M. Alfred, V. O. Paul, and V. Scott, Handbook of applied
cryptography: CRC press, 2010.
[2] J. Neil and J. Sushil, "Exploring steganography: Seeing the
unseen," Computer, vol. 31, pp. 26-34, 1998.
[3] V. Potdar and E. Chang, "Visibly Invisible: Ciphertext as a
Steganographic Carrier," in Proceedings of the 4th
International Network Conference (INC2004), 2004, pp. 385-
391.
[4] E. Martiri, A. Baxhaku, and E. Barolli, "Steganographic
Algorithm Injection in Image Information Systems Used in
Healthcare Organizations," in 2011 Third International
Conference on Intelligent Networking and Collaborative
Systems (INCoS), 2011, pp. 408-411.
[5] M. Zamani, A. Manaf, and R. Ahmad, "Knots of Substitution
Techniques of Audio Steganography," in The 2009
International Conference on Telecom Technology and
Applications, Singapore, 2009, pp. 415-419.
[6] I. Avcibas, N. Memon, and B. Sankur, "Steganalysis using
image quality metrics," IEEE Transactions on Image
Processing, vol. 12, pp. 221-229, 2003.
[7] A. Odeh, A. Alzubi, Q. B. Hani, and K. Elleithy,
"Steganography by multipoint Arabic letters," in Systems,
Applications and Technology Conference (LISAT), 2012 IEEE
Long Island, Farmingdale State College - State University of
New York, 2012, pp. 1-7.
[8] A. Odeh and K. Elleithy, "Steganography in Arabic Text
Using Full Diacritics Text," presented at the 25th International
Conference on Computers and Their Applications in Industry
and Engineering (CAINE-2012), New Orleans, Louisiana,
USA, 2012.
[9] A. Odeh and K. Elleithy, "Steganography in Arabic Text
Using Zero Width and Kashidha Letters," International
Journal of Computer Science & Information Technology
(IJCSIT), vol. 4, 2012.
[10] A. Odeh, K. Elleithy, and M. Faezipour, "Steganography in
Arabic text using Kashida variation algorithm (KVA)," in
Systems, Applications and Technology Conference (LISAT),
2013 IEEE Long Island, 2013, pp. 1-6.
[11] A. Odeh and K. Elleithy, "Steganography in Text by Merge
ZWC and Space Character," in 28th International Conference
on Computers and Their Applications, CATA-2013, Honolulu,
Hawaii, USA, 2013, pp. 1-7.
[12] A. Odeh, K. Elleithy, and M. Faezipour, "Text Steganography
Using Language Remarks," in Prcoceeding of the American
Society of Engineering Education, pp. 1-7, 2013.
[13] A. Odeh, K. Elleithy, M. Faezipour, and E. Abdelfattah,
"Novel Steganography over HTML Code," in New Trends in
Networking, Computing, Informatics, Systems Sciences, and
Engineering, T. Sobh and E. Khaled, Eds., ed: Springer, 2014.
[14] A. Odeh, K. Elleithy, and M. Faezipour, "Steganography in
text by using MS word symbols," in 2014 Zone 1 Conference
of the American Society for Engineering Education (ASEE
Zone 1), CT, USA, 2014, pp. 1-5.
TABLE 7. COMPARISON BETWEEN EIGHT DIFFERENT ALGORITHMS
Algorithm Applicable Language General Categories Technique File size Ability to combine with other
Algorithm(s)
A1:Multipoint Unicode(Arabic, Persian, Urdu) Substitution Linguistic No Effect (A2,A3,A4,A5,A6)
A2:Diacritics Unicode(Arabic, Persian, Urdu) Substitution Linguistic No Effect (A1,A3,A4,A5,A6)
A3:ZKS Unicode(Arabic, Persian, Urdu) Injection Linguistic Increase Can be combined with (A1,A2)
A4:KVA Unicode(Arabic, Persian, Urdu) Injection Linguistic Increase Can be combined with (A2)
A5:ZWC Language Independent Injection Format Increase Can be combined with (A1,A2)
A6:Remarks Language Independent Injection Format Increase Can be combined with (A1,A2)
A7: HTML Code Language Independent Injection Format No Effect Can be combined with (A1,A2)
A8: MS Word Language Independent Injection Format Increase Can be combined with (A1,A2)
TABLE 8. OUTPUT OF THE COMBINING PROCESS BETWEEN PROPOSED ALGORITHMS
Algorithm Applicable Language General Categories Technique Hidden Capacity File Size
A1(A3,A4,A5,A6) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase
A2(A3,A4,A5,A6) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase
(A1,A2) Unicode(Arabic, Persian,
Urdu) Injection Linguistic
Not Effect No change
A3 (A1,A2) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase
A4 (A2) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase
A5(A1,A2) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase
A6 (A1,A2) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase
A7(A1,A2) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format No Effect Increase
A8(A1,A2) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase