Conference PaperPDF Available

Highly Efficient Novel Text Steganography Algorithms

Authors:

Abstract and Figures

This paper investigates eight novel Steganography algorithms employing text file as a carrier file. The proposed model hides secret data in the text file by manipulating the font format or inserting special symbols in the text file. Furthermore, the suggested algorithms can be applied to both Unicode and ASCII code languages, regardless of the text file format. In addition, a merging capability among the techniques is introduced, which allows alternatives for users based on the system requirements. The proposed algorithms achieve a high degree of optimized Steganography attributes such as hidden ratio, robustness, and transparency.
Content may be subject to copyright.
Ammar Odeh, Khaled Elleithy, Miad Faezipour
Department of Computer Science and Engineering
University of Bridgeport
Bridgeport, CT
aodeh@my.bridgeport.edu, elleithy@bridgeport.edu,
mfaezipo@bridgeport.edu
Eman Abdelfattah
School of Engineering & Computing Sciences
Texas A&M University
Corpus Christi, TX
Eman.Abdelfattah@tamucc.edu
Abstract—This paper investigates eight novel Steganography
algorithms employing text file as a carrier file. The proposed
model hides secret data in the text file by manipulating the font
format or inserting special symbols in the text file.
Furthermore, the suggested algorithms can be applied to both
Unicode and ASCII code languages, regardless of the text file
format. In addition, a merging capability among the techniques
is introduced, which allows alternatives for users based on the
system requirements. The proposed algorithms achieve a high
degree of optimized Steganography attributes such as hidden
ratio, robustness, and transparency.
Index Terms HTML, Kashida, Multipoint Letter, Left
Remark, Right Remark, Steganography, Zero Width.
I. INTRODUCTION
Different strategies are used to protect sensitive data
during transmission over unsecure channels. Some
algorithms suggest changing plain text into cipher text,
which is called Cryptography [1]. On the other hand, some
algorithms are presented to protect secret information by
hiding its existence [2]. Steganography is a security
mechanism that is used to hide data inside a carrier file such
as image, sound, video, or text [3]. Secret information can be
inserted inside a carrier file in different strategies, where
each one of them has advantages and drawbacks.
The first steganography technique is injection where the
secret data is injected inside the carrier file, which increases
the carrier file size, and sometimes changes the carrier file
format [4]. The second suggested technique is substitution
where sensitive information is replaced by other data from
the carrier file [5]. The substitution method searches for bits
that have the lowest effect in the carrier file to apply the
exchange operation on them. The main advantage of this
approach is that the Stego object size is the same as the
carrier file. In addition, this approach avoids the possibility
of attacker’s suspicions. In all Steganography approaches,
the main idea is to hide the data inside the carrier file and
then to place the Stego file in some transport media. After
the data is transmitted, a Stego-analyzer might analyze the
data if there is any suspicion about the carrier file [6].
II. TECHNICAL SOLUTION AND ALGORITHMS DETAILS
This section presents eight new text Steganography
algorithms: (1) Multipoint, (2) ZKA, (3) Diacritics, (4)
KVA, (5) ZWC, and (6) Remarks (7) HTML code (8) MS
Word symbols. The proposed algorithms enhance the hidden
capacity ratio, the system’s transparency and robustness. The
diversity of the algorithms enables the users to pass their
sensitive data in an authenticated and secure manner. Each
algorithm has a different strategy that allows the user to hide
the data inside the text file. Some of these algorithms
employ the text’s font format to embed. Other algorithms
use symbol insertion technique to pass sensitive data over
the public channels. The presented algorithms can be
classified into two categories (1) Unicode languages and (2)
Multiple languages.
1. Multipoint Algorithm (A1)
Multipoint algorithm is presented and applied in Unicode
language (Arabic, Persian). These languages have some
letters called multipoint shaped letters, which contain more
than one point, unlike the English shaped letters (i, j).
Vertically or horizontally shifting techniques can be
employed to hide two bits of information in each character.
The proposed algorithm offers a highly hidden capacity ratio
compared to other algorithms. Each multipoint letter can
hide 2 bits, whereas, other algorithms can hide only one bit
per letter. The proposed algorithm enhances robustness by
changing the Stego-object to an image or a PDF file in order
to avoid the retype problem. In addition, the introduced
algorithm can be applied into other languages such as Pashto
and Urdu [7].
2. Diacritics Algorithm (A2)
Diacritics Algorithm is used to hide data inside Arabic
text [8]. This proposed algorithm uses Arabic diacritics
which are language characteristics represented by small
vowel letters. Diacritics are an optional property for any
Arabic text, and are not popularly used. Most of the Arabic
letters need diacritics to correct word pronunciation. The
proposed algorithm employs diacritics to hide two secret bits
from each diacritic, in order to offer a high hidden ratio
compared to other algorithms. The main drawback of the
Highly Efficient Novel Text Steganography
Algorithms
diacritics algorithm is that the use of diacritics is uncommon.
Thus, hackers might be more intrigued to analyze the file.
Table 1 shows a sample cover object before and after
inserting the secret data.
TABLE 1. DIACR ITICS ALGORITHM SIMULATION OUTPUT
3. Zero Width Character and Kashida Algorithm (ZKA)
(A3)
Zero Width character and Kashida are introduced through
the Unicode language (Arabic) by using special characters to
justify sentences. Kashida and Zero Width character hide
two bits between connected letters. In order to avoid any
attackers’ suspicion, a randomization algorithm utilizing
ZKA has been applied. By using this process, each message
applies different strategies to conceal secret messages. The
proposed algorithm can be extended to other Unicode
languages. Two weaknesses of ZKA are: (1) the retyping
problem, and (2) the clear format problem. These
deficiencies reduce the algorithm robustness [9]. Table 2
illustrates the changes in the carrier text and secret data
when applying the ZKA algorithm.
TABLE 2. ZKA SIMULATION OUTPUT
Cover
Object
ﺖﻧﺎآ ﻲﺘﻟا ﺔﻴﻧﺎﻣوﺮﻟا ﺔﻳرﻮﻃاﺮﺒﻣﻻاو ﺮﺼﻣ ءاﺬﻏ ﺔَﻠﺳ ﻲﻟﺎﻤﺸﻟا ﺮﺼﻣ ﻞﺣﺎﺳ نﺎآ
مﻼﺳﻹا ﻞﺒﻗ ﺮﺼﻣ ﻞﺘﺤﺗ . ﻰﻓ ﻞﻴﻨﻟا ﻩﺎﻴﻣ ﻰﻠﻋ نﻮﻳﺮﺼﻤﻟا ﺪﻤﺘﻋا ،ﻲﻟﺎﻌﻟا ﺪﺴﻟا ءﺎﻨﺑ ﻞﺒﻘﻓ
ﺔﻋارز ﻲﻓ رﺎﻄﻣﻷا ﻩﺎﻴﻣ ﻰﻠﻋ اوﺪﻤﺘﻋا ﺎﻤآ ﺎﺘﻟﺪﻟاو يداﻮﻟا ﻲﻓ ﺔﻴﻔﻴﺼﻟا تﺎﻋارﺰﻟا
ﺢﻤﻘﻟا
Stego
Object
ﻲﺘـﻟا ﺔـﻴـﻧﺎـﻣوﺮﻟا ﺔﻳرﻮـﻃاﺮﺒﻣﺎـﻟاو ﺮـﺼـﻣ ءاﺬﻏ ﺔَـﻠـﺳ ﻲﻟﺎـﻤﺸـﻟا ﺮﺼـﻣ ﻞﺣﺎﺳ نﺎـآ
مﺎـﻠـﺳﻹا ﻞـﺒـﻗ ﺮـﺼـﻣ ﻞـﺘـﺤـﺗ ﺖـﻧﺎآ . نﻮـﻳﺮﺼﻤـﻟا ﺪﻤﺘﻋا ،ﻲـﻟﺎـﻌﻟا ﺪـﺴـﻟا ءﺎﻨـﺑ ﻞﺒﻘـﻓ
اوﺪـﻤـﺘـﻋا ﺎـﻤآ ﺎـﺘﻟﺪﻟاو يداﻮـﻟا ﻲـﻓ ﺔـﻴـﻔـﻴـﺼﻟا تﺎﻋارﺰﻟا ﻰـﻓ ﻞﻴـﻨـﻟا ﻩﺎـﻴـﻣ ﻰـﻠـﻋ
ﻩﺎﻴـﻣ ﻰـﻠـﻋﺢـﻤـﻘﻟا ﺔﻋارز ﻲﻓ رﺎـﻄﻣﻷا
Secret
Data
100100100000111010110001011110110001111111001110111
110011111010000010010111000110000000101101000111111
1000000101001
4. Kashida Variation Algorithm (KVA) (A4)
Kashida variation algorithm (KVA) is presented in
Kashida according to the space position in order to hide two
bits between words. KVA is a simple application that,
typically, uses justified text over any word editors. The
proposed KVA introduces four different Kashida
information hidden scenarios. A specific scenario is applied
in each fragment to embed the secret data inside carrier file.
Furthermore, an aggregation is applied over message blocks
to reassemble the message that contains hidden information.
The benefits of the variation process within KVA is the
creation of an extra complex dimension, enhancing
robustness, and improving transparency [10]. Table 3
illustrates the cover file effect after embedding the secret
data. As demonstrated in Table 3, the semantics and the
syntax of the carrier file did not change.
TABLE 3. KVA SIMULATION OUTPUT
Cover
Object ﺖﻧﺎآ ﻲﺘﻟا ﺔﻴﻧﺎﻣوﺮﻟا ﺔﻳرﻮﻃاﺮﺒﻣﻻاو ﺮﺼﻣ ءاﺬﻏ ﺔَﻠﺳ ﻲﻟﺎﻤﺸﻟا ﺮﺼﻣ ﻞﺣﺎﺳ نﺎآ
مﻼﺳﻹا ﻞﺒﻗ ﺮﺼﻣ ﻞﺘﺤﺗ . ﻞﺒﻘﻓ ﻰﻓ ﻞﻴﻨﻟا ﻩﺎﻴﻣ ﻰﻠﻋ نﻮﻳﺮﺼﻤﻟا ﺪﻤﺘﻋا ،ﻲﻟﺎﻌﻟا ﺪﺴﻟا ءﺎﻨﺑ
ﺔﻋارز ﻲﻓ رﺎﻄﻣﻷا ﻩﺎﻴﻣ ﻰﻠﻋ اوﺪﻤﺘﻋا ﺎﻤآ ﺎﺘﻟﺪﻟاو يداﻮﻟا ﻲﻓ ﺔﻴﻔﻴﺼﻟا تﺎﻋارﺰﻟا
ﺢﻤﻘﻟا
Stego
Object ﻲﺘـﻟا ﺔـﻴـﻧﺎـﻣوﺮﻟا ﺔﻳرﻮـﻃاﺮﺒﻣﺎـﻟاو ﺮـﺼـﻣ ءاﺬﻏ ﺔَـﻠـﺳ ﻲﻟﺎـﻤﺸـﻟا ﺮﺼـﻣ ﻞﺣﺎﺳ نﺎـآ
ﻞـﺒـﻗ ﺮـﺼـﻣ ﻞـﺘـﺤـﺗ ﺖـﻧﺎآمﺎـﻠـﺳﻹا . نﻮـﻳﺮﺼﻤـﻟا ﺪﻤﺘﻋا ،ﻲـﻟﺎـﻌﻟا ﺪـﺴـﻟا ءﺎﻨـﺑ ﻞﺒﻘـﻓ
اوﺪـﻤـﺘـﻋا ﺎـﻤآ ﺎـﺘﻟﺪﻟاو يداﻮـﻟا ﻲـﻓ ﺔـﻴـﻔـﻴـﺼﻟا تﺎﻋارﺰﻟا ﻰـﻓ ﻞﻴـﻨـﻟا ﻩﺎـﻴـﻣ ﻰـﻠـﻋ
ﺢـﻤـﻘﻟا ﺔﻋارز ﻲﻓ رﺎـﻄﻣﻷا ﻩﺎﻴـﻣ ﻰـﻠـﻋ
Secret
Data 100100100000111010110001011110110001111111001110111
110011111010000010010111000110000000101101000111111
1000000101001
5. Zero Width Character Algorithm (ZWC) (A5)
Zero Width Character algorithm merges a space and Zero
Width Character to hide two bits between words inside any
document[11]. The main idea of ZWC is to introduce a
variation between two letters in order to hide data. An
advantage of ZWC is its ability to be applied to any
language. A disadvantage of ZWC is the addition of an extra
space that can be checked by most of the word editors. In
addition, retyping can destroy the robustness of the
algorithm. Table 4 exemplifies the Stego Object and the
secret data position.
TABLE 4. ZWC ALGORITHM OUTPUT
6. Remarks Steganography (A6)
Remarks Steganography Algorithm uses a text file as a
carrier to hide the data inside it[12]. The main goal of the
algorithm is to hide data inside a word file without any
changes in the file format. In Remarks Algorithm the Right-
to-Left Remark (U200F) symbol “ ” and the Left-to-Right
Remark (U200E) symbol “ ” are used to hide the bits inside
the carrier file. This method does not change the format of
the file and can also be applied to different languages,
regardless of UNICODE or ASCII coding. This method is
easily applied to the Microsoft Office Word files to hide
secret data.
7. Novel Steganography over HTML Code (A7)
In the HTML Steganography technique [13],
Cryptography and Steganography techniques are used to
Cover
Object ﱠﻞ َﺘ ْﺤ َﺗ ْﺖ َﻧ ﺎ َآ ﻲِﺘﱠﻟا َﺔ ﱠﻴ ِﻧ ﺎ ﻣ و ﱡﺮ ﻟ ا ﺔﻳرﻮﻃاﺮﺒﻣﻻاو ٍﺮ ْﺼ ِﻣ ُء ا َﺬ ِﻏ ﺔَﻠﺳ ِﻲ ِﻟ ﺎ َﻤ ﱠﺸ ﻟ ا ِﺮ ْﺼ ِﻣ ُﻞ ِﺣ ﺎ َﺳ َن ﺎ َآ
ِم ﻼ ْﺳ ِﺈ ْﻟ ا َﻞ ْ ﺒ َﻗ ًﺮ ْﺼ ِﻣ . ﻰﻓ ِﻞ ْﻴ ﱠﻨ ﻟ ا ِﻩ ﺎ َﻴ ِﻣ ﻰَﻠَ َن ﻮ ﱡﻳ ِﺮ ْ ﺼ ِﻤ ْﻟ ا َﺪ َﻤ َﺘ ْﻋ ِا ، ﻲِﻟﺎَﻌْﻟا ﱢﺪ ﱠﺴ ﻟ ا ُء ﺎ َﻨ ِﺑ َﻞ ْﺒ َﻘ َﻓ
ِﺢ ْﻤ َﻘ ْ ﻟ ا ِﺔ َﻋ ا ر ِز ﻲِ ِرﺎَﻄْﻣَﺄْﻟا ِﻩ ﺎ َﻴ ِﻣ ﻰَﻠَ ا و ُﺪ َﻤ َﺘ ْﻋ ِا ﺎَﻤَآ ﺎﺘْﻟﱢﺪﻟاَو يِداَﻮْﻟا ﻲِ ُﺔ ﱠﻴ ِﻔ ْ ﻴ ﱠﺼ ﻟ ا ُتﺎَﻋارﱢﺰﻟا
Stego
Object َﻤ ﱠﺸ ﻟ ا ِﺮ ْ ﺼ ِﻣ ُﻞ ِﺣ ﺎ َﺳ َن ﺎ َآ ﱠﻞ َﺘ ْﺤ َﺗ ْﺖ َﻧ ﺎ َآ ﻲِﺘﱠﻟا َﺔ ﱠﻴ ِﻧ ﺎ ﻣ و ﱡﺮ ﻟ ا ﺔﻳرﻮﻃاﺮﺒﻣﻻاو ٍﺮ ْﺼ ِﻣ ُء ا َﺬ ِﻏ ﺔَﻠﺳ ِﻲ ِﻟ ﺎ
ِم ﻼ ْﺳ ِﺈ ْﻟ ا َﻞ ْ ﺒ َﻗ ًﺮ ْﺼ ِﻣ . ﻰﻓ ِﻞ ْﻴ ﱠﻨ ﻟ ا ِﻩ ﺎ َﻴ ِﻣ ﻰَﻠَ َن ﻮ ﱡﻳ ِﺮ ْ ﺼ ِﻤ ْﻟ ا َﺪ َﻤ َﺘ ْﻋ ِا ، ﻲِﻟﺎَﻌْﻟا ﱢﺪ ﱠﺴ ﻟ ا ُء ﺎ َﻨ ِﺑ َﻞ ْﺒ َﻘ َﻓ
ﺎﺘْﻟﱢﺪﻟاَو يِداَﻮْﻟا ﻲِ ُﺔ ﱠﻴ ِﻔ ْ ﻴ ﱠﺼ ﻟ ا ُتﺎَﻋارﱢﺰﻟا ِﺢ ْﻤ َﻘ ْ ﻟ ا ِﺔ َﻋ ا ر ِز ﻲِ ِرﺎَﻄْﻣَﺄْﻟا ِﻩ ﺎ َﻴ ِﻣ ﻰَﻠَ ا و ُﺪ َﻤ َﺘ ْﻋ ِا ﺎَﻤَآ
Secret
Bits 1001001000001110101100010111101100011111110011101111
1001111101000001001011100011000000010110100011111110
00000101001
Cover
Object ﺖﻧﺎآ ﻲﺘﻟا ﺔﻴﻧﺎﻣوﺮﻟا ﺔﻳرﻮﻃاﺮﺒﻣﻻاو ﺮﺼﻣ ءاﺬﻏ ﺔَﻠﺳ ﻲﻟﺎﻤﺸﻟا ﺮﺼﻣ ﻞﺣﺎﺳ نﺎآ
مﻼﺳﻹا ﻞﺒﻗ ﺮﺼﻣ ﻞﺘﺤﺗ . ﻰﻓ ﻞﻴﻨﻟا ﻩﺎﻴﻣ ﻰﻠﻋ نﻮﻳﺮﺼﻤﻟا ﺪﻤﺘﻋا ،ﻲﻟﺎﻌﻟا ﺪﺴﻟا ءﺎﻨﺑ ﻞﺒﻘﻓ
ﺔﻋارز ﻲﻓ رﺎﻄﻣﻷا ﻩﺎﻴﻣ ﻰﻠﻋ اوﺪﻤﺘﻋا ﺎﻤآ ﺘﻟﺪﻟاو يداﻮﻟا ﻲﻓ ﺔﻴﻔﻴﺼﻟا تﺎﻋارﺰﻟا
ﺢﻤﻘﻟا.
Stego
Object ﻞﺣﺎﺳ نﺎـآ ﻲﺘـﻟا ﺔـﻴـﻧﺎـﻣوﺮﻟا ﺔﻳرﻮـﻃاﺮﺒﻣﺎـﻟاو ﺮـﺼـﻣ ءاﺬﻏ ﺔَـﻠـﺳ ﻲﻟﺎـﻤﺸـﻟا ﺮﺼـﻣ
مﺎـﻠـﺳﻹا ﻞـﺒـﻗ ﺮـﺼـﻣ ﻞـﺘـﺤـﺗ ﺖـﻧﺎآ . نﻮـﻳﺮﺼﻤـﻟا ﺪﻤﺘﻋا ،ﻲـﻟﺎـﻌﻟا ﺪـﺴـﻟا ءﺎﻨـﺑ ﻞﺒﻘـﻓ
اوﺪـﻤـﺘـﻋا ﺎـﻤآ ﺎـﺘﻟﺪﻟاو يداﻮـﻟا ﻲـﻓ ﺔـﻴـﻔـﻴـﺼﻟا تﺎﻋارﺰﻟا ﻰـﻓ ﻞﻴـﻨـﻟا ﻩﺎـﻴـﻣ ﻰـﻠـﻋ
رﺎـﻄﻣﻷا ﻩﺎﻴـﻣ ﻰـﻠـﻋﺢـﻤـﻘﻟا ﺔﻋارز
Secret
Bits
Positions
pass secure information. Webpages are used as the carrier
for secret data while they are published over the Internet.
Authenticated users can access the hidden data. The
proposed algorithm consists of three main steps, where the
first and third steps represent inverse operations.
The Conceal operation consists of six stages:
1. The Statistical Stage consists of an array of 26
elements that count the characters' frequency. The
frequency array can be increased or decreased based
on a webpage’s language. Our experiments are based
on the English language.
2. The Character Representation Stage assigns “bits”
based on the usage frequency of the character. For
example, after the frequency array has been generated,
the lowest two frequently found characters could be
represented by one bit. If two characters have the
same frequency, the character order specifies which
one is zero. For example, if letter ‘X’ appears 10 times
and a letter ‘Z’ appears 6 times, then ‘Z’ is
represented as ‘0’, and ‘X’ as ‘1’. Moreover, if both
letters have the same occurrence number, then ‘X’ is
represented as ‘0’ and ‘Z’ is represented as ‘1’.
Similarly, the next four characters can be represented
by two bits.
3. The Embedding Stage occurs when the secret bits are
embedded after the character representation equals ‘8’
bits. In other words, if the first character
representation is ‘0’ and the secret bits are “0111011”,
the code will be “00111011”.
4. The Encryption Stage: consists of three simple binary
operations. First, the binary representation is
complemented. Then, “exclusive OR” (XOR) is
performed with the binary key. In order to produce the
output of XOR gate, shift left must be applied by one
bit and re-entered as XOR input. In addition, the
binary key code creation depends on the webpage
index where each page has a rear index. This
operation is repeated twice as shown in Figure 1.
FIGURE 1. ENCRYPTION GATES
A numerical example is presented below where the input
is (C) 01000011, and the key is 10001100:
Step 1:- Binary representation for C =>01000011
Step 2: - 1’s complement of C => 10111100
Step 3:- 10001100 XOR 10111100=>00110000
Step 4:-Shift left 00110000 => 01100000
Step 5:- 10001100 XOR 01100000=>11101100
Step 6:- Shift left 11101100=>11011001
5. The Decoding Stage converts the binary code to ASCII
code. Here, the numerical form is changed into text. In
the example, shown above “11011001” is decoded to
(Ù).
6. The Insertion Stage occurs when the text from the
decoding stage is produced. Then, the text is placed into
the webpage code as comment. The comments do not
appear in the webpage’s output view.
To reveal the secret message the following procedural
stages are applied:-
1. The Statistical Stage relies on 26 characters, similar to
those found in the “Conceal Operation” statistical stage.
2. The Reading Comments Stage allows the user to read
comments from the webpage.
3. The Encoding Stage converts any comment appearing in
the text into binary representation.
4. The Decryption Stage follows the same binary process
that exists in the “Conceal Operation” of the encryption
stage.
The Reveal Character Code Stage is performed by using
the frequency array created in the statistical stage, and
comparing it with the binary output of the decryption stage.
The embedded information is obtained by removing the
character representation.
8. Steganography in Text by Using MS Word Symbols (A8)
The MS Word Symbols Algorithm hides data inside a
Word file without altering the carrier file properties such as
file size, content and format[14]. The MS Word Symbols
Algorithm employs some non-printable symbols to hide four
bits between letters. This process improves the hidden
capacity ratio compared to other algorithms. Moreover, no
modification in the Word format file or letter shapes would
be made. Thus, the suggested algorithm avoids raising any
hackers’ suspicions.
Table 5 represents some of the hidden codes. For
example, if we insert all four-table variation symbols after
each letter, then the passing bits code is 0000. This insertion
enables us to hide four bits of secret data. The four symbols
applied are Right remark (200E), Left remark (200F), Zero
width joiner (200D), and Zero width non-joiner (200C). In
this technique, different variations can be used to represent
hidden bits for a total of 16 different codes.
Three inputs are used in the hiding process: the Stego key,
carrier file, and hidden data. The main purpose of Stego key
is to change the symbols’ bit representation. For example,
‘1’ represents a bit’s absence, and ‘0’ represents a bit’s
presence. In the next step, a symbols’ table is created
depending on the outcome of the Stego key.
TABLE 5. SAMPLE OF HIDDEN BITS BY USING WORD SYMBOLS
Right
Remark
Left
Remark
ZWJ ZWNJ Hidden
code
X X X X 0000
X X X 0001
X X 0101
X X 0011
X 0111
1111
X 1101
III. PERFORMANCE EVALUATION
Table 7 shows the proposed algorithms and their
classifications. Multipoint, Diacritics, ZKA, and KVA
algorithms are language dependent. These algorithms (A1,
A2, A3, and A4) have the potential to hide secret data inside
Unicode Languages such as Arabic, Persian, and Urdu.
Multipoint and Diacritics algorithms hide data by
substituting one letter with another modified shaped letter.
This approach works well, since there is no standard position
or distance between the letter and the points. Also, this
approach has proven that there is no standard position
between the letter and the Diacritics. The main advantage of
the A1 and A2 algorithms is the file’s size stability.
Moreover, A1 and A2 have the ability to combine with A3,
A4, A5, A6, A7, and A8 in order to improve the hidden ratio
capacity. A3 and A4 hide the data inside a carrier object by
inserting non-printed symbols. The main drawback of A3
and A4 is that the carrier file’s size increases with the
process of embedding the secret message.
A5, A6, and A8 hide the data by embedding non-printed
symbols. In addition, A5, A6, and A8 provide a highly
hidden ratio capacity. Moreover, these algorithms have the
ability to be combined with A1 and A2 in order to improve
the hidden ratio capacity. One disadvantage associated with
A5, A6, and A8 is that the carrier file size increases through
the insertion of non-printed symbols. HTML Code A7
algorithm hides the encrypted secret data as comments
inside the HTML Webpage. Limited hidden bits can be
inserted based on the webpage’s original language. The
hidden capacity of HTML Code A7 algorithm can be
enhanced by combing it with A1 and/or A2. Table 8 explains
the results of combining one algorithm with another
algorithm and the effect on the hidden ratio capacity and the
file size. It is notable that combing one algorithm with
another will increase the hidden ratio capacity at a risk of
increasing the file size. The only case that the file size does
not increase is when A1 and A2 are combined. Here, either
A1 or A2 did not increase the carrier file size. These
algorithms are only based on changing the points or the
Diacritics positions.
FIGURE 2. HIDDEN RATIO [BIT/KBYTE]
Figure 2 illustrates and compares the hidden ratio of eight
proposed algorithms and Line and Word Shift algorithms
introduced in literature. Equation 1, stated below, represents
the hidden ratio capacity (HR). Based on the simulation
results, A8 achieves the highest hidden ratio range. A6
achieves the second highest hidden ratio range. A7
represents a constant hidden ratio range. The average range
of the hidden ratio of A1 = 104 bits per Kbyte (KB); A2 =
144b/KB; A3 = 153b/KB; A4 =355b/KB; A5 = 153 b/KB;
A6 = 626b/KB; A7 = 127 b/KB; and A8 = 799 b/KB.
    
    1
FIGURE 3. FILE SIZE EFFECT AFTER INSERTING SECRET DATA
Figure 3 represents and compares the file size of the Stego
objects that increase the hidden ratio capacity of eight
proposed algorithms with Line and Word Shift algorithms.
Based on the simulation results for the hidden ratio range,
there is no change in the carrier file size when each A1, A2,
and A7 is applied. On other hand, the highest file increase
ratio appears in A8. A6 offers the second highest file size
increase.
0
200
400
600
800
1000
A1
A2
A3
A4
A5
A6
A7
A8
Line Shift
Word Shift
Aljazeera.net
BBC.COM
CNN
Alhayat
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
A1
A2
A3
A4
A5
A6
A7
A8
Line Shift
Word Shift
Aljazeera.net
BBC.COM
CNN
Alhayat
IV. MERGING ALGORITHMS
In this section, we discuss the possibility of merging more
than one algorithm. The proposed algorithms can be merged
collectively to further improve the hidden ratio range of the
carrier file. For example, A1 can be combined with either
A2, A3, A4, A5, A6, A7 or A8. Figure 4 illustrates the
hidden ratio output when A1 is combined with other
algorithms. Figure 5 represents the change in the carrier file
size, once other algorithms have been merged with A1. The
merging process will be done in sequence. For example, the
user can select A1 algorithm. The carrier file will be updated
based on the secret data, and then the user can get the output
file of the A1 algorithm and select the second suggested
algorithm.
FIGURE 4.HIDDEN RATIO MERGED WITH MULTIPOINT
FIGURE 5. FILE SIZE CHANGE MERGED WITH MULTIPOINT
Based on the simulation results, the best outcome will
depend on the individual user requirements. If the file size is
not allowed to change, then the user will select A1, A2, or
A7. Otherwise, if the carrier file size is allowed to increase,
and large amounts of secret data are being embedded in the
carrier file, then A8 will be the best algorithm choice. One
distinctive merging algorithm scenario is (A1, A2, A4, A7)
Multipoint, Diacritics, KVA, and HTML Code. This
scenario provides a highly hidden ratio while the size of the
carrier file will minimally increase, as shown in Table 6. The
results show that the merging scenario is very consistent
among different Websites in terms of hidden ratio capacity
and the file size change.
TABLE 6 . MERGED SIMU LATION RESULTS OF A1, A2, A4, A7
Webpage Hidden Ratio b/KB File size Change
Aljazeera 723 2%
CNN 709 1%
BBC 717 2%
Alhayat 771 2%
V. CONCLUSION
In this paper, we have discussed and compared the
performance of eight new text Steganography Algorithms.
Multipoint, ZKA, Diacritics, KVA, ZWC, Remarks, HTML
code, and MS Word Symbols Algorithms. These algorithms
provide additional and necessary protection of confidential
data. KVA, ZKA, Multipoint, and Diacritics Algorithms are
applied to specific Unicode based languages such as, Arabic,
Persian, Pashto, and Urdu. Hence, these particular
algorithms could potentially benefit, approximately, a
population of two billion people. Whereas, ZWC Remarks,
HTML code, and MS Word Symbols Algorithms are
language independent, and can be used to pass sensitive
information by inserting specific symbols.
We have presented and discussed the possibility and
effect of merging several algorithms together. The choice
between different merging scenarios depends on the user’s
constraints. The scenarios we provided work well in
situations where there is either a constraint or no constraint
placed on the entire file. Furthermore, we presented tradeoff
scenarios between the hidden ratio and the increase of the
file size.
The presented algorithms can be used to establish safe
communication, privacy enhancement, secure data sharing,
and additional protection of copyrighted products. For
example, our algorithms can be applied to multimedia and
publishing products. By using any of the eight referenced
algorithms or merging among them, users will have a highly,
secured data system in place to prevent and curtail hackers.
0
200
400
600
800
1000
1200
A1
and
A2
A1
and
A3
A1
and
A4
A1
and
A5
A1
and
A6
A1
and
A7
A1
and
A8
Hidden Ratio/ Merge
with Multipoint
Aljazeera.net
BBC.COM
CNN
Alhayat
0%
50%
100%
150%
200%
A1
and
A2
A1
and
A3
A1
and
A4
A1
and
A5
A1
and
A6
A1
and
A7
A1
and
A8
File size change /Merge
with Multipoint
Aljazeera.net
BBC.COM
CNN
Alhayat
REFERENCES
[1] M. Alfred, V. O. Paul, and V. Scott, Handbook of applied
cryptography: CRC press, 2010.
[2] J. Neil and J. Sushil, "Exploring steganography: Seeing the
unseen," Computer, vol. 31, pp. 26-34, 1998.
[3] V. Potdar and E. Chang, "Visibly Invisible: Ciphertext as a
Steganographic Carrier," in Proceedings of the 4th
International Network Conference (INC2004), 2004, pp. 385-
391.
[4] E. Martiri, A. Baxhaku, and E. Barolli, "Steganographic
Algorithm Injection in Image Information Systems Used in
Healthcare Organizations," in 2011 Third International
Conference on Intelligent Networking and Collaborative
Systems (INCoS), 2011, pp. 408-411.
[5] M. Zamani, A. Manaf, and R. Ahmad, "Knots of Substitution
Techniques of Audio Steganography," in The 2009
International Conference on Telecom Technology and
Applications, Singapore, 2009, pp. 415-419.
[6] I. Avcibas, N. Memon, and B. Sankur, "Steganalysis using
image quality metrics," IEEE Transactions on Image
Processing, vol. 12, pp. 221-229, 2003.
[7] A. Odeh, A. Alzubi, Q. B. Hani, and K. Elleithy,
"Steganography by multipoint Arabic letters," in Systems,
Applications and Technology Conference (LISAT), 2012 IEEE
Long Island, Farmingdale State College - State University of
New York, 2012, pp. 1-7.
[8] A. Odeh and K. Elleithy, "Steganography in Arabic Text
Using Full Diacritics Text," presented at the 25th International
Conference on Computers and Their Applications in Industry
and Engineering (CAINE-2012), New Orleans, Louisiana,
USA, 2012.
[9] A. Odeh and K. Elleithy, "Steganography in Arabic Text
Using Zero Width and Kashidha Letters," International
Journal of Computer Science & Information Technology
(IJCSIT), vol. 4, 2012.
[10] A. Odeh, K. Elleithy, and M. Faezipour, "Steganography in
Arabic text using Kashida variation algorithm (KVA)," in
Systems, Applications and Technology Conference (LISAT),
2013 IEEE Long Island, 2013, pp. 1-6.
[11] A. Odeh and K. Elleithy, "Steganography in Text by Merge
ZWC and Space Character," in 28th International Conference
on Computers and Their Applications, CATA-2013, Honolulu,
Hawaii, USA, 2013, pp. 1-7.
[12] A. Odeh, K. Elleithy, and M. Faezipour, "Text Steganography
Using Language Remarks," in Prcoceeding of the American
Society of Engineering Education, pp. 1-7, 2013.
[13] A. Odeh, K. Elleithy, M. Faezipour, and E. Abdelfattah,
"Novel Steganography over HTML Code," in New Trends in
Networking, Computing, Informatics, Systems Sciences, and
Engineering, T. Sobh and E. Khaled, Eds., ed: Springer, 2014.
[14] A. Odeh, K. Elleithy, and M. Faezipour, "Steganography in
text by using MS word symbols," in 2014 Zone 1 Conference
of the American Society for Engineering Education (ASEE
Zone 1), CT, USA, 2014, pp. 1-5.
TABLE 7. COMPARISON BETWEEN EIGHT DIFFERENT ALGORITHMS
Algorithm Applicable Language General Categories Technique File size Ability to combine with other
Algorithm(s)
A1:Multipoint Unicode(Arabic, Persian, Urdu) Substitution Linguistic No Effect (A2,A3,A4,A5,A6)
A2:Diacritics Unicode(Arabic, Persian, Urdu) Substitution Linguistic No Effect (A1,A3,A4,A5,A6)
A3:ZKS Unicode(Arabic, Persian, Urdu) Injection Linguistic Increase Can be combined with (A1,A2)
A4:KVA Unicode(Arabic, Persian, Urdu) Injection Linguistic Increase Can be combined with (A2)
A5:ZWC Language Independent Injection Format Increase Can be combined with (A1,A2)
A6:Remarks Language Independent Injection Format Increase Can be combined with (A1,A2)
A7: HTML Code Language Independent Injection Format No Effect Can be combined with (A1,A2)
A8: MS Word Language Independent Injection Format Increase Can be combined with (A1,A2)
TABLE 8. OUTPUT OF THE COMBINING PROCESS BETWEEN PROPOSED ALGORITHMS
Algorithm Applicable Language General Categories Technique Hidden Capacity File Size
A1(A3,A4,A5,A6) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase
A2(A3,A4,A5,A6) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase
(A1,A2) Unicode(Arabic, Persian,
Urdu) Injection Linguistic
Not Effect No change
A3 (A1,A2) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase
A4 (A2) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase
A5(A1,A2) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase
A6 (A1,A2) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase
A7(A1,A2) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format No Effect Increase
A8(A1,A2) Unicode(Arabic, Persian,
Urdu) Substitution/Injection Linguistic/format Increase Increase
... During the last two decades, many text hiding algorithms have been introduced in terms of text steganography and text watermarking for covert communication [1,6,8,[9][10][11][12][13][14]20,31,36,39,51,91], copyright protection [3][4][5]7,18,[20][21][22][23][24][25][26][27][28][29]44,[49][50][51][52][53][54][55][56][57][58][59][60][61][62][63][64][65][66][67][68][72][73][74][75][87][88][89][90][91][92][98][99][100][101][102][103][104][105][106][107][108][109], copy control and authentication [31,57,60,74,78,[93][94][95][96][97][98]. ...
... During the last two decades, many text hiding algorithms have been introduced in terms of text steganography and text watermarking for covert communication [1,6,[8][9][10][11][12][13][14]20,31,36,39,51,91], copyright protection [3][4][5]7,18,[20][21][22][23][24][25][26][27][28][29]44,[49][50][51][52][53][54][55][56][57][58][59][60][61][62][63][64][65][66][67][68][72][73][74][75][87][88][89][90][91][92][98][99][100][101][102][103][104][105][106][107][108][109], copy control and authentication [31,57,60,74,78,[93][94][95][96][97][98]. ...
... Therefore, it affords a total of 1,114,112 possible symbols/characters in various formats such as numbers, letters, emoticons, and a vast number of current characters in different languages, i.e., the UTF-8 presents one byte for any ASCII character, which have the same code values in both ASCII and UTF-8, and up to four bytes for other symbols [1][2][3][4][5][6][7]. In the Unicode, there are special zero-width characters (ZWC) which are employed to provide specific entities such as Zero Width Joiner (ZWJ), e.g., ZWJ joins two supportable characters together in particular languages, POP directional, and Zero Width Non-Joiner (ZWNJ), etc. Practically, the ZWC characters do not have traces, widths or written symbol in digital texts [1][2][3][4][5][6][7][8]11,15,18,[25][26][27][28]33,34,[41][42][43][50][51][52][53][54][55][56][57][58][59][60][61][62][63][64][65][66][67][68][86][87][88][89][90][91][92][93][94][95][96][97][98][99][100]. Recently, many text hiding techniques that utilize social media, email, SMS, as communication channels have been introduced [1,6,8,11,20,36,37]. ...
Article
Full-text available
Abstract: Modern text hiding is an intelligent programming technique which embeds a secret message/watermark into a cover text message/file in a hidden way to protect confidential information. Recently, text hiding in the form of watermarking and steganography has found broad applications in, for instance, covert communication, copyright protection, content authentication, etc. In contrast to text hiding, text steganalysis is the process and science of identifying whether a given carrier text file/message has hidden information in it, and, if possible, extracting/detecting the embedded hidden information. This paper presents an overview of state of the art of the text hiding area, and provides a comparative analysis of recent techniques, especially those focused on marking structural characteristics of digital text message/file to hide secret bits. Also, we discuss different types of attacks and their effects to highlight the pros and cons of the recently introduced approaches. Finally, we recommend some directions and guidelines for future works.
... Steganography can be arranged in many ways. Such as, according to carrier file type can be classified as text, image, audio, movie, video, or protocol file used to embed secret data [4,5,6]. According to the kind of the key that used to hide information: pure Steganography, secret key steganography, and publickey steganography. ...
... According to the kind of the key that used to hide information: pure Steganography, secret key steganography, and publickey steganography. Also, based on embedding method, there are three techniques used to protect information in a cover object: insertion-based, substitution-based, and generation-based techniques [7,5]. ...
Article
Full-text available
Since computer utilization is expanding, for both social and trade ranges, secure communications through channels got to be an exceptionally critical issue. Information hiding away could be a strategy to get a secure communication medium and securing the data amid transmission. Text documents have very less redundant information as compared to the images and audio, therefore, text steganography is most challenging. This paper aims to improve "text steganography based on Unicode of characters in multilingual" by design new font with special properties for purposes of hiding data. Furthermore, this method based on making the same glyphs for the multiple codes, the Set of High-Frequency Letters called SHFL in the English language was chosen for the embedding process. The hiding method replaces the code of English symbol with other code that has the same glyph exactly. Two bits are hidden at once, utilizing glyph1 for hiding 00 and utilizing glyph2, glyph3, and glyph4 for hiding 01, 10, and 11. The improvement increases the steganography capacity, transparency and improves the security and robustness of the text stego file.
Article
Full-text available
This paper presents an approach to a robust watermark extraction from images containing text. Data extraction based on developed approach to robust watermark embedding into text data, characterizing by conversion invariance of text data into an image format. The comparative analysis of existing approaches of steganographic data embedding into text data is carried out, their advantages and disadvantages are determined. The choice of groups to steganographic data embedding methods based on text formatting is justified. As an embedding algorithm is determined approach based on interline space shifting. The block diagram and the description of the developed algorithm of data embedding into text data are given. An experimental estimation of the embedding capacity and perceptual invisibility of the developed data embedding approach was carried out. An approach to extract embedded information from images containing a robust watermark, based on the existing limitations, has been developed. The Radon transform is chosen as the basic extraction procedure of embedded information, allowing to extract values of the interline spacing. An approach based on Gaussian mixture model separating to isolate the values of the bits was chosen. The limits of the retrieval of embedded data have been experimentally established, and the robustness of the developed embedding approach to the implementation of various transformations has been estimated. The following parameters of robustness developed approach are defined: rotation of an image containing embedded data at any angle; scaling an image with a scaling factor not exceeding 1.5; conversion to any bitmap format; the application of a median filter to an image with a convolution core limit of not more than 9, a Gaussian blur filter with a blurring limit not exceeding 8 and an average filter with a convolution kernel limit of not more than 5. © 2018 St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences. All rights reserved.
Chapter
Full-text available
Document fraud has evolved to become a significant threat to individuals and organizations. Data leakage from hard copy documents is a common type of fraud. This chapter proposes a methodology for analyzing printed and photocopied versions of confidential documents to identify the source of a leak. The methodology incorporates a novel font pixel manipulation algorithm that embeds data in the pixels of certain characters of confidential documents in a manner that is imperceptible to the human eye. The embedded data is extracted from a leaked printed or photocopied document to identify the specific document that served as the source. The embedded data is robust in that it can withstand errors introduced by printing, scanning and photocopying documents. Experimental results demonstrate the efficiency, robustness and security of the methodology.
Chapter
In today’s world, steganography is achieved through various different factors like LSB replacement, DCT, etc., in various forms. In which the embedding of the data was used to be successful but at the cost of degrading the quality of the file behind which a secret message was embedded. Earlier techniques used were only for a particular subdomain like only file, only audio, and only video embedding. We have a proposed a new technique, in which the file of all format extensions like pdf, txt, doc, etc., are embedded behind a video without degrading the quality of the video. Video is dividing into number of frames along with dividing the secret file into various numbers of frames and then embedding each frame behind one another. The steganography achieved previously was near to 90–95% but we have achieved a success rate of near to 99%, as the pixels and properties of the video are unchanged, whereas earlier they used to be disrupted or degraded.
Conference Paper
Full-text available
The massive amount of data transfer over internet raises different challenges such as channel types, transmission time and data security. In this paper, we present a novel secure algorithm to hide the data inside document files, where four symbols are used to embed the data inside the carrier file. The main process depends on a key to produce a symbol table and match the data to be hidden with the representative symbols. This method can be extended to any language and does not change the file format. In addition, the capacity ratio of the presented algorithm is high compared to other algorithms.
Conference Paper
Full-text available
Different security strategies have been developed to protect the transfer of information between users. This has become especially important after the tremendous growth of internet use. Encryption techniques convert readable data into a ciphered form. Other techniques hide the message in another file, and some powerful techniques combine hiding and encryption concepts. In this paper, a new security algorithm is presented by using Steganography over HTML pages. Hiding the information inside Html page code comments and employing encryption, can enhance the possibility to discover the hidden data. The proposed algorithm applies some statistical concepts to create a frequency array to determine the occurrence frequency of each character. The encryption step depends on two simple logical operations to change the data form to increase the complexity of the hiding process. The last step is to embed the encrypted data as comments inside the HTML page. This new algorithm comes with many advantages, such as generality, applicability to different spoken languages, and can be extended to other Web programming pages such as XML, ASP.
Conference Paper
Full-text available
Security methodologies are taken into consideration for many applications, where transferring sensitive data over network must be protected from any intermediate attacker. Privacy of data can be granted using encryption, by changing transmitted data into cipher form. Apart from encryption, hiding data represents another technique to transfer data without being noticeable by an attacker. This technique is called Steganography. In this paper, we will discuss the main concepts of Steganography and a carrier media that is used for this goal. Employing text as mask for other text represents the most difficult method that can be used to hide data. We will discuss some algorithms that use Arabic text. We then describe our doted space methodology to enhance data hiding.
Conference Paper
Full-text available
Secure communication is essential for data confidentiality and integrity especially with the massive growth of the internet and mobile communication. Steganography is an art for data hiding by embedding the data to different objects such as text, images, audio and video objects. In this paper we propose a new algorithm for data hiding using Text Steganography in Arabic language. Our algorithm uses the Zero Width Character from Unicode (U+200B) and space character to pass bits before and after space. Main advantage of our algorithm file format will not be change and this will decrease the ability of Stegoanalysis to observe hidden data. Moreover ZWC algorithm can be applied to any language (ASCII, Unicode).
Conference Paper
Full-text available
The need for secure communications has significantly increased with the explosive growth of the internet and mobile communications. The usage of text documents has doubled several times over the past years especially with mobile devices. In this paper, we propose a new steganography algorithm for Unicode language (Arabic). The algorithm employs some Arabic language characteristics which represent extension letters. Kashida letter is an optional property for any Arabic text and usually is not popularly used. Many algorithms tried to employ this property to hide data in Arabic text. In our method, we use this property to hide data and reduce the probability of suspicions. The proposed algorithm first introduces four scenarios to add Kashida letters. Then, random concepts are employed for selecting one of the four scenarios for each round. Message segmentation principles are also applied, enabling the sender to select more than one strategy for each block of message. At the other end, the recipient can recognize which algorithm was applied and can then decrypt then message content and aggregate it. Kashida variation algorithm can be extended to other similar Unicode languages to improve robustness and capacity.
Article
Full-text available
The need for secure communication methods has significantly increased with the explosive growth of the internet and mobile communications. The usage of text documents has doubled several times over the past years especially with mobile devices. In this paper we propose a new steganography algorithm for Arabic text. The algorithm employs some letters that can be joined with other letters. These letters are the extension letter, Kashida and Zero width character. The extension letter, Kashida, does not have any change in the word meaning if joined to other letters. Also, the Zero width character (Ctrl+ Shift +1) does not change the meaning. The new proposed algorithm, Zero Width and Kashidha Letters (ZKS), mitigate the possibility to be discovered by steganoanalysis through using parallel connection and permutation function.
Conference Paper
Full-text available
With the rapid growth of networking mechanisms, where large amount of data can be transferred between users over different media, the necessity of secure systems to maintain data privacy increases significantly. Different techniques have been introduced to encrypt data during the transfer process to avoid any kind of attack. One of these techniques is to hide the data inside another file which is called Steganography. In steganography, data is hidden inside a carrier file where anyone can see, but the hidden data inside it cannot be discovered. To this end, good algorithms can avoid the suspicion of having any attacker by applying some criteria before sending the data. In this paper, we present an algorithm to hide data using a text file as a carrier. Left-Right Remarks that represent Unicode symbols are used to hide the data inside the text file. Moreover, our algorithm can be applied in different size textual data.
Conference Paper
Full-text available
The need for secure communications has significantly increased with the explosive growth of the internet and mobile communications. The usage of text documents has doubled several times over the past years especially with mobile devices. In this paper we propose a new Steganogaphy algorithm for Arabic text. The algorithm employs some Arabic language characteristics, which represent as small vowel letters. Arabic Diacritics is an optional property for any text and usually is not popularly used. Many algorithms tried to employ this property to hide data in Arabic text. In our method, we use this property to hide data and reduce the probability of suspicions hiding. Our approach uses a performance metric involves the file size before and after adding Diacritics and ability to hide data with being suspicious.
Data
Full-text available
Secure communication is essential for data confidentiality and integrity especially with the massive growth of the internet and mobile communication. Steganography is an art for data hiding by embedding the data to different objects such as text, images, audio and video objects. In this paper we propose a new algorithm for data hiding using Text Steganography in Arabic language. Our algorithm uses the Zero Width Character from Unicode (U+200B) and space character to pass bits before and after space. Main advantage of our algorithm file format will not be change and this will decrease the ability of Stegoanalysis to observe hidden data. Moreover ZWC algorithm can be applied to any language (ASCII, Unicode).