Conference PaperPDF Available

An Email based high capacity text steganography scheme using combinatorial compression

Authors:

Abstract

In this paper, we propose an email based high capacity text steganography method using combinatorial compression. The method makes use of forward email platform to hide the secret data in email addresses. We use the combination of BWT + MTF + LZW coding algorithm to increase the hiding capacity, as it is proved that this combination increases the compression ratio. To further increase the capacity, the numbers of characters of email id are also used to refer the secret data bits. Furthermore, the method adds some random characters just before the '@' symbol of email ids to increase the randomness. Experimental results show that our method performs better than the some important existing methods in terms of hiding capacity.
An Email based high capacity text steganography
scheme using combinatorial compression
Rajeev Kumar1, Satish Chand2, and Samayveer Singh3
Division of Computer Engineering,
Netaji Subhas Institute of Technology, New Delhi, India
rajeevgargnsit@gmail.com1, schand20@gmail.com2, and samayveersingh@gmail.com3
Abstract—In this paper, we propose an email based high
capacity text steganography method using combinatorial
compression. The method makes use of forward email platform
to hide the secret data in email addresses. We use the
combination of BWT + MTF + LZW coding algorithm to
increase the hiding capacity, as it is proved that this combination
increases the compression ratio. To further increase the capacity,
the numbers of characters of email id are also used to refer the
secret data bits. Furthermore, the method adds some random
characters just before the ‘@’ symbol of email ids to increase the
randomness. Experimental results show that our method
performs better than the some important existing methods in
terms of hiding capacity.
Keywords— text steganography; BWT; MTF; LZW;
I. INTRODUCTION
Steganography is a discipline of concealing secret
messages with reliability in such a way that no one can be
aware of the existence of the hidden messages. Another way to
send the secret data securely is cryptography which encrypts
or encodes the secret data rather than making it invisible.
Thus, the advantage of steganography over cryptography
alone is that the hidden messages do not attract any attention.
Therefore, steganography can be said to protect the content of
messages as well as the communicating parties. Information
to be concealed and communicated covertly is called the
payload or secret message. The carrier with concealed secret
data is known as stego media or covert message. The cover
media can be image, text, video or audio. The steganography
method which uses text as cover media is known as the text
steganography method. Text steganography is one of the
toughest areas of data encryption, since the difference between
the original and the covered texts is easily detectable. A good
steganographic method must possess three properties: hiding
capacity, security and robustness. Hiding capacity is the
amount of secret data that can be concealed in the media.
Security is related to the ability of a masquerader to figure the
hidden information easily. Robustness is related to the
resistance of the possible alteration to the unseen data [1].
In this paper, we propose an email based data hiding
method which uses combinatorial compression to increase the
hiding capacity. Basically, data compression algorithms are
classified into two categories: lossless or lossy. Lossless data
compression involves recovery of original data after
decompression and lossy data compression losses some
information while compression, hence exact recovery is not
possible after decompression in case of lossy compression. In
our proposed method, we use lossless data compression
techniques as we are dealing with textual information. If the
text contents are made even wee disturbed then also the
meaning of the entire sentence can be changed. Here, we make
use of Burrows Wheeler Transform (BWT) + Move to Front
(MTF) encoding + LZW algorithms to achieve better
compression ratio. The secret data is embedded into the email
ids of forward mail platform. The cover text is chosen from
the text base after some processing. While arranging the stego
cover as a forward mail platform, the previously arranged
email address list is utilized for selecting the email addresses.
This email address list is used as a global stego key that is
shared between both the sender and the receiver beforehand.
To check the quality of our proposed method, we evaluate it
on capacity metric. The rest of the paper is presented as
follows. Section 2 summarizes the related works. Section 3
introduces the proposed work and in section 4, experimental
results are analyzed. At last, in section 5, conclusion is
provided.
II. RELATED WORKS
In this section, some of the important text steganography
schemes for variety of languages like English, Persian
Chinese, and Arabic, etc. are discussed.
Earlier in text steganography, Wayner [3,4] discussed an
important method using mimic functions. It applies the inverse
of Huffman Code having employed the randomly distributed
bits of input stream on itself. It is directed toward the security
perspective i.e. resiliency against statistical attacks. Another
text steganography given by Maher [5] which is popular as
TEXTO is built to transform uuencoded or PGP ASCII-
armored ASCII data into English language sentences. It
converts the secret data into English words. To extend the
work of [6], another important method is synonyms-based
approach [7-9]. Unlike [6], the method uses legitimate words
and sentences having appropriate preciseness. Thus, the visual
attack will have wee significance on these types of methods..
Sun et al. [10] introduced a method using the left and right
components of Chinese characters hence known as L-R
scheme. It selects characters with left and right components as
candidates to conceal the secret data. If the secret data bit is
“1”, the scheme modifies the candidate by adjusting the space
between the left and right components otherwise leaves
unchanged. To improve the L-R scheme in terms of hiding
capacity Wang et al. [11] extends this method by
336
978-1-4799-4236-7/14/$31.00 c
2014 IEEE
incorporating the up and down structure of Chinese characters
as an extra candidate set. Further, a reversible function is also
incorporated to get the original cover text after the initial
hidden secret data has been extracted. Wang and Chang [12]
discussed a text steganography method which hides the secret
information into emotional icons (also known as emoticons) in
chat rooms over the Internet. Collaboratively, both the sender
and the receiver built a table. It is used at the time of
communication. These emoticons are categorized into multiple
classes according to their meaning (like smile laugh, cry).
Therefore, each emoticon is fall into at-least one classes. The
secret bits are used to be referred by the order number of an
emoticon. Grothoff et al. [13] discusses a text steganography
method which uses the errors to hide the secret data. The error
is usually come in the way in a machine translation (MT). The
secret data is hidden using substituting procedure on the
translated text using translation variations of multiple MT
systems. In 2008, Por et al. [14] discussed a text
steganography method called WhiteSteg which hides secret
data into the inter word and inter paragraph spacing. Normal
space and tab space characters are mainly utilized to hide the
secret data bits. the method hides significant amount of data
into the spacing but suffers from the security point of view
because the unusual ordering of the normal and tab spaces is
easily seen by the show/hide tool. The normal space is
indicated by dot sign and tab space by arrow sign. Por et al.
[15] introduced a data hiding method based on space character
manipulation called UniSpaCh. UniSpaCh conceals secret data
in Microsoft Word document using Unicode space characters.
It provides security from the show/hide attack. The method
uses the eight Unicode characters which are not visible by
show/hide option of word. The permutation of these Unicode
characters is utilized to hide the secret data. Therefore, without
making any significant alteration in the cover text spaces,
UniSpaCh provides sufficient hiding capacity and to provide
security some of the existing cryptography schemes are
applied to secret data before embedding. Thus security is also
improved.
Another important and popular method is introduced by
Satir and Isik [16] using forward mail platform to hide the
secret data. The method majorly considers hiding capacity and
security issues of data hiding techniques in account. The LZW
compression scheme is utilized to increase the hiding capacity.
It tries to enhance the dual pattern repetition before employing
the LZW scheme on the secret data. It hides the secret data
into the email addresses which are listed in Carbon copy (Cc)
field. In our proposed method, we extend the work of [16] to
again increase the hiding capacity and security. Our method
makes use of combination of Burrows Wheeler Transform
(BWT) + Move to Front (MTF) encoding + LZW algorithms
to increase the compression ratio. As [17] discussed that the
Burrows Wheeler Transform (BWT), Move to Front (MTF)
encoding, and LZW algorithms scheme used in combination
as in this paper increases the compression ratio. In addition,
we use the number of characters in email ids to refer to the
secret data bits so that the hiding capacity is further increased.
The next section discusses our proposed method.
III. THE PROPOSED METHOD
The goal of the proposed scheme is to hide more secret
data and improve security so that the communication cost is
reduced. The proposed scheme hides the secret data in text
media. The method is divided into two phases: embedding and
Extraction phase explained as follows.
3.1. Embedding phase
Let,
S: secret message
T: A text base of cover texts in which the secret message
will be embedded.
K1: set of email addresses having four characters before
‘@’ symbol. It is shared between the sender and the receiver.
A: set of the second parts of email addresses such as
outlook.com, gmail.com, etc.
Step1. Construct difference matrix D in order to select
most suitable or relevant text from the T. the D is calculated
by having difference of index of last matched symbol of S
with current index of match symbol in T iteratively except for
the first symbol of S as in case of first symbol, the last symbol
index is 0.
Step2. Calculate vectors R and E using following
equations:
R=D mod 26 (1)
E=D/26 (2)
Step3. Estimate the maximum dual pattern repetition in R
and store in a column matrix P. Now, select the largest row of
P and denote it as Pmax. The corresponding rows of R and E
vectors are also selected and put in R* and E* vectors. The
text from T corresponding to Pmax is also chosen and put in T*
as it is the most suitable text.
Step4. Apply BWT transform for rearranging the elements
of R* in runs of similar elements.
Step5. Apply MTF encoding to further increase the
correlation among the elements of R* after step 4.
Step6. Apply LZW compression technique to compress the
R* as follows:
Construct the initial LZW dictionary using the integers
between 1 and 26.
Update the LZW dictionary for every met symbol or
symbol string. The concerning symbol or symbol string is
encoded using the corresponding index in the dictionary.
Step7. Represent each element of R in binary form and
concatenate them in order to obtain bit stream.
Step8. The bit stream is partitioned into groups of 14 bits,
in each group, the first 9 bits are called G1, next 3 bits are
called G2 and remaining bits are called G3. The quotient and
remainder of decimal representation of G1 with respect to 26
are known as x and y respectively and decimal representation
of G2 is known as z.
2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence) 337
Step9. Choose email addresses from K1 by employing
Latin square (as shown in Fig. 1) on x and y. select extensions
of email addresses using z from A.
Step10. Alter chosen email addresses to incorporate
random elements according to G3 bits as follows
If the secret data bits are 01 then append a random symbol
before ‘@’ symbol.
Else if the secret data bits are 10 then append two random
symbols before ‘@’ symbol.
Else if the secret data bits are 11 then append three random
symbols before ‘@’ symbol.
Step11. Modify resultant email addresses (after step 10) in
order to complete construction of K2 set by using E*.
Step12. Construct stego-cover using T* as cover text and
K2 set as email addresses both.
3.2. Extraction phase
Step1. Get the stego-cover. Extract numeric elements of K2
before ‘@’ symbol to construct the vector E. If there is not any
numeric element then E will be 0.
Step2. Count the number of characters before numeric
values and if characters are 4 then secret data bits are 00, or 5
then secret data bits are 01, or 6 then secret data bits are 10,
otherwise the secret data bits 11. Store these bits in G3.
Step3. Extract first two elements from K2 to obtain x and y
by employing Latin Square of Fig. 1. Also extract email
address extension to obtain z. Now, calculate G1 and G2 for
each group of 12 bits by using the following equations:
G1 = (x • 26+y)2 (3)
G2 = (z)2 (4)
Step4. Concatenate G1, G2 and G3 in same order to obtain
compressed bit stream.
Step5. Decompress the bit stream using LZW decoding:
Construct the initial LZW dictionary using the integers
between 1 and 26.
Update the LZW dictionary for every met symbol or
symbol string. The concerning symbol or symbol string is
encoded using the corresponding index in the dictionary.
Step6. Apply MTF decoding to obtain R* (of step 4 of
embedding phase).
Step7. Apply BWT to obtain original R* (of step 3 of
embedding phase).
Step8. Estimate original difference D using R* and E as
follows:
D = R + (26 • E) (5)
Step9. By using elements of D, extract the elements of S
through T*, in the stego cover.
Thus, the secret data is obtained at receiver side. But, the
receiver must have A and K1 beforehand to extract the secret
data.
IV. EXPERIMENTAL RESULTS
To analyze the performance of the text steganography
methods, the hiding capacity is a prime parameter. In this
section, we calculate these parameters for our proposed
scheme and compare them with that of the recently developed
data hiding schemes. Bit rate or hiding capacity is defined as
the size of the hidden message relative to the size of the cover
[16]. In this case, we can formulate bit rate as follows:
 ୠ୧୲ୱ୭୤ୱୣୡ୰ୣ୲୫ୣୱୱୟ୥ୣ
ୠ୧୲ୱ୭୤ୱ୲ୣ୥୭ୡ୭୴ୣ୰ (6)
The novelty of our scheme lies into the combination of
compression method which is BWT + MTF + LZW coding
and also the way number of elements of email addresses are
used to refer to secret data bits. We also add randomness to the
selected email addresses while choosing number of elements
in the email addresses for hiding the secret data bits which
further improve the security aspect of the algorithm. Our
proposed scheme is implemented in MATLAB running on the
Intel® Core 2 Duo 2.20 GHz CPU, and 3GB RAM hardware
platform. The secret message used in our experiment is
“behind using a cover text ……………………….. the
intended recipient.”(just for an illustrative purpose) and the
cover text is “in the research area of text
steganography,……………………………………at least 16
bits.” (just for an illustrative purpose). Here, ‘…’ is sign of the
continuation of the text. The complete cover text and secret
data can be referred from [16]. The secret message has 200
characters with spaces and without quotation marks. The cover
text T* has the 847 characters without spaces and without
quotation marks. The stego cover consists of the chosen cover
text shown in Fig. 1 and the chosen and modified email
address (K2). The employed Latin square is shown in Fig 1.
According to Eq. (6), capacity has been computed as 7.03%
for this example. The performance of our method is increased
because the combination BWT and MTF increases the
correlation among the data [17]. So, the LZW compression
algorithm gives better compression ratio hence the hiding
capacity is increased.
V. CONCLUSIONS
In this paper, we have proposed an email based text
steganography method. The method makes use of the
combination of BWT + MTF + LZW codlings to achieve
higher capacity. It also uses number of elements of email
addresses to refer to the bits of the secret data. To increase the
security of the proposed method, the random elements are
added into the email addresses to enhance randomness. In this
method, the forward mail platform is used to hide the secret
data. Our scheme performs better than the existing state of the
art schemes like [16] in terms of hiding capacity.
338 2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence)
Table I: Comparison of hiding capacity
Method Capacity (%) Explanation
Mimic functions [4] 1.27 Computed using given secret message at http://www.spamimc.com
Winstein [6] 0.5 Based on the referred paper
Sun et al.’s L-R scheme [10] 2.17 Computed using the given sample in Wang et al. (2009a)
Wang et al. [11] 3.53 Computed using the given sample in Wang et al. (2009a)
Listega [9] 3.87 Based on the referred papers
Satir and Isik [16] 6.92 Based on the given example of the same article
Proposed Method 7.03 Calculated by employing the same example of [17]
Rows 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
1 A B C D E F G H I
J
K
LM
N
OPQ
R
S T U V
W
XY Z
2 B C D E F G H I J K L M N O P Q R S T U V W X Y Z A
3 C D E F G H I
J
K
L M
N
OPQ
R
STUV
W
X Y Z A B
4 D E F G H I
J
K
L M
N
OPQ
R
STUV
W
X Y Z A B C
5 E F G H I
J
K
L M
N
OPQ
R
STUV
W
X Y Z A B C D
6 F G H I
J
K
L M
N
O P Q
R
STUV
W
XY Z A B CD E
7 G H I
J
K
L M
N
O P Q
R
STUV
W
X Y Z A B C D E F
8 H I
J
K
L M
N
O P Q
R
STUV
W
X Y Z A B C D E F G
9 I
J
K
L M
N
O P Q
R
STUV
W
X Y Z A B C D E F G H
10
J
K
L M
N
O P Q
R
S T U V
W
XYZAB C D E F GH I
11
K
L M
N
O P Q
R
S T U V
W
XYZABCD E F G HI
J
12 L M
N
O P Q
R
S T U V
W
X Y Z A B C D E F G H I
J
K
13 M
N
O P Q
R
S T U V
W
X Y Z A B C D E F G H I
J
K
L
14
N
O P Q
R
S T U V
W
XYZABCDE FG H I
J
K
LM
15 O P Q
R
S T U V
W
X Y Z A B C D E F G H I
J
K
LM
N
16 P Q
R
S T U V
W
X Y Z A B C D E F G H I
J
K
L M
N
O
17 Q
R
S T U V
W
X Y Z A B C D E F G H I
J
K
L M
N
OP
18
R
S T U V
W
X Y Z A B C D E F G H I
J
K
L M
N
OP Q
19 S T U V
W
X Y Z A B C D E F G H I
J
K
L M
N
O P Q
R
20 T U V
W
X Y Z A B C D E F G H I
J
K
LM
N
O P Q
R
S
21 U V
W
X Y Z A B C D E F G H I
J
K
LM
N
O P Q
R
ST
22 V
W
X Y Z A B C D E F G H I
J
K
LM
N
O P Q
R
ST U
23
W
X Y Z A B C D E F G H I
J
K
LM
N
OP Q
R
S T U V
24 X Y Z A B C D E F G H I
J
K
LM
N
OPQ
R
S T U V
W
25 Y Z A B C D E F G H I
J
K
LM
N
OPQ
R
S T U V
W
X
26 Z A B C D E F G H I
J
K
LM
N
OPQ
R
S T U V
W
XY
Fig. 1: Latin Square
References
[1] A. Gutub and M. Fattani, “A Novel Arabic Text Steganography Method
Using Letter Points and Extensions,” WASET International Conference
on Computer, Information and Systems Science and Engineering
(ICCISSE), Vienna, Austria, pp: 28-31, May 25-27, 2007.
[2] J. Y. Liang, C. S. Chen, C. H. Huang, and L. Liu., “Lossless
compression of medical images using Hilbert space-filling curves,”
Computerized Medical Imaging and Graphics, vol. 32(3), pp. 174-182,
2008.
[3] P. Wayner, “Mimic Functions,” Cryptologia vol. 16(3), pp. 193-214,
1992.
[4] P. Wayner, “Disappearing Cryptography” AP Professional, Chestnut
Hill, MA (1996)
[5] K. Maher, TEXTO. 1995.
ftp://ftp.funet.fi/pub/crypt/steganography/texto.tar.gz.
[6] K. Winstein, “Lexical steganography through adaptive modulation of the
word choice hash”, Secondary education at the Illinois Mathematics and
Science Academy, January 1999.
[7] H. Nakagawa, K. Sanpei, T. Matsumoto, T. Kashiwagi, S. Kawaguchi,
K. Makino and I. Murase, “Meaning Preserving Information Hiding
_Japanese text Case,” IPSJ Journal, Vol.42, No.9, pp. 2339 - 2350,
2001. (In Japanese)
[8] B. Murphy and C. Vogel, “The syntax of concealment: reliable methods
for plain text information hiding,” In Proceedings of the SPIE
Conference on Security, Steganography, and Watermarking of
Multimedia Contents, San Jose, CA, vol. 65(05), 2007.
[9] A. Desoky, “Listega: List-Based Steganography Methodology,”
International Journal of Information Security, Springer-Verlag, vol. 8,
pp. 247-261, April 2009.
[10] X. Sun, G. Luo , and H. Huang, “Component-based digital
watermarking of Chinese texts,” Proceedings of the 3rd international
conference on Information security, Shanghai, China, 2004.
[11] Z. H. Wang, C. C. Chang, C. C. Lin and M. C. Li, “A reversible
information hiding scheme using left-right and up-down Chinese
character representation,” Journal of Systems and Software, vol.82, no.8,
pp.1362-1369, 2009.
[12] Z.H. Wang, C.C. Chang, T.D. Kieu, and M.C. Li, “Emoticon-based text
steganography in chat,” In: Proceedings of 2009 Asia-Pacific
Conference on Computational Intelligence and Industrial Applications
vol. 2, Wuhan, China, pp. 457–460, 2009.
[13] R. Stu tsman, C. Grothoff, M. Attallah, and K. Gr othoff, “Lost in just the
translation,” in Proc. ACM Symp. Applied Computing, pp. 338–345,
2006.
[14] L-Y. Por, T.F. Ang, B. Delina, “Whitesteg: a new scheme in information
hiding using text steganography” WSEAS Transaction on Computers,
Vol. 7, pp. 735745, 2008.
[15] L-Y. Por, K-S. Wong, and K-O. Chee, “UniSpaCh: A Textbased Data
Hiding Method Using Unicode Space Characters,” Journal of Systems
and Software, vol. 85, no. 5, pp. 1075-1082, 2012
[16] E. Satir, and H. Isik, “A compression-based text steganography
method,” Journal of Systems and Software, vol. 85(10), pp. 2385-2394,
October, 2012.
[17] L. Bin, N. Guiqiang, L. Jianxin, and Z. Xue, “BWT-based Data
Preprocessing for LZW,” International Conference on Multimedia and
Signal Processing (CMSP), 2011.
2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence) 339
... Several works have been proposed in the field of text steganography [8,[13][14][15][16]. Ekodeck and Ndoundam [13] proposed different approaches of PDF file based steganography, essentially based on the Chinese Remainder Theorem. ...
... Here, after a cover PDF document has been released from unnecessary characters of ASCII code A0, a secret message is hidden in it using one of the proposed approaches, making it invisible to common PDF readers, and the file is then transmitted through a non-secure communication channel. Rajeev et al. [15] proposed an email-based steganography method using a combination of compression. The method uses the email forwarding platform to hide secret data in email addresses and the combination of BWT, MTF and LZW compression algorithms to increase the embedding capacity. ...
... But in the example proposed, there is no compression. In other words, the size of the [7] 3.87 % Rajeev et al [15] 7.03 % Rajeev et al [14] 7.21 % Aruna et al [16] 13.43% compressed text is much greater than the size of the secret. To show this, we will give three different implementations of LZW algorithm applied to the secret message. ...
Article
Text steganography is a mechanism of hiding secret text message inside another text as a covering message. In this paper, we propose a text steganographic scheme based on color coding. This includes two different methods: the first based on permutation, and the second based on numeration systems. Given a secret message and a cover text, the proposed schemes embed the secret message in the cover text by making it colored. The stego-text is then send to the receiver by mail. After experiments, the results obtained show that our models perform a better hiding process in terms of hiding capacity as compared to the scheme of Aruna Malik et al. on which our idea is based. La stéganographie textuelle est un mécanisme permettant de cacher un message secret dans un texte. Dans ce papier, nous proposons un schéma stéganographique basé sur le codage de couleurs. Ce schéma comprend deux méthodes différentes : la première, basée sur les permutations et la seconde, basée sur les systèmes de numération. Etant donné un message secret et un texte de couverture, les méthodes proposées cachent le message secret dans le texte de couverture en le coloriant. Le stégo-texte est ensuite envoyé au destinataire par mail. Après expérimentations, les résultats obtenus montrent que nos modèles proposent un meilleure capacité d'embarquement par rapport au schéma d'Aruna Malik et al. sur laquelle repose notre idée.
... However, it aroused suspicion as it abruptly changed the LZW code value. Kumar et al. [10] discussed a high-capacity email based text steganography method that uses combinational compression. This method utilizes a forwarding email platform to hide the secret data in email addresses. ...
... The hiding capacity is a primary decisive parameter for the performance analysis of a text steganography algorithm. As in [8,10], we define the hiding capacity or bit rate as the size of the hidden data relative to the size of the stego cover. The hiding capacity can be calculated as formulated in Eq. (1): ...
... Table 5 compares the capacities of the proposed method and existing techniques. The proposed scheme achieved 12.02% capacity in comparison with that in [8] which produced only 6.92% for the same secret message and cover text given in Figs. 3 and 4. The proposed scheme also performed better than the method in [10] devised by Rajeev et al. for the same cover text and the secret message given in Figs. 3 and 4. By using Eq. (1), the capacity was calculated as 12.02% for this example. ...
Chapter
Text steganography is regarded as the most challenging carrier to hide secret data with because it is not enough unnecessary information compared with other carrier files. This study aimed to deal with the capacity (how much data can be hidden in the cover carrier) and security (the inability of disclosing the data by an unauthorized party) issues of text steganography. Generally, the data hiding capacity of text steganography is limited, and imperceptibility is very poor. Therefore, a new scheme was suggested to improve the two-letter word technique by using the Lempel-Ziv-Welch algorithm. This scheme can hide 4 bits in each position of a two-letter word in the cover text by inserting a nonprinting Unicode symbol. Each two-letter word can have four different locations in the text. Some experiments were conducted on the proposed method (enhancement) and our previous method (two-letter word) to compare their performance applying twelve secret message samples in terms of capacity and Jaro-Winkler. On the other hand, the performance of our proposed method was compared with other related studies in terms of capacity. The results show that the proposed method not only had a high embedding capacity but also reduced the growing size ratio between the original cover and stego cover. In addition, the security of the proposed approach was improved through improving imperceptibility and by using stego-key. Enhancement of Two-Letter Word Steganography Technique Using Lempel-Ziv-Welch Algorithm and Two-Letter Word Technique Abstract. Text steganography is regarded as the most challenging carrier to hide secret data with because it is not enough unnecessary information compared with other carrier files. This study aimed to deal with the capacity (how much data can be hidden in the cover carrier) and security (the inability of disclosing the data by an unauthorized party) issues of text steganography. Generally, the data hiding capacity of text steganography is limited, and imperceptibility is very poor. Therefore, a new scheme was suggested to improve the two-letter word technique by using the Lempel-Ziv-Welch algorithm. This scheme can hide 4 bits in each position of a two-letter word in the cover text by inserting a nonprinting Unicode symbol. Each two-letter word can have four different locations in the text. Some experiments were conducted on the proposed method (enhancement) and our previous method (two-letter word) to compare their performance applying twelve secret message samples in terms of capacity and Jaro-Winkler. On the other hand, the performance of our proposed method was compared with other related studies in terms of capacity. The results show that the proposed method not only had a high embedding capacity but also reduced the growing size ratio between the original cover and stego cover. In addition, the security of the proposed approach was improved through improving imperceptibility and by using stego-key.
... Kumar et al., [36] developed a text steganography algorithm that boosted hiding volume with a compression ratio using a combination of Burrows Wheeler Transform (BMT)+Move to Front (MTF) encoding +LWZ coding. This technique is implemented based on email sender address that use to increase unpredictability, this technique places a number of random letters before the email address sign (@). ...
... Arabic text using text steganography and cryptography [42] Complex algorithm 9 Secret sharing message system [47] 10 Coverless steganography Single Bit Rules [48] It has effective algorithm 11 Binary Mapping [34] Easy to detect changes letter 12 HTML Web page steganography [33] 13 Font color MS excel [12] Easy to detect changes letter 14 Character pair text [49] Easy to detect changes letter 15 AITSteg Via social media [50] 16 Multilayer Partially Homomorphic [51] 18 English text using number oriented [52] Fast embedding process 19 Huffman Compression [53] 20 Content-based Feature extraction [54] 21 Alphabet Pairing Text [55] 22 Compression ratio in Email [36] Complex algorithm 23 Encryption with Cover Text and Reordering [37] 24 Back-end interface web page [56] Time consuming process : High performance : Lack performance : No performance Figure 3 shows the number of achievements in development feature-based method in last decade. Based on Fig. 3, the most of high performance that achieve is capacity performance with 16 techniques, the second is security performance that achieved by nine techniques, the third is robustness with three techniques, and the last is other performance (effective algorithm and fast embedding process). ...
Article
Steganography is part of information hiding as the knowledge in science system that covers confidential messages via text, image, audio and video. Many researchers’ effort implemented steganography in the text domain using the feature-based method concerning uniqueness letters to embed that conceal the hidden message. This paper intends to review the achievement performance that is used in feature-based method. This paper aim to concern specifically on performance of robustness, security and capacity in implementation feature-based method of text steganography. Therefore, this paper reviews the implementation some performance that achieve by previous researcher in developing feature-based method of text steganography.
... In [17], the creator displayed text-based steganography based on LZW compression technique with email-ids and email messages as targeted carrier positions of confidential information. In [18], message steganography displayed by joining the transition to-front, tunnels wheeler-change and LZW compression techniques to acquire improved compression capacity of covert content while the carrier medium is alike [17]. In [19], an augmentation of message steganography [8] is exhibited so as to improve implanting limit by the utilization of the Huffman calculation. ...
... In [19], an augmentation of message steganography [8] is exhibited so as to improve implanting limit by the utilization of the Huffman calculation. In [17][18][19] plans spread content picked is not common content that using the arrangement of email-ids as a carrier medium may make defenselessness which might be perceptible as unnatural by the human spectator and can be tempered. ...
Article
Full-text available
In this paper, we have proposed an approach of PDF based text Steganography by considering the hiding capacity and security of secret information, and improved imperceptibility of stego-cover file. In proposed approach secret information are transformed into compact and encrypted form of imperceptible coding, and then translated into bits form, thereafter embedded into targeted locations of PDF file by applying new cross-reference coding technique in incremental updates of PDF file with less computation complexity. The proposed extraction process provides authentication of received stego-cover file such that only desired file is accepted for the extraction process otherwise fake file is discarded by recipient. Time complexity has been improved significantly by implementing a novel method of PDF steganography for embedding the secret data. Experimental result demonstrates that proposed method provides efficient algorithms in terms of improved security of hidden information.
... It is also computation heavy. Kumar et al. [20] have proposed a slightly modified version of the above method with a combination of three compression techniques namely Burrows-Wheeler Transform (BWT), Move to Front encoding (MTF), and LZW and achieve a hiding capacity of 7.03%. ...
... In an active attack where the adversary intends to jeopardize the covert message but without knowing for sure, employs methods such as rewriting the text, removing font size changes, color changes, white space manipulation, shifting positions of diacritics, etc. Normally most of the text-based steganography techniques fail when an active attack occurs. Other email based approaches [1,3,11,20,34,35] also fail on such types of attacks where the email body cover text is changed slightly or some noise is added to it or some words are replaced with synonyms. Also, all the above methods use the cover text and secret text in the process to generate email addresses, so any small change in the cover text ruins the whole secret message. ...
Article
Full-text available
Text steganography is inherently difficult due to minimal redundant information space to hide secret payload. The same fact limits the hiding capacity and security too. In this study, a novel technique has been proposed using a randomized indexed word dictionary, and a list of email addresses to increase the hiding capacity and security. A forward email platform has been used as the cover, and email addresses in the carbon copy (CC) field contain secret data that are encoded using a randomized index-based word dictionary. The email username list and indexed word dictionary are both pre-shared between the communicating parties. But during every new communication, a random bitstream (temporary stego-key) is generated from the system time and communicated separately using public-key cryptography. This temporary stego-key is used to randomize the index values of the words in the dictionary. Most of the existing state-of-the-art techniques provide a hiding capacity of 6–10%. The proposed scheme achieves a capacity of 12.17% using some common secret text and email body text (cover text) as used in all other studies. The proposed technique provides higher hiding capacity and security by randomizing the word indexes every time using temporary stego-key. It is also free from statistical attacks, OCR based attacks, and does not depend on the use of any particular text processor.
... In [65], they conducted a steganography study on e-mail texts using the combinatorial compression method. They used a combination of burrows wheeler transform (BWT), move to forward (MTF) and LZW coding algorithm to increase the capacity of the information hidden in their proposed method. ...
Article
Full-text available
With the effect of digitalization, the transfer of all text documents over the Internet rather than human transmission has increased, and this situation has revealed the idea that text documents can be used as a carrier that can safely store information. Realizing that methods such as word-line shifting, usage of spaces, replacement of the word with its synonym are fragile against steganalysis, led to new searches and it was determined that deep learning models were more resistant to detecting the presence of hidden words. In this study, the text generation based on the information that is wanted to be hidden without a carrier text, both at word and character level, was performed. Arithmetic coding, perfect tree and Huffman coding methods were used as secret information embedding methods in text generation based on word level. In this part of the study, bidirectional LSTM architecture with attention mechanism was created as language model. In text generation based on character level, a new secret information embedding algorithm is created by combining the LZW compression algorithm with the Char Index (LZW-Char Index Encoding) method. The character-level model is created as a result of using the encoder–decoder architecture together with bidirectional LSTM and Bahdanau attention. The proposed method was evaluated from the perspectives of information embedding efficiency, information imperceptibility and hidden information capacity. As a result of the experiments, it was determined that the method exceeded the state-of-the-art performance and was more resistant to steganalysis.
... As in [13,46] we define bit rate or hiding capacity as the size of the hidden message relative to the size of the cover. It can be formulated as follows: Capacity = bits of secret message bits of stego cover During their study Khosravi et al. choose 200 different texts with different words and lines in order to measure the capacity ratio of their scheme and considering it they saw that approximately %69 of the lines are 9S-3AS host lines. ...
Preprint
Full-text available
In this paper, we proposed a method that exploits justified texts of PDF files to embed secret data. The method is inspired by the work of Khosravi et al on "A new method for pdf steganography in justified texts" where they proposed a stegosystem allowing to embed in particular lines called a host line, 4 bits of secret message. We've improved it by exposing a combinatorics-based method able not only to embed 7 bits of secret message in these host lines but also b bits (b<14) in any line containing at least n spaces and m added spaces for (m, n > 0), thus allowing a sender to conceal a bigger message with very few changes made to the cover text.<br
... As in [13,46] we define bit rate or hiding capacity as the size of the hidden message relative to the size of the cover. It can be formulated as follows: Capacity = bits of secret message bits of stego cover During their study Khosravi et al. choose 200 different texts with different words and lines in order to measure the capacity ratio of their scheme and considering it they saw that approximately %69 of the lines are 9S-3AS host lines. ...
Preprint
Full-text available
In this paper, we proposed a method that exploits justified texts of PDF files to embed secret data. The method is inspired by the work of Khosravi et al on "A new method for pdf steganography in justified texts" where they proposed a stegosystem allowing to embed in particular lines called a host line, 4 bits of secret message. We've improved it by exposing a combinatorics-based method able not only to embed 7 bits of secret message in these host lines but also b bits (b<14) in any line containing at least n spaces and m added spaces for (m, n > 0), thus allowing a sender to conceal a bigger message with very few changes made to the cover text.<br
Article
Full-text available
This paper presents a new steganography approach suitable for Arabic texts. It can be classified under steganography feature coding methods. The approach hides secret information bits within the letters benefiting from their inherited points. To note the specific letters holding secret bits, the scheme considers the two features, the existence of the points in the letters and the redundant Arabic extension character. We use the pointed letters with extension to hold the secret bit 'one' and the un-pointed letters with extension to hold 'zero'. This steganography technique is found attractive to other languages having similar texts to Arabic such as Persian and Urdu.
Article
Full-text available
This paper proposes a text-based data hiding method to insert external information into Microsoft Word document. First, the drawback of low embedding efficiency in the existing text-based data hiding methods is addressed, and a simple attack, DASH, is proposed to reveal the information inserted by the existing text-based data hiding methods. Then, a new data hiding method, UniSpaCh, is proposed to counter DASH. The characteristics of Unicode space characters with respect to embedding efficiency and DASH are analyzed, and the selected Unicode space characters are inserted into inter-sentence, inter-word, end-of-line and inter-paragraph spacings to encode external information while improving embedding efficiency and imperceptivity of the embedded information. UniSpaCh is also reversible where the embedded information can be removed to completely reconstruct the original Microsoft Word document. Experiments were carried out to verify the performance of UniSpaCh as well as comparing it to the existing space-manipulating data hiding methods. Results suggest that UniSpaCh offers higher embedding efficiency while exhibiting higher imperceptivity of white space manipulation when compared to the existing methods considered. In the best case scenario, UniSpaCh produces output document of size almost 9 times smaller than that of the existing method.
Article
Full-text available
Many plain text information hiding techniques demand deep semantic processing, and so suffer in reliability. In contrast, syntactic processing is a more mature and reliable technology. Assuming a perfect parser, this paper evaluates a set of automated and reversible syntactic transforms that can hide information in plain text without changing the meaning or style of a document. A large representative collection of newspaper text is fed through a prototype system. In contrast to previous work, the output is subjected to human testing to verify that the text has not been significantly compromised by the information hiding procedure, yielding a success rate of 96% and bandwidth of 0.3 bits per sentence.
Article
Full-text available
Sending encrypted messages frequently will draw the attention of third parties, i.e. crackers and hackers, perhaps causing attempts to break and reveal the original messages. In this digital world, steganography is introduced to hide the existence of the communication by concealing a secret message inside another unsuspicious message. The hidden message maybe plaintext, or any data that can be represented as a stream of bits. Steganography is often being used together with cryptography and offers an acceptable amount of privacy and security over the communication channel. This paper presents an overview of text steganography and a brief history of steganography along with various existing techniques of text steganography. Highlighted are some of the problems inherent in text steganography as well as issues with existing solutions. A new approach, named WhiteSteg is proposed in information hiding using inter-word spacing and inter-paragraph spacing as a hybrid method to reduce the visible detection of the embedded messages. WhiteSteg offers dynamic generated cover-text with six options of maximum capacity according to the length of the secret message. Besides, the advantage of exploiting whitespaces in information hiding is discussed. This paper also analyzes the significant drawbacks of each existing method and how WhiteSteg could be recommended as a solution.
Article
Cryptology is the practice of hiding digital information by means of various obfuscatory and steganographic techniques. The application of said techniques facilitates message confidentiality and sender/receiver identity authentication, and helps to ensure the integrity and security of computer passwords, ATM card information, digital signatures, DVD and HDDVD content, and electronic commerce. Cryptography is also central to digital rights management (DRM), a group of techniques for technologically controlling the use of copyrighted material that is being widely implemented and deployed at the behest of corporations that own and create revenue from the hundreds of thousands of mini-transactions that take place daily on programs like iTunes. This new edition of our best-selling book on cryptography and information hiding delineates a number of different methods to hide information in all types of digital media files. These methods include encryption, compression, data embedding and watermarking, data mimicry, and scrambling. During the last 5 years, the continued advancement and exponential increase of computer processing power have enhanced the efficacy and scope of electronic espionage and content appropriation. Therefore, this edition has amended and expanded outdated sections in accordance with new dangers, and includes 5 completely new chapters that introduce newer more sophisticated and refined cryptographic algorithms and techniques (such as fingerprinting, synchronization, and quantization) capable of withstanding the evolved forms of attack. Each chapter is divided into sections, first providing an introduction and high-level summary for those who wish to understand the concepts without wading through technical explanations, and then presenting concrete examples and greater detail for those who want to write their own programs. This combination of practicality and theory allows programmers and system designers to not only implement tried and true encryption procedures, but also consider probable future developments in their designs, thus fulfilling the need for preemptive caution that is becoming ever more explicit as the transference of digital media escalates. * Includes 5 completely new chapters that delineate the most current and sophisticated cryptographic algorithms, allowing readers to protect their information against even the most evolved electronic attacks. * Conceptual tutelage in conjunction with detailed mathematical directives allows the reader to not only understand encryption procedures, but also to write programs which anticipate future security developments in their design. * Grants the reader access to online source code which can be used to directly implement proven cryptographic procedures such as data mimicry and reversible grammar generation into their own work.
Article
In this study, capacity and security issues of text steganography have been considered to improve by proposing a novel approach. For this purpose, a text steganography method that employs data compression has been proposed. Because of using textual data in steganography, the employed data compression algorithm has to be lossless. Accordingly, LZW data compression algorithm has been chosen due to its frequent use in the literature and significant compression ratio. The proposed method constructs – uses stego keys and employs Combinatorics-based coding in order to increase security. Secret information has been hidden in the chosen text from the previously constructed text base that consists of naturally generated texts. Email has been chosen as communication channel between the two parties, so the stego cover has been arranged as a forward mail platform. By means of the proposed scheme, capacity has been reached to 7.042% for the secret message containing 300 characters (or 300·8 bits). Finally, comparison of the proposed scheme with the other contemporary methods in the literature has been carried out. Experimental results show that the proposed scheme provided a significant increment in terms of capacity.
Article
In this paper we propose a BWT-based LZW algorithm for reducing the compressed size and the compression time. BWT and MTF can expose potential redundancies in a given input and then significantly improve the compression ratio of LZW. In order to avoid the poor matching speed of LZW on long runs of the same character, we propose a variant of RLE named RLE-N. RLE-N does not affect the compression ratio, but it contributes LZW to reduce the execution time obviously. The experimental results show that our algorithm performs well on normal files.
Article
According to the types of the host media, digital watermarking may be classified mainly as image watermarking, video watermarking, audio watermarking, and text watermarking. The principle of the three watermarking research fields are similar in that they make use of the redundant information of their host media and the characteristics of human video system or human audio system. Unfortunately, text has no redundant information. Text watermarking techniques are totally different from them. And text watermarking algorithm is very difficult to satisfy the requirements of transparence and robustness. In this paper, a novel text watermarking algorithm based on the thought of the mathematical expression will be presented. Since watermarking signals are embedded into some Chinese characters that can be divided into left and right components, this algorithm is totally based on the content. Therefore, it breaks through the difficulties of text watermarking. Experiments also show that the component-based text watermarking technique is relatively robust and transparent. It will play an important role in protecting the security of Chinese documents over Internet.
Article
A mimic function changes a file A so it assumes the statistical properties of another file B. That is, if p(t, A) is the probability of some substring t occurring in A, then a mimic function f, recodes A so that p(t, f(A)) approximates p(t, B) for all strings t of length less than some n. This paper describes the algorithm for computing mimic functions and compares the algorithm with its functional inverse, Huffman coding. The paper also provides a description of more robust and more general mimic functions which can be defined using context-free grammars and van Wijngaarden grammars.