Conference PaperPDF Available

A New Approach to Persian/Arabic Text Steganography

Authors:

Abstract and Figures

Conveying information secretly and establishing hidden relationship has been of interest since long past. Text documents have been widely used since very long time ago. Therefore, we have witnessed different method of hiding information in texts (text steganography) since past to the present. In this paper we introduce a new approach for steganography in Persian and Arabic texts. Considering the existence of too many points in Persian and Arabic phrases, in this approach, by vertical displacement of the points, we hide information in the texts. This approach can be categorized under feature coding methods. This method can be used for Persian/Arabic Watermarking. Our method has been implemented by JAVA programming language.
Content may be subject to copyright.
A New Approach to Persian/Arabic Text Steganography
M. Hassan Shirali-Shahreza
Computer Eng. Department
Yazd University
Yazd, IRAN
hshirali@yazduni.ac.ir
Mohammad Shirali-Shahreza
Computer Science Department
Sharif University of Technology
Tehran, IRAN
shirali@cs.sharif.edu
Abstract
Conveying information secretly and establishing
hidden relationship has been of interest since long
past. Text documents have been widely used since very
long time ago. Therefore, we have witnessed different
method of hiding information in texts (text
steganography) since past to the present. In this paper
we introduce a new approach for steganography in
Persian and Arabic texts. Considering the existence of
too many points in Persian and Arabic phrases, in this
approach, by vertical displacement of the points, we
hide information in the texts. This approach can be
categorized under feature coding methods. This
method can be used for Persian/Arabic Watermarking.
Our method has been implemented by JAVA
programming language.
Keywords: Information Security, Text
Steganography, Text Watermarking, Information
Hiding, Feature Coding, Persian/Arabic Text, Image
Processing & Pattern Recognition.
1. Introduction
By development of computer and the expansion of
its use in different areas of life and work, the issue of
security of information has gained special significance.
One of the concerns in the area of information security
is the concept of hidden exchange of information. For
this purpose, various methods including cryptography,
steganography, coding and so on have been used.
Steganography is one of the methods which have
attracted more attention during the recent years.
In implementing steganography, the main objective
is to hide the information under cover media so that the
outsiders may not discover the information contained
in the said frame. This is the major distinction between
steganography and other methods of hidden exchange
of information. For example, in cryptography method,
people become aware of the existence of information
by observing coded information, although they will be
unable to comprehend the information. However, in
steganography, nobody will understand the existence
of information in the resources [1].
Most of steganography works have been carried out
on pictures [2, 3], video clips [4, 5], music and sounds
[6].
Text steganography is the most difficult kind of
steganography [7]; this is due largely to the relative
lack of redundant information in a text file as
compared with a picture or a sound file [8].
The structure of text documents is identical with
what we observe, while in other types of documents
such as in picture, the structure of document is
different from what we observe. Therefore, in such
documents, we can hide information by introducing
changes in the structure of the document without
making a notable change in the concerned output.
Of course today, the security of information has
been considerably improved by combination of
steganography with other methods mentioned. In
addition to hidden exchange of information,
steganography is used in other areas such as copyright
protection, preventing e-document forging and other
applications [9].
Contrary to other media such as pictures, sounds
and video clips, using text documents has been
common since very old times. Even after invention of
printing machine, most of the books and documents
have contained only texts. This has extended until
today and still, using text is preferred over other
media, because the texts occupy lesser memory,
communicate more information and need less cost for
printing as well as some other advantages.
Proceedings of the 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS
International Workshop on Component-Based Software Engineering, Software Architecture and Reuse (ICIS-COMSAR’06)
0-7695-2613-6/06 $20.00 © 2006 IEEE
As the use of text and hidden communication goes
back to antiquity, we have witnessed to steganography
of information in texts since past. For example, in
order to prevent disclosure of government documents
by the press, Margaret Thatcher, former British Prime
Minister used to place certain number of white-spaces
in documents related to each cabinet minister so that
she could identity the owner of the document [10].
Today, the computer systems have facilitated hiding
information in texts. The range of using hiding
information in text has also developed. From among
the most important of these technologies, one can
name of hiding information in electronic texts and
documents. The use of hiding information in text for
web pages is another example.
Different methods are used for hiding information
in text which will be dealt with in section 2.
The present paper offers a new method for hiding
information in Persian and Arabic texts. Due to
differences between languages, no single method can
be used for hiding information in texts of different
languages. This will be discussed in section 3.
2. Previous Works
A few works have been done on hiding information
in texts. Following is the list of ten different methods
of the works carried out and reported thus far.
2.1. Steganography of Information in Random
Character and Word Sequences [11]
By generating a random sequence of characters or
words, specific information can be hidden in this
sequence.
In this method, the characters or words sequence is
random; therefore it is meaningless and attracts the
attentions too much. It seems to be that this method is
not steganography, but it is a kind of encryption.
2.2. Steganography of Information in Specific
Characters in Words [10]
In this method, some specific characters from
certain words are selected as hiding place for
information. In the simplest form, for example, the first
words of each paragraph are selected in a manner that
by placing the first characters of these words side by
side, the hidden information is extracted. This has been
done by classic poets of Iran as well.
This method requires strong mental power and
takes a lot of time. It also requires special text and not
all types of texts can be used in this method.
2.3. Creating Spam Texts [11]
Another feature of HTML documents is their case-
insensitivity of tags and their members. For example,
the three tags <BR>, <Br> and <br> are equally valid
and are the same. As a result, one can do information
steganography in HTML documents by changing the
small or large case of letters in document tags. To
extract information, one can extract information by
comparing these words with words in normal case and
by using the appropriate function.
However, in the WML, all tags should be written in
lowercase letters and, as a result, this method cannot be
employed.
2.4. Line Shifting [12, 13]
In this method, the lines of the text are vertically
shifted to some degree (for example, each line shifts
1/300 inch up or down) and information are hidden by
creating a unique shape of the text. This method is
proper for printed texts.
However, in this method, the distances can be
observed by using special instruments of distance
assessment and necessary changes can be introduced to
destroy the hidden information. Also if the text is
retyped or if character recognition programs (OCR) are
used, the hidden information would get destroyed.
2.5. Word Shifting [12, 14]
In this method, by shifting words horizontally and
by changing distance between words, information are
hidden in the text. This method is acceptable for texts
where the distance between words is varying. This
method can be identified less, because change of
distance between words to fill a line is quite common.
But if somebody was aware of the algorithm of
distances, he can compare the present text with the
algorithm and extract the hidden information by using
the difference. The text image can be also closely
studied to identify the changed distances. Although
this method is very time consuming, there is a high
probability of finding information hidden in the text.
The same as in the method described under 2-4,
retyping of the text or using OCR programs destroys
the hidden information.
2.6. Syntactic Methods [11]
By placing some punctuation signs such as full stop
(.) and comma (,) in proper places, one can hide
information in a text file.
Proceedings of the 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS
International Workshop on Component-Based Software Engineering, Software Architecture and Reuse (ICIS-COMSAR’06)
0-7695-2613-6/06 $20.00 © 2006 IEEE
This method requires identifying proper places for
putting punctuation signs. The amount of information
to hide in this method is trivial.
2.7. Semantic Methods [11, 15]
In this method, we use the synonym of words for
certain words thereby hiding information in the text. A
major advantage of this method is the protection of
information in case of retyping or using OCR
programs (contrary to methods listed under 2-4 and 2-
5).
However, this method may alter the meaning of the
text.
2.8. Feature Coding [16]
In this method, some of the features of the text are
altered. For example, the end part of some characters
such as h, d, b or so on, are elongated or shortened a
little thereby hiding information in the text. In this
method, a large volume of information can be hidden
in the text without making the reader aware of the
existence of such information in the text.
By placing characters in a fixed shape, the
information is lost. Retyping the text or using OCR
program (as in methods 2-4 and 2-5) destroys the
hidden information.
2.9. Abbreviation [8]
Another method for hiding information is the use of
abbreviations.
In this method, very little information can be hidden
in the text. For example, only a few bits can be hidden
in a file of several kilobytes.
2.10. Open Spaces [8, 17]
In this method, hiding information is done through
adding extra white-spaces in the text. These white-
spaces can be placed at the end of each line, at the end
of each paragraph or between the words. This method
can be implemented on any arbitrary text and does not
raise attention of the reader.
However, the volume of information hidden under
this method is very little. Also, some text editor
programs automatically delete extra white-spaces and
thus destroy the hidden information.
3. Suggested Algorithm
One of the characteristics of Persian language is
abundance of points in its letter. Although English also
has points, there is a huge difference between the two
languages in this respect. In English, only two letters
of small "i" and small "j" have point while in Persian
18 letters out of 32 alphabet letters have points. From
these 18, 3 letters have 2 points each, 5 letters have 3
points each and the other 10 letters have 1 point each
(Table 1) [18].
Persian language, of course, has 4 letters make than
Arabic language from which, 3 letters have point.
Therefore, in Arabic, 15 letters out of the entire 28
alphabet letters have point. In general, the number of
points in any given Persian or Arabic text is
noteworthy. In this paper, this same characteristic of
Persian and Arabic languages is used for
steganography.
For this purpose, the concerned information is first
of all compressed. Then, we look for the first pointed
letter in the given text. By finding this character, we go
to the compressed information and read the first bit of
information which has one the values of zero or one. If
the value of the bit were zero, the concerned character
remains unchanged. If the value of the bit were one,
we shift the point on the concerned character a little
upward (Figure 1).
Table 1. Persian Alphabet
Letters
without
point
Letters
with
one point
Letters
with
two points
Letters
with
three points
Figure 1. Vertical displacement of the points
for the Persian letter NOON
Proceedings of the 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS
International Workshop on Component-Based Software Engineering, Software Architecture and Reuse (ICIS-COMSAR’06)
0-7695-2613-6/06 $20.00 © 2006 IEEE
This procedure is repeated for the next pointed
characters in the text and the next bits of information.
Thus, the entire information is hidden. In order to
divert the attention of readers, after hiding all
information, the points of the remaining characters are
also changed randomly. Of course, before doing this,
the size of hidden information is also hidden in the
beginning of the text.
For the characters with two or three points, all
points shift, because shifting one point among the
points of a character raises attentions.
While extracting information from the text, the
program starts identifying the quantity of hidden bit in
the character based on the place of points on the
character. By placing all the extracted bits side by
side, the compressed information is obtained. Now,
this compressed piece of information is uncompressed
and the original data is recovered.
This method can be categorized under feature
coding methods (described under 2.8) which is
developed for Persian and Arabic languages.
4. Advantages and Disadvantages
4.1. Advantages
1. By this method, a large volume of information
can be hidden in text, because a large number of letters
in Persian and Arabic have points.
2. Due to the lack of a strong OCR program for
Persian and Arabic languages, the printed text cannot
be easily converted into a simple text thus destroying
the hidden information is difficult.
3. The text containing hidden phrases is not
specific to computer and the hidden information can
also extracted from printed text. In order to recover the
information in case of printed text, the text should be
scanned and then subjected to the relevant program.
4. The hidden text is resistant to enlargement or
downsize and these changes do not destroy the hidden
information.
4.2. Disadvantages
1. The information is lost in case of retyping.
2. The output text has a fixed frame due to the
use of only one font.
3. Due to the lack of good OCR program for
Persian and Arabic languages, using this method in
texts that are printed and then scanned is difficult.
5. Experimental Results
In this project, files and information were hidden in
text documents by the use of the described algorithm.
For this purpose, several files containing text, picture
and executive file were selected. Then the files were
compressed to reduce their size. The compressed file
was then hidden in the text by our steganography
program.
The steganography program is developed by Java
language. In order to hide information in the text, first
of all we introduce some change to the concerned font.
That is, we define a new mode for the pointed letters in
blank spaces of the font file. In this mode, the point of
the letter is placed a little higher. Now the program
starts to read the incoming data bit by bit and the
incoming text letter by letter to find pointed letters. If
the incoming bit was zero, the pointed letter remains
unchanged but if the incoming bit were one, the found
pointed letter is changed to a mode whose point is a
little higher. Then the resulted output file is converted
into a PDF file by Adobe Acrobat. The concerned
document is now ready and can be printed.
In order to extract data from the text, we used
information extractor program which is also in Java
language. For this purpose, first we converted the PDF
file to a framed text file by the use of Acrobat Reader.
Then, by the use of the algorithm for extraction of
information, the program acted reversely and extracted
the compressed data from the text. At the end, we
uncompressed the data file to obtain the original file.
By comparing the output files with input files, we
observed that both files were identical.
After this experiment, we studied the capacity of a
number of texts selected from some highly circulating
newspapers of Iran for hiding data. In steganography
the hidden data must not attract attention. Therefore
for using newspapers or magazines as cover media for
hiding data in them, it is better to use internal pages
instead of using first page or cover pages. In this
project, we check sport pages of some Iranian
newspapers for computing the capacity of an article for
hidden data. Table I shows the result of this
computation.
The internet address of these newspapers and the
capacity of each text for hiding data are shown in
Table 2. All the articles selected on 20 August 2005.
Proceedings of the 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS
International Workshop on Component-Based Software Engineering, Software Architecture and Reuse (ICIS-COMSAR’06)
0-7695-2613-6/06 $20.00 © 2006 IEEE
Table 2. The capacity of an article in sport
pages of some Iranian newspapers for
hiding data
Newspaper WebSite Address
Text Size
(Kilo Byte)
Text
Capacity(Bit)
Capacity Ratio
(Bit / Kilobyte)
Farhange
Ashti ashtidaily.com 13.3 1278 96
Hamshahri hamshahri.net 6.82 820 120
Iran iraninstitute.org 6.64 694 105
JameJam jamejamdaily.net 3.84 434 113
Javan javandaily.com 8.03 922 115
Jomhouri
Eslami jomhourieslami.com 3.52 441 125
Keyhan kayhannews.ir 2.92 310 106
Khorasan khorasannews.com 5.40 628 116
Quds qudsdaily.net 9.98 1137 114
Shargh sharghnewspaper.com 20.4 2409 118
6. Conclusion
In this paper we introduce a new approach for
steganography of information in Persian and Arabic
texts. This method is based on the existence of point in
majority of letters of Persian and Arabic alphabets. On
this basis, information was hidden in text by changing
the place of points. This method can be used in hidden
exchange of information through text documents and
text watermarking.
In addition to establishing secret communication,
this method can be used for preventing illegal
duplication and distribution of texts especially
electronic texts [19, 20].
In addition to use this method for electronic texts, it
can be applied on hard copy documents. To this end,
we print the document after hiding data in it. For
extracting data from the hard copy document, we scan
it and unhide the embedded data by computer.
Considering the similarity of Urdu script (official
language of Pakistan) with Persian and Arabic, this
method can be used in Urdu as well.
In addition to vertical displacement of points, the
points can be displaced in horizontal direction as well
and thus two bits of information can be hidden in each
letter. By combining the above method with other
methods such as line shifting and word shifting, the
volume of hidden information can be increased.
By employing a font editing software, the program
can be enabled dynamically to produce necessary fonts
for hiding information so that the output form of the
text is not homogenous and conform to the input form
of the text.
7. References
[1] J.C. Judge, "Steganography: Past, Present, Future", SANS
white paper, November 30, 2001,
http://www.sans.org/rr/papers/index.php?id=552,
last visited: 1 May 2006.
[2] R. Chandramouli, and N. Memon, "Analysis of LSB
based image steganography techniques", Proceedings of the
International Conference on Image Processing, vol. 3, 7-10
Oct. 2001, pp. 1019 - 1022.
[3] M. Shirali Shahreza, “An Improved Method for
Steganography on Mobile Phone”, WSEAS Transactions on
Systems, vol. 4, Issue 7, July 2005, pp. 955-957.
[4] G. Doërr and J.L. Dugelay, "A Guide Tour of Video
Watermarking", Signal Processing: Image Communication,
vol. 18, Issue 4, 2003, pp. 263-282.
[5] G. Doërr and J.L. Dugelay, "Security Pitfalls of Frame-
by-Frame Approaches to Video Watermarking", IEEE
Transactions on Signal Processing, Supplement on Secure
Media, vol. 52, Issue 10, 2004, pp. 2955-2964.
[6] K. Gopalan, "Audio steganography using bit
modification", Proceedings of the IEEE International
Conference on Acoustics, Speech, and Signal Processing,
(ICASSP '03), vol. 2, 6-10 April 2003, pp. 421-424.
[7] J.T. Brassil, S. Low, N.F. Maxemchuk, and L.
O’Gorman, "Electronic Marking and Identification
Techniques to Discourage Document Copying", IEEE
Journal on Selected Areas in Communications, vol. 13,
Issue. 8, October 1995, pp. 1495-1504.
[8] W. Bender, D. Gruhl, N. Morimoto, and A. Lu,
"Techniques for data hiding", IBM Systems Journal, vol. 35,
Issues 3&4, 1996, pp. 313-336.
[9] N. F. Maxemchuk and S. Low, “Marking Text
Documents”, Proceedings of the IEEE International
Conference on Image Processing, Santa Barbara, CA, USA,
Oct. 26-29, 1997, pp. 13-16.
[10] T. Moerland, "Steganography and Steganalysis", May
15, 2003, www.liacs.nl/home/tmoerlan/privtech.pdf,
last visited: 1 May 2006.
[11] K. Bennett, "Linguistic Steganography: Survey,
Analysis, and Robustness Concerns for Hiding Information
in Text", Purdue University, CERIAS Tech. Report 2004-13.
[12] S.H. Low, N.F. Maxemchuk, J.T. Brassil, and L.
O'Gorman, "Document marking and identification using both
line and word shifting", Proceedings of the Fourteenth
Annual Joint Conference of the IEEE Computer and
Communications Societies (INFOCOM '95), 2-6 April 1995,
vol.2, pp. 853 - 860.
Proceedings of the 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS
International Workshop on Component-Based Software Engineering, Software Architecture and Reuse (ICIS-COMSAR’06)
0-7695-2613-6/06 $20.00 © 2006 IEEE
[13] A.M. Alattar and O.M. Alattar, "Watermarking
electronic text documents containing justified paragraphs and
irregular line spacing ", Proceedings of SPIE -- Volume
5306, Security, Steganography, and Watermarking of
Multimedia Contents VI, June 2004, pp. 685-695.
[14] Y. Kim, K. Moon, and I. Oh, "A Text Watermarking
Algorithm based on Word Classification and Inter-word
Space Statistics", Proceedings of the Seventh International
Conference on Document Analysis and Recognition
(ICDAR’03), 2003, pp. 775–779
[15] M. Niimi, S. Minewaki, H. Noda, and E. Kawaguchi, "A
Framework of Text-based Steganography Using SD-Form
Semantics Model", Pacific Rim Workshop on Digital
Steganography 2003, Kyushu Institute of Technology,
Kitakyushu, Japan, July 3-4, 2003.
[16] K. Rabah, "Steganography-The Art of Hiding
Data", Information Technology Journal, vol. 3, Issue 3, pp.
245-269, 2004.
[17] D. Huang, and H. Yan, "Interword Distance Changes
Represented by Sine Waves for Watermarking Text Images",
IEEE Transactions on Circuits and Systems for Video
Technology, vol. 11, no. 12, December 2001, pp. 1237-1245
[18] M. H. Shirali-Shahreza, and S. Shirali-Shahreza, "A
Robust Page Segmentation Method for Persian/Arabic
Document", WSEAS Transactions on Computers, vol. 4,
Issue 11, Nov. 2005, pp. 1692-1698.
[19] J.T. Brassil, S. Low, and N. F. Maxemchuk, "Copyright
Protection for the Electronic Distribution of Text
Documents", Proceedings of the IEEE, vol. 87, Issue. 7, July
1999, pp. 1181-1196.
[20] J. T. Brassil, S . Low, N. F. Maxemchuk, and L.
O’Gorman, "Marking Text Features of Document Images to
Deter Illicit Dissemination", Proceedings of the 12th IAPR
International Conference on Pattern Recognition, 1994, vol.
2, 9-13 Oct. 1994, pp. 315-319.
Proceedings of the 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS
International Workshop on Component-Based Software Engineering, Software Architecture and Reuse (ICIS-COMSAR’06)
0-7695-2613-6/06 $20.00 © 2006 IEEE
... Figure. 1 Steganography Types Secondly, Random and Statistical generation to avoid comparison with a known plaintext, steganographers often resort to generating their own cover texts. Lastly, Linguistic methods specifically consider the linguistic properties of generated and modified text, in this method a pre-selected synonyms of words are used [3][4][5]. ...
... Shirali-Shahreza, M.H. and M. Shirali-Shahreza [5] deal with the issue of text steganography, their model focuses on the letters that have points on them (example English Language had two letters i,j. while Arabic language has 15 pointed letters out of its 28 alphabet letters). ...
... Gutub,A. and M. Fattani. A in [6], "That Benefiting from Shirali-Shahreza [5] proposes a new method to hide information in any letters (Unicode system) instead of pointed ones only. This model uses the pointed letters with extension after the letters to hold secret bit "one" and the un-pointed Letters with extension to hold secret bit "zero. ...
Article
Full-text available
A lot of techniques are used to protect and hide information from any unauthorized users such as Steganography and Cryptography. Steganography hides a message inside another message without any suspicion, and Cryptography scrambles a message to conceal its contents. This paper uses a new text steganography that is applicable to work with different languages, the approach, based on the Pseudorandom Number Generation (PRNG), embeds the secret message into a generated Random Cover-text. The output (Stego-Text) is compressed to reduce the size. At the receiver side the reverse of these operations must be carried out to get back the original message. Two secret keys (Hiding Key & Extraction Key) for authentication are used at both ends in order to achieve a high level of security. The model has been applied to different message languages and both encrypted and unencrypted messages. The experimental results show the model"s capacity and the similarity test values..
... For letters with more than one dot, the displacement of all dots occurs simultaneously, and the secret text is hidden by converting it to the binary system and then moving the dots bit by bit to hide it. If the hidden bit has the value (1), the dots of the chosen letter are moved, but if the hidden bit has the value (0), the letter remains unchanged, as shown in table (11). In the case of the sequence of bits (01), a slight horizontal space will be added between the dots of the same letter. ...
... Also, if the sequence of bits is (10), the letter dots will move vertically with a slight space. In the event of the bit sequence (11), the dots will be moved in both directions (horizontal and vertical). The percentage of this movement is about (1/300), and it is so small that it is nearly invisible to the naked eye. ...
... The percentage of this movement is about (1/300), and it is so small that it is nearly invisible to the naked eye. Still, it can be determined using a specific program designed to extract hidden text from the text cover [10], [11], as shown in table (11). 2.Extended letters (Kashida): Persian and Arabic languages differ from English in that their letters are connected rather than separated in printing, with one letter joining the next, with the exception of seven letters that cannot be joined to the next, which are ‫‪).The‬ا،د،ر،و،ذ،ز،ژ(‬ extension can be found in the Persian and Arabic languages as a kind of embellishment and adjustment to equalize the length of all the lines of the text where the extension is made between any two letters except for the end of the previous seven letters. ...
Article
Full-text available
Steganography is one of the oldest methods for securely sending and transferring secret information between two people without raising suspicion. Recently, the use of Artificial Intelligence (AI) has become simpler and more widely used. Since the emergence of natural language processing (NLP), building language models using deep learning has become more Furthermore, because of the importance of concealing secret information in delivered messages, Artificial Intelligence theories along with Natural Language Processing algorithms were employed to conceal hidden information within the text cover. The Arabic language was used because of its large number of words, vocabulary, and linguistic meanings, and its most significant feature is Arabic poetry. This study discovered a new way to hide secret data inside newly formulated Arabic poetry based on previous Arabic poetic texts and a database of a number of Arab poets from the ancient and modern eras using Artificial Intelligence and Long-Short Term Memory (LSTM) theories to increase storage capacity by 45 percent. The linguistic accuracy and volume of secret data hidden within the formulated poetry were increased using a Baudot Code algorithm, where the secret data is hidden at the level of letters rather than words, and the linguistic accuracy and volume of secret data hidden within the formulated poetry were increased to eliminate the negatives found in previous studies.
... Notably, the study by [60] is a pioneering research work in the field of ATS. It suggests moving the placement of points up slightly if the hidden bit value is one; otherwise, the position should remain the same as before. ...
... Despite the efforts to improve the capability for concealing the secret bit, the robustness issue raised in [60] remains unresolved. The possible position is in a consistent format and may raise suspicion, while incorporating the retyping procedure can erase all hidden data as they rely on the file format. ...
Article
Full-text available
Despite the rapidly growing studies on Arabic text steganography (ATS) noted recently; systematic, in-depth, and critical reviews are in scarcity due to high overlap or low segregation level among the existing review articles linked to this research area. As such, the objective of this paper is to present an extensive systematic literature review (SLR) on the techniques and algorithms used to analyse ATS. Data were retrieved from three primary databases, namely Science direct Journal, IEEE Explore Digital Library, and Scopus Journal. As a result, 214 publications were identified since the past 5 years regarding methods of analysing ATS. A comprehensive SLR was executed to detect a range of unique characteristics from the algorithms, which led to the discovery of a new structure of ATS categories. Essentially, a hybrid method for ATS was identified with other sub-disciplines, especially cryptography, which leads to a new branch in enhancing security for ATS. Other relevant findings included key performance and evaluation criteria used to measure the performance of the algorithms (i.e., capacity, invisibility, robustness, & security). 87 % of the reviewed articles are the capacity measurement performance. Therefore, it disclosed a huge potential for the other two criteria (i.e. invisibility, robustness and security) to set a benchmark for future research endeavour.
... It is also often difficult to detect visually [1]. In the literature, there are format-based steganography studies using techniques such as character spacing [11], word wrapping [12], and character encoding [13][14][15][16][17][18][19][20]. In the line and word wrapping method, which is one of the methods used in the literature, the word or line is shifted up and down in order to create unique spaces for information hiding. ...
... The KLD and JSD distance metrics used are given in Eqs. (12) and (13), respectively. The ''C'' refers to the overall statistical distribution of training text here, and ''S'' refers to the overall statistical distribution of the stego text [64,87]. ...
Article
Full-text available
With the effect of digitalization, the transfer of all text documents over the Internet rather than human transmission has increased, and this situation has revealed the idea that text documents can be used as a carrier that can safely store information. Realizing that methods such as word-line shifting, usage of spaces, replacement of the word with its synonym are fragile against steganalysis, led to new searches and it was determined that deep learning models were more resistant to detecting the presence of hidden words. In this study, the text generation based on the information that is wanted to be hidden without a carrier text, both at word and character level, was performed. Arithmetic coding, perfect tree and Huffman coding methods were used as secret information embedding methods in text generation based on word level. In this part of the study, bidirectional LSTM architecture with attention mechanism was created as language model. In text generation based on character level, a new secret information embedding algorithm is created by combining the LZW compression algorithm with the Char Index (LZW-Char Index Encoding) method. The character-level model is created as a result of using the encoder–decoder architecture together with bidirectional LSTM and Bahdanau attention. The proposed method was evaluated from the perspectives of information embedding efficiency, information imperceptibility and hidden information capacity. As a result of the experiments, it was determined that the method exceeded the state-of-the-art performance and was more resistant to steganalysis.
... Numerous text steganography techniques have already existed. In [8], authors proposed a novel text steganography technique in Persian and Arabic texts. A vertical displacement of points in Persian and Arabic phrases has been calculated by authors for concealing secret information. ...
Article
Full-text available
In today’s world of computers everyone is communicating their personal information through the web. So, the security of personal information is the main concern from the research point of view. Steganography can be used for the security purpose of personal information. Storing and forwarding of embedded personal information specifically in public places is gaining more attention day by day. In this research work, the Integer Wavelet Transform technique along with JPEG (Joint Photograph Expert Group) compression is proposed to overcome some of the issues associated with steganography techniques. Video cover files and JPEG compression improve concealing capacity because of their intrinsic properties. Integer Wavelet Transform is used to improve the imperceptibility and robustness of the proposed technique. The Imperceptibility of the proposed work is analyzed through evaluation parameters such as PSNR (Peak Signal to Noise Ratio), MSE (Mean Square Error), SSIM (Structure Similarity Metric), and CC (Correlation Coefficient). Robustness is validated through some image processing attacks. Complexity is calculated in terms of concealing and retrieval time along with the amount of secret information hidden.
... A small change in punctuation marks can also be made, which will not be readable to a normal reader (Bender et al., 1996). Storage and access of text files is easier compared to other digital mediums (Shahreza et al., 2006). Text steganography is classified into three major types namely format-based methods, linguistic methods, and random and statistical generation. ...
Book
Unleashing the Art of Digital Forensics is intended to describe and explain the steps taken during a forensic examination, with the intent of making the reader aware of the constraints and considerations that apply during a forensic examination in law enforcement and in the private sector. Key Features: • Discusses the recent advancements in Digital Forensics and Cybersecurity • Reviews detailed applications of Digital Forensics for real-life problems • Addresses the challenges related to implementation of Digital Forensics and Anti-Forensic approaches • Includes case studies that will be helpful for researchers • Offers both quantitative and qualitative research articles, conceptual papers, review papers, etc. • Identifies the future scope of research in the field of Digital Forensics and Cybersecurity. This book is aimed primarily at and will be beneficial to graduates, postgraduates, and researchers in Digital Forensics and Cybersecurity.
... Authors used these pointed letters to hide secret data. By shifting points upwards to hide secret bit 1 and keep it normal distance to hide bit zero (Shirali-Shahreza, 2006). Drawback of this method is hidden information can be lost by using optical character recognition (OCR) software or retyping these documents. ...
Article
Full-text available
Recently, information security has become a very important topic for researchers as well as military and government officials. For secure communication, it is necessary to develop novel ways to hide information. For this purpose, steganography is usually used to send secret information to its destination using different techniques. In this article, our main focus is on text-based steganography. Hidden information in text files is difficult to discover as text data has low redundancy in comparison to other mediums of steganography. Hence, we use Arabic text to hide secret information using a combination of Unicode character's zero-width-character and zero-width-joiner and pseudo-space in our proposed algorithm. The experimental results show hidden data capacity per word is significantly increased in comparison to the recently proposed algorithms. The major advantage of our proposed algorithm over previous research is the high visual similarity in both cover and stego-text that can reduce the attention of intruders. الملخص العربي: في الآونة الأخيرة، أصبح أمن المعلومات موضوعاً بالغ الأهمية بالنسبة للباحثين فضلا عن المسؤولين العسكريين والحكوميين. وللتواصل الآمن من الضروري استحداث طرق جديدة لإخفاء المعلومات، ولهذا الغرض، عادة ما يستخدم علم إخفاء البيانات لإرسال معلومات سرية إلى مقصدها باستخدام تقنيات مختلفة . والهدف من هذه الأطروحة هو توفير طريقة جديدة لعلم اخفاء البيانات بالتقنية النصية. من الصعب اكتشاف المعلومات المخفية في الملفات النصية، حيث أن البيانات النصية ذات إسهاب اقل بالمقارنة مع تقنيات أخرى من علم إخفاء البيانات. ومن هنا فإننا نستخدم نصا عربيا لإخفاء المعلومات السرية باستخدام مزيج من الحرف ذو العرض الصفري، والانضمام ذو العرض الصفري، والفضاء ال ا زئف في الخوارزمية المقترحة . وتظهر النتائج التجريبية زيادة في سعة البيانات المخفية لكل كلمة مقارنة بالخوارزميات المقترحة مؤخ ا ر. الميزة الرئيسية لخوارزميتنا المقترحة على البحوث السابقة هو التشابه البصري العالي في كل من الغلاف ونص الاخفاء الذي يمكن أن يقلل من انتباه الدخلاء
Preprint
Full-text available
We are introducing and implementing a novel method for cloud data security in this paper. In this novel method, five arithmetic operations which are basic (i.e. Addition, Subtraction, Multiplication, Division and Modulo) are performed on ASCII digits of every cover text character and these values which are newly generated are used for encryption of plain text parallelly. These each new generated values and two plain text characters in ASCII code will be encrypted in parallel way from beginning and from end of plain text. One cover text character hides at most ten plain text characters in our proposed approach. Since only one cover text character is used therefore less memory is required for storing cover text on cloud storage. Since we are using basic arithmetic operations therefore execution time is reduced and due to our parallelism approach, performance of overall system is enhanced.
Article
Linguistic steganalysis is an important topic in the field of information security and signal processing. In recent years, linguistic steganalysis have mainly utilized deep learning techniques and make great success. But suffer from the following major disadvantages. From the perspective of model structure, current methods only extract coarse features of the text, without focusing on the fine-grained representations. In terms of application, most of the studies only focus on single hidden scene and ignore the more realistic mixed hidden scenes which are more complex and realistic. These weaknesses limit the performance and the application of linguistic steganalysis in reality. In this letter, we propose a novel linguistic steganalysis method to overcome these weaknesses. This proposed method can extract distinguished text representation which fuses hierarchical features and perform excellently in sophisticated conditions. Firstly, we adapt gated graph neural networks as the coarse graph updater to update node representations on the graph level. Then we design a fine graph updater composed of the graph attention mechanism to focus on the highlighted nodes on the node-level. Moreover, we extract the most notable feature on the dimension-level of node by the graph channel attention module. Finally, the readout function is designed to fuse the hierarchical features and make the classification. The experimental results show that our method achieves the best results compared with the previous methods in both single hidden scene and mixed hidden scenes, which prove the effectiveness of the proposed method.
Article
Full-text available
In this paper I introduce an improved method for hiding data in images or steganography. This method is used for secure data transfer from a computer to mobile phones. In this method a message can hide in an image on a PC using a password. The user can download this image from the computer to his mobile phone. The decoder program running on his phone will extract the hidden information by a Java program. The decoder program was installed on a Nokia 6600 mobile phone and tested by posting the students' grades over it. Key-Words - Steganography, Watermarking, Mobile phone, Wireless communication, PNG image format, J2ME, Security
Conference Paper
Full-text available
There have been many techniques for hiding messages in images in such a manner that the alterations made to the image are perceptually indiscernible. However, the question whether they result in images that are statistically indistinguishable from untampered images has not been adequately explored. We look at some specific image based steganography techniques and show that an observer can indeed distinguish between images carrying a hidden message and images which do not carry a message. We derive a closed form expression of the probability of detection and false alarm in terms of the number of bits that are hidden. This leads us to the notion of steganographic capacity, that is, how many bits can we hide in a message without causing statistically significant modifications? Our results are able to provide an upper bound on the this capacity. Our ongoing work relates to adaptive steganographic techniques that take explicit steps to foil the detection mechanisms. In this case we hope to show that the number of bits that can be embedded increases significantly
Article
Full-text available
Data hiding, a form of steganography, embeds data into digital media for the purpose of identification, annotation, and copyright. Several constraints affect this process: the quantity of data to be hidden, the need for invariance of these data under conditions where a "host" signal is subject to distortions, e.g., lossy compression, and the degree to which the data must be immune to interception, modification, or removal by a third party. We explore both traditional and novel techniques for addressing the data-hiding process and evaluate these techniques in light of three applications: copyright protection, tamper-proofing, and augmentation data embedding.
Article
Full-text available
Given the shear volume of data stored and transmitted electronically in the world today, it is no surprise that countless methods of protecting such data have evolved. One lesser-known but rapidly growing method is steganography, the art and science of hiding information so that it does not even appear to exist. Moreover, in an ideal world we would all be able to openly send encrypted email or files to each other with no fear of reprisals. However, there are often cases when this is not possible, either because you are working for a company that does not allow encrypted emails or perhaps the local government does not approve of encrypted communication. This is one of the cases where Steganography can help hide the encrypted messages, images, keys, secret data, etc. This paper discusses the purpose of steganography. Explains how steganography is related to cryptography as well as what it can and cannot be used for. It also discusses a brief history of steganography. In addition, some of the tools and software used in steganography are demonstrated and including some discussion of the most popular algorithms involved in these tools. This paper further explains the advantages and disadvantages, as well as, strengths and weaknesses in the use of steganography.
Conference Paper
Optical Character Recognition (OCR) softwares are widely used in the office automation systems. One of the first steps in the recognition of the documents is to segment the input image. Various methods have been offered for the English language. For the Persian/Arabic Language, however, no complete method has been found yet. In this paper we present a new page segmentation method for Persian/Arabic printed texts. This method has been inspired by the effect of the spreading of ink on paper. One of the most important characteristics of this method is its non-sensitivity to rotation.
Conference Paper
Continues a study of document marking to deter illicit dissemination. An experiment performed reveals that the distortion on the photocopy of a document is very different in the vertical and horizontal directions. This leads to the strategy that marks a text line both vertically using line shifting and horizontally using word shifting. A line that is marked is always accompanied by two unmarked control lines one above and one below. They are used to measure distortions in the vertical and horizontal directions in order to decide whether line or word shift should be detected. Line shifts are detected using a centroid method that bases its decision on the relative distance of line centroids. Word shifts are detected using a correlation method that treats a profile as a waveform and decides whether it originated from a waveform whose middle block has been shifted left or right. The maximum likelihood detectors for both methods are given.
Article
Steganography (a rough Greek translation of the term Steganography is secret writing) has been used in various forms for 2500 years. It has found use in variously in military, diplomatic, personal and intellectual property applications. Briefly stated, steganography is the term applied to any number of processes that will hide a message within an object, where the hidden message will not be apparent to an observer. This paper will explore steganography from its earliest instances through potential future application.
Article
Digital watermarking was introduced at the end of the 20th century to provide means of enforcing copyright protection once the use and distribution of digital multimedia data have exploded. This technology has first been intensively investigated for still images and recent efforts have been put to exhibit unifying characteristics. On the other hand, the situation is rather different in the context of video watermarking, where unrelated articles are scattered throughout the literature. The purpose of this paper is consequently to give an in-depth overview of video watermarking and to point out that it is not only a simple extension of still images watermarking. New applications have to be considered, specific challenges have to be taken up and video-driven approaches have to be investigated.
Conference Paper
Upon development of m-commerce as one of the new branches of e-commerce, m-banking has emerged as one of the main divisions of m-commerce. As the m-banking was received very well, it has embarked upon supply of various services based on different systems and with the aid of various services such as the short messaging service (SMS). However, in spite of its advantages, m-banking is facing some challenges as well. One of these challenges is the issue of security of this system. This paper presents a method for increasing security of the information requested by users with the use of steganography method. In this method, instead of direct sending of the information, it is hidden in a picture by the password and is put on a site. Then the address of the picture is sent to the user. After receiving the address of the picture through SMS, the user downloads the picture by a special program. After entering the password, the user can witness the information extracted from the picture if the password is entered correctly. This project is written in J2ME language (Java 2 Micro Edition) and has been implemented on Nokia mobile phones, models N71 and 6680
Article
In this paper, we propose a new method for watermarking electronic text documents that contain justified paragraphs and irregular line spacing. The proposed method uses a spread-spectrum technique to combat the effects of irregular word or line spacing. It also uses a BCH (Bose-Chaudhuri-Hocquenghem) error coding technique to protect the payload from the noise resulting from the printing and scanning process. Watermark embedding in a justified paragraph is achieved by slightly increasing or decreasing the spaces between words according to the value of the corresponding watermark bit. Similarly, watermark embedding in a text document with variable line spacing is achieved by slightly increasing or decreasing the distance between any two adjacent lines according to the value of the watermark bit. Detecting the watermark is achieved by measuring the spaces between the words or the lines and correlating them with the spreading sequence. In this paper, we present an implementation of the proposed algorithm and discuss its simulation results.