Conference PaperPDF Available

Text Steganography Using Language Remarks

Authors:

Abstract and Figures

With the rapid growth of networking mechanisms, where large amount of data can be transferred between users over different media, the necessity of secure systems to maintain data privacy increases significantly. Different techniques have been introduced to encrypt data during the transfer process to avoid any kind of attack. One of these techniques is to hide the data inside another file which is called Steganography. In steganography, data is hidden inside a carrier file where anyone can see, but the hidden data inside it cannot be discovered. To this end, good algorithms can avoid the suspicion of having any attacker by applying some criteria before sending the data. In this paper, we present an algorithm to hide data using a text file as a carrier. Left-Right Remarks that represent Unicode symbols are used to hide the data inside the text file. Moreover, our algorithm can be applied in different size textual data.
Content may be subject to copyright.
2013 ASEE Northeast Section Conference Norwich University
Reviewed Paper March 14-16, 2013
Text Steganography Using
Language Remarks
Ammar Odeh
1
, Khaled Elleithy, Miad Faezipour
Abstract With the rapid growth of networking mechanisms, where large amount of data can be transferred
between users over different media, the necessity of secure systems to maintain data privacy increases significantly.
Different techniques have been introduced to encrypt data during the transfer process to avoid any kind of attack.
One of these techniques is to hide the data inside another file which is called Steganography. In steganography, data
is hidden inside a carrier file where anyone can see, but the hidden data inside it cannot be discovered. To this end,
good algorithms can avoid the suspicion of having any attacker by applying some criteria before sending the data. In
this paper, we present an algorithm to hide data using a text file as a carrier. Left-Right Remarks that represent
Unicode symbols are used to hide the data inside the text file. Moreover, our algorithm can be applied in different
size textual data.
Keywords: Steganography, Carrier file, Text Steganography, Information Hiding, Stegoanalysis.
I. INTRODUCTION
A. Background
Steganography is a security mechanism used to hide data inside a carrier file such as image, sound, video, or text
[1]. The main idea is to hide data inside the carrier file and then placing the Stego file in some transport media.
Stegoanalysis will start analyzing the data if there is any suspicion about the carrier file. Some file properties will be
basic rules for the analyzer to discover the hidden data. The file size and file format are examples of such
properties. As shown in Figure 1, Steganography is mainly classified into four categories depending on the type of
the carrier file, i.e. image, audio, video, or text. Moreover, Text Steganography can be classified into different
categories depending on the file application.
Most of Steganography algorithms are applied on images which contain huge amount of data. The Least
Significant Bit replacement algorithm (LSB) is one such Steganography algorithm [2]. Other complex algorithms
have also been introduced to be applied on images. However, the main problems are[3]:
1. File Size :- Image file sizes are already relatively large compared to other files.
2. Image Distortion: - The replacement of some bits may destroy/distort the image, and this will enable the
Stegoanalysis to acquire the hidden data [4].
3. Deterministic Changes: The same deterministic algorithm will produce the same distribution bits over the
image and this will produce the same hidden image area style. In other words, if we try to replace white
pixels by red ones, all white pixels will be converted to red, and this way, the original file could be easily
extracted.
Audio carrier files also have some weak points, since any audio signal can be converted and processed in
frequency domain and by computing the lower control limit and upper control limit, we can deduce if there are any
hidden data in that file. Video carrier files have the disadvantages of merging the weaknesses in sound and image
files [5][6].
Text files represent the smallest files in terms of size that can be used to transfer data from sender to receiver,
when compared with the other carrier files [7]. Moreover, huge amount of textual data over the internet enables us to
hide data over different websites and update those websites with a new style of hidden information that can be
1
University of Bridgeport, Bridgeport CT 06604, aodeh@bridgeport.edu
2013 ASEE Northeast Section Conference Norwich University
Reviewed Paper March 14-16, 2013
embedded within the files. On other hand, text files represent the most difficult Steganography carrier files that do
not have redundant patterns like other carrier files [3].
B. Main Contributions and Paper Organization
A promising text steganography algorithm is presented in this paper. The main idea is to use the Right-to-Left
Remark (U200F) and the Left-to-Right Remark (U200E) to hide secret data. In our algorithm, we also suggest
optimization techniques to offer the highest performance to achieve “Magic Triangle Concepts for Steganography;
that is, the function ability to achieve transparency, robustness, and hiding capacity.
The rest of this paper is organized as follows. In Section II we discuss previous text Steganography techniques.
Our proposed Remarks algorithm is discussed in Section III. Discussion and analysis of our algorithm are also
provided in the same Section. Simulation results are provided in Section IV. Finally, concluding remarks are
offered in Section V.
Figure 1. Steganography Carrier Media classification
II. RELATED WORKS
Text Steganography can be classified into three categories depending on the hidden information methods;
linguistic, format, and random. Different linguistic methods are classified into two categories. The first one is
syntax and the other one is semantic methods [8]. These methods have been developed by creating a dictionary of
synonyms and creating representations of each word by bit. Authors in [8] presented a synonyms algorithm to hide
data in Bahasa Melayu language, where the hidden algorithm was divided into two phases. The first step converted
hidden message into binary code using ASCII codes. Then, a synonyms file was created, where the sender and
recipient must have same word list to encrypt and decrypt the message. If the sender wants to insert a zero in the
text, there is no need for word replacement. Otherwise, the word is replaced from the synonyms file. The same
strategy will be iterated until the end of the secret message is reached. The recipient can decrypt the message by an
inverting strategy and comparing if a replacement occurs, in which case the secret code is 1.
Another similar technique was presented in [9]. The algorithm consisted of three input sources; natural language,
secret message and the key; and one output which was the Stego-object. By creating lexical substitutions set and
variant forms of the same word, after the first scan, the system will recognize each word and to which set it belongs
to. The lexical analyzer was then used for Chinese language to embed the correct word in the carrier files and take
the context into consideration.
Stganography carrier
Image Video Audio Text
HTML Document TXT
2013 ASEE Northeast Section Conference Norwich University
Reviewed Paper March 14-16, 2013
In [10], the authors introduced two linguistic methods using Telugu language (spoken language in the state of
Andhra and other states of India). The first method used one of Telugu characteristics by classifying characters into
two groups, where the first group would pass 0 and the other group would pass 1. The other method applied Telugu
language punctuation marks by distributing them into four groups, each group used to pass two bits.
Another Text Stegonography algorithm was introduced in [11], where the space character was added after words
and two bits were encoded. Depending on the number of word letters, and the number of space characters after that
word, one of the values in the set {00, 01, 10, 11} would be passed. Authors in [3] also introduced another space
method. Single spaces were used to pass 0, and double spaces were used to pass 1. The previous two methods have a
problem in which a word processor highlights the additional spaces.
In [11] a new method was introduced to hide data inside Telugu text by horizontally shifting inherent vowel signs.
The main advantage of this method is that huge amount of data can be hidden inside the text file. Another algorithm
was introduced in [12] by merging between three languages Chinese, Arabic, and English. At the beginning, the
authors created two tables; the first one storing Arabic Diacritics and the other table storing English letters. By
translating Chinese text into English sentences, each English letter would correspond to two Arabic Diacritics. Then,
the Arabic text was created which contained selected Diacritics.
III. PROPOSED ALGORITHM
In this work, the idea is to hide data inside a word file without any change in the file format. Stegoanalysis will try
to analyze the file content and formatting. If there is any change in the file format, it can catch the hidden data. In
our algorithm, we will use the Right-to-Left Remark (U200F) symbol and the Left-to-Right Remark (U200E)
symbol to hide bits inside the message. Our method will not change the format of the file and can also be
applied to different languages regardless of the UNICODE or ASCII coding. Moreover, it is easy to apply this
method using Microsoft office word application to hide data.
To avoid the retyping problem that the attacker may employ, we convert our file to PDF, which prevents anyone
from ediingt it.
Scenarios to hide the data are as follows:
1. (00) add nothing
2. (01) add Right-to-Left Remark (U200F)
3. (10) Left-to-Right Remark (U200E)
4. (11) Left-to-Right Remark (U200E), Right-to-Left Remark (U200F)
By applying one of these four cases, we can hide data without any changes in the file information.
A. Algorithm I: Hiding Data
Input: - Carrier file, hidden bits file
Output: - Stego file (embedded U200E && U200F file)
Step1:- Choose any DOC file
Step 2. Repeat while !(EOF)// repeat until the end of the hidden file
Step3: Embed hidden data in the selected file
Step 3a. Start from first letter of the carrier file
Step 3b. Pack out the first two hidden bits
If 00 then no U200F nor U200E
Else if 01 then there is U200F
Else if 10 then there is U200E
Else add U200F and U200E.
2013 ASEE Northeast Section Conference Norwich University
Reviewed Paper March 14-16, 2013
Step 5: Go to step 2
Step 6 : Save file as PDF then send it to other side.
B. Algorithm II: Data Extraction
Input:-Stegofile
Output: - Secure data, original file
Step1:- Open PDF Message
Step 2. Repeat while!(EOF)// repeat until the end of Stego file
Step 3: Embed hidden data in selected file
Step 3a. Separate each letter
Step 3b.
If there is nothing then 00
Else if only U200F then 01
Else if U200E then it’s a 10
Else, 11
Step 4: Go to step 3
Step 5: Read hidden data.
C. Algorithm: Optimization
Our algorithm has some main advantages which are listed below. Other advantages are also provided in Section
IV. The main advantages are:
a) File format will not be affected by embedding the Stego data
b) The algorithm be applied to any language
However, the file size depends on hidden data, which may increase dramatically. Therefore, we suggest the
following solution to solve the file size issue.
Before we embed data, we will collect statistical information about the percentage of ones and zeros and apply the
following strategy:







(1)
The best case would be the case where all hidden data are zeros or ones. In this case, the file size will not change
at all. However, the worst case is when half of the hidden data are zeros and the other half are ones. Therefore, the
best way to optimize our work is to find the largest sequence of string that contains zeros or ones. The file size can
be then optimized by considering the relationship in Equation 1.
IV. SIMULATION RESULTS
Our simulation results are divided into two parts. The first one is concerned about which optimization step is
employed, as shown in Table I. We created secret messages, converted the messages into ASCII, and computed the
number of ones and zeros in each message. Based on Equation 1, the table provides us with a decision as to what
would be the most optimized step to proceed with.
2013 ASEE Northeast Section Conference Norwich University
Reviewed Paper March 14-16, 2013
Table I. Optimization algorithm decision
Message
Number of Bits
Applied algorithm
Steganography
104

How are you
88

See You
56

At 10
40

See You At 10
104

From our simulation results we conclude that the best way to optimize the embedded message with respect to the
file size is to separate our message word by word (where the space binary code is 00100000), and apply formulation
(1) to each word. For example, if our secret message is “See You At 10”, and if we apply the Scenario 1, the file size
would increase, and this may lead to violating one of the important steganography concepts; transparency. In
contrast, if we split the message into parts, and apply the best scenario to each part, one case is that the message
“See You At 10” could be divided into two parts, where “See You” will use scenario 4, and “At 10” will use
scenario 1. By using the switching scenarios strategy, storage space will be saved as much as possible, and this will
improve the transparency goal.
In Table II, we analyze the ability of a few websites to hide bits and also compute the capacity ratio for each (see
Equation 2). In our experiments, we assume that hidden bits are inserted between any two words to make it easier to
decrypt by finding the space in the file and then finding the Remarks.
Table II. The capacity of articles in web pages for hiding data
#
Website Article
Text Size
(Kilo Byte)
Capacity
Ratio
1
www.nydailynews.com
8.8
674
2
www.aljazeera.com
18.7
637
3
www.englisharticles.info
15.9
610
4
www.latimes.com
14.8
586
Capacity Ratio= (Number of hidden bits/carrier file size) %100 (2)
It's interesting to note that the average number of word letters in any English file is 9.2 letters [13]. The capacity
ratio can be calculated from Equation 2. In addition, the proposed Remarks algorithm can be applied regardless of
the language being used.
In summary, the Remarks algorithm has the following advantages:
2013 ASEE Northeast Section Conference Norwich University
Reviewed Paper March 14-16, 2013
1. Language independent: - Remarks algorithm can be applied in any language. This feature enables users to
hide data in different file formats (Unicode, ASCII). This is while other algorithms depend on language
characteristics, which limits the algorithm flexibility.
2. Improved transparency: - This algorithm improves the transparency feature since the Stego file format seems
as the original file.
3. File format: - Our method is not dependent on any special format. This allows the use of the carrier text in
different formats such as HTML pages, Microsoft Word documents or even plain text format.
4. Algorithm optimization: - Our method suggests optimization steps to reduce the file size change.
5. Hiding capacity: Remarks algorithm enable users to hide huge amount of data between two letters. Any two
users can determine where the suitable place to insert bits would be. In our simulations, we used the space
between two words to hide one word, where the whole message can also be hidden in one space.
V. CONCLUSION
Different algorithms have been presented to hide data inside text files. Some of these methods were designed to be
applied in specific languages [8][9], while others can be applied regardless of the language. In this paper, we
presented a promising algorithm that can be used to hide data inside text files of any language by using Remarks
(Right, Left). In our method, we pass two bits in each symbol. Moreover, we suggest optimization techniques that
can be used to minimize the file size and insert huge amount of data.
REFERENCES
[1] V. Potdar, E. Chang. Visibly Invisible: Ciphertext as a Steganographic Carrier, Proceedings of the 4
th
International Network Conference (INC2004), pp. 385391, Plymouth, U.K., July 69, 2004.
[2] T. Morkel, J.H.P. Eloff and M.S. Olivier, An Overview of Image Steganography, in H.S. Venter, J.H.P.
Eloff, L. Labuschagne and M.M. Eloff (eds), Proceedings of the Fifth Annual Information Security South
Africa Conference (ISSA2005), Sandton, South Africa, June/July 2005, (Published electronically).
[3] W. Bender,D. Gruhl, N. Morimoto, A. Lu, "Techniques for Data Hiding, IBM Systems Journal, Vol. 35, pp.
313 - 336, 1996.
[4] N. Johnson, S. Katzenbeisser, "A Survey of Steganographic Techniques, Chapter 3 in Stefan Katzenbeisser
(ed.), Fabien A. P. Petitcolas (ed.) Information Hiding Techniques for Steganography and Digital
Watermarking, Artech House Books, 2000.
[5] P. Jayaram, H. Ranganatha, H. Anupama , Information Hiding Audio Steganography - A Survey,
International Journal of Multimedia & Its Applications (IJMA), Vol. 3, pp. 86-96, Aug. 2011.
[6] A. Al-Othmani, A. Abdul, A. Zeki, A Survey on Steganography Techniques in Real Time Audio Signals and
Evaluation, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 1, pp. 30-37, Jan. 2012.
[7] H. Singh, P. Singh, K. Saroha, A Survey on Text Based Steganography,” Proceedings of the 3
rd
National
Conference, pp. 3-9, INDIACom, 2009.
[8] R. Prasad, K. Alla, A New Approach to Telugu Text Steganography, Proceedings of the IEEE Wireless
Technology and Applications Conference (ISWTA), pp. 60 - 65, 2011.
[9] L. Yuling, S. Xingming, G. Can, W. Hong, “An Efficient Linguistic Steganography for Chinese Text,
Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 2094 - 2097, 2007.
[10] S. Bhattacharyya, I. Banerjee, G. Sanyal, A Novel Approach of Secure Text Based Steganography Model
using Word Mapping Method (WMM), International Journal of Computer and Information Engineering,
Vol. 4, No. 2, pp. 96-103, 2010.
[11] S. Tech, S. Pothalaiah, K. Babu, A New Approach to Telugu Text Steganography by Shifting Inherent
Vowel, International Journal of Engineering Science and Technology, Vol. 2, No. 12, pp. 7203-7214, 2010.
[12] A. Shakir, G. Xuemai, J. Min, Chinese Language Steganography using the Arabic Diacritics as a Covered
Media, International Journal of Computer Applications, Vol. 11, No. 1, pp. 43-46, Dec. 2010.
[13] R.D. Smith, Distinct Word Length Frequencies: Distributions and Symbol Entropies, Journal of
Glottometrics 23, pp. 7-22, 2012.
2013 ASEE Northeast Section Conference Norwich University
Reviewed Paper March 14-16, 2013
Ammar Odeh is a PhD. Student in University of Bridgeport. He earned the M.S. degree in Computer Science
College of King Abdullah II School for Information Technology (KASIT) at the University of Jordan in Dec. 2005
and the B.Sc. in Computer Science from the Hashemite University. He has worked as a Lab Supervisor in
Philadelphia University (Jordan) and Lecturer in Philadelphia University for the ICDL courses and as technical
support for online examinations for two years. He served as a Lecturer at the IT, (ACS,CIS ,CS) Department of
Philadelphia University in Jordan, and also worked at the Ministry of Higher Education (Oman, Sur College of
Applied Science) for two years. Ammar joined the University of Bridgeport as a PhD student of Computer Science
and Engineering in August 2011. His area of concentration is reverse software engineering, computer security, and
wireless networks. Specifically, he is working on the enhancement of computer security for data transmission over
wireless networks. He is also actively involved in academic community, outreach activities and student recruiting
and advising.
Dr. Khaled Elleithy is the Associate Dean for Graduate Studies in the School of Engineering at the University of
Bridgeport. He has research interests are in the areas of network security, mobile communications, and formal
approaches for design and verification. He has published more than two hundereds research papers in international
journals and conferences in his areas of expertise. Dr. Elleithy is the co-chair of the International Joint Conferences
on Computer, Information, and Systems Sciences, and Engineering (CISSE). CISSE is the first
Engineering/Computing and Systems Research E-Conference in the world to be completely conducted online in
real-time via the internet and was successfully running for six years. Dr. Elleithy is the editor or co-editor of 12
books published by Springer for advances on Innovations and Advanced Techniques in Systems, Computing
Sciences and Software.
Dr. Miad Faezipour is an Assistant Professor in the Computer Science and Engineering program at the University
of Bridgeport and the director of the D-BEST Lab since July 2011. Prior to joining UB, she has been a Post-
Doctoral Research Associate at the University of Texas at Dallas collaborating with the Center for Integrated
Circuits and Systems and the Quality of Life Technology laboratories. She received the B.Sc. in Electrical
Engineering from the University of Tehran, Tehran, Iran and the M.Sc. and Ph.D. in Electrical Engineering from the
University of Texas at Dallas. Her research interests lie in the broad area of biomedical signal processing and
behavior analysis techniques, high-speed packet processing architectures, and digital/embedded systems. Dr.
Faezipour is a member of IEEE and IEEE women in engineering.
... The authors of [54] used right-to-left and left-to-right remark to conceal information. This embedded method hides the secret data/message without changing the file's information. ...
Article
Full-text available
Protecting sensitive information transmitted via public channels is a significant issue faced by governments, militaries, organizations, and individuals. Steganography protects the secret information by concealing it in a transferred object such as video, audio, image, text, network, or DNA. As text uses low bandwidth, it is commonly used by Internet users in their daily activities, resulting a vast amount of text messages sent daily as social media posts and documents. Accordingly, text is the ideal object to be used in steganography, since hiding a secret message in a text makes it difficult for the attacker to detect the hidden message among the massive text content on the Internet. Language’s characteristics are utilized in text steganography. Despite the richness of the Arabic language in linguistic characteristics, only a few studies have been conducted in Arabic text steganography. To draw further attention to Arabic text steganography prospects, this paper reviews the classifications of these methods from its inception. For analysis, this paper presents a comprehensive study based on the key evaluation criteria (i.e., capacity, invisibility, robustness, and security). It opens new areas for further research based on the trends in this field.
... The steganography impelementationis could be applied in private communication, security system protection and any confidential data that is commonly used by government, military, industry and etc [5]. From that history, steganography is introduced as information hiding field that hides the confidential information to avoid the message from the third party [6]. Moreover, one characteristic of steganography is securing the important message in every medium to hide in the text [7]. ...
Article
The implementation of steganography in text domain is one the crutial issue that can hide an essential message to avoid the intruder. It is caused every personal information mostly in medium of text, and the steganography itself is expectedly as the solution to protect the information that is able to hide the hidden message that is unrecognized by human or machine vision. This paper concerns about one of the categories in steganography on medium of text called text steganography that specifically focus on feature-based method. This paper reviews some of previous research effort in last decade to discover the performance of technique in the development the feature-based on text steganography method. Then, ths paper also concern to discover some related performance that influences the technique and several issues in the development the feature-based on text steganography method.
... However, easy to attack that can make the removed hidden message also existed in several technique likes retyping in Arabic based using reversed fatah technique (Memon et al., 2008) and vertical displacement of the point (Odeh et al., 2012 ), specific matra technique (Changder et al., 2009). Secondly, problem in implementation the algorithm in English based using machine translation has often error encode algorithm (Stutsman et al., 2006), in FSM technique in Hindi based (Changder et al., 2010), Right-to-Left Remark and Left-to-Right remark (Odeh et al., 2013 ). Some performance of technique also dependable ASCII in English based such as technique in remark joiner (Odeh et al., 2014 ), dependable with vowel and consonant word in technique SSCE (Bhattacharyya et al., 2011) and in Arabic based technique using letter point is dependable with extension character (Gutub and Fattani, 2007 ). ...
Conference Paper
Full-text available
This papers presents several steganography method on text domain based on the perspective of researchers effort in last decade. It has been analyzed the categories of method steganography in medium of text; text steganography and linguistic steganography. The following aim on this paper is identifying the typical these two methods in order to recognize the comparison technique used in previous study. Especially, the explication techniques of text steganography which consist of word-rule based and feature-based technique is critical concern in this paper. Finally, the advantage characteristic and drawback on these techniques in generally also presented in this paper.
... Remarks Steganography Algorithm uses a text file as a carrier to hide the data inside it [12]. The main goal of the algorithm is to hide data inside a word file without any changes in the file format. ...
Conference Paper
Full-text available
This paper investigates eight novel Steganography algorithms employing text file as a carrier file. The proposed model hides secret data in the text file by manipulating the font format or inserting special symbols in the text file. Furthermore, the suggested algorithms can be applied to both Unicode and ASCII code languages, regardless of the text file format. In addition, a merging capability among the techniques is introduced, which allows alternatives for users based on the system requirements. The proposed algorithms achieve a high degree of optimized Steganography attributes such as hidden ratio, robustness, and transparency.
... Text Steganography is classified into different categories. One of the most popular text Steganography methods is semantic Steganography [7]. This technique makes use of synonyms in the same language or similar languages such as American English and British English. ...
Conference Paper
Full-text available
Different security strategies have been developed to protect the transfer of information between users. This has become especially important after the tremendous growth of internet use. Encryption techniques convert readable data into a ciphered form. Other techniques hide the message in another file, and some powerful techniques combine hiding and encryption concepts. In this paper, a new security algorithm is presented by using Steganography over HTML pages. Hiding the information inside Html page code comments and employing encryption, can enhance the possibility to discover the hidden data. The proposed algorithm applies some statistical concepts to create a frequency array to determine the occurrence frequency of each character. The encryption step depends on two simple logical operations to change the data form to increase the complexity of the hiding process. The last step is to embed the encrypted data as comments inside the HTML page. This new algorithm comes with many advantages, such as generality, applicability to different spoken languages, and can be extended to other Web programming pages such as XML, ASP.
... Text Steganography is classified into different categories. One of the most popular text Steganography methods is semantic Steganography [7]. This technique makes use of synonyms in the same language or similar languages such as American English and British English. ...
Chapter
Full-text available
Different security strategies have been developed to protect the transfer of information between users. This has become especially important after the tremendous growth of internet use. Encryption techniques convert readable data into a ciphered form. Other techniques hide the message in another file, and some powerful techniques combine hiding and encryption concepts. In this paper, a new security algorithm is presented by using Steganography over HTML pages. Hiding the information inside Html page code comments and employing encryption, can enhance the possibility to discover the hidden data. The proposed algorithm applies some statistical concepts to create a frequency array to determine the occurrence frequency of each character. The encryption step depends on two simple logical operations to change the data form to increase the complexity of the hiding process. The last step is to embed the encrypted data as comments inside the HTML page. This new algorithm comes with many advantages, such as generality, applicability to different spoken languages, and can be extended to other Web programming pages such as XML, ASP.
Article
Full-text available
Today’s large demand of internet applications requires data to be transmitted in a secure manner. Datatransmission in public communication system is not secure because of interception and impropermanipulation by eavesdropper. So the attractive solution for this problem is Steganography, which is the artand science of writing hidden messages in such a way that no one, apart from the sender and intendrecipient, suspects the existence of the message, a form of security through obscurity. Audio steganographyis the scheme of hiding the existence of secret information by concealing it into another medium such asaudio file. In this paper we mainly discuss different types of audio steganographic methods, advantages anddisadvantages.
Article
Full-text available
In a modern era of Information Technology, illicit copying and illegal distribution accompany the adoption of widespread electronic distribution of copyrighted material. This is the main reason why people think about how to protect their work and how to prevent such unlawful activities. For this purpose various methods including cryptography, steganography, coding and so on have been used. Steganography is the best-suited technique that allow user to hide a message in another message (cover media). Most of steganography research uses cover media as pictures, video clips and sounds. However, text steganography is not normally preferred due to the difficulty in finding redundant bits in text document. To embed information inside a document its characteristics should be altered. These characteristics can be either the text format or characteristics of the character. But the problem is that if slight change has been done to the document then it will become visible to the third party or attacker. The key to this problem is that to alter the document in such a way that it is simply not visible to the human eye yet it is possible to decode it with computer. For this purpose various methods of text-based steganography have been purposed like line shifting, word shifting, feature coding, white space manipulation etc. In this paper, we present an overview of the steganography, with a particular focus on text-based steganography in details.
Article
Full-text available
Steganography has proven to be one of the practical ways of securing data. It is a new kind of secret communication used mainly to hide secret data inside other innocent digital mediums. Most of existing steganographic techniques use digital multimedia files as cover mediums to hide secret data. Audio files and signals make appropriate mediums for steganography due to the high data transmission rate and the high level of redundancy. Hiding data in real time communication audio signals is not a simple mission. Steganography requirements as well as real time communication requirements are supposed to be met in order to construct a useful and useful data hiding application. In this paper we will survey the general principles of hiding secret information using audio technology, and provide an overview of current functions and techniques. These techniques will be evaluated across both, steganography and real time communication requirements.
Article
Full-text available
The goal of steganography is to avoid drawing suspicion to the transmission of a hidden message. If suspicion is raised then this goal is defeated. The success of steganography, to a certain extent, depends on the secrecy of the cover medium. Once the steganographic carrier is disclosed then the security depends on the robustness of the algorithm used. Hence, to maintain secrecy either we have to make the cover medium more robust against steganalysis or discover new and better cover mediums. We consider the latter approach much more effective, since old techniques get prone to steganalysis. In this paper, we present one such cover medium. We propose to use ciphertext as a steganographic carrier. (114 words)
Article
Full-text available
The distribution of frequency counts of distinct words by length in a language's vocabulary will be analyzed using two methods. The first, will look at the empirical distributions of several languages and derive a distribution that reasonably explains the number of distinct words as a function of length. We will be able to derive the frequency count, mean word length, and variance of word length based on the marginal probability of letters and spaces. The second, based on information theory, will demonstrate that the conditional entropies can also be used to estimate the frequency of distinct words of a given length in a language. In addition, it will be shown how these techniques can also be applied to estimate higher order entropies using vocabulary word length.
Conference Paper
Full-text available
Linguistic steganography, as a method of text steganography, is becoming a hot spot. To investigate the linguistic steganography for Chinese text, a Chinese linguistic steganography algorithm is presented by utilizing the existing Chinese information processing techniques. The algorithm is based on the substitution of synonyms and variant forms of the same word. Furthermore, in order to decrease the interaction between the surrounding words and the substituted word, the contextual window of sentence is taken into account by using the disambiguation function of Chinese lexical analysis. Experimental results show that the algorithm can achieve a good result with the imperceptibility, a degree of information-carrying capacity and the performance of resistant to steganalysis.
Conference Paper
Full-text available
Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information. Many different carrier file formats can be used, but digital images are the most popular because of their frequency on the Internet. For hiding secret information in images, there exists a large variety of steganographic techniques some are more complex than others and all of them have respective strong and weak points. Different applications have different requirements of the steganography technique used. For example, some applications may require absolute invisibility of the secret information, while others require a larger secret message to be hidden. This paper intends to give an overview of image steganography, its uses and techniques. It also attempts to identify the requirements of a good steganographic algorithm and briefly reflects on which steganographic techniques are more suitable for which applications.
Article
Full-text available
Data hiding, a form of steganography, embeds data into digital media for the purpose of identification, annotation, and copyright. Several constraints affect this process: the quantity of data to be hidden, the need for invariance of these data under conditions where a "host" signal is subject to distortions, e.g., lossy compression, and the degree to which the data must be immune to interception, modification, or removal by a third party. We explore both traditional and novel techniques for addressing the data-hiding process and evaluate these techniques in light of three applications: copyright protection, tamper-proofing, and augmentation data embedding.
Conference Paper
The tremendous growth in the use of Internet has led to a continuous improvements in the area of security. The improved security techniques are used to preserve the intellectual property. There are many kinds of security mechanisms. New steganographic methods are proposed to embed secret information into a text cover media to search for new possibilities for employing new languages other than English. Here in this paper the authors propose two new methods using Linguistic steganography. By taking the help of Telugu consonant character modifiers and its punctuation marks authors have developed security mechanisms. These markers can be efficiently used as information carriers to hide information into a cover text. In this paper the authors propose a few text based steganographic methods. The proposed methods work by using the linguistic properties of Telugu language. The first method selects embed position of the secret information in the cover text by using Telugu Ottulu. Based on the two level classification of Ottulu, they are assigned a bit 0 or a bit 1. These symbols embed the secret information in the third character of Telugu cover Text data. It maps a single bit of the data with a Telugu character in the specified manner. The second method uses the Telugu linguistic punctuation marks classification. These punctuation marks are classified using a four level classification. Using a four level classification, these punctuation marks are encoded with a 00, 01, 10 and 11 respectively. Secret information is embedded in a Telugu Text. Whenever the application sees a certain punctuation mark it codes this one as secret information. At the receiving side, corresponding opposite methods are applied to get back the original secret message. To improve it further, the information has been encrypted using various cryptographic methods and then this cipher text can also be embedded in the cover text.
Article
Over last two decades, due to hostilities of environment over the internet the concerns about confidentiality of information have increased at phenomenal rate. Therefore preventing unauthorized information access has been a prime consideration for growing use of steganography techniques for applications like copyright protection, feature tagging and secret communication. As a result, steganography has become an interesting and challenging field of research striving to achieve greater immunity of hidden data against signal processing operations on the host cover media like image, audio, or text.A good steganography technique should offer immunity of hidden data against lossy compression, scaling, interception, modification, or removal etc. and ensure that embedded data remains inviolate and recoverable.In this work the authors propose a new text based steganographic model along with a novel steganography method for hiding the information. The proposed approach works by selecting the embedding position of the secret information in the cover text using some mathematical function and map each two bit of the secret information in those selected position in a specified manner. As a further improvement of security level, the information has been encrypted first through the genetic operator crossover and then embedded into an innocuous cover text to form the stego text with minimum degradation. At the receiving end different opposite processes should run to get the back the original secret message.The proposed system is also capable of checking the authenticity of the secret message through integer wavelet transform.