Text hiding is an intelligent programming technique, which embeds a secret message (SM) or watermark (ω) into a cover text file or message (CM/CT) in an imperceptible way to protect confidential information. Recently, text hiding in forms of watermarking and steganography has found broad applications in, for instance, covert communication, copyright protection, content authentication, and so on. It has also been widely considered as an attractive technology to improve the use of conventional cryptography algorithms in the area of multimedia security by concealing information into a cover being protected. In general, information hiding or data hiding can be categorized into two classifications: watermarking and steganography. While watermarking attempts to concern the robustness of the embedded watermark/signature at the expense of embedding capacity, steganography tries to embed as much secret information as feasible into a cover media. In contrast to text hiding, text steganalysis is the process and science of identifying whether a given carrier text file/message has a hidden message (HM) in it, and, if possible, extracting/detecting the embedded hidden information. In practice, steganalysis evaluates the efficiency of information hiding algorithms, meaning a robust watermarking/steganography algorithm should be invisible (or irremovable) not only to Human Vision Systems (HVS) but also to intelligent data processing attacks. Since the digital text is one of the most widely used digital media on the Internet, the significant part of Web sites, social media, articles, eBooks, and so on is only plain text. Thus, copyrights protection of plaintexts is still a remaining issue that must be improved to provide proof of ownership and obtain the integrity rate. During the last decade, digital watermarking and steganography techniques have been used as alternatives to prevent tampering, distortion, and media forgery attacks and also to protect both copyright and authentication.
As yet, text hiding and steganalysis have drawn relatively less attention compared to data hiding in other media such as image, video, and audio. This dissertation aims to focus on this relatively neglected research area and has three main objectives as follows.
1) We discuss various types of text hiding algorithms, and their limitations in digital text documents and messages as well as the definition of the common evaluation criteria. We theoretically analyze the efficiency of the existing text hiding methods concerning the evaluation criteria. Then, we conduct a set of experiments on the real examples to evaluate the efficiency of existing techniques and their limitations and investigate the performance of structural-based text hiding techniques. Our findings confirm that the structural-based text hiding approaches provide better efficiency compared to other state-of-the-art methods. Thus, we outline some guidelines and directions to enhance the efficiency of structural-based techniques in digital texts for future works.
2) We propose a novel text steganography technique called AITSteg, which affords end-to-end secure conversation via SMS or social media between smartphone users. To meet this requirement, we investigate the trade-off between invisibility, embedding capacity, and distortion robustness criteria by considering proper embeddable locations for hiding the SM into the CM using Unicode Zero Width characters (ZWC). We then experiment the proposed technique concerning evaluation criteria by implementing it on some real CM examples. The experiments confirm that the AITSteg can prevent different attacks, including man-in-the-middle attack, message disclosure, and manipulation by readers. Also, we compare the experimental results with the existing approaches for showing the superiority of the proposed technique. To the best of our knowledge, this is the first technique that provides end-to-end hidden transmission of SM in the cover of text message using symmetric keys via social media.
3) We present an intelligent watermarking technique called ANiTW which utilizes an instance-based learning algorithm to hide an invisible watermark (ω) into Latin cover text-based information (CT) such that the ω can be extracted, even if a malicious user manipulates a portion of the watermarked information. We experiment with the ANiTW by implementing it on 16 social media applications (SMAs) and some real CT examples concerning evaluation criteria. Experiments demonstrate that the ANiTW can identify the integrity rate and ownership of watermarked information on social media, where there is a doubt about its originality. To the best of our knowledge, this is the first intelligent text watermarking technique that provides an invisible signature for forensic identification of spurious information on social media by evaluating the manipulation rate of watermarked information, while the other existing approaches only consider the robust/fragile marking of signature into the CT.