Conference PaperPDF Available

Has F5 really been broken?

  • Norwegian University of Science and Technology, Ålesund, Norway

Abstract and Figures

The publicly-available F5 software (F5Software) implementation takes a possibly compressed cover image, decompresses it if necessary, and embeds the hidden message during a second compression process. This procedure introduces a risk that the stego image goes through ‘double compression’. While this is not a problem from the embedding and extraction point of view, any steganalysis process trained on such a scheme will potentially detect artifacts caused either by the embedding process or the second compression process. In this paper we review published steganalysis techniques on F5. By re-implementing an isolated F5 embedding algorithm excluding the decompression and recompression process (F5Py), we show that published steganalysis techniques are unable to defeat F5 when its ideal operational condition is not violated. In other words, published techniques most likely detected the compression artifacts rather than the embedding process when the message size is not exceeding the optimum F5 capacity. This is an important fact that has been ignored before. Furthermore, we look for the optimum embedding rate for F5 in order for it to take advantage of matrix encoding for better embedding efficiency. From here we found that the low embedding rate considered for F5 in the previous works are actually relatively high for it. This is also important since bigger message size might degrade F5 to F4. In addition, we also verify that, as expected, steganalysis performance depends on the message size.
Content may be subject to copyright.
Has F5 Really Been Broken?
Johann A. Briffa* and Hans Georg Schaathun* and Ainuddin Wahid Abdul Wahab*+
*Department of Computing, University of Surrey, Guildford GU2 7XH, England
+Faculty of Computer Science and Information Technology, University of Malaya, Malaysia
Keywords: Steganalysis, F5.
The publicly-available F5 software (F5Software) implementa-
tion takes a possibly compressed cover image, decompresses it
if necessary, and embeds the hidden message during a second
compression process. This procedure introduces a risk that the
stego image goes through ‘double compression’. While this is
not a problem from the embedding and extraction point of view,
any steganalysis process trained on such a scheme will poten-
tially detect artifacts caused either by the embedding process or
the second compression process. In this paper we review pub-
lished steganalysis techniques on F5. By re-implementing an
isolated F5 embedding algorithm excluding the decompression
and recompression process (F5Py), we show that published ste-
ganalysis techniques are unable to defeat F5 when its ideal op-
erational condition is not violated. In other words, published
techniques most likely detected the compression artifacts rather
than the embedding process when the message size is not ex-
ceeding the optimum F5 capacity. This is an important fact
that has been ignored before. Furthermore, we look for the op-
timum embedding rate for F5 in order for it to take advantage
of matrix encoding for better embedding efficiency. From here
we found that the low embedding rate considered for F5 in the
previous works are actually relatively high for it. This is also
important since bigger message size might degrade F5 to F4.
In addition, we also verify that, as expected, steganalysis per-
formance depends on the message size.
1 Introduction
There is a battle going on, between steganographers and stegan-
alysts. Steganographers aim to develop ever better techniques
to hide a secret message in a media file, such as an image, such
that the enemy will not even suspect that there is a secret there.
The steganalyst is the enemy who tries to devise techniques to
detect reliably the hidden secrets. There has been a lot of ac-
tivity in the last 15 years.
Current research tends to be rather ad hoc. Evaluation of
new methods are generally based on simulations, and the con-
ditions and parameters for these simulations are not always
clear. When they are clear, they may be controversial.
This leads to several problems in the evaluations. For in-
stance, simulations must be based on specific implementations
which are not always correct implementations of the algorithms
Figure 1. JPEG steganography embedding algorithm between
two stages in JPEG encoding
specified in the literature. Many authors assume that relatively
long messages would be embedded, and this gives the best de-
tection results on their part. But what message length was the
stego-system designed for?
Most papers in the literature have focused on defeating a
preceeding paper. Often, too little time is spent to analyse the
operating conditions where a particular technique is useful.
This is our purpose in this paper. We discuss the celebrated
F5 algorithm by Andreas Westfeld [12], and analyse the op-
erating conditions for which it can be expected to work. We
demonstrate that some previous claims of breaking F5 are ex-
aggerated, and that the respective steganalysis works due to ar-
tifacts in the F5 software, not the embedding algorithm.
2 JPEG Steganography
One of the most popular media for steganography is JPEG im-
ages. This makes sense because image coding and processing
is relatively uncomplicated and requires much less knowledge
than video and audio. JPEG is also one of the most popular
image formats, and remarkable in that it has been very well
known for 15–20 years across multiple architectures.
A range of stego-algorithms modulate information by mod-
ifying the JPEG coefficients, which are obtained by a block-
wise DCT transform of an image and subsequent quantisation.
Figure 1 shows where embedding takes place in the process.
The subsequent steps of the JPEG compression are lossless,
so that the stego-decoder will see the exact same JPEG coeffi-
cients as the encoder produced.
2.1 The F5 algorithm
F5 was introduced in [12], with an accompanying Java imple-
mentation [1](F5Software). It sports two key features, which
should be treated separately. We’ll briefly explain the purpose
and result of these features. For further details we refer to the
original paper.
Firstly, it uses a modulation technique, named F4, which
maintains the symmetry as well as other statistical properties
of the histogram of the JPEG coefficients. It effectively pro-
duces a histogram which looks like that of a JPEG image at
lower quality. This modulation effectively prevents the suc-
cessful attacks on its predecessors Jsteg and Outguess.
Secondly, it uses matrix encoding to minimise the distor-
tion. This uses ideas from source coding and coding for con-
strained memories to represent the message as a codeword
which can be embedded with a minimum of distortion.
2.2 Attacks on F5
Soon after being introduced to the steganography community,
F5 has become a popular target for steganalysis. This is due
to its capability to withstand visual and statistical attacks while
providing a large steganographic capacity. F5’s advantages are
obtained by the implementation of matrix encoding and per-
mutative straddling. Matrix encoding helps to improve the effi-
ciency of embedding while the permutative straddling helps to
uniformly spread the changes over the stego image.
The most recent algorithm claimed to defeat F5 software
was our conditional probability approach [11]. With a total
of 54 features extracted for each image, this steganalysis tech-
nique can defeat F5 software, particularly with longer message
sizes (618 bytes, 1848 bytes and 4096 bytes).
The 324-feature Markov-process approach, which inspired
our conditional probability steganalysis technique, was also
shown to defeat F5 software in [11]. By using the different
message sizes measured in the unit of bpc (bits per non-zero
AC coefficients), the highest accuracy is 96.8% for a message
size of 0.4 bpc.
Steganalysis methods based on statistical moments also
claimed to defeat F5 in [3]. Generating 78 features for clas-
sification, an accuracy up to 91.1% achived by this technique
for a message size of 0.4 bpc.
In [9], Pevn´
y and Fridrich have combined their approach
[6] with the markov process approach. The resultant detec-
tion accuracy (99.92% for message size of 100% embedding
capacity) from this combination really shows it was better in
attacking F5 compared to their individual features.
For our experiments we use the conditional probability and
Markov process approaches, as well as steganalysis based on
wavelet decomposition [7].
3 Double Compression Issue
Westfeld’s F5 implementation, as well as many other popular
stego-programs for JPEG, are not able to work directly on a
given JPEG input file. An input file in JPEG format would
be decompressed (using a standard API function to load the
image), and represented internally in the spatial domain. Then
the image is recompressed, and the F5 embedding performed
during compression.
The recompression will not necessarily be with the same
quantisation matrices as the original compression. In fact, the
program is unable to capture the original matrices, so if the
same quantisation matrices are to be used, the matrices have to
bethe standard matrix and the user has to manually supply the
correct quality factor.
According to Pevn´
y and Fridrich [5], double compression
occurs when a JPEG image, originally compressed with a pri-
mary quantization matrix is decompressed and compressed
again with different secondary quantization matrix. In their pa-
per, they used a primary quantization matrix estimation tech-
nique [4] to resolve the double compression issue; the esti-
mated quantization matrix was then used for the second com-
pression process. A similar approach, without the need for es-
timation, was used in [8].
Chen et al. [3] implemented another approach for this dou-
ble compression issue. They fix the same quality factor (80)
for image preparation (F5 without an embedded message) and
the F5 embedding process. This assumes that the differences
between stego image and cover image are only caused by data
embedding and not by the JPEG compression quality.
In both solutions, double compression still occurs, and the
risk of its effect is still present. Indeed, F5 with no message is
actually 32 bits embedded, namely the status block saying that
there is a message of length 0. It seems both approaches as-
sume that double compression is an inherent part of F5. In the
following section we discuss how our implementation avoids
the double-compression altogether.
4 Optimal use of F5
The double-compression is an artifact of Westfeld’s implemen-
tation of F5 (F5Software), and not of the F5 algorithm as de-
scribed in Westfeld’s paper [12]. In order to make a fair evalu-
ation of F5, we reimplemented it in Python1. Our implementa-
tion (F5Py) reads JPEG files and does the Huffmann decoding
to obtain the JPEG coefficient matrix, on which the F5 embed-
ding operates, as well as the quantisation matrices. It never
reverses the lossy compression steps.
We have followed Westfeld’s version 12 beta, which means
that the algorithm ignores, in addition to all zero coefficients,
every other coefficient of value ±1. This feature is not docu-
mented in [12], but our simulations show that it gives a slight
improvement over using all ±1coefficients.
Another issue which requires some consideration is the em-
bedding capacity of F5. It is not uncommon in the literature to
estimate the capacity of F5 by the number of non-zero AC co-
efficients, and work on embedding rates of 0.25-0.8 bpc (bits
per non-zero coefficient). Firstly, this over-estimates the to-
tal capacity, both because F5 creates new zeros which are then
ignored, and because the software ignores half of all the ±1
1We plan to make this code publicly available when the paper is
Table 1. F5 Capacity in our test images.
Range Mean Variance
Capacity (1579,8719) 3710.77 1643133
Changes (517,2621) 1100.10 98747
BPC (0.00335,0.01849) 0.00816 7.74 ×106
Length (430416,471429) — —
More importantly, F5 is not designed to operate close to
the maximum capacity. The main idea in F5 is matrix cod-
ing, which means that we use more coefficients, but change
fewer, per embedded bit. By changing fewer coefficients, we
reduce the distortion and the steganographic modification be-
comes harder to detect. Operating close to capacity, matrix
coding is not possible and F5 degenerates to F4.
The codes used in F5 are Hamming codes of co-dimensions
k= 1,2,...,7, giving block lengths of n= 2k1. The case
k= 1 is degenerate and gives no coding at all. At k= 1,
we can typically embed 1.4bits per change. At k= 4 we
can almost always embed more that 2 bits per change (2.1–2.5
typically). At k= 7, we can typically embed 3.5–4 bits per
In order to reduce the distortion by 50%, it is necessary to
use a matrix code of k5, which gives a rate of less than 1/6.
There is no doubt that many application may require embed-
ding rates of 0.25 bpc and higher, but this is not the problem F5
set out to solve.
To give an impression of the capacity of F5, we have run
a simulation at k= 7 over all the 1745 images in our test
database. The result is shown in Table 1. We generated a long
random message which we attempted to embed in every image.
Then we recorded the number of bits successfully embedded,
including the 32-bit status block which is embedded without
matrix coding.
In our simulation, the rate which can be achieved with
k= 7 is around 0.01 bpc. We did not run any simulations
with k= 5, but it is expected that it would give a rate ap-
proximately three times higher, corresponding to the ratio of
the code rates. This means that 0.05 bpc, considered by many
authors as an extremely low rate, should actually be considered
as a relatively high rate for F5.
5 Experimental Methodology
A significant difficulty in any attempt to compare steganalysis
techniques arises from the many variables involved in setting
up an experiment to determine the classification accuracy of
such a scheme. A non-exhaustive list includes:
Image source: sensor type; sensitivity setting; lens used;
aperture; shutter speed; mounted/handheld; temperature
& other environmental conditions
Image subject: illumination; detail; etc.
Image processing: conversion from raw sensor data;
compression for storage; conversion to grayscale; crop-
ping; possible re-compression
Steganography: embedding rate/message size; strength
or other parameters of embedding scheme; randomiza-
tion of message; randomization of embedding location
Classifier used (this is often not necessarily determined
by the algorithm)
Classifier training: selection of cover images; selection
of stego images (should their covers be part of the train-
ing set?); proportion of cover to stego images; proportion
of stego algorithms/settings
Classifier tuning: parameter tuning for balancing false
positive/negative errors
Classifier testing: selection of images; statistical accu-
racy of results
This is further complicated by the lack of a comprehen-
sive guide to the approach that should be used. While the gen-
eral approach is accepted, the many details of a best-practice
methodology are scattered in the literature. This makes it very
difficult for new researchers, resulting in even more publi-
cations using divergent approaches. The lack of consistency
makes it practically impossible to make a meaningful compar-
ison between techniques. Sometimes, missing details in pub-
lished work compromise the reproduction of experimental re-
In our experiments we use a combination of public im-
age databases and our own captured images, ensuring a suf-
ficient number of training and testing images. Source images
are greyscale, mostly stored in JPEG format, and have not
been processed other than to remove color information and
store in compressed format. For each source image, we de-
compress, crop to the central 640 ×480 region, and store in
JPEG-compressed format.The cropping technique ensures that
our source images have the same size, so we can compare the
embedded message size in a meaningful way.
After image preparation, we create stego-images corre-
sponding to each cover image, using both Westfeld’s F5 soft-
ware implementation (F5Software) and our implementation
that avoids double-compression (F5py). In each case, we em-
bed messages of size ranging from 1 byte to 500 bytes, creating
five different sets of stego images for each implementation.
For each of the cover and stego images, we extract the
features for every steganalysis technique we are analysing,
in preparation for the subsequent classification process. The
freely available LibSVM [2] is then used as the classifier.
The soft margin and γparameters are determined using the
‘’ parameter selection tool, available as part of the Lib-
SVM package. Classifier training is based on a random selec-
tion of 2
3of the cover and stego image pairs. The remaining 1
of the image pairs are used for classifier testing. The complete
process is shown schematically in Figure 2.
6 Results
Classification results for our conditional probability steganaly-
sis and for the Markov process based technique are shown re-
spectively in Tables 2–3. Additionally, results for the wavelet
Figure 2. Experiment tasks
Table 2. Classification accuracy for conditional probability
technique (optimal soft margin and γparameters)
Message Size (bytes)
1 10 50 250 500
F5Software 97.0% 97.6% 97.4% 98.2% 99.2%
F5py 50.0% 50.0% 50.0% 54.6% 66.8%
decomposition technique are shown in Table 4. It is clear from
the presented results that classification accuracy is significantly
reduced when double compression is excluded (F5Py). In fact,
none of the steganalysis techniques tested have any detection
ability at small message sizes that allows F5 to operate on its
optimal performance. The Markov technique is only able to
detect F5 at a message length of 500 bytes, and even then the
accuracy is still rather low. Conditional probability steganal-
ysis seems to perform only slightly better. Finally, as already
observed in [11] and [10], wavelet decomposition steganalysis
is unable able to defeat F5 at any message size.
F5Py also provide a new approach on how to deal with dou-
ble compression issue. Comparing our new implementation
with previous approaches, we can see that:
Using stego image with zero message [3] resolved the
problem with different quality factor in preparation and
embedding process. One disadvantage is that there will
be more time needed in order to prepare the F5 with no
message set.
Using F5Py did also resolved the above problem. With
an advantage, it save the time by not involving another
image preparation process. For example, the process
needed to estimate the primary quantization matrix in
Using F5Py resolved the issue with the unknown addi-
tional JPEG compression with cover image. This is a
very important factor when we consider the real world
implementation of steganalysis.
7 Conclusion
Our discussion of reasonable operating conditions for F5 have
shown two things. Firstly, we have shown that F5 is not de-
signed to operate at high embedding rates, and we recommend
Table 3. Classification accuracy for Markov process technique
(optimal soft margin and γparameters)
Message Size (bytes)
1 10 50 250 500
F5Software 99.8% 99.6% 99.6% 99.6% 99.9%
F5Py 50.0% 50.0% 50.0% 50.0% 62.4%
Table 4. Classification accuracy for wavelet decomposition
technique (optimal soft margin and γparameters)
Message Size (bytes)
1 10 50 250 500
F5Software 50.2% 50.0% 50.6% 50.4% 50.2%
F5Py 50.0% 50.4% 50.6% 50.4% 50.2%
to restrict use of F5 to less than 0.05 bpc. This is to ensure F5
can take full advantage the of matrix encoding and its optimal
embedding efficiency is not violated by the access message.
Secondly, we have demonstrated the possibility that
Markov model based attack by Shi et al. detects double com-
pression artifacts, which is an artifact of the implementation
and not of the algorithm in F5 software. It is not nearly as good
a classifier for stegograms from an improved implementation
of F5 (F5Py) which does not give double compression.
We stress the importance of clarifying the test conditions
when stego-systems and attacks are specified and evaluated. It
is not very useful to attack a system under conditions where it
never was supposed to be used.
It remains to be seen whether double-compressed images
actually exist in the wild, and if so, which image processing
operations tend to produce them. If such images do exist, the
problem of developing and tuning a steganalysis algorithm may
be even more complex than is generally believed; allowing
double-compressed images as cover and stego-images would
mean that the artifacts produced by double compression would
have to be ignored by the classifier.
It is an interesting open problem to see how effective other
attacks are on our implementation of F5 with reasonable em-
bedding rates.
Another open problem, and this is of course well-known, is
how we can design stego-systems with higher capacity, without
becoming more detectible.
[1] A.Westfeld. F5 version 12beta. Software avail-
able at
[2] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a
library for support vector machines, 2001. Soft-
ware available at
[3] Chunhua Chen, Yun Q. Shi, Wen Chen, and Guorong
Xuan. Statistical moments based universal steganaly-
sis using JPEG 2-d array and 2-d characteristic function.
In Proceedings of the International Conference on Im-
age Processing, ICIP 2006, Atlanta, Georgia, USA, pages
105–108, October 8-11, 2006.
[4] J. Fridrich and J. Lukas. Estimation of primary quanti-
zation matrix in double compressed JPEG images. in In-
ternational Conference on Image Processing, September
[5] Jessica Fridrich, Tom´
s Pevn´
y, and Jan Kodovsk´
y. Statis-
tically undetectable JPEG steganography: dead ends chal-
lenges, and opportunities. In MM&Sec ’07: Proceedings
of the 9th workshop on Multimedia & security, pages 3–
14, New York, NY, USA, 2007. ACM.
[6] Jessica J. Fridrich. Feature-based steganalysis for JPEG
images and its implications for future design of stegano-
graphic schemes. In Information Hiding, 2004.
[7] Siwei Lyu and Hany Farid. Detecting hidden messages
using higher-order statistics and support vector machines.
In 5th International Workshop on Information Hiding,
Noordwijkerhout, The Netherlands, 2002.
[8] Siwei Lyu and Hany Farid. Steganalysis using color
wavelet statistics and one-class support vector machines.
Proc. SPIE, 2003.
[9] Tom´
s Pevn´
y and Jessica Fridrich. Merging Markov and
DCT features for multi-class JPEG steganalysis. In Pro-
ceedings SPIE, Electronic Imaging, Security, Steganogra-
phy, and Watermarking of Multimedia Contents IX, Jan-
uary, 2007.
[10] Yun Q. Shi, Chunhua Chen, and Wen Chen. A Markov
process based approach to effective attacking JPEG
steganography. In Information Hiding, 2006.
[11] Ainuddin Wahid Abdul Wahab, Johann A. Briffa,
Hans Georg Schaathun, and Anthony TS Ho. Conditional
probability based steganalysis for JPEG steganography.
2009 International Conference on Signal Processing Sys-
tem (ICSPS 2009), May 15 -17 2009.
[12] Andreas Westfeld. High capacity depsite better steganal-
ysis: F5 – a steganographic algorithm. In Fourth Infor-
mation Hiding Workshop, pages 301–315, 2001.
... The stream cipher encrypts k value with the help of PRNG, which is embedded with the message length at the start of stream of messages. The message body is implanted by means of matrix embedding, injecting k message bits to one clutch that contains (2k-1) constants by decreasing the total value by "one" in every group [46] [47] [48]. ...
Full-text available
Blind steganalysis or the universal steganalysis helps to identify hidden information without previous knowledge of the content or the embedding technique. The Support Vector Machine (SVM) and SVM-Particle Swarm Optimization (SVM-PSO) classifiers are adopted for the proposed blind steganalysis. The important features of the JPEG images are extracted using Discrete Cosine Transform (DCT). The kernel functions used for the classifiers in the proposed work are the linear, epanechnikov, multi-quadratic, radial, ANOVA and polynomial. The proposed work uses linear, shuffle, stratified and automatic sampling techniques. The proposed work employs four techniques for image embedding namely, Least Significant Bit (LSB) Matching, LSB replacement, Pixel Value Differencing (PVD) and F5 and applies 25% embedding. The data to the classifier is split as 80:20 for training and testing and 10-fold cross validation is carried out.
Full-text available
The fast evolution of Information and Digital technology had given way for internet to be an effective medium for communication. This has also paved way for data exploitation. Therefore, users must protect their data from misuse. This led to the emergence of security framework like Information Hiding. Steganography and Steganalysis are of the two primary techniques in the field of Information Hiding. Steganography is the science of concealing confidential information, while steganalysis is the art of detecting the existence of that information. The primary goal of this research is to address the general concept of steganalysis, and various breaches associated with it. It involves a blind statistical steganalysis technique that is led in Joint Photographic Experts Group (JPEG) text embedded images by extracting features that illustrate an alteration during an embedding. The images used as embedding medium are uncalibrated and the percentage of the embedding used in this study is 50%. The text embedding is done using various steganographic schemes in the spatial and transform domain. The steganographic schemes considered are Least Significant Bit (LSB) Matching, Least Significant Bit (LSB) Replacement, Pixel Value Differencing and F5. After steganographic embedding of the data, the first order, second order, extended Discrete Cosine Transform (DCT) and Markov features are extracted. Then, Principal Component Analysis (PCA) is used as a system for feature dimensionality reduction. Furthermore, the technique of machine learning is incorporated by means of a classifier to identify the stego image and cover image. Support Vector Machine (SVM) and Support Vector Machine with Particle Swarm Optimization (SVM-PSO) are the classifiers examined in this paper for a comparative study. Moreover, the concept of cross-validation is also incorporated in this work. Six dissimilar kernel functions and four diverse samplings are used during classification to check on the effectiveness of the kernels and sampling in classification.
Full-text available
The spectacular progress of technology related to the information and communication arena throughout the past epoch made the internet a powerful media for faster communication of data. Though this technology is being admired at one side, there equally exists a challenge for safeguarding the data and privacy of information of a personal without any leak in the data and corresponding mistreatment. Hence, the proposed work primarily aims to investigate the internet communication as well as deter any unwanted happenings, which could occur because of the covert communication. The probable presence of hidden messages is inspected in the digital mass media using the technique of steganalysis. The distinctive features are to be identified, chosen and extracted for universal (blind) steganalysis and are decided by the format of image and its transformation. In this paper, the analysis is carried out in JPEG format images and 10% embedding with 10 fold cross validation. The technique of calibration is used to obtain an estimate of the cover image. Four embedded techniques that have been applied for stegananlysis are Least Significant Bit Matching, LSB Replacement, Pixel Value Differencing (PVD) and F5 respectively. Four different sampling like linear, shuffle, stratified and automatic are considered in this paper. The classifiers used for a comparative study are Support Vector Machine (SVM) and SVM- Particle Swarm Optimization (SVM-PSO). Several kernels namely linear, epanechnikov, multi-quadratic, radial, ANOVA and polynomial are used in classification. The classifier is trained to examine every single coefficient as a separate unit for analysis and the outcome of this analysis helps in finding the decision of steganalysis.
Full-text available
Inspired by works on the Markov process based steganalysis, we propose a new steganalysis technique based on the conditional probability statistics. Specifically we focus on its performance against the F5 software. In our experiment, we prove that the proposed technique works as well or better than the Markov process based technique in terms of classification accuracy on F5. Our main advantage is a much better computational efficiency. With different number of messages embedded, it can also be seen that the performance of steganalyis depends on the message size embedded. This paper includes the introduction to conditional probability features, how the experiment works, and the discussion of the results.
Conference Paper
Full-text available
Owing to the popular usage of JPEG images, the steganographic tools for JPEG images emerge increasingly nowadays, among which OutGuess, F5, and the model based steganography are the most advanced. Advancing the previous work, we present a new universal steganalysis method based on statistical moments derived from both image 2-D array and JPEG 2-D array in this paper. In addition to the first order histogram, the second order histogram is considered. Consequently, the moments of 2-D characteristic functions are also used for steganalysis. Extensive experimental works have shown that the proposed method outperforms in general the prior-arts of steganalysis methods in attacking the three aforesaid steganographic schemes.
Conference Paper
Full-text available
Steganographic messages can be embedded into digital images in ways that are imperceptible to the human eye. These messages, however, alter the underlying statistics of an image. We previously built statistical models using rst-and higher-order wavelet statistics, and employed a non-linear support vector machines (SVM) to detect steganographic messages. In this paper we extend these results to exploit color statistics, and show how a one-class SVM greatly simplies the training stage of the classier. In previous work,8, 11, 12 we showed that a statistical model based on rst-and higher-order wavelet statistics could discriminate between images with and without hidden messages. In this earlier work, we only considered grayscale images in order to simplify the computations. There is no doubt that strong statistical regularities exist between the color channels, and in this paper we extend our earlier statistical model to capture some of these regularities. In our previous work we used a (linear and non-linear) two-class support vector machine (SVM) to discriminate between the statistical features extracted from images with and without hidden messages. This classier required training from both cover and stego images. From the point of view of universal steganalysis, this training had the drawback of requiring exposure to images from broad range of stego tools. In this paper we employ a one-class SVM that obviates the need for training from stego images, thus making the training easier, and making it more likely that our classier will be able to contend with novel and yet to be developed stego programs. We will rst present the basic statistical model, and then describe the construction of a one-class support vector machine. We then show the eectiv eness of these tools in detecting hidden messages in tens of thousands of images, and from v e dieren t stego programs. We believe that this work brings us closer to realizing a robust tool for universal steganalysis.
Conference Paper
Full-text available
The goal of this paper is to determine the steganographic ca- pacity of JPEG images (the largest payload that can be un- detectably embedded) with respect to current best stegan- alytic methods. Additionally, by testing selected stegano- graphic algorithms we evaluate the influence of specific de- sign elements and principles, such as the choice of the JPEG compressor, matrix embedding, adaptive content-dependent selection channels, and minimal distortion steganography using side information at the sender. From our experiments, we conclude that the average steganographic capacity of grayscale JPEG images with quality factor 70 is approxi- mately 0.05 bits per non-zero AC DCT coefficient.
Conference Paper
Full-text available
In this paper, a novel steganalysis scheme is presented to effectively detect the advanced JPEG steganography. For this purpose, we first choose to work on JPEG 2-D arrays formed from the magnitudes of quantized block DCT coefficients. Difference JPEG 2-D arrays along horizontal, vertical, and diagonal directions are then used to enhance changes caused by JPEG steganography. Markov process is applied to modeling these difference JPEG 2-D arrays so as to utilize the second order statistics for steganalysis. In addition to the utilization of difference JPEG 2-D arrays, a thresholding technique is developed to greatly reduce the dimensionality of transition probability matrices, i.e., the dimensionality of feature vectors, thus making the computational complexity of the proposed scheme manageable. The experimental works are presented to demonstrate that the proposed scheme has outperformed the existing steganalyzers in attacking OutGuess, F5, and MB1.
Conference Paper
Techniques for information hiding have become increasingly more sophisticated and widespread. With high-resolution digital images as carriers, detecting hidden messages has become considerably more difficult. This paper describes an approach to detecting hidden messages in images that uses a wavelet-like decomposition to build higher-order statistical models of natural images. Support vector machines are then used to discriminate between untouched and adulterated images.
Blind steganalysis based on classifying feature vectors derived from images is becoming increasingly more powerful. For steganalysis of JPEG images, features derived directly in the embedding domain from DCT coefficients appear to achieve the best performance (e.g., the DCT features 10 and Markov features 21). The goal of this paper is to construct a new multi-class JPEG steganalyzer with markedly improved perfor-mance. We do so first by extending the 23 DCT feature set, 10 then applying calibration to the Markov features described in 21 and reducing their dimension. The resulting feature sets are merged, producing a 274-dimensional feature vector. The new feature set is then used to construct a Support Vector Machine multi-classifier capable of assigning stego images to six popular steganographic algorithms—F5, 22 Out-Guess, 18 Model Based Steganography without , 19 and with 20 deblocking, JP Hide&Seek, 1 and Steghide. 14 Comparing to our previous work on multi-classification, 11, 12 the new feature set provides significantly more reliable results.
In this report, we present a method for estimation of primary quantization matrix from a double compressed JPEG image. We first identify characteristic features that occur in DCT histograms of individual coefficients due to double compression. Then, we present 3 different approaches that estimate the original quantization matrix from double compressed images. Finally, most successful of them -Neural Network classifier is discussed and its performance and reliability is evaluated in a series of experiments on various databases of double compressed images. It is also explained in this paper, how double compression detection techniques and primary quantization matrix estimators can be used in steganalysis of JPEG files and in digital forensic analysis for detection of digital forgeries.
LIBSVM is a library for support vector machines (SVM). Its goal is to help users to easily use SVM as a tool. In this document, we present all its imple-mentation details. For the use of LIBSVM, the README file included in the package and the LIBSVM FAQ provide the information.
Conference Paper
In this paper, we introduce a new feature-based steganalytic method for JPEG images and use it as a benchmark for comparing JPEG steganographic algorithms and evaluating their embedding mechanisms. The detection method is a linear classifier trained on feature vectors corresponding to cover and stego images. In contrast to previous blind approaches, the features are calculated as an L 1 norm of the difference between a specific macroscopic functional calculated from the stego image and the same functional obtained from a decompressed, cropped, and recompressed stego image. The functionals are built from marginal and joint statistics of DCT coefficients. Because the features are calculated directly from DCT coefficients, conclusions can be drawn about the impact of embedding modifications on detectability. Three different steganographic paradigms are tested and compared. Experimental results reveal new facts about current steganographic methods for JPEGs and new de-sign principles for more secure JPEG steganography.