ArticlePDF Available

Blind Key Based Attack Resistant Audio Steganography Using Cocktail Party Effect

Authors:
  • St' Thomas College of Engineering and Technology, Kolkata,India

Abstract and Figures

Steganography is a popular technique of digital data security. Among all digital steganography methods, audio steganography is very delicate as human auditory system is highly sensitive to noise; hence small modification in audio can make significant audible impact. In this paper, a key based blind audio steganography method has been proposed which is built on discrete wavelet transform (DWT) as well as discrete cosine transform (DCT) and adheres to Kerckhoff’s principle. Here image has been used as secret message which is preprocessed using Arnold’s Transform. To make the system more robust and undetectable, a well-known problem of audio analysis has been explored here, known as Cocktail Party Problem, for wrapping stego audio. The robustness of the proposed method has been tested against Steganalysis attacks like noise addition, random cropping, resampling, requantization, pitch shifting, and mp3 compression. The quality of resultant stego audio and retrieved secret image has been measured by various metrics, namely, “peak signal-to-noise ratio”; “correlation coefficient”; “perceptual evaluation of audio quality”; “bit error rate”; and “structural similarity index.” The embedding capacity has also been evaluated and, as seen from the comparison result, the proposed method has outperformed other existing DCT-DWT based technique.
This content is subject to copyright. Terms and conditions apply.
Research Article
Blind Key Based Attack Resistant Audio Steganography Using
Cocktail Party Effect
Barnali Gupta Banik 1and Samir Kumar Bandyopadhyay 2
1St. omas’ College of Engineering & Technology, Kolkata 700023, India
2University of Calcutta, Kolkata 700098, India
Correspondence should be addressed to Barnali Gupta Banik; gupta.barnali@gmail.com
Received 19 October 2017; Revised 13 January 2018; Accepted 5 February 2018; Published 16 April 2018
Academic Editor: Emanuele Maiorana
Copyright ©  Barnali Gupta Banik and Samir Kumar Bandyopadhyay. is is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided
the original work is properly cited.
Steganography is a popular technique of digital data security. Among all digital steganography methods, audio steganography is
very delicate as human auditory system is highly sensitive to noise; hence small modication in audio can make signicant audible
impact. In this paper, a key based blind audio steganography method has been proposed which is built on discrete wavelet transform
(DWT) as well as discrete cosine transform (DCT) and adheres to Kerckho’s principle. Here image has been used as secret message
which is preprocessed using Arnold’s Transform. To make the system more robust and undetectable, a well-known problem of audio
analysis has been explored here, knownas C ocktail Party Problem, for wrapping stego audio.e robustness of the proposed method
has been tested against Steganalysis attacks like noise addition, random cropping, resampling, requantization, pitch shiing, and
mp compression. e quality of resultant stego audio and retrieved secret image has been measured by various metrics, namely,
“peak signal-to-noise ratio”; “correlation coecient”; “perceptual evaluation of audio quality”; “bit error rate”; and “structural
similarity index.” e embedding capacity has also been evaluated and, as seen from the comparison result, the proposed method
has outperformed other existing DCT-DWT based technique.
1. Introduction
In the present era, communicating through Internet has
become vulnerable as there may be several intruders who can
eavesdrop for secret messages to capture and disburse them
for unlawful misconducts. Henceforth nowadays it is most
necessary to camouage secret message in such a way that
stego cannot be identied as carrier of secret message. Cam-
ouaging secret message through carrier objects introduces
the age-old technique of steganography. However, with the
current enormous use of Internet and elevation of various
Steganalysis attacks, it is required to have an extra shield to
protect steganography techniques. is is the reason cocktail
party eect in audio steganography has been explored to
ensure enhanced security during data transmission.
2. Related Work
2.1. Audio Steganography Techniques. In audio steganogra-
phy, audio is used as cover media. In [], authors have
described dierent spatial and frequency domain techniques
of audio steganography. e popular spatial domain tech-
niques are as follows.
Least Signicant Bit (LSB) Encoding. is is the simplest
method of audio steganography where Least Signicant Bit
of each audio sample is modied with bits of secret message
vector. With the extensive use of this method it becomes more
prone to attack and its embedding capacity is poor compared
to others. To cope up with the necessity of increasing capacity,
authors of [] have proposed an enhanced method of LSB
technique where it has been proved that nd and rd LSB
modication does not make audible dierence in audio
sample. In [], authors have suggested another enhancement
over LSB technique by shiing LSB modication from rd bit
to th bit which incur more embedding capacity compared to
previous methods of LSB encoding.
Parity Encoding. In this approach, audio signal is broken into
number of samples []. Depending on sample’s parity bit,
Hindawi
Security and Communication Networks
Volume 2018, Article ID 1781384, 21 pages
https://doi.org/10.1155/2018/1781384
Security and Communication Networks
HH
HL
High-pass
lter
Low-pass
lter
High-pass
lter
Low-pass
lter
High-pass
lter
Low-pass
lter
I
LH
LL
2
2
2
2
2
2
F : Block diagram of  level D DWT.
secret message is embedded in the LSB of the sample byte
stream.
Echo Hiding. In this method, a short echo signal is introduced
as part of cover audio where secret message is hidden [].
Study shows that the echo signal is inaudible provided the
delay between cover audio and echo signal is up to  ms.
e widespread frequency domain techniques are as
follows.
Phase Coding. As human auditory system cannot percept
phase component modulation, hence, in this technique,
secret data is embedded by modication of selected phase
component of cover audio signal. Using psychoacoustic
model, a threshold is calculated which can be used as masking
threshold[].In[],authorshaveuseddierencebetweenthe
phase values of the selected component frequencies and their
adjacent frequencies of the cover signal as a medium to hide
secret data bits. is method provides more robustness than
the previous approaches.
Spread Spectrum. e basic principle of spread spectrum is
to spread the secret message over the frequency spectrum of
cover audio signal. In [], Direct Sequence Spread Spectrum
is used to hide text data in an audio. Here a key is used to
embed message to the noise. In [], authors have discovered
that low spreading rate improves performance of spread spec-
trum audio steganography. erefore, authors have proposed
a technique which decreases correlation between original
signal and spread data signal by having phase shi in each
subband signal of original audio.
Discrete Wavelet Transforms (DWT). DWT decomposes a
signal in four frequency components, popularly known as
subbands. ese sub bands are Low-Low (LL), Low-High
(LH), High-Low (HL), and High-High (HH), as shown in
Figure . e LL subband describes approximation details.
e HL band demonstrates variation along the -axis or
horizontal details and the LH band demonstrates the -
axis variation or vertical details []. In other words, the
low frequency subband is a low-pass approximation of the
original signal and contains most energy of the signal. e
other subbands include mainly detailed components which
have low energy level. is is the reason LH subband is very
popular for data hiding.
In [], authors have proposed a method to create DWT
ofcoveraudioandselecthigherfrequencytoembedimage
data using low bit encoding technique. In [], authors have
decomposed the cover audio signal using Haar DWT and
then choose coecient to embed data. is is done using a
precalculated threshold value to ip data. In [], secret audio
is embedded using synchronizing code in the low frequency
part of DWT of cover audio.
Discrete Cosine Transforms (DCT).DCTisusedtoconvert
a signal from spatial domain into frequency domain. DCT
decomposes a signal into a series of cosine functions. e
two-dimensional DCT can be performed by executing one-
dimensional DCT twice, initially in the direction, next by
direction. e formulation of the D DCT for an input signal
with rows and columns and the output signal has been
given in
𝑥,𝑦
=𝑥𝑦
𝑀−1
𝑖=0
𝑁−1
𝑗=0 𝑖𝑗 cos (2+1)
2 cos 2+1
2 ,()
where 0≤≤−1and 0≤≤−1and
𝑥=
1
,where =0
2
,where 1≤≤−1,
𝑦=
1
,where =0
2
,where 1≤≤−1.
()
InverseDDCTisalsoavailabletotransformafrequency
domain coecient to spatial domain signal, as specied in
𝑖𝑗 =𝑀−1
𝑥=0
𝑁−1
𝑦=0𝑥𝑦𝑥𝑦 cos (2+1)
2 cos 2+1
2 ,()
where 0≤≤−1and 0≤≤−1.
DCT can be performed in block-by-block basis like 4×4,
8×8,and16×16blocks.
As shown in Figure (a), the top le coecient is called
DC coecient holding the approximate value of the whole
signal; normally it has coecients with zero frequency and
the remaining  coecients are called AC coecients hold-
ing most detailed parameters of the signal, having coecients
with nonzero frequency. ere are some DCT coecients
which hold quite similar values. Human brains are less
sensitive to detect changes where all the elements hold more
or less the same value. erefore, this region of similar values
canbeselectedfordatahidingpurpose.isregionisknown
as midband region, as shown in Figure (b).
In [], authors have used speech signal as cover, where
voiced and nonvoiced part of the speech are separated by
zero crossing count and short time energy. e secret data is
embedded by modifying DCT coecient of nonvoiced part.
Security and Communication Networks
(a) DC and AC coecients in 4×4block (b) Midband region of 4×4block
F 
In [], authors have decomposed the cover audio in 8×8
nonoverlapping block and secret data is hidden in the DC
coecient and th AC coecient in line. In [], authors
have embedded secret data in the low frequency component
of DCT quantization. In [], authors have decomposed the
cover audio into 8×8block and then each of those blocks was
decomposed further into 4×4frames. Embedding of secret
message depends on the dierence between rst or last two
frames.
2.2. Correlation Coecient (CC). A correlation coecient is a
measure of linear relationship between two random variables.
is term was rst coined by Karl Pearson in . e value
of correlation coecient can vary from to.Ifthevalue
is perfect  or  that indicates both variables are linearly
related. If the value is  that indicates there is no relation
between the said variables. Moreover, the sign indicates that
thevariablesarepositivelyrelatedornegativelyrelated[].
ere are three types of correlation coecients: Pearson’s
coecient (), Spearman’s rho coecient (𝑠), and Kendall’s
tau coecient (). Pearson’s coecient, which is also known
as product-moment correlation coecient, is the most widely
used popular correlation coecient. It is given by paired
measurements (1,1),(2,2),...,(𝑛,𝑛)as mentioned
in
𝑝=𝑛
𝑖=1 𝑖𝑖
𝑛
𝑖=1 𝑖2𝑛
𝑖=1 𝑖2,()
where and are the mean of (1,2,...,𝑛)and
(1,2,...,𝑛), respectively. Correlation coecient can also
be used as quality metrics to measure similarity between two
signals.
2.3. Arnold Transform. ArnoldsTransformisachaoticbidi-
rectional map proposed by Vladimir Arnold in . A
chaotic map is an evaluation function which demonstrates
x
y
(a, b)
(a,b
)
F : Representation of point (,)shearedtopoint(󸀠,󸀠).
some sort of chaotic nature, as seen in the following trans-
formation function:
Γ:T2→ T2given by,
Γ:(,)→ (2+,+)mod 1. ()
An image is collection of pixels in row and column
arrangement, which can be organized in square or nonsquare
shape. If Arnold transform is applied to an image, it scrambles
the image by “” times iteration (e.g., iteration  will scramble
less and iteration  will scramble more), which makes the
imageimperceptible.isundetectableimageformatcanbe
used for data hiding securely as it is unable to reveal any
existence of secret data. Hence scrambling an image can be
a preprocessing step of data hiding technique.
Traditionally Arnold transform can be applied only for
square matrices; however later it has been improvised to apply
on any matrix, by
󸀠
󸀠=11
12

mod ,
where ,{0,1,2,...,−1},()
where (,) is the element of original matrix and (󸀠,󸀠)is
the element of transformed matrix and is the order of
thematrix;asshowninFigure,thepoint(,)issheared
through -and-axis to get (󸀠,󸀠).
Security and Communication Networks
e function mod is important to regenerate the
original ×image. e functions to shear in -axis, -
axis, and modulo function is represented in
→+
(a) Function to shear in axis
→
+
(b) Function to shear in axis
→
(c) Modulo function.
()
Arnold transformation is reversible []. To recover
original image from scrambled image there are two ways, the
traditional way is periodicity, and the better approach is to
use inverse matrix, which is also known as Reverse Arnold
Transformation [] and expressed by
󸀠
󸀠=2−1
−1 1
mod . ()
In [], authors have used Arnold’s transformation to
scramble the image before embedding into the DWT coef-
cient of cover audio. In [], authors have embedded
scrambled image in “Redundant Discrete Wavelet Transform”
coecient using Singular Value Decomposition (SVD) tech-
nique. In [], authors have proposed data hiding in DWT
and DCT domain using SVD where the secret image is
scrambled before embedding.
2.4. Cocktail Party Problem. Cocktail Party Problem is a
classic example of source separation which is very popular
in digital signal processing. In this problem, several people
are talking to each other in a banquet room and a listener
is trying to recognize one specic speech from that crowd
of partying guests. Human brain can distinguish one explicit
signal component from a mixed signal combination in real
time which is popularly known as “Auditory Scene Analysis.”
However, in digital signal processing, it is dicult to extract
only one speaker’s voice from the rest in cocktail party
situation.
In [], Colin Cherry rst revealed the ability of human
auditory system to separate a single speech or audio from a
combination of voices, which may turn into noise through
properties like pitch, gender, rate of speech, and/or direction
of speech. is task of separating single source audio from
a noise is known as dichotic listening task []. In [],
authors have reviewed the same techniques to train machine
to segregate signals. In [], Broadbent has concluded that
simultaneous listening can be performed for small messages,
not for long ones. Human ability to identify audio from a
mixed signal can be improved by listening by two ears [].
It has been seen that, in ideal circumstances, the signal
detection threshold of binaural listening is  dB more than
monaural listening. In [], it has been stated that cocktail
party eect can be explained by Binaural Masking Level
Dierence (BMLD). As per BMLD, for binaural listening
the desired signal coming from one direction is ineectively
maskedbythenoisegeneratedindierentdirection.In
[], Kassebaum et al. discussed two methods for sig-
nal separation—Back Propagation (BP) and Self-Organizing
Neural Network (SONN). at experiment was carried out
through  kHz channel using a modem data signal and a male
speech signal. It has been concluded that BP requires more
inputs and training time than SONN.
In [] authors have discussed  types of approach to solve
Cocktail Party Problem:
(i) Temporal binding and oscillatory correlation
(ii) Cortronic network
(iii) Blind source separation.
In [], von der Malsburg explained the temporal binding
technique. He stated that neuron carries two distinct signals
and the binding is accomplished by correlation. e synchro-
nization allows neuron to create topological network. In [],
von der Malsburg and Schneider proposed a cocktail party
processor enhancing this idea—the Oscillatory Correlation
which is the basis of Computational Auditory Scene Analysis.
In [, ], multistage neural model has been proposed to
separate speech from interfering sounds using oscillatory
correlation.
In [], authors have proposed a biological approach to
solve Cocktail Party Problem using articial neural network
named as cortronic network. A cortronic neural network
describes connection among neurons in several regions
which demonstrates the output links of each neuron and the
strength of the connections.
e Blind Source Separation (BSS) is the technique of
separating signal from a mixed source without having knowl-
edge of source signals and the process of mixing. ere are
dierent methods of BSS among which Principal Component
Analysis (PCA), Independent Component Analysis (ICA),
and Time and Frequency domain approaches are signicant.
PCAandICAarebothstatisticalapproacheswhicharebetter
than Time or Frequency domain approach, since Fourier
components of data segments are xed in frequency domain
whereas in statistical domain the transformation depends on
thedatatobeanalyzed[].
PCA is a mathematical technique of transforming large
correlated dataset into a small number of major components
known as principal components []. It is moderately related
to mathematical theory of Singular Value Decomposition
(SVD), which is used to implement PCA []. Independent
Component Analysis can also be implemented with SVD,
though there are subtle dierences between PCA and ICA.
e aim of PCA is to nd decorrelated variables whereas
the aim of ICA is to nd independent variables. PCA and
ICA both perform matrix factorization for linear transfor-
mation, though PCA perform low rank matrix factorization
whereas ICA performs full-rank matrix factorization. e
Security and Communication Networks
T : Advantage and disadvantage of dierent approaches for solving Cocktail Party Problem.
Approach for solving
Cocktail Party Problem Advantages of the method Disadvantages of the method
Temporal binding
Asshownin[],thisstrategyhelps Asstatedin[],thisstrategyresults
robustness against loss of network
elements
Ninexible refocusing of system onto
events rapidly occurring in sequence
richness of representation
processing speed enhancement
Cortronic network
As mentioned in [], in this method As shown in [], this technique is
there is no requirement for having
knowledge of background sounds such as
static, trac, and music
Ncostly to implement as it requires a
separate articial neural network
Blind source separation
As shown in [], in this technique As reported in [], in this method
thereisnoneedforhaving
knowledge of source signals or the
process of mixing
Nconvergence speed is slow
no need for dening a cut-o
frequency for separation
low computational complexity
helps signal enhancement
advantageofICAoverPCAisthatPCAjustremoves
correlations whereas ICA removes correlations and higher
order dependencies []. ICA has extensive use in biomedical
imaging and audio processing []. ICA can also be used for
transformation to independent variable using multiplication
of observed data and for demixing matrix []. It depends
on the fact that there are as many sources as channels of
data available, which are to be separated as independent
sources—by utilizing this fact, ICA is used in Blind Source
Separation. In [], author described a fast method for ICA
using xed point iteration. is algorithm is popularly known
as FastICA.
In Table , comparison of the existing techniques for
solving Cocktail Party Problem has been discussed. It can be
noted that each of these techniques has its own advantage and
disadvantages. However, as blind steganographic approach
is considered more robust and secure than the nonblind
steganography techniques, hence, in this proposed method,
“Blind Source Separation” approach has been chosen for
solving cocktail party eect.
3. Proposed Method
3.1. In a Nutshell. Steganography can be broadly grouped
into two types: blind and nonblind techniques. e technique
where cover object is not required to retrieve the secret is
called blind steganography. e method where cover object
is required to regain secret is called nonblind or cover escrow
techniqueofsteganography.Tocreateamostrobustmethod
of steganography, here a blind steganography technique has
been proposed.
In this proposed method, image has been used as secret
message. is secret image is scrambled using Arnold trans-
form. en Haar lter is applied for two-dimensional DWT
on the cover source audio. Since audio is one-dimensional
signal, hence it must be reshaped into two-dimensional
matrix to perform D DWT. Haar is simple, fast, and
memory ecient compared to other available DWT lters
like Daubechies and Coiets. Aer DWT application, LH
subband has been chosen for further decomposition into
4×4blocks where two-dimensional DCT has been applied.
As shown in Figure (b), in Section ., midband region of
those 4×4blocks has been chosen and embedding has been
performed by the following equation:
mid
𝑎=mid
𝑎+× PN,()
where mid(
(𝑎))indicates midband frequency region; is
the embedding factor; and PN is the pseudorandom number.
Equation () has been further explained in Section .;
embedding factor () has been discussed in Section .
and pseudorandom number (PN) has been discussed in
Section ..
Aer embedding, the resultant cover becomes stego
audio. To increase security of the proposed method, this stego
audio is blended with other audio signals to produce cocktail
party eect—aerwards this has been securely transmitted
throughthewebtoreachtheintendedrecipient.Evenifany
intruder is able to break the communication channel and get
access to the transmitted media, neither he would decipher
the cocktail party eect to identify stego audio nor he would
able to decode the stego audio to recognize the secret message
without knowing the key required for extraction, whereas the
intended receiver knowing the key as well as the entire algo-
rithms is able to easily extract the secret message implanted
without any loss of data. e proposed method is also tested
against well-known Steganalysis attacks and the outcomes
are quite impressive (discussed in Section .)—hence this
technique provides complete security.
Once the intended recipient receives the cocktail eect,
using the demixing algorithm (discussed in Section .) s/he
Security and Communication Networks
Cover audio, source 1 (S1)
Secret image
Arnold
transform
Embedding functionScrambled image
Pseudorandom
number (PN)
Key based linear
feedback shi register
2D discrete wavelet transform
2D discrete cosine transform
Find the midband coecients
Inverse 2D DCT followed
by inverse 2D DWT
Audio source 2 (S2) Stego audio (St)
Mixing function
Cocktail stego audio
Find the maximum coecient
value of LH subband (G;Rf)
Set embedding factor
()=G
f×Gultiplicative factor
F : Flowchart of embedding procedure.
Stego audio (St)
Cocktail stego audio
Demixing function
Audio source 2 (S2)
2D discrete wavelet transform
2D discrete cosine transform
Find the midband coecients
Extraction by comparing correlation coecients
Pseudorandom
number (PN)
Scrambled secret image
Reverse Arnold
transform
Secret
image
Key based linear
feedback shi register
F : Flowchart of extraction procedure.
can separate the audios and can also apply the extraction
procedure on them, as the recipient is aware of the key. e
extraction algorithm performs correlation between the coef-
cients and extracts the secret bits, from which the scrambled
secret image can be generated. Finally, by applying inverse
Arnold transform, the secret image can be reconstructed.
e owcharts for embedding and extraction procedure have
been shown in Figures  and , respectively.
3.2. Input Preparation
Cover Audio Source.Anyspeechormusiccanbeusedhereas
cover audio sources. For this demonstration, popular English
songs have been chosen—as mentioned below. All the audio
sources have been sampled at  kHz in monochannel
with -bit depth, cut to  seconds’ duration for optimizing
embedding capacity calculation, and nally saved as .wav le.
Security and Communication Networks
e following are the audio sources used for this research
experiment:
() “My Heart Will Go On” by Celine Dion from lm
“Titanic” saved as tt.wav
() “Beat It” by Michael Jackson from album “riller”
saved as mj.wav
() “Like a Rolling Stone” by Bob Dylan from album
“Highway  Revisited” saved as bob.wav
() Title song from lm “Mamma Mia!” by Meryl Streep
saved as mm.wav
() Title song from lm “High School Musical” by chorus
saved as hsm.wav.
Secret Image. ough any types of grayscale image (.jpg
or .bmp) can be used here as secret, however for this
experiment binary images (.pbm) have been chosen for better
quality extraction. For this proposed method, secret images
need to transform to binary, which is lossy conversion;
hence any true-color RGB images cannot be applied here
as, aer extraction, the retrieved image will only have two
colors—black and white. Secret image size here is taken as
128× 128, which can be further increased if the length of
input cover audio source is more than  seconds. For this
experiment, secret images have been either downloaded from
Internet (these do not have any copyright restriction) or
drawn by Microso Paint soware.
3.3. Scrambling and Descrambling Algorithm for Secret Image.
e “Arnold transform” algorithm randomizes the input
image by number of iterations to create scrambled image.
Input: AnybinaryImage(𝑚×𝑛), number of iteration ()
Output: Scrambled Image (out)
Algorithm: written as function Arnold (𝑚×𝑛,)
Step 1:Findoutthesizeofand store in and
Step 2:
for =1to
for =0to
for =0to
Find out =11
12

;
out(mod((2),)+1,mod((1),)+1)
(+1,+1);
end;
end;
=out ;
end;
Once applied to the scrambled image, the “Reverse Arnold
Transform” algorithm returns the original secret image aer
specied iterations.
Input: Any scrambled binary Image (𝑚×𝑛), number of
iteration ()
Output: Descrambled Image (out)
Algorithm: written as function iArnold (𝑚×𝑛,)
Step 1: Find out the size of and store in and
Step 2:
for =1to
for =0to
for =0to
Find out =2−1
−1 1
;
out (mod((2),)+1,mod((1),)+1)
←(+1,+1);
end;
end;
=out ;
end;
3.4. Embedding and Multiplicative Factors. As shown in ()
in Section ., embedding factor ()hasbeenmultiplied
with PN to oset the increment of DCT coecient value
such that, aer embedding, stego audio will not have any
audiblenoise.Hencethevalueofmust be between 
and . Aer repeated experiments, it has been observed that
when value of embedding factor nears , then the extracted
message is having very high PSNR and SSIM—which tends
to high robustness—however simultaneously, in stego audio,
there are audible artifacts identied, which is dierentiating
with the cover audio. is signies value of near to
 compromise imperceptibility. On the other hand, if the
value of approaches , the stego audio would be just
like the original cover audio (the PSNR between these two
audios reaches around  dB), whereas then the secret image
extracted is completely corrupted. ese test results indicate
that, to get an optimum outcome, the tradeo must be done
between robustness and imperceptibility.
While experimenting with several cover audios along
with various secret images, it has been also noticed that
keeping a constant value of embedding factor ()cannot
ensure similar quality outcome, aer extraction. Henceforth
it is decided to set depending on the cover to generate
the optimal result. As the data hiding takes place in the
LH subband of DWT, hence, to formularize ,maximum
coecient value of the LH subband has been chosen as one
of the aspects of the following formula:
Embedding Factor ()
=Multiplicative Factor
×Max (coecients of LH).
()
Finally, for this proposed method, the value of Multi-
plicative Factor has been universally set as ., based on the
experimental outcome, as shown in Table .
3.5. Pseudorandom Number. For embedding secret into
cover, in this proposed method “pseudorandom number”
(PN) has been used; PN is generated using Linear Feedback
Security and Communication Networks
T : Experimental Results with dierent embedding and multiplicative factors.
Original secret Extracted secret image Embedding
factor
PSNR of
extracted
secret
SSIM of
extracted
secret
PSNR of
stego audio
.x Maximum
Coecient
Value of L H
. . .
.x Maximum
Coecient
Value of L H
. . .
.x Maximum
Coecient
Value of L H
. . .
Bit 5 Bit 4 Bit 3 Bit 2 Bit 1
F : Simplied block diagram of LFSR.
Shi Register (LFSR), as shown in Figure . Here LFSR
hasbeendesignedusingonlyrightshioperatorandthe
operation of this shi register is completely deterministic. It
must be initialized with a set of numbers and, at any given
point, the value of LFSR can be determined by its present
state.
In this proposed method, two simple algorithms have
been designed to generate two dierent sets of PN values
for a given key with the same initial sequence of numbers.
is initial sequence can be altered any time. Here, for easy
illustration purpose, “00001” has been chosen as
initial sequence.
Description: e below algorithm(s) generates endless
non-sequential lists of numbers in binary base
using Linear Feedback Shi Register.
Input: AnumberasKey
Output: Pseudo-random Numbers, PN[]and PN[]
respectively.
Algorithm 1: written as function SRPN (Key)
Step 1: set =Key;
Step 2: set initial state of shi register as
state = 00001
Step 3: set PN = [];
Step 4:
for =1to
PN = [PN state(5)]
if state(1)== state(4)
then set temp = ;
else set temp = ;
end;
set state(1)=state(2);
Security and Communication Networks
set state(2)=state(3);
set state(3)=state(4);
set state(4)=state(5);
set state(5)=temp;
end;
Algorithm 2: written as function SRPN (Key)
Step 1: set =Key;
Step 2: set initial state of shi register as
state = 00001
Step 3: set PN = [];
Step 4:
for =1to
PN = [PN state(5)]
if state(1)== state(2)
then set temp  = ;
else set temp  = ;
end;
if state(4)== temp 
then set temp  = ;
else set temp  = ;
end;
if state(5)== temp 
then set temp  = ;
else set temp  = ;
end;
set state(1)=state(2);
set state(2)=state(3);
set state(3)=state(4);
set state(4)=state(5);
set state(5)=temp;
end;
3.6. Embedding Algorithm. To ensure more security and
imperceptibility, in this proposed method, the secret message
is embedded in the transform domain using discrete wavelet
transform (DWT) as well as by discrete cosine transform
(DCT).
Description: algorithm for embedding secret data.
Input:aCoverAudio(𝑎), Secret message as an image
(𝐼)
Output: a Stego Audio (Steg Aud).
Algorithm:
Step 1: read cover audio (𝑎)
Step 2:readsecretmessage(𝐼)
Step 3: set iteration as a number =
Step 4:callfunctionArnold(𝐼,)whichreturns
scrambled image (𝐼)
Step 5: set Key as a number =
Step 6: call function SRPN()whichreturns
PN[];
Step 7: call function SRPN()whichreturns
PN[];
Step 8: apply D DWT on 𝑎to decompose in
LL, LH, HL and HH;
Step 9:ndmax
f= max(value of coecients in
LH);
Step 10: set embedding factor ()=Multiplica-
tive Factor ×maxf
Step 11:applyDDCToverLHandget
(𝑎).
Step 12: nd mid-band coecient region of
(𝑎)and term it as mid(
(𝑎));
Step 13:if𝐼(,)== 
then set mid(
(𝑎))=mid(
(𝑎))+×
PN[];
else set mid(
(𝑎))=mid(
(𝑎))+×
PN[];end;
Step 14: perform inverse DCT to get new(LH).
Step 15: perform inverse DWT using LL,
new(LH), HL, HH and get Stego
Step 16: write Stego in Steg Aud
3.7. Mixing Algorithm. is algorithm mixes two audio
sources from two dierent channels to create cocktail eect
of two audio signals.
Input: two monochannel .wav les (1and 2)having
same duration and sampling rate of  Hz
Output: .wav les having cocktail sound eect (S3and
4)
Algorithm: written as function Mixing (1,2)
Step 1: set Gain Factor ()asdecimal(0<<1)
Step 2: read 1and 2in sig1&sig
2while keeping
their respective sampling frequencies stored in
Fs1and Fs2
Step 3: set Mixed1=sig
1+(×sig2)andMixed
2
=sig
2+(×sig1);
Step 4: write Mixed1in audio le 3with Fs1and
write Mixed2in audio le 4with Fs2
3.8. Demixing Algorithm. Here, for demixing, FastICA MAT-
LAB package (ver. .) has been used which estimates
the independent components from given multidimensional
signals using Blind Source Separation technique.
Input: two .wav les (3and 4) containing mixed
signals from dierent channels
Output: twounmixedsource.wavles(1,2)
Algorithm: written as function Demixing (3,4)
Step 1: read 3and 4in &while keeping
their respective sampling frequencies stored in
Fs1and Fs2
Step 2: nd complex conjugate transpose of
and ,storetheminand
Step 3: createonematrixfromand ,storeit
in
Step 4: set =FastICA();
 Security and Communication Networks
Step 5: extract two sources from as source1and
source2
Step 6: write source1in 1with Fs1and source2
in 2with Fs2
3.9. Extraction Algorithm
Input: stego audio (Steg Aud)
Output: secret image (𝐼)
Algorithm:
Step 1: read Stego audio (Steg Aud) in 𝑎
Step 2: set Key as a number =
Step 3: call function SRPN()whichreturns
PN[];
Step 4: call function SRPN()whichreturns
PN[];
Step 5: apply D DWT on 𝑎to decompose it in
LL, LH, HL and HH;
Step 6: apply D DCT over LH and get
(𝑎)
Step 7: ndmid-bandcoecientregionof
(𝑎)
and term it as mid(
(𝑎))
Step 8: if Correlation(mid(
(𝑎)), PN1[])>=
Correlation(mid(
(𝑎)), PN2[])
then 𝐼(,)=else𝐼(,)=;end;
Step 9: reshapetheimagebitsstoredin𝐼to get
secret scrambled image
Step 10: set iteration as a number =
Step 11: call function iArnold (𝐼,) which
returns secret image (𝐼)
4. Experimental Results and Analysis
is proposed method has been applied on several sets of
cover audio and secret images, though, for ecient use of
space, here only  sets of robustness test results have been
presented for Steganalysis attacks.
4.1. Adherence to Kerckho ’s Principle. In this research article,
a key based steganography technique has been proposed.
Hence it should follow Kerckhos principle of cryptography
[], which says an exemplary method should be secure even
if the public is aware of all the details of that method except
the key. As mentioned in Section ., here LFSR has been used
both at sender’s end and at receiver’s end. It requires a unique
key to generate the same set of pseudorandom numbers []
which are used in embedding equation () and again in
Step8of the extraction algorithm for comparing correlation
coecients. If the exact same key is not used during embed-
ding and extraction, then LFSR will generate dierent set of
pseudorandom numbers using which secret image cannot be
extracted from the stego audio. Henceforth it is proved that
the proposed method complies with Kerckho’s principle.
4.2. Outcome of Quality Metrics
Embedding Capacity (EC).ECismeasuredbytheratio
between size of hidden message (in bits) and size of cover
(in bits), as shown in () below. In this research experiment,
it has been observed that, to hide 128×128size of a secret
image, it requires cover audio size of  bits—which
implies embedding capacity value of .%. Similarly, to
implant a 64× 64 secret image,  bits of cover audio
is needed—this again conrms the proportion of embedding
capacity as .%.
capacity =size of hidden data
size of cover data ×100%.()
Peak Signal-to-Noise Ratio (PSNR).PSNRrepresentstheratio
betweenmaximumpoweroftestsignalandthepowerof
reference signal. e mathematical representation for PSNR
is as follows:
PSNR =10log10 Maxsf 2
MSE , ()
where Maxsf is maximum signal value or maximum uctu-
ation in the input image data type (e.g., for -bit unsigned
integer data type, Maxsf is)andMSEistheMeanSquared
Error, which is given by
MSE =1
𝑚−1
𝑖=0
𝑛−1
𝑗=0 Ref −Te s t 2,()
where Ref represents original signal; Tes t represents
degraded signal; and represent numbers of rows and
columns of the signal matrix, respectively; represents index
of row and represents index of column.
Structural Similarity Index (SSIM). SSIM is a measurement
of similarity, calculated through luminance, contrast, and
structural dierences between two images as given below.
SSIM (S,E)=2SE+12SE +2
2
S+2
E+12
S+2
E+2,()
where Sand Eare the mean of secret image S and extracted
imageE,respectively;Sand Eare the standard deviation of
SandE;SE is correlation of S and E.
Bit Error Rate (BER). BER is dened by number of error bits
divided by total number of transmitted bits, as shown in the
following equation:
BER =ErrorBit
BitsTransmitted ×100. ()
Here the BER is calculated between original secret image
and extracted secret image.
Table  shows the quality outcome of the secret and
extracted images with respect to PSNR, SSIM, BER, and
correlation coecient (CC, discussed in Section .).
Security and Communication Networks 
T : Quality analysis of secret and extracted image.
Secret image (S) Scrambled secret image Extracted scrambled
image Extracted image (E)PSNR
(S,E)
SSIM
(S,E)
BER
(S,E)
CC
(S,E)
. . . .
. . . .
. . . .
. . . .
 Security and Communication Networks
1
0.5
0
−0.5
300
200
100
0
300
200
100
0
F : Surface plot of NCC between secret and extracted image.
Perceptual Evaluation of Audio Quality (PEAQ).PEAQisa
standardized metric to evaluate audio quality utilizing human
perceptual properties, output of which is given in a scale of 
to  (where  signies poor and  implies excellent) depending
on the Mean Opinion Score (MOS) of all listeners. e quality
of output audio is measured by comparing with a reference
audio.
Normalized Cross-Correlation (NCC). NCC quanties degree
of similarity between two signals. NCC computes normalized
two-dimensional cross-correlation values between two image
metrics. e values of correlation coecients lie between
and , where  signies identical images and  denotes totally
dierent image. It is formulated as
NCC =𝐴
𝑝=1 𝐵
𝑞=1 ,,
𝐴
𝑝=1 𝐵
𝑞=1 ,2𝐴
𝑝=1 𝐵
𝑞=1 ,2,()
where (,) is the extracted image and (,) is the
reference image. NCC is used to produce surface plot, which
depicts functional relationship between two independent
variables and map to a plane which is parallel to -plane.
Here,inFigure,thesurfaceplotofNCCbetweensecretand
extracted image has been shown.
In Table , quality analysis of the cover and stego audio
has been shown in PSNR, PEAQ, and CC.
4.3. Robustness Tests by Steganalysis Attacks
By Random Cropping. On average, English music or a full
song has duration of over  minutes, that is, more than 
seconds. In this proposed method, only  seconds of audio is
required to hide a secret image having size of 128×128.is
secret can be kept anywhere within the stego, that is, at the
startorattheendoraerth seconds—in short, the secret
can be moved throughout the cover and the exact place of
hiding is not predetermined. at is why  out of  attempts
of random cropping leave the secret image intact, as stego
has been cropped elsewhere. For the remaining  out of 
attempts, that is, when the stego audio has been cropped in
×105
1
0.5
0
0.5
−1024681012
(a) Graphical plot of audio during cropping
(b) Scrambled secret
(c) Extracted secret
F 
such a place where secret image was embedded, Figures (a),
(b), and (c) provide the results.
As shown in Figure (a), from a stego audio of  seconds’
duration, -second-long window (from nd to th second)
has been chosen and the remaining audio signal has been
replaced with zero. When the intended recipient applies the
extraction mechanism on such modied stego audio, it gen-
erates only a portion of scrambled secret image as shown in
Figure(b).However,when“ReverseArnoldTransform”has
beenappliedonsuchpartiallyscrambledsecretimage,itstill
recovers the extracted secret as shown in Figure (c). Quality
analysis of the extracted secret image has revealed PSNR
value of . and SSIM value of ., when compared
with the original secret image which was embedded.
By Adding White Gaussian Noise. In this type of attack,
Additive White Gaussian Noise” (AWGN) is added to the
stego audio to distort the hidden message. AWGN can
beaddedtoanysignal,andithasuniformpowerandis
Security and Communication Networks 
T : Quality analysis of cover and stego audio.
Secret image and cover audio Graphic plot of cover audio Graphic plot of stego audio PSNR
(in dB)
PEAQ
(in MOS) CC
embedded in bob.wav
×105
1
0.5
0
−0.5
−1024681012
×105
1
0.5
0
−0.5
−1024681012
. . .
embedded in mm.wav
×105
0.4
0.2
0
−0.4
−0.2
024681012
×105
0.4
0.2
0
−0.4
−0.2
024681012
. . .
embedded in mj.wav
×105
1
0.5
0
−0.5
−1024681012
×105
1
0.5
0
−0.5
−1024681012
. . .
embedded in tt.wav
×105
1
0.5
0
−0.5
−1024681012
×105
1
0.5
0
−0.5
−10 2 4 6 8 10 12
. . .
 Security and Communication Networks
T : Experimental results of AWGN stego analysis attack.
Original secret Extracted secret aer adding
noise at SNR  dB
Extracted secret aer adding
noise at SNR  dB
Extracted secret aer adding
noise at SNR  dB
PSNR: .
SSIM: .
PSNR: .
SSIM: .
PSNR: .
SSIM: .
PSNR: .
SSIM: .
PSNR: .
SSIM: .
PSNR: .
SSIM: .
Security and Communication Networks 
T : Outcome of resampling attack.
Original secret embedded in cover
audio of sampling rate   Hz
Extracted secret aer changing
sampling rate to  Hz
Extracted secret aer changing
sampling rate to  Hz
Extracted secret aer changing
sampling rate to  Hz
PSNR:.
SSIM: .
PSNR: .
SSIM: .
PSNR:.
SSIM: .
PSNR: .
SSIM: .
PSNR: .
SSIM: .
PSNR: .
SSIM: .
 Security and Communication Networks
T : Results of requantization attack.
Original secret embedded in cover
audio of -bit depth
Extracted secret aer changing
audio bit depth to  bits
Extracted secret aer changing
audio bit depth to  bits
Extracted secret aer changing
audio bit depth to  bits
PSNR:.
SSIM: .
PSNR:.
SSIM: .
PSNR:.
SSIM: .
PSNR: .
SSIM: .
PSNR: .
SSIM: .
PSNR: .
SSIM: .
Security and Communication Networks 
T : Experimental results of pitch shiing attack.
Original secret in cover audio Extracted secret aer %
reduction of audio pitch
Extracted secret aer %
increment of audio pitch
Extracted secret aer %
increment of audio pitch
PSNR: .
SSIM: .
PSNR:.
SSIM: .
PSNR: .
SSIM: .
PSNR: .
SSIM: .
PSNR: .
SSIM: .
PSNR: .
SSIM: .
 Security and Communication Networks
T : Experimental results of MP compression attack.
Secret embedded in cover audio .wav
le
Extracted secret from MP of
bitrate  kbps
Extracted secret from MP of
bitrate  kbps
Extracted secret from MP of
bitrate  kbps
PSNR:.
SSIM: .
PSNR:.
SSIM: .
PSNR:.
SSIM: .
PSNR: .
SSIM: .
PSNR: .
SSIM: .
PSNR: .
SSIM: .
Security and Communication Networks 
distributed with respect to time. As shown in Table , to test
robustness of the proposed method, here , , and  dB of
SNR (Signal-to-Noise Ratio) per sample is added to the stego
audio signal, assuming the power of stego signal is  dBW
(decibel-watt is a unit of power in decibel scale, relative to
watt).
By Resampling. While writing audio data into a le, sampling
rate of the audio is generally mentioned as Fs. In the
resamplingattack,atrstthissamplingratehasbeenchanged
to a higher or lower frequency while saving the same audio in
a new le. As resampling causes impact on audio le length,
hence, to maintain the same length as of original cover,
modiedaudiohasbeencutorlledwithzeros.Oncesaved,
resampling has been performed again on the modied audio
to revert it back to the original sampling frequency—by this,
audibly no dierences will be noted; however it will distort the
embedded secret message (if any). In Table , result of such
resampling attack has been shown.
By Requantization.enumberofbitsrequiredtoexpress
each audio sample is known as bit depth. It is a measurement
of sound accuracy: the higher the bit depth is, the more it
wouldbeprecise.Intherequantizationattack,thisbitdepth
of stego audio has been changed to pervert the embedded
secret image. Table  illustrates the outcome of the extraction
process aer requantization attack.
By Pitch Shiing. Pitch means tone of a signal; it describes
the quality of a sound by the rate of vibrations. In pitch
shiing attack, original pitch of an audio is lied or dropped
without modifying its length to destroy the hidden message
embedded in a stego audio. Here pitch shiing has been done
by utilizing time-scale modication algorithm called “Phase
Vocoder” [], the result of which is shown in Table .
By MP3 Compression. In this Steganalysis attack, stego.wav
le has been compressed to MP format to eliminate redun-
dant data, by which embedded secret message would be
completely removed. Here mpwrite MATLAB function has
been used to convert the stego.wav le into mp format and
mpread MATLAB function has been applied to read from
the mp le during extraction process.
Table  reects the extraction outcome from three dier-
ent mp les of the same stego audio which has been encoded
with bitrates  kbps,  kbps, and  kbps, respectively.
4.4. Comparison with Existing Method. For comparison with
the proposed method, research articles published in SCI
indexed journal have been searched—where data hiding
in audio has been performed by DWT along with DCT
and extraction mechanism is blind. Authors of [] have
proposed DCT-DWT based data hiding technique using -
bit Barker code as synchronizing code to accommodate 64×
64 binary image as secret message. From the comparison
results presented in Table , this can be proved that the
proposed method has outperformed the existing one in terms
of quality and robustness test against Steganalysis attacks.
T : Comparison results.
Features based comparison and
robustness tests
Proposed
method
Existing
method []
Secret message size 128×128 64×64
Adherence to Kerckho ’s principle N
Peak signal to noise ratio . -
Structural similarity index . -
Perceptual evaluation of audio quality -
Addition of white Gaussian noise 
Random cropping Steganalysis attack N
Resampling Steganalysis attack 
Requantization Steganalysis attack 
Pitch shiing Steganalysis attack N
MP compression Steganalysis attack 
In Table , “” signies “satisfactory result obtained”;
N” signies “unsatisfactory result or method does not
comply”; and “-” implies “details not mentioned.”
5. Conclusion
Secret communication using age-old steganography tech-
niques oen increases chances of detectability through the
perceivable noise. Hence, in this article, the cocktail party
eect has been considered which has eectively reduced the
probability of detectability. is has also been proved by the
help of dierent Steganalysis techniques. Additionally, PSNR,
CC, and PEAQ values are also analyzed to determine the per-
ceptual noise recorded due to secret message embedding and
extraction. Since all the above results verify the undetectabil-
ity and robustness of the system, hence it can be concluded
that this audio steganography technique is successful in secret
communication with very high robustness.
In future, this proposed method can be further impro-
vised by utilizing speaker diarization technique, which deter-
mines “who spoke when.” Application of speaker diarization
along with speech recognition would identify a speaker’s
voice and this concept will permit segregating secret audio
stream into multiple speech segments, ensuring another
novel approach of data hiding.
Conflicts of Interest
e authors declare that there are no conicts of interest
regarding the publication of this paper.
References
[] B. G. Banik and S. K. Bandyopadhyay, “Review on steganog-
raphy in digital media,International Journal of Science and
Research,vol.,no.,pp.,.
[] M. Asad, J. Gilani, and A. Khalid, “An enhanced least signif-
icant bit modication technique for audio steganography,” in
Proceedings of the 1st International Conference on Computer
Networks and Information Technology (ICCNIT ’11), pp. –,
IEEE, Pakistan, July .
 Security and Communication Networks
[] N. Cvejic and T. Seppanen, “Increasing the capacity of LSB-
based audio steganography,” in Proceedings of the 2002 5th IEEE
Workshop on Multimedia Signal Processing (MMSP ’02),pp.
–, IEEE, USA, December .
[] Jayaram, Ranganatha, and Anupama, “Information Hiding
Using Audio Steganography - A Survey,e International
Journal of Multimedia & Its Applications,vol.,no.,pp.
, .
[] D. Gruhl, A. Lu, and W. Bender, “Echo hiding,” in Information
Hiding,vol.ofLecture Notes in Computer Science, pp. –
, Springer Berlin Heidelberg, Berlin, Heidelberg, .
[] D. Xiaoxiao, M. F. Bocko, and Z. Ignjatovic, “Robustness
analysis of a digital audio steganographic method based on
phase manipulation,” in Proceedings of the 7th International
Conference on Signal Processing (ICSP ’04),vol.,pp.–,
IEEE, Beijing, China, .
[] N. Parab, M. Nathan, and K. T. Talele, “Audio Steganography
Using Dierential Phase Encoding,” in Technology Systems and
Management,vol.ofCommunications in Computer and
Information Science, pp. –, Springer Berlin Heidelberg,
Berlin, Heidelberg, .
[] R. M. Nugraha, “Implementation of Direct Sequence Spread
Spectrum steganography on audio data,” in Proceedings of the
2011 International Conference on Electrical Engineering and
Informatics (ICEEI ’11), pp. –, IEEE, Indonesia, July .
[] H. Matsuoka, “Spread spectrum audio steganography using
sub-band phase shiing,” in Proceedings of the 2006 Interna-
tional Conference on Intelligent Information Hiding and Mul-
timedia Signal Processing (IIH-MSP ’06), pp. –, IEEE, USA,
December .
[] G. Prabakaran and R. Bhavani, “A modied secure digital
image steganography based on discrete wavelet transform,” in
Proceedings of the 2012 International Conference on Computing,
Electronics and Electrical Technologies (ICCEET ’12),pp.
, IEEE, India, March .
[] N. Gupta and N. Sharma, “Dwt and LSB based Audio Steganog-
raphy,” in Proceedings of the 2014 International Conference on
Reliability, Optimization and Information Technology (ICROIT
’14), pp. –, IEEE, India, February .
[] S. S. Verma, R. Gupta, and G. Shrivastava, “A novel technique
for data hiding in audio carrier by using sample comparison
in DWT domain,” in Proceedings of the 2014 4th International
Conference on Communication Systems and Network Technolo-
gies (CSNT ’14), pp. –, IEEE, India, April .
[] W. Junjie, M. Qian, M. Dongxia, and Y. Jun, “Research for
synchronic audio information hiding approach based on DWT
domain,” in Proceedings of the 2009 International Conference on
E-Business and Information System Security (EBISS ’09),pp.,
IEEE, China, May .
[] A. Kanhe and G. Aghila, “DCT based audio steganography
in voiced and un-voiced frames,” in Proceedings of the 1st
International Conference on Informatics and Analytics (ICIA ’16),
pp. –, ACM Press, India, August .
[] Z. Zhou and L. Zhou, “A novel algorithm for robust audio water-
marking based on quantication DCT domain,” in Proceedings
of the 3rd International Conference on Intelligent Information
Hiding and Multimedia Signal Processing (IIHMSP ’07),pp.
, IEEE, Taiwan, November .
[] W. Yongqi and Y. Yang, “A synchronous audio watermarking
algorithm based on chaotic encryption in DCT domain,” in
Proceedings of the 2008 International Symposium on Information
Science and Engineering (ISISE ’08), pp. –, IEEE, China,
December .
[]S.Roy,N.Sarkar,A.K.Chowdhury,andS.M.A.Iqbal,
An ecient and blind audio watermarking technique in DCT
domain,” in Proceedings of the 18th International Conference on
Computer and Information Technology (ICCIT ’15), pp. –,
IEEE, Bangladesh, December .
[] B. Ratner, “e correlation coecient: Its values range between
+/, or do they?” Journal of Targeting, Measurement and
Analysis for Marketing,v
ol.,no.,pp.,.
[] L. Min, L. Ting, and H. Yu-jie, “Arnold Transform Based Image
Scrambling Method,” in Proceedings of the 3rd International
Conference on Multimedia Technology (ICMT ’13),pp.
, Publisher Atlantis Press, Guangzhou, China, November
.
[] L.Wu,J.Zhang,W.Deng,andD.He,“Arnoldtransforma-
tion algorithm and anti-Arnold transformation algorithm,” in
Proceedings of the 1st International Conference on Information
Science and Engineering (ICISE ’09), pp. –, IEEE, China,
December .
[] N. V. Lalitha, S. Rao, and P. V. JayaSree, “DWT - Arnold Trans-
form based audio watermarking,” in Proceedings of the 2013
IEEE Postgraduate Research in Microelectronics and Electronics
Asia (PrimeAsia), pp. –, IEEE, Visakhapatnam, India,
December .
[] S. Gaur and V. K. Srivastava, “Robust embedding of improved
arnold transformed watermark in digital images using RDWT-
SVD,” in Proceedings of the 4th IEEE International Conference on
Parallel, Distributed and Grid Computing (PDGC ’16),pp.
, IEEE, India, December .
[] Z. Zhang, C. Wang, and X. Zhou, “Image watermarking
schemebasedonArnoldtransformandDWT-DCT-SVD,”in
Proceedings of the 13th IEEE International Conference on Signal
Processing (ICSP ’16), pp. –, IEEE, China, November
.
[] E. C. Cherry, “Some experiments on the recognition of speech,
with one and with two ears,e Journal of the Acoustical Society
of America,vol.,no.,pp.,.
[] R. Russell, Cognition: eory and Practice,WorthPublishers,
.
[] B. Arons, “A Review of e Cocktail Party Eect,Journal of e
American Voice I/O Society,vol.,pp.,.
[] D. E. Broadbent, “Selective listening to speech,” in Perception
and Communication,pp.,PergamonPress,.
[] N. I. Durlach and H. S. Colburn, “Binaural Phenomena,” in
Hearing, pp. –, Elsevier, .
[] J. Blauert and R. A. Butler, “Spatial hearing: the psychophysics of
human sound localization,e Journal of the Acoustical Society
of America,vol.,no.,pp.-,.
[] J. Kassebaum, M. F. Tenorio, and C. Schaefers, “e Cocktail
Party Problem: Speech/Data Signal Separation Comparison
between Backpropagation and SONN,” in Proceedings of the
2nd International Conference on Neural Information Processing
Systems, pp. –, MIT Press Cambridge, Cambridge, USA,
.
[] S. Haykin and Z. Chen, “e cocktail party problem,Neural
Computation,vol.,no.,pp.,.
[] C. von der Malsburg, “ e correlation theory of brain function,
in Models of Neural Networks, Temporal Aspects of Coding
and Information Processing in Biological Systems, pp. –,
Springer New York, New York, NY, USA, .
Security and Communication Networks 
[] C. von der Malsburg and W. Schneider, “A neural cocktail-party
processor,Biological Cybernetics,vol.,no.,pp.,.
[] D.L.WangandG.J.Brown,“Separationofspeechfrominterfer-
ing sounds based on oscillatory correlation,” IEEE Transactions
on Neural Networks and L earning Systems,vol.,no.,pp.
, .
[] G. J. Brown and D. L. Wang, “An oscillatory correlation frame-
work for computational auditory scene analysis,” in Advances
in Neural Information Processing Systems 12, pp. –, MIT
Press, .
[] B.Sagi,S.C.Nemat-Nasser,R.Kerr,R.Hayek,C.Downing,
and R. Hecht-Nielsen, “A biologically motivated solution to the
cocktail party problem,Neural Computation,vol.,no.,pp.
–, .
[] S. Ao, Z. Luo, N. Zhao, and R. Wang, “Blind source sepa-
ration based on principal component analysis- independent
component analysis for acoustic signal during laser welding
process,” in Proceedings of the 2010 International Conference on
Digital Manufacturing and Automation (ICDMA ’10),pp.
, IEEE, China, December .
[] I. T. Jollie, Principal Component Analysis, Springer Series in
Statistics, Springer, New York, NY, USA, nd edition, .
[] M. E. Wall, A. Rechtsteiner, and L. M. Rocha, “Singular
value decomposition and principal component analysis,” in A
Practical Approach to Microarray Data Analysis,pp.,
Kluwer Academic Publishers, .
[] J. Wellhausen, “Audio signal separation using independent sub-
space analysis and improved subspace grouping,” in Proceedings
of the 7th Nordic Signal Processing Symposium (NORSIG ’06),pp.
–, IEEE, Iceland, June .
[] Q. Cai and X. Tang, “A digital audio watermarking algorithm
based on independent component analysis,” in Proceedings of
the 9th International Congress on Image and Signal Processing,
BioMedical Engineering and Informatics, CISP-BMEI 2016,pp.
–, IEEE, China, October .
[] A. Hyv¨
arinen, “Independent component analysis: recent
advances,Philosophical Transactions of the Royal Society A:
Mathematical, Physical & Engineering Sciences,vol.,no.
,pages,.
[] A. Hyv¨
arinen, “Fast and robust xed-point algorithms for
independent component analysis,IEEE Transactions on Neural
Networks and Learning Systems,vol.,no.,pp.,.
[] H. Helfrich, TimeandMindII:InformationProcessingPerspec-
tives,Hogrefe&HuberPublishers,Cambridge,MA,USA,.
[] M. F. Casanova and I. Opris, Eds., Recent Advances on the
Modular Organization of the Cor tex, Springer Netherlands,
Dordrecht, .
[] C. Ionescu and R. De Keyser, “Exploring the advantages of blind
source separation in monitoring input respiratory impedance
during apneic events,Journal of Control Engineering and
Applied Informatics,vol.,no.,pp.,.
[]Q.Su,Y.Shen,W.Jian,andP.Xu,“Blindsourceseparation
algorithm based on modied bacterial colony chemotaxis,” in
Proceedings of the 5th International Conference on Intelligent
ControlandInformationProcessing(ICICIP’14), pp. –,
IEEE, Dalian, China, August .
[] F. A. P. Petitcolas, “Kerckhos’ Principle,” in Encyclopedia of
Cryptography and Security,H.C.A.vanTilborgandS.Jajodia,
Eds.,p.,Springer,Boston,MA,USA,.
[] Paar, Christof, and J. Pelzl, “Stream Ciphers,” in In Understand-
ing Cryptography,p.,Springer,Berlin,Heidelberg,Germany,
.
[] J. Laroche and M. Dolson, “New phase-vocoder techniques
for pitch-shiing, harmonizing and other exotic eects,” in
Proceedings of the 1999 Workshop on Applications of Signal
Processing to Audio and Acoustics, pp. –, IEEE, New Paltz,
NY, USA.
[] X.-Y. Wang and H. Zhao, “A novel synchronization invariant
audio watermarking scheme based on DWT and DCT,IEEE
Transactions on Signal Processing,vol.,no.,pp.,
.
... The author uses LSB and most significant bit (MSB) differencing method along with fletcher-munson curvebased method to hide a secret message into a cover audio file and add new samples to the existing audio channel. While in [14], Banik and Bandyopadhyay proposes a key based blind audio steganography method with the discrete wavelet transform (DWT) as well as discrete cosine transform (DCT). To make the system more robust and undetectable, the cocktail party problem has been explored for wrapping stego audio. ...
... In the Table 3, whenever the data transmitted is less, the transmission is faster and has less risk and a one-second audio with 16 or 24-bits cannot be hide in one second at the least bit only. Therefore, we suggested full benefit of the cover by hiding these data in the bit 22, 23, and 24 in case of 24 b/s and in 14,15, and 16 in case of 16 b/s according of audio representation. Consequently, hiding audio with length of 1 sec. in the cover with 1 sec. ...
Article
Full-text available
Although variety in hiding methods used to protect data and information transmitted via channels but still need more robustness and difficulty to improve protection level of the secret messages from hacking or attacking. Moreover, hiding several medias in one media to reduce the transmission time and band of channel is the important task and define as a gain channel. This calls to find other ways to be more complexity in detecting the secret message. Therefore, this paper proposes cryptography/steganography method to hide an audio/voice message (secret message) in two different cover medias: audio and video. This method is use least significant bits (LSB) algorithm combined with 4D grid multi-wing hyper-chaotic (GMWH) system. Shuffling of an audio using key generated by GMWH system and then hiding message using LSB algorithm will provide more difficulty of extracting the original audio by hackers or attackers. According to analyses of obtained results in the receiver using peak signal-to-noise ratio (PSNR)/mean square error (MSE) and sensitivity of encryption key, the proposed method has more security level and robustness. Finally, this work will provide extra security to the mixture base of crypto-steganographic methods.
... Finally, the findings of tests and comparisons with the current technique show that the new scheme has greater hiding ability while maintaining imperceptibility and solidity. In [27] 2018, suggested the main based blind steganography approach on discrete wavelet transform (DWT) as well as discrete cosine transformation (DCT). The image used as a hidden message after it processes with Arnold's transformation. ...
Article
Nowadays, cases of theft of important data both by employees of the organization and outside hackers are increasing day-by-day. So, new methods for information hiding and secret communication are need of today. Steganography is an option for it. Embedding a secret message into other meaningful messages (cover media) without disturbing the features of the cover media is known as steganography [1]. The first method proposed is called the cocktail mixed method. In this method hiding a secret image inside the audio cover. Using Discrete Wavelet Transform (DWT) and Discrete cosine transform (DCT) for the cover file while secret information is an image encrypted with a Henon map method to increase the complexity. The resulting stego signal mixed with other audio to increase complexity. The second method proposed is called the Single Value Decomposition (SVD) method. This method used mixing two matrices of the three matrices results from applying SVD on input binary image and embedding these parameters in the audio signal and the third matrix use as a key for extraction. The DWT is applied to the audio file for separating low frequency from the high-frequency signal. The third method proposed is called a speech hiding method. That hides speech signals in audio signals after compressing the speech using a discrete cosine transform (DCT) and another speech is used as a secret key and applying the same operations on it. The cover audio signal is segmented into frames and transform into the frequency domain using Fast Fourier Transform (FFT). Finally, the results proved the quality of the proposed methods and their strength against attacks. The mean secured error, signal to noise ratio, and accuracy for retrieval images after embedding it gave good results.
... The principle is to hide secret information without being noticed by a third party by modifying redundant data in digital media or protocols, such that the carrier's use attributes are not changed during transmission. By this means, a secret message can be embedded into cover objects and transmitted through public channels [2,3]. At present, it is widely used in transmission media such as voice, image, video, and text. ...
Article
Full-text available
The rapid advance and popularization of VoIP (Voice over IP) has also brought security issues. VoIP-based secure voice communication has two sides: first, for legitimate users, the secret voice can be embedded in the carrier and transmitted safely in the channel to prevent privacy leakage and ensure data security; second, for illegal users, the use of VoIP Voice communication hides and transmits illegal information, leading to security incidents. Therefore, in recent years, steganography and steganography analysis based on VoIP have gradually become research hotspots in the field of information security. Steganography and steganalysis based on VoIP can be divided into two categories, depending on where the secret information is embedded: steganography and steganalysis based on voice payload or protocol. The former mainly regards voice payload as the carrier, and steganography or steganalysis is performed with respect to the payload. It can be subdivided into steganography and steganalysis based on FBC (fixed codebook), LPC (linear prediction coefficient), and ACB (adaptive codebook). The latter uses various protocols as the carrier and performs steganography or steganalysis with respect to some fields of the protocol header and the timing of the voice packet. It can be divided into steganography and steganalysis based on the network layer, the transport layer, and the application layer. Recent research results of steganography and steganalysis based on protocol and voice payload are classified in this paper, and the paper also summarizes their characteristics, advantages, and disadvantages. The development direction of future research is analyzed. Therefore, this research can provide good help and guidance for researchers in related fields.
... Steganography, which embeds secret message into host signals, is an important way of secure communication. By this means, the secret message is embedded into cover objects and transmitted through public channels [3,35]. Steganography has attracted remarkable attention from researchers due to its potential application in multimedia security. ...
Article
Full-text available
Speech is one of the essential ways of communication. The study of speech steganography provides great value in information security. To improve imperceptibility and robustness of speech steganography, the characteristics of speech signals should be fully taken into account. In this paper, a robust speech steganographic scheme based on Singular Value Decomposition (SVD) and Modified Discrete Cosine Transform (MDCT) is proposed. Firstly, Voice Activity Detector (VAD) is used to detect voiced frames from speech signals, along with MDCT with Kaiser Bessel Derived (KBD) window being performed on each frame. Then the MDCT coefficients are selected from a certain frequency range and divided into a pair of segments. The two largest singular values of the paired segments are modified respectively according to their value difference to embed secret message. The thresholds are adaptively adjusted according to the largest singular values. Extensive experiments are carried out to compare the proposed method with three other methods from imperceptibility, robustness, capacity, and security. The experimental results show that under the simulation parameters β = 320, Nk = 58, fl = 100 Hz, fh = 3 kHz, and α = 0.61, the proposed method has striking advantages to resist common robust attacks and the state-of-the-art steganalysis attacks while maintaining good imperceptibility.
... Mixed drink party impact is utilized in sound steganography framework where visually impaired key idea is applied to oppose the attack in the framework [10]. To improve the security of the information. ...
... The complete complementary arrays are used in the encoding stage to profit from their impulse autocorrelation sum. In [4], a watermarking algorithm based on (CCC) and Fast Walsh Hadmard Transform (FWHT) was introduced. The algorithm has applied (CCC) technique on watermark and FWHT on original image. ...
Article
Full-text available
The recent revolution of the Internet as a collaborative medium has opened the door for people who want to share their work. Nevertheless, this may cause serious problems for privacy and copyright protection. Steganography is a powerful tool for protecting important data during transmission. It’s used to hide any secret information like text, image or audio behind a cover file. In this study, a new robust audio steganography technique based on optimum two dimensional Complete Complementary Codes (CCC) has been adopted to encode colour images data and obtain two differently encoded versions of it. These two versions are hidden in DWT coefficients of the two channels of stereo audio signal and embedding locations are determined via 2-D chaotic map random sequence. Complete Complementary Codes (CCC) are sets of spread spectrum sequence family that have ideal auto and cross-correlation properties so, they found many applications in several science areas with the broadest application possibilities in telecommunications. Various attacks are applied to the host audio signals and simulation results show high robustness and capacity with good quality of the extracted image.
Article
Full-text available
The purpose of audio steganography is to cover confidential data in digital audio for confidentiality. To embed the secret image into the audio signal is the objective of the proposed audio steganography. For that purpose, the method utilizes the optimal audio steganography using Adaptive cat swarm optimization and secure XOR (S-XOR) encryption with a dynamic key algorithm. Initially, Discrete Wavelet Transform (DWT) is used to decompose the audio signal into two sub-bands. The novelty of the recommended technique is to optimally select the embedding position using the Adaptive Cat Swarm Optimization algorithm (ACSO). To improve the security of data hiding, secure XOR (S-XOR) encryption along with dynamic key is used here. Before the encryption, the Modified Huffman encoding (MHE) technique is used to compress the secret image; it helps to reduce the storage space of the proposed technique. Using the mean square error (MSE), signal to noise ratio (SNR), and peak signal to noise ratio (PSNR), the quality of stego audio is analyzed. Experimental outcomes demonstrate that the implemented audio steganography method maintains embedding quality with a high PSNR of 52.67 dB. The recommended technique is implemented in the MATLAB working platform.
Conference Paper
Full-text available
In this paper, an improved digital watermarking scheme combined with multi-level discrete cosine transform (DCT), discrete wavelet transform (DWT), and singular value decomposition (SVD) is proposed. To improve the security of the watermark, the encrypted watermark is acquired by Arnold transform before watermark embedding. The proposed scheme is proved to be robust, and its performance is evaluated with respect to the normalized correlation (NC) and peak signal-to-noise ratio (PSNR). The robustness of the proposed scheme is also evaluated under different attacks which include salt and pepper noise, Gaussian noise, and other attacks. The experimental results are also conducted to verify the effect of false positive problem which exists in most watermarking schemes based on SVD.
Article
Full-text available
Today’s large demand of internet applications requires data to be transmitted in a secure manner. Datatransmission in public communication system is not secure because of interception and impropermanipulation by eavesdropper. So the attractive solution for this problem is Steganography, which is the artand science of writing hidden messages in such a way that no one, apart from the sender and intendrecipient, suspects the existence of the message, a form of security through obscurity. Audio steganographyis the scheme of hiding the existence of secret information by concealing it into another medium such asaudio file. In this paper we mainly discuss different types of audio steganographic methods, advantages anddisadvantages.
Conference Paper
In this paper, a robust audio steganography method is proposed based on voiced and un-voiced frames. The key idea is to change the magnitude of Discrete Cosine Transform (DCT) coefficients of voiced and un-voiced frames separately. Taking advantage of voiced and un-voiced characteristics it is possible to embed more number of bits in unvoiced frames. The proposed method proves the high imperceptibility of 43.9dB measured in terms of signal to noise ratio, for a payload of 1.08Kbps. This paper provides the tradeoff between SNR and Embedding Bit Rate which facilitate to decide the imperceptibility requirement. The experimental results proves that the method has high capacity of 240 bps to 1800 bps and provides robustness against the common signal processing attacks such as re-quantization, re-sampling and additive white Gaussian noise
Article
Most blind source separation (BSS) algorithm use single-point optimization method which always have the disadvantage of slow convergence speed, bad separate precision and easily getting into the local optimization. In view of these disadvantages, recently, Chen proposed a multiple-point optimization algorithm for BSS named DPBCC, which overcome these disadvantages at a certain extent. But DPBCC uses the superior bacterial random perturbation strategy to solve the problem of local convergence, which cannot ensure that after random perturbation it will be an ergodic search of the domain. So the ability of global convergence still has to improve. This paper proposes a modified bacterial colony chemotaxis algorithm (CBCC) for BSS, combined the strategy of chaos search with the strategy of neighborhood random search, reaching to an ergodic search of the entire domain, solving the local convergence better, improving the convergence speed and separate precision further. Take the BSS algorithm based on kurtosis under the instantaneous linear model for example to do computer simulation. The results validate the superiority of CBCC by comparing with the existing ones.