ArticlePDF Available

A Secure Echo-hiding Audio Watermarking Method based on Improved PN Sequence and Robust Principal Component Analysis

Authors:

Abstract and Figures

Echo‐hiding has been widely studied for audio watermarking. This study proposes a more secure echo‐hiding method based on modified pseudo‐noise (PN) sequence and robust principal component analysis (RPCA). In the proposed method, the RPCA is used to decompose the original audio signal into low‐rank and sparse parts and then a pair of opposite modified PN sequences is employed to embed watermarks. The modified PN sequence improves the robustness of watermark detection by providing additional correlation peaks. Meanwhile, benefit from the RPCA and the opposite PN sequences, the security of the proposed method is improved since watermarks cannot be detected from the whole signal even if the PN sequence is known, which is an obvious improvement compared with the previous PN‐based echo‐hiding methods. In the watermark detection process, the authors make use of the low‐rank and sparse characteristics of the watermarked signal to detect watermarks from the low‐rank and sparse parts, respectively. Based on this basic framework, they also propose a multi‐bit embedding scheme, which obtains a doubled embedding capacity compared with the previous PN‐based echo‐hiding methods. The proposed method was evaluated with respect to inaudibility, security, and robustness. The experiment results verified the effectiveness of the proposed method.
This content is subject to copyright. Terms and conditions apply.
IET Signal Processing
Research Article
Secure echo-hiding audio watermarking
method based on improved PN sequence and
robust principal component analysis
ISSN 1751-9675
Received on 15th August 2019
Revised 26th December 2019
Accepted on 18th February 2020
E-First on 27th March 2020
doi: 10.1049/iet-spr.2019.0376
www.ietdl.org
Shengbei Wang1, Chao Wang1, Weitao Yuan1, Lin Wang2, Jianming Wang1
1Tianjin Key Laboratory of Autonomous Intelligence Technology and Systems, Tianjin Polytechnic University, 300387 Tianjin, People's Republic
of China
2Techfantasy. Co. Ltd., 300387 Tianjin, People's Republic of China
E-mail: wangjianming@tjpu.edu.cn
Abstract: Echo-hiding has been widely studied for audio watermarking. This study proposes a more secure echo-hiding method
based on modified pseudo-noise (PN) sequence and robust principal component analysis (RPCA). In the proposed method, the
RPCA is used to decompose the original audio signal into low-rank and sparse parts and then a pair of opposite modified PN
sequences is employed to embed watermarks. The modified PN sequence improves the robustness of watermark detection by
providing additional correlation peaks. Meanwhile, benefit from the RPCA and the opposite PN sequences, the security of the
proposed method is improved since watermarks cannot be detected from the whole signal even if the PN sequence is known,
which is an obvious improvement compared with the previous PN-based echo-hiding methods. In the watermark detection
process, the authors make use of the low-rank and sparse characteristics of the watermarked signal to detect watermarks from
the low-rank and sparse parts, respectively. Based on this basic framework, they also propose a multi-bit embedding scheme,
which obtains a doubled embedding capacity compared with the previous PN-based echo-hiding methods. The proposed
method was evaluated with respect to inaudibility, security, and robustness. The experiment results verified the effectiveness of
the proposed method.
1Introduction
With the rapid development of digital media and network
technology, the audio transmission has become more and more
convenient. However, illegal dissemination behaviours that ignore
copyright seriously harm the interests of audio authors. As a result,
copyright protection has lodged itself in the public mind. In order
to solve this problem, scholars want to add tags to the audio to
prove the ownership and therefore the audio watermarking
technology has been proposed [1, 2]. After so many years of
exploration, the achievements of watermarking technology have
been significantly improved.
Audio watermarking [3] has been considered as an effective
technique to prevent audio from unauthorised operations. In
general, there are several requirements for audio watermarking,
e.g. inaudibility, blindness, robustness, capacity, and security [3–5].
Inaudibility requires that the embedded watermarks should not
degrade the audio quality [6]. Blindness suggests that the
watermarks can be detected without the original audio. At present,
more and more scholars pay attention to blind watermarking.
Robustness guarantees that the embedded watermarks cannot be
destroyed by allowable audio operations such as compression.
Embedding capacity is also necessary to evaluate the watermarking
method since the copyright of the audio signal can be better
protected when more watermarks are embedded. The last and most
recent concern on watermarking is that the copyright information
contained in the watermarked signal should not be easily
discovered by the attackers, which is called security. In general, the
five requirements described above are the main criteria for
evaluating a watermarking method [7].
In the past, many audio watermarking methods have been
proposed, e.g. support vector [8, 9], phase coding [10], spread
spectrum [11, 12], techniques based on masking [13, 14],
patchwork [15, 16], echo-hiding [17–23], and so on. As one typical
audio watermarking method, echo-hiding has a simple and easy-to-
operate embedding and detection process. Besides, the watermark
detection of echo-hiding does not need the original signal (i.e. it is
a blind method). The echo-hiding method was first proposed by
Gruhl et al. [17], who described how to embed the watermarks
using one backward echo kernel and how to detect it using
cepstrum operations. As a critical factor for echo-hiding, echo
kernel plays a vital role which greatly affects the performance of
the watermarking method. Therefore, many echo-hiding algorithms
have been proposed to design more effective kernels, such as the
dual-kernel [24–26], backward–forward echo kernels [27] etc.
Although the above echo-hiding methods have a simple
embedding and detection process, they have a fatal security flaw,
since the watermarks can be easily obtained with a cepstrum
analysis of the watermarked signal. In order to overcome this
drawback, the security algorithm [18, 19] using pseudo-noise (PN)
sequence has been proposed. The PN sequence is used as a secret
key to embed multiple echoes into an original signal. Owing to
very small amplitude of each echo, there will be no obvious peak in
the power spectrum after cepstrum analysis, that is, its power
spectrum becomes nearly smooth in the mean time sense.
Therefore, it is impossible to detect the watermarks directly
through cepstrum analysis in these watermarking schemes. In order
to obtain the watermarks, the corresponding PN sequence must be
used for correlation operation after the cepstrum analysis, which
greatly increases the security of the basic echo-hiding methods [17,
24–27]. In [19], the PN sequence-based echo-hiding method was
further improved. During the watermark detection process, the
authors used real cepstrum instead of the complex cepstrum to
make the cepstrum peak more obvious. In recent years, some
modified PN sequences have been proposed for echo-hiding
methods to enhance robustness and inaudibility performance. In
[28], the PN sequence was modified to a new sequence denoted by
q(n) and three peaks were obtained after correlation, which
improved the accuracy of watermark detection. In [21], the PN
sequence was improved and more large peaks were produced after
correlation. These two schemes improved the performance of
watermarking methods by modifying the PN sequence and taking
advantage of the correlation operation.
However, the algorithms mentioned above still have the
following shortcomings:
IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
229
(i) Low security: In these methods, the PN sequence is used as a
security key for watermark detection. If the PN sequence is leaked,
the watermarks will be easily detected by the attackers.
Furthermore, many methods, e.g. [21, 28], try to obtain better
robustness by modifying the PN sequence, however, such
modifications also make the PN sequence more regular, thus the
complexity of the PN sequence is reduced which makes it easier to
crack.
(ii) Low capacity: For most PN sequence-based echo-hiding
methods, the PN sequence should be relatively long to achieve
good inaudibility and robustness performance. However, if the
length of the PN sequence is guaranteed to be long enough, the
number of watermarks that can be embedded will be reduced.
To solve the above issues, this paper proposes a more secure echo-
hiding method based on improved PN sequence and the robust
principal component analysis (RPCA) [29]. As we considered, the
root cause of the security problem in PN-based echo-hiding
methods is that these methods directly add the echoes on the
original whole signal. Therefore, watermarks can be easily detected
after cepstrum and correlation analysis of the watermarked signal
when the PN sequence is known. In this paper, the RPCA is
employed to improve the PN-based echo-hiding method. The
original signal is first decomposed into two parts, i.e. low-rank and
sparse parts, using RPCA. Watermarks are separately embedded
into them using a pair of opposite PN sequences. Accordingly, the
cepstrum and correlation analysis of whole watermarked signal can
no longer produce any obvious peaks for watermark detection even
using the correct PN sequences, since there is an opposite effect
between the correlation results of the two parts in the watermarked
signal (low-rank and sparse parts). To correctly detect the
watermarks, the proposed method takes advantage of the low-rank
and sparse characteristics of the embedded echoes. Watermarks can
be separately detected from the low-rank and sparse parts using the
correct decomposition parameter of RPCA and the corresponding
PN sequence. In particular, the PN sequence employed above is
designed by considering the correlation property of the PN
sequence and it provides two extra correlation peaks to effectively
improve the robustness of watermark detection. Based on the
above framework, this paper also implements a multi-bit
embedding scheme which achieves a doubled embedding capacity
compared with the previous PN-based echo-hiding methods.
This paper is organised as follows. Section 2 reviews typical
echo-hiding methods. Section 3 describes our proposed method
based on the improved PN sequence and the RPCA. Section 4
evaluates the inaudibility, security, and robustness of the proposed
method with a series of experiments and compares it with the
previous echo-hiding method and the PN sequence-based echo-
hiding method. In the last section, we give a summary of our work.
2Review of typical echo-hiding
The widely accepted model of echo-hiding is
y(n) = x(n)  h(n),
(1)
where x(n) is the original audio signal, h(n) is the echo kernel, y(n)
is the watermarked signal, and the operation symbol stands for
convolution. The backward echo kernel is defined as
h(n) = δ(n) + αδ(nd),
(2)
where δ(  ) is a Dirac delta function, α denotes the attenuation
amplitude of the echo, and d is the delay of the echo. To improve
the security, the PN sequence-based echo kernel [18, 19] was
proposed, which is given by
h(n) = δ(n) + α
i= 0
L− 1
p(i)δ(ndi),
(3)
where p(i)  { − 1, + 1}, 0  iL− 1 is the PN sequence of
length L. The pulse representation of PN-based echo kernel is
shown in Fig. 1. In the watermark detection process, the
watermarked signal is analysed by real cepstrum analysis of (1),
i.e.
cy(n) = cx(n) + ch(n),
(4)
where cx(n) = −1{log (x(n)) }, ch(n) = −1{log (h(n)) },
is the absolute value operation, and (  ) and −1(  ) are Fourier
transform and inverse Fourier transform, respectively. For PN-
based echo-hiding, the ch(n) can be calculated in more detail [19]
ch(n)  α
2(p( − n+d) + p(nd)) .
(5)
The PN sequence is a necessary condition for obtaining the
watermarks. The cross-correlation operation of (4) is carried out
with the PN sequence
d(τ) = E(cy(n)p(nτ))
E(cx(n)p(nτ))
+α
2E(p( − n+d)p(nτ))
+α
2E(p(nd)p(nτ)),
(6)
where E(  ) calculates the mathematical expectation. We know
from (6) that (α/2)E(p( − n+d)p(nτ)) is small and negligible,
and when τ=d, the term of (α/2)E(p(nd)p(nτ)) has a
maximum value of α/2. Hence, we can detect the watermarks by
detecting the maximum value (peak value).
In order to further improve the above algorithms, some
modified PN sequences were proposed. In [28], the modified PN
sequence was proposed, which is defined by
q(i) = p(i), if i= 0 or i=L− 1
( − 1)y(i)p(i), if 0 < i<L− 1,
(7)
where
y(i) = fix q(i− 1) + p(i− 1) + p(i) + p(i+ 1)
4
(8)
and the fix(  ) function is used to take an integer in the direction
nearest to zero. The correlation operation of q(i) is
r(τ) = E(q(id)q(iτ)) .
(9)
The calculation result is shown in the top two panels of Fig. 2. We
can find that there are three peaks at τ=d, τ=d+ 1, and
τ=d− 1. By detecting these three peaks, the robustness of
watermark detection is effectively improved.
In [21], the PN sequence is modified to
Fig. 1 Echo kernels based on PN sequence
230 IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
p
¯(i) = [p(0), p(0), …, p(0)
nr− elements
,p(1), p(1), …, p(1)
nr− elements
, …,
p(L− 1), p(L− 1), …, p(L− 1)
nr− elements
],
(10)
where nr is defined as the number of repetitions. The correlation
result of this sequence can be seen in the bottom two panels of
Fig. 2. We can obtain (2nr− 1) peaks for watermark detection.
These peaks are located at
τ= {(d− (nr− 1)), …, (d− 1), d, (d+ 1), …, (d+ (nr− 1))}.
The above two methods indeed improve the robustness of
watermark detection compared with the original PN sequence.
However, such modifications to the PN sequence reduce its
complexity and therefore make it easier to crack, i.e. the
watermarking method will be less secure. In addition, to ensure the
performance of the watermarking method, the PN sequence should
be as long as possible. However, this will reduce the embedding
capacity of the echo-hiding method. In the next section, we will
introduce the proposed echo-hiding method based on the improved
PN sequence and RPCA.
3Proposed watermarking scheme
3.1 RPCA decomposition
Although the PN sequence increases the security of the echo-hiding
method, the watermarks can be easily obtained by the attackers
when the PN sequence is known. The key reason for this is that
most of the previous echo-hiding methods (including the PN
sequence-based methods) directly add the echoes to the original
signal. Accordingly, the cepstrum of the watermarked signal can be
regarded as the addition of the cepstrum of the original signal and
that of the echo kernel, which is conducive for the attackers to
watermark detection. We considered if the echoes were not added
directly to the original signal, the watermarks will not be easily
obtained even if the PN sequence is known. As a result, the
security of the typical echo-hiding methods and the PN-based
echo-hiding methods will be further improved.
This paper introduces RPCA [29, 30] to the original PN-based
echo-hiding methods. In the proposed method, the original audio
signal is first decomposed into two parts, i.e. low-rank and sparse
parts, and then watermarks are separately embedded into them
using the proposed kernels. That is, the echoes are not added
directly to the original signal but to its sub-signals.
The process of audio decomposition based on RPCA is as
follows. Since the audio signal (denoted as x(n)) is the one-
dimensional signal, we first compute the short-time Fourier
transform (STFT) representation of x(n) in time–frequency (TF)
domain. The obtained TF representation is denoted by M F×T,
where F is the number of frequency bins and T is the number of
time bins. Note that to obtain a good decomposition effect, the
window size and hop size (half of the window size) change with
frame lengths, to ensure the obtained TF representation a square
matrix (approximately) (For example, for frame length of 5512 (i.e.
8 bps for 44.1 kHz sampled signal), we use window size of 144 and
hop size of 72. The obtained M is nearly a square matrix, where
F= 73 and T= 75. The window size and hop size for other frame
length can be calculated the same way.). The magnitude and phase
spectrograms of M are M0 F×T and P F×T, respectively.
The decomposition operation of the RPCA is performed on M0 by
solving the following convex optimisation problem using principal
component pursuit:
minimise L0+λS01
subject to M0=L0+S0,
(11)
where L0 is a low-rank matrix, S0 is a sparse matrix, and the λ
(λ> 0) is a positive parameter that controls the decomposition
balance between the low-rank and the sparse parts. The    is
the nuclear norm operation and the   1 is the 1-norm. In (11),
the L0 and S01 are separately defined as
L0=
i= 1
min ( f,t)
σi,
(12)
S01=
f,t
Sf,t,
(13)
where σi is the ith singular value of L0, f is the index of frequency
bin (1 < f<F), and t is the index of time bins (1 < t<T). When
considering the constraint condition M0=L0+S0, the objective
function in (11) can be set as follows
min
L0,S0
L0+λS01+μ
2M0L0S0F
2,
(14)
where μ is a positive scalar, the   F is the Frobenius norm. The
L0 and S0 can be obtained by solving (14) with proximal gradient
method.
The obtained L0 and S0 are then synthesised with the previously
obtained phase spectrogram P and we can obtain the time-domain
low-rank signal xl(n) and sparse signal xs(n), respectively, using
inverse STFT. Then the x(n) is decomposed as
x(n) = xl(n) + xs(n) .
(15)
Watermarks are embedded into xl(n) and xs(n), respectively. Note
that, since different λ will produce different decomposition results,
the λ can be also used as a secret key for the proposed method. In
the next section, we will describe how the proposed method
improves security during the watermark embedding and detection
process and how to increase the embedding capacity of the
proposed method with a multi-bit embedding scheme.
3.2 Watermark embedding process
When xl(n) and xs(n) are obtained, the one-bit watermark w is
duplicated as wl and ws and then separately embedded into them.
The embedding process is shown in the top panel of Fig. 3.
3.2.1 Design of the echo kernel: Before watermark embedding,
the echo kernels for low-rank and sparse parts need to be designed.
As to ensure the robustness of the proposed method, a new echo
kernel is designed. In general, the cross-correlation describes the
degree which two functions (sequences) match each other at
different relative positions. This degree can be calculated by
mathematical expectation, i.e. E(  )  [ − 1, 1]. The cross-
correlation turns into the autocorrelation when these two functions
are identical and when they are completely matched, a positive
peak will occur, i.e. E(  ) = 1. Conversely, when they are
completely mismatched, a negative peak will occur, i.e.
Fig. 2 Top two panels: left panel: the result of E(q(id)q(iτ)) and
right panel: the zoom-in observation of the left panel, where the length of
q(i) is L= 1023 and d= 154. Bottom two panels: left panel: the result of
E(p
¯(id)p
¯(iτ)) and right panel: the zoom-in observation of the left
panel, where the length of q
¯(i) is L= 1023, d= 154, and nr= 3
IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
231
E(  ) = − 1. Inspired by [21, 28], we propose a new modified PN
sequence
p
~
(n) = {p(i), p(i)},
(16)
where p(i), 0  IL− 1 can be any PN sequence, 0  n 2L− 1
and 2L is determined by the frame length and less than the frame
length. For example, when p(i) = { − 1, 1, 1, − 1} then
p
~
(n) = { − 1, 1,1, − 1, 1, − 1, − 1, 1}.
When calculating E(p
~(nd)p
~(nτ)), there will be two
negative peaks appearing at τ=dL and τ=d+L with peak
value of 0.5 (i.e. E(p
~(nd)p
~(nτ))  − 0.5) except for the
positive peak at τ=d, since E(p(id)( − p(iτ))) = − 1. As
shown in Fig. 4, we can clearly find three peaks in the correlation
result. Therefore, detection of the watermarks using these three
peaks can considerably improve the robustness.
It should be noted that the proposed PN sequence in (16) can
also be designed as
p
~(n) = {p(i), p(i)} .
(17)
Similar to the above analysis, there will be two positive peaks
appearing at τ=dL and τ=d+L with a peak value of 0.5 (i.e.
E(p
~(nd)p
~(nτ))  0.5) except for the positive peak at
τ=d. The result of E(p
~(nd)p
~(nτ)) is shown in Fig. 5. By
comparing Figs. 4 and 5, we can know that the detection
performance of p
~(n) and p
~(n) are the same. In the experiment,
we use p
~(n) to evaluate the proposed method.
Compared with the previous PN sequences, the proposed PN
sequence is also regularised after modification. However, as the
proposed PN sequence is not modified in adjacent (two or three
adjacent) positions [21, 28], but over a wide range (i.e. L), it is less
regular than the previous modified PN sequences.
3.2.2 Secure embedding with opposite kernels: To improve
the security of the proposed method, we use p
~(n) and p
~(n) to
design the echo kernel for the low-rank and sparse parts.
In addition, we use the forward and backward echo kernels [27]
to strengthen the kernels and increase the robustness of watermark
detection. The kernels for the low-rank part and sparse part are
defined as
hl(n) = δ(n) + α
j= 0
2L− 1
p
~(j)δ(ndj)
+
j= 0
2L− 1
p
~(j)δ(n+d+j),
(18)
hs(n) = δ(n) + α
j= 0
2L− 1
( − p
~(j))δ(ndj)
+
j= 0
2L− 1
( − p
~(j))δ(n+d+j).
(19)
where hl(n) is the kernel for the low-rank part, hs(n) is the kernel
for the sparse part, the j is the index of the proposed PN sequence,
d {d0,d1} in (18) and (19) are set the same for the low-rank part
and the sparse part. These two echo kernels are separately
performed on the low-rank signal xl(n) and sparse signal xs(n) using
convolution function to obtain the watermarked low-rank signal
yl(n) and the watermarked sparse signal ys(n). The final
watermarked signal is obtained by
y(n) = yl(n) + ys(n)
=xl(n)  hl(n) + xs(n)  hs(n) .
(20)
Fig. 3 Process of watermark embedding and detection
Fig. 4 Results of E(p
~(nd)p
~(nτ)), where the length of p
~(n) is
2L= 1024 and d= 154
Fig. 5 Results of E(p
~(nd)p
~(nτ)), where the length of p
~(n) is
2L= 1024 and d= 154
232 IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
3.3 Watermark detection process
3.3.1 Detection from the low-rank part and the sparse
part: The watermark detection process is shown in the bottom
panel of Fig. 3. Similar to the embedding process, the watermarked
signal y(n) is first decomposed into the low-rank part y
^l(n) and
sparse part y
^s(n) using RPCA
y(n) = y
^l(n) + y
^s(n),
(21)
where we use the same λ to decompose the watermarked signal.
Recall the watermark embedding process, since the embedded
echoes are separately generated by the low-rank part xl(n) and the
sparse part xs(n), they should have similar low-rank and sparse
properties as xl(n) and xs(n). As a result, the echoes generated by
xl(n) will be assigned to y
^l(n) and the echoes generated by xs(n) will
be assigned to y
^s(n) in watermark detection process if we use the
same λ to decompose y
^(n). Accordingly, we will have
y
^l(n)  xl(n)  hl(n),
(22)
y
^s(n)  xs(n)  hs(n) .
(23)
The real cepstrum analysis can be separately performed on the low-
rank part y
^l(n) and the sparse part y
^s(n) for watermark detection,
i.e.
cy
^l(n)  cxl(n) + chl(n)
= −1{log (xl(n)) } + −1{log (hl(n)) },
(24)
cy
^s(n)  cxs(n) + chs(n)
= −1{log (xs(n)) } + −1{log (hs(n)) } .
(25)
Since the forward and backward echo kernels are employed in the
proposed method, by referring to [27] and the derivation of (5), the
following results can be obtained
chl(n)  α
2(1 − α2)(p
~( − n+d) + p
~(nd)),
(26)
chs(n)  α
2(1 − α2)(p
~( − n+d) + p
~(nd)) .
(27)
The cross-correlation is then performed on cy
^l(n) and cy
^s(n) using
the proposed sequence p
~(n) and p
~(n), respectively (see (6))
dl(τ) = E(cy
^l(n)p
~(nτ))
E(cxl(n)p
~(nτ))
+α
2(1 − α2)E(p
~( − n+d)p
~(nτ))
+α
2(1 − α2)E(p
~(nd)p
~(nτ)),
(28)
ds(τ) = E(cy
^s(n)( − p
~(nτ)))
E(cxs(n)( − p
~(nτ)))
+α
2(1 − α2)E(p
~( − n+d)p
~(nτ))
+α
2(1 − α2)E(p
~(nd)p
~(nτ)) .
(29)
According to the analysis in Section 3.2, there are one positive
peak at τ=d and two negative peaks at τ=dL and τ=d+L in
the correlation result of the low-rank part and the sparse part, so we
use the following equations to detect the watermarks from them
d
¯l(τ) = dl(τ) − dl(τL) − dl(τ+L),
(30)
d
¯s(τ) = ds(τ) − ds(τL) − ds(τ+L) .
(31)
Since the Delta function of different delays (d0 and d1) are used to
embed ‘0’ and ‘1’, the watermark bit can be detected by comparing
the peak values at these two positions. For low-rank part, if
d
¯l(d0) > d
¯l(d1), the watermark bit for the low-rank part is detected
as w
^l= 0, otherwise as w
^l= 1. The detection process of the sparse
part is performed the same way, i.e. if d
¯s(d0) > d
¯s(d1), the
watermark bit w
^s for sparse part is detected as w
^s= 0, otherwise as
w
^s= 1. In addition, we use a small trick to improve watermark
detection performance. Here, we set the value before the delay d
(d0 or d1) of the watermarked signal to 0 to reduce the impact on the
correlation operation. This will make the peaks in the delay
position more pronounced and will improve the detection rate
(DR).
3.3.2 Final decision on the watermark bit: We can separately
detect one-bit watermark from the low-rank part and the sparse part
of one audio frame. In a general case, if the detected two
watermark bits are the same, then the final watermark bit w
^ of the
current frame is determined as the same bit, i.e.
w
^=w
^l=w
^s.
(32)
However, if the detected two bits are different from each other, i.e.
one of the following two cases happens: (i) d
¯l(d0) > d
¯l(d1) &
d
¯s(d0) < d
¯s(d1) (i.e. w
^l= 0 and w
^s= 1) or (ii) d
¯l(d0) < d
¯l(d1) &
d
¯s(d0) > d
¯s(d1) (i.e. w
^l= 1 and w
^s= 0), then the final watermark bit
of current frame is determined as follows:
(i) when d
¯l(d0) > d
¯l(d1) & d
¯s(d0) < d
¯s(d1):
w
^=w
^l,d
¯l(d0) − d
¯l(d1) > d
¯s(d1) − d
¯s(d0)
w
^s, otherwise;
(33)
(ii) when d
¯l(d0) < d
¯l(d1) & d
¯s(d0) > d
¯s(d1):
w
^=w
^s,d
¯s(d0) − d
¯s(d1) > d
¯l(d1) − d
¯l(d0)
w
^l, otherwise .
(34)
That is, we take into account the reliabilities of the detected
watermark bits from two parts to determine the final watermark bit.
Note that, similar to other previous PN sequence-based echo-
hiding methods, the proposed p
~(n) is also required to be as long
as possible to increase the inaudibility and robustness of the
proposed method, which will reduce the embedding capacity. To
address this problem, we make use of the obtained two sub-signals,
i.e. the low-rank and sparse parts, of each audio frame, to embed
more bits. We propose a multi-bit embedding scheme based on the
above basic framework. This scheme is covered in more detail in
the last part of this section.
3.4 Security analysis of the proposed method
The security of the proposed method is ensured in a way that the
watermarks cannot be detected from the whole watermarked signal.
According to (20), the cepstrum analysis of the whole watermarked
signal y(n) can be written as
cy(n) = −1{log (yl(n) + ys(n)) }
= −1{log (yl(n)) + (ys(n)) }
= −1 log (yl(n)) 1 + (ys(n))
(yl(n))
= −1 log (yl(n)) + log 1 + (ys(n))
(yl(n)) ,
(35)
or
IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
233
cy(n) = −1{log (yl(n)) + (ys(n)) }
= −1 log (ys(n)) (yl(n))
(ys(n)) + 1
= −1 log (ys(n)) + log 1 + (yl(n))
(ys(n)) ,
(36)
Then the cy(n) can be expressed as half of the sum of (35) and (36)
cy(n) = 1
2{−1(log (yl(n)) ) +  1(log (ys(n)) )
+ 1 log 1 + (ys(n))
(yl(n))
+ 1 log 1 + (yl(n))
(ys(n))
(37)
Since each audio frame has its own low-rank and sparse
characteristics, we cannot confirm the analytical expressions of the
low-rank part xl(n) and the sparse part xs(n) for each frame, and
consequently, the last two terms in (37) cannot be theoretically
analysed. Therefore, we experimentally investigated whether the
last two terms can produce correlation peaks for watermark
detection when applying the correlation operation to them.
Here, we tested two cases, i.e. watermarks were separately
embedded into the low-rank and sparse parts with: (i) different
delays (dl= 10 and ds= 15); and (ii) the same delay
(dl=ds= 10). The length of the PN sequence p
~(n) (or p
~(n))
was set as 60% of the frame length, the embedding capacity was
16 bps, a= 0.008 and λ= 0.8. We calculated the correlation
between the last two terms of (37) and the PN sequence. To obtain
the statistical results, we randomly selected 35 frames and
performed correlation on each of them. Fig. 6 shows the averaged
correlation result calculated on all 35 frames. It can be seen that
there was no peak at the delay positions no matter for different
delays or the same delay. Therefore, the last two terms of (37) have
almost negligible effect on the final correlation results.
According to the above analysis, the cy(n) in (37) can be
approximately written as
cy(n)  1
2{−1(log (yl(n)) ) +  1(log (ys(n)) )
=1
2(cyl(n) + cys(n)) .
(38)
We calculated the cross-correlation between (38) and the PN
sequences (p
~(n) and p
~(n)) to observe if there are correlation
peaks. The cross-correlation between (38) and p
~(n) is
d(τ) = E(cy(n)p
~(nτ))
1
2E(cyl(n)p
~(nτ)) + 1
2E(cys(n)p
~(nτ))
1
2E(cxl(n)p
~(nτ)) + 1
2E(cxs(n)p
~(nτ))
+α
4(1 − α2)E(p
~( − n+d)p
~(nτ))
+α
4(1 − α2)E(p
~(nd)p
~(nτ))
α
4(1 − α2)E(p
~( − n+d)p
~(nτ))
α
4(1 − α2)E(p
~(nd)p
~(nτ)) .
(39)
We can see from the above equation that the first two terms in (37)
always produce opposite peaks at the same delay positions when
they are correlated with p
~(n). These peaks are effectively
compensated in the correlation result. When using p
~(n) for
correlation, we can get the same result. Therefore, the correlation
peaks cannot be discovered from the whole watermarked signal
even the PN sequences of low-rank and sparse parts are known,
which means the watermarks cannot be detected from the whole
watermarked signal. The above analysis suggests that the proposed
method is more secure compared with the previous PN sequence-
based echo-hiding methods.
3.5 Multi-bit embedding scheme
In the above framework, the same watermark is embedded into
low-rank and sparse parts, so this does not really increase the
embedding capacity. This section describes how to embed two bits
into one frame under the proposed framework.
In general, when we embed two bits into one frame, it turns out
as the following two cases:
Case 1: Two bits are the same. In this case, the two watermark bits
can be ‘0’ & ‘0’ or ‘1’ & ‘1’. In the embedding process, we use
opposite PN sequences to separately embed them (‘0’ & ‘0’ or ‘1’
& ‘1’) into the low-rank part and the sparse part. In particular, the
delay for both the low-rank and the sparse will be set the same, that
is, we set the delay as d0 for ‘0’ & ‘0’ case and d1 for ‘1’ & ‘1’
case, respectively. An example is shown in Figs. 7a and b. As
explained in Section 3.4, it would be unable to observe the
correlation peaks when the PN sequences (positive p
~(n) or
Fig. 6 Cross-correlation results between the last two terms of (37) and the
PN sequence
(a) The cross-correlation results when embedding watermarks using different delays
(the delays are 10 and 15, respectively), (b) The cross-correlation results when
embedding watermarks using the same delay (delay is 10)
Fig. 7 Illustration of the cross-correlation results between low-rank and
sparse parts with PN sequence in multi-bit embedding scheme
(a), (b) The same two watermark bits are embedded into low-rank and sparse parts,
(c), (d) The different watermark bits are embedded into low-rank and sparse parts
234 IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
negative p
~(n)) are correlated with the whole watermarked
signal since we use the same delay but opposite PN sequences.
Therefore, this watermark embedding method is also secure. To
correctly detect the watermark bits, we apply RPCA decomposition
to the watermarked signal and watermarks (‘0’ & ‘0’ or ‘1’ & ‘1’)
can be detected by finding the correlation peaks from low-rank and
sparse parts, respectively.
Case 2: Two bits are different. In this case, the two watermark bits
can be ‘0’ & ‘1’ or ‘1’ & ‘0’. Different from case 1, we use the
same PN sequence (i.e. positive p
~(n)) but different delays to
embed watermarks into the low-rank part and the sparse part. As
shown in Figs. 7c and d, the delays for bit ‘0’ and bit ‘1’ are set as
d0 and d1, respectively. For both ‘0’ & ‘1’ and ‘1’ & ‘0’, the
correlation between the whole watermarked signal and the PN
sequence can be expressed as
d(τ) = E(cy(n)p
~(nτ))
1
2E(cyl(n)p
~(nτ)) + 1
2E(cys(n)p
~(nτ))
=1
2E(cxl(n)p
~(nτ)) + 1
2E(cxs(n)p
~(nτ))
+α
4(1 − α2)E(p
~( − n+d0)p
~(nτ))
+α
4(1 − α2)E(p
~(nd0)p
~(nτ))
+α
4(1 − α2)E(p
~( − n+d1)p
~(nτ))
+α
4(1 − α2)E(p
~(nd1)p
~(nτ)),
(40)
i.e. there are always two peaks of the similar values appearing at
τ=d0 and τ=d1 in d(τ). Therefore, it is difficult to detect the
watermarks from the whole watermarked signal by comparing the
peaks at d0 and d1. Similar to case 1, we can correctly detect the
watermarks (‘0’ & ‘1’ or ‘1’ & ‘0’) by comparing the peak
positions in low-rank and sparse parts after RPCA decomposition.
Using the above scheme, we can not only improve the capacity of
watermark embedding but also guarantee the security of the multi-
bit embedding method.
3.6 Frame synchronisation
The watermark detection process works with an assumption that
each frame is synchronised. In the proposed method, frame
synchronisation is realised as follows. Starting from the first audio
sample of the received watermarked signal, we take a frame-length
segment as a calculation unit (indexed by k) and apply RPCA
decomposition, cepstrum calculation, and correlation analysis. The
correlation values at delay positions (d0 and d1) of the low-rank
signal (denoted by Pl0
k and Pl1
k) and those of the sparse signal
(denoted by Ps0
k and Ps1
k) are recorded. The synchronisation
coefficient c
k for the kth unit is calculated by
c
k= max (Pl0
k,Pl1
k) × max (Ps0
k,Ps1
k) .
(41)
Then we move to the next audio sample and repeat this process
until the end of the signal. A synchronisation vector c is obtained
c= {c
1, c
2, …, c
k, …, c
K}, K=LF+ 1,
(42)
where K is the unit number, L is the length of the watermarked
signal, F is the frame length. If a calculation unit perfectly matches
a correct frame, its synchronisation coefficient will be a local
maximum in c. We extract the local-maximum coefficients of c
using multi-scale searching and reset the non-maximum
coefficients to 0. The final synchronisation vector
^
c is normalised
to [0, 1], which indicates the position of each frame.
We calculate frame synchronisation results for three typical
cases: (i) normal watermarked signal; (ii) watermarked signal with
insertion between the 14th and 15th frames; and (iii) watermarked
signal with cropping the 15th frame. The synchronisation results
are shown in Fig. 8. One can see that each frame can be correctly
synchronised for case (i) (see Fig. 8b). For cases (ii) and (iii) all the
intact frames are correctly synchronised while the inserted or
cropped segments cannot be synchronised (see Figs. 8c and d).
4Evaluations
4.1 Database
The experimental database we used was RWC Music Database
[31], which contains 102 dual-channel audio files sampled at the
rate of 44.1 kHz and quantised with 16 bits. We took 10 s from
each audio as experimental material and all watermarking
operations were performed on one channel only. There are many
parameters in the proposed method that affect its performance, e.g.
the length of the PN sequence, the delays, the attenuation
amplitude, the embedding capacity (bps), the RPCA decomposition
parameter etc. In the experiments, the length of PN sequence (i.e.
p
~(n) and p
~(n)) was fixed at 60% of the frame length. The
delays used for the low-rank and sparse parts were fixed at
d= [d0,d1] = [10, 20]. We adjusted one parameter each time and
then observed how the experimental results change with it.
4.2 Evaluations for inaudibility
The signal-to-noise ratio (SNR), the log-spectrum distortion (LSD)
[32], and the perceptual evaluation of audio quality (PEAQ) [33,
34] were used to measure the inaudibility of the proposed method.
The calculation of SNR is as follows:
SNR(dB) = 10 × log10
nx(n)2
n(y(n) − x(n))2,
(43)
where x(n) stands for the original signal and y(n) stands for the
watermarked signal. The higher the value of SNR, the better the
audio quality.
The LSD can calculate the logarithmic spectrum distance
between the spectra of the original signal and the watermarked
signal, which is defined as
Fig. 8 Illustration of frame synchronisation
(a) Watermarked signal, frame synchronisation results of, (b) Normal watermarked
signal, (c) Watermarked signal with insertion, (d) Watermarked signal with cropping
IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
235
DLS(dB) = 1
2ππ
π
10log10
Py(ω)
Px(ω)
2
dω,
(44)
where Px(ω) is the spectrum of the original signal and Py(ω) is the
spectrum of the watermarked signal. The LSD result <1.0 dB
indicates less distortion of the watermarked signal.
The PEAQ is fully compliant to ITU-R BS.1387 covering the
applicability to high-quality audio signals with sampling
frequencies of 44.1–48 kHz. The PEAQ works out a value called
objective difference grade (ODG), by comparing the perceptual
quality of the original audio signal and the watermarked audio
signal. The ODG ranges from −4 (very annoying) to 0
(imperceptible). The threshold of PEAQ is −1 ODG. We observed
the inaudibility performance of the proposed method measured by
SNR, LSD, and PEAQ under different parameter settings.
4.2.1 Attenuation amplitude: For echo-hiding based methods, the
attenuation amplitude has a strong effect on inaudibility. Therefore,
we evaluated the inaudibility performance of the proposed method
as a function of attenuation amplitude. In this experiment, the
embedding capacity was fixed at 16 bps, the RPCA decomposition
parameter as fixed at λ= 0.8, and the attenuation amplitudes were
set as [0.002, 0.005, 0.008, 0.01, 0.04].
Since the kernels for both low-rank and sparse parts have
attenuation amplitudes (see (18) and (19)), we conducted three
experiments. In the first experiment, the attenuation amplitude of
the low-rank part (denoted as αl) was changed while the attenuation
amplitude of the sparse part (denoted as αs) was fixed at
αs= 0.008. In the second experiment, the αs was changed while the
αl was fixed at αl= 0.008. In the third experiment, the αl and αs
were changed simultaneously. The results of the first experiment
are shown in Table 1 (I). Obviously, when αl increased, the values
of SNR and PEAQ decreased, and the values of LSD increased,
that is, the sound quality of the watermarked signal decreased.
Table 1 (II) shows the results of the second experiment, which
were similar to the first experiment. Finally, we can see the results
of the third experiment from Table 1 (III) that the quality of
watermarked signal decreased when αl and αs increased. Overall,
the sound quality of the watermarked signals degraded when the
attenuation amplitude increased.
4.2.2 Decomposition parameter: The decomposition parameter
λ controls the balance between the low-rank part and the sparse
part. When λ becomes larger, more information will be assigned to
the low-rank part and vice versa. We checked the inaudibility of the
proposed method using λ= [0.2, 0.4, 0.6, 0.8, 1.0], where the
embedding capacity was 16 bps and the attenuation amplitudes for
the low-rank and sparse parts were αl=αs= 0.008. The
experimental results are shown in Table 2. When λ increased, the
SNR, LSD, and PEAQ became better. In particular, the SNR and
PEAQ showed more significant changes.
4.2.3 Embedding capacity (bps): In general, when more
watermarks are embedded in a signal, the inaudibility will be
degraded. Therefore, we observed the inaudibility performance of
the proposed method under different embedding capacities. In the
experiment, we set the embedding capacity as 4, 8, 16, 32, 64, and
128 bps, αl=αs= 0.008, and λ= 0.8. According to Table 3, one
can see that the sound quality of the watermarked signal was
improved as embedding capacity increased, which was very
different from most general watermarking methods. The reason for
this was that when the embedding capacity increased, the PN
sequence became shorter, that is, the number of added echoes was
reduced, leading to better inaudibility. However, as the embedding
capacity increased, the DR decreased, which will be shown in later
experiments.
We also evaluated the inaudibility of the multi-bit embedding
scheme under varied embedding capacity, where αl=αs= 0.008
and λ= 0.8. The results are shown in Table 4. Compared with
Table 3, the inaudibility of the multi-bit embedding scheme did not
change much when the embedding capacity increased.
According to all these results, it is easy to find that the proposed
method could satisfy inaudibility when the attenuation amplitudes
of low-rank and sparse parts satisfied αl=αs 0.01, the
decomposition parameter satisfied λ 0.6, and the embedding
Table 1Inaudibility under different attenuation amplitude setting: (I) inaudibility affected by αl when αs= 0.008, (II) inaudibility
affacted by αs when αl= 0.008, and (III) inaudibility affected by αl and αs
The attenuation amplitude (I) αs= 0.008 (II) αl= 0.008 (III) αl=αs
αl/αs/αl & αsSNR, dB LSD, dB PEAQ (ODG) SNR, dB LSD, dB PEAQ (ODG) SNR, dB LSD, dB PEAQ (ODG)
0.002 13.4466 0.4052 −0.2843 26.1629 0.3313 0.0885 26.5489 0.1637 0.1363
0.005 14.2300 0.4062 −0.2504 19.8797 0.3678 −0.0440 18.9918 0.3164 −0.0455
0.008 14.9641 0.4317 −0.2307 14.9641 0.4317 −0.2307 14.9641 0.4317 −0.2307
0.01 15.3917 0.4566 −0.2249 12.6564 0.4774 −0.3493 13.0391 0.4950 −0.3397
0.04 12.7118 0.8463 −0.6378 −0.5949 0.8947 −1.7957 1.0339 0.9056 −1.5929
Table 2Inaudibility under different decomposition parameters λ
Decomposition parameter λSNR, dB LSD, dB PEAQ (ODG)
0.2 10.9375 0.5409 −0.4695
0.4 11.1205 0.4783 −0.4498
0.6 12.1600 0.4455 −0.3694
0.8 14.9641 0.4317 −0.2307
1.0 19.2594 0.4088 −0.1256
Table 3Inaudibility under different embedding capacities (bps)
Embedding capacity, bps SNR, dB LSD, dB PEAQ (ODG)
4 8.6697 0.7356 −0.6610
8 13.2167 0.5480 −0.3297
16 14.9641 0.4317 −0.2307
32 16.0787 0.3729 −0.1580
64 19.9342 0.3161 0.0009
128 27.4793 0.2319 0.0938
236 IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
capacity was higher than 8 bps. For the multi-bit embedding
scheme, when the embedding capacity was higher than 16 × 2 bps,
we obtained good inaudibility.
4.3 Evaluations for security
4.3.1 Security of the proposed method: The DR is often used to
evaluate whether the watermarks can be detected and thus it can
verify the security of the proposed method. The DR is defined as
DR = Number of correctly detected watermarks
Number of total watermarks × 100% .
(45)
To prove the security of the proposed method, three different
embedding schemes were compared to the proposed method. In
these experiments, we set αl=αs= 0.008 and λ= 0.8. The
embedding capacities were set as 4, 8, 16, 32, 64, and 128 bps. The
other key parameters, i.e. the delays and the PN sequences, were
taken as variables to implement different embedding schemes.
In the embedding process, we decomposed the original audio
signal into low-rank and sparse parts and then separately embedded
watermarks into them using the following four schemes:
Scheme I: opposite PN sequences and different delays for low-
rank and sparse parts (dl= [10, 20] and ds= [15, 25]);
Scheme II: the same PN sequence and different delays for low-
rank and sparse parts (dl= [10, 20] and ds= [15, 25]);
Scheme III: the same PN sequence and the same delays for low-
rank and sparse parts (dl=ds= [10, 20]);
Scheme IV: opposite PN sequences and same delays for low-
rank and sparse parts (dl=ds= [10, 20]), i.e. the proposed
method.
We detected the watermarks from the whole watermarked signals
and calculated the DRs for each embedding scheme. The
experimental results are compared in Fig. 9. It can be seen from
Figs. 9ac that the watermarks could be correctly detected from the
whole watermarked signal with the first three embedding schemes,
indicating that these embedding schemes were not secure. In
particular, the DRs of scheme I and scheme II were very high at
low embedding capacities. These results suggested that there were
obvious peaks in the correlation results between the whole
watermarked signal and PN sequences (different or same) when the
delays for the low-rank part and sparse part were different. The
DRs of scheme III were even higher than scheme I and scheme II,
owing to the strengthened embedding, i.e. the same PN sequence
and the same delays. Compared with these embedding schemes,
watermarks could not be detected from the whole watermarked
signal with either the PN sequence of the low-rank part or the PN
sequence of the sparse part in scheme IV, as shown in Fig. 9d.
These results suggested that the proposed method was more secure.
4.3.2 Security of the multi-bit embedding scheme: We also
observed the watermark detection performance of the multi-bit
embedding scheme. In the experiment, we set αl=αs= 0.008 and
λ= 0.8. The embedding capacities were twice of the above
experiments, i.e. 4 × 2, 8 × 2, 16 × 2, 32 × 2, 64 × 2, and
128 × 2 bps. The results in Fig. 10 demonstrate that the watermarks
could not be detected from the whole watermarked signal.
Therefore, the multi-bit embedding scheme did not compromise the
security of the proposed method.
Table 4Inaudibility of the multi-bit embedding scheme under different embedding capacities (bps)
Embedding capacity, bps SNR, dB LSD, dB PEAQ (ODG)
4 × 2 6.9664 0.7595 −0.7829
8 × 2 11.7453 0.5775 −0.3957
16 × 2 13.4406 0.4587 −0.2959
32 × 2 14.5109 0.3917 −0.2212
64 × 2 18.3256 0.3278 −00338
128 × 2 26.0395 0.2516 0.0762
Fig. 9 DRs of four different embedding schemes (Scheme I–Scheme IV) when applying correlation to the whole watermarked signal, where the black bar
shows the DRs when using the PN sequence of the low-rank part for correlation and the white bar shows the DRs when using the PN sequence of the sparse
part for correlation. For Scheme II and Scheme III, the PN sequences for the low-rank part and the sparse part were the same
(a) Scheme I, (b) Scheme II, (c) Scheme III, (d) Scheme IV
Fig. 10 DRs of the multi-bit embedding scheme when applying correlation
to the whole watermarked signal, where the black bar shows the DRs of the
low-rank part and the white bar shows the DRs of the sparse part
IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
237
We also exhibited the cross-correlation results of two different
embedding cases, i.e. Case 1 and Case 2, where the correlation was
performed between the whole watermarked signal and the PN
sequences. According to Figs. 11a and b, there were no obvious
peaks in the correlation results for Case 1, i.e. the watermark bits
could not be detected from the whole watermarked signal. The
correlation results for Case 2 are shown in Figs. 11c and d. As
predicted, there were always two peaks at the delay positions and it
was difficult to infer the watermarks from such results. Therefore,
the multi-bit embedding scheme was also secure and the
watermarks could not be detected from the whole watermarked
signal.
4.4 Evaluations for robustness
4.4.1 Evaluations for normal detection: This section evaluated
the normal detection ability of the proposed method under different
parameter settings. Watermarks were embedded into the low-rank
part and the sparse part and then detected from them, respectively.
(i) The attenuation amplitude: The experiments were also divided
into three sub-experiments. Similar to the inaudibility experiments
in Section 4.2.1, the attenuation amplitudes of the low-rank part,
the sparse part, and both low-rank and sparse parts were separately
controlled. Here, we set the attenuation amplitudes as [0.002,
0.005, 0.008, 0.01, 0.04], the embedding capacity as 16 bps, and λ
as 0.8. The results are shown in Fig. 12. It is found from Figs. 12a
and b that when the amplitude of the low-rank part (or the sparse
part) increased, its own DRs increased while the other part's DRs
Fig. 11 Correlation results of two different embedding cases: (a) and (b) are the results when embedding the same bits (‘0’ & ‘0’ and ‘1’ & ‘1’) to low-rank
part and sparse part, (c) and (d) are the results when embedding different bits (‘0’ & ‘1’ and ‘1’ & ‘0’) to low-rank part and sparse part
(a) Low-rank: ‘0’ bit, Sparse: ‘0’ bit, (b) Low-rank: ‘1’ bit, Sparse: ‘1’ bit, (c) Low-rank: ‘0’ bit, Sparse: ‘1’ bit, (d) Low-rank: ‘1’ bit, Sparse: ‘0’ bit
Fig. 12 DRs under different attenuation amplitude settings, where ‘Low-rank’ denoted the DRs calculated from the low-rank part, ‘Sparse’ denoted the DRs
calculated from the sparse part, and ‘Overall’ denoted the DRs calculated by considering the reliabilities of the detected watermark bits from two parts using
(32) to (34)
(a) The attenuation amplitude of sparse part is 0.008, (b) The attenuation amplitude of low − rank part is 0.008, (c) The attenuation amplitudesof low-rank and sparse partsare set the
same,which are 0.002, 0.005, 0.008, 0.01, 0.04
Fig. 13 DRs under different λ, where ‘Low-rank’ denoted the DRs
calculated from the low-rank part, ‘Sparse’ denoted the DRs calculated
from the sparse part, and ‘Overall’ denoted the DRs calculated by
considering the reliabilities of the detected watermark bits from two parts
using (32) to (34)
238 IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
decreased. When the al and as were set as 0.008 and 0.01,
respectively, both low-rank part and sparse part had good DRs. The
overall DRs were always close to 100% by considering the
reliabilities of the detected watermark bits from two parts (see (32)
to (34)), which is an advantage of the proposed method. From
Fig. 12c, one can see that when the attenuation amplitudes of both
low-rank and sparse parts increased simultaneously, their DRs were
improved. However, when the attenuation amplitudes were too
large, their DRs started to decline. For ‘Overall’ case, the DRs in
Fig. 12c were always satisfactory, except for extremely small
amplitude (0.002). Furthermore, these results also suggested that
the proposed method was feasible, since the low-rank part and the
sparse part could be successfully separated out from the whole
watermarked signal and watermarks could be correctly detected
from them, by taking advantage of the low-rank and sparse
characteristics of the embedded echoes (see (22) and (23)).
(ii) The decomposition parameter: In this experiment, the λ of
different values were used to embed and detect the watermarks.
The range of λ was set as [0.2, 0.4, 0.6, 0.8, 1.0], the embedding
capacity was 16 bps, and αl=αs= 0.008. According to Fig. 13, the
DRs of low-rank part increased while the DRs of the sparse part
decreased, when λ increased. This was because that the low-rank
part contained more information when λ became larger and thus led
to better robustness. As a result, there exists a DR trade-off
between the low-rank part and the sparse part. According to
Fig. 13, we could obtain a well-balanced DR results for both low-
rank part and the sparse part when λ was around 0.6 to 1.0. In
addition, the ‘Overall’ DRs were always close to 100% for all λ.
(iii) The embedding capacity: We evaluated the watermark
detection performance of the proposed method under different
embedding capacities. Similar to the previous evaluations, the
embedding capacities were set as 4, 8, 16, 32, 64, 128 bps,
αl=αs= 0.008, and λ= 0.8. The results are shown in Fig. 14. It is
found that the DRs decreased with the increase of the embedding
capacity. The main reason for this was that when the embedding
capacity increased, the length of the PN sequence became smaller,
which affected the watermark detection. By comparing these
results and the previous inaudibility results in Table 3, we could
obtain satisfactory robustness and inaudibility performance of the
proposed method when embedding capacity was lower than 64 bps.
We also observed the DR results of the multi-bit embedding
method under different embedding capacities. The results are
shown in Fig. 15. Compared with Fig. 14, the DR of the multi-bit
embedding method also decreased as the embedding capacity
increased. Besides, since no optimisation was used in watermark
detection as those in the basic framework ((32) to (34)), the multi-
bit embedding method did not show better robustness than the
proposed method. In our future work, the optimisation detection for
the multi-bit embedding method will be explored.
4.4.2 Robustness against attacks: We evaluated the robustness
of the proposed method against different attacks, including
Gaussian-noise addition of SNR of 20 dB (GNA), band-pass
filtering (BPF), re-sampling with 16 and 22 kHz (Res. 16 and Res.
22), re-quantisation with 8 bits and 32 bits (Req. 8 and Req. 32),
MP3 compression with 128 and 64 kbps mono (MP3. 128 and
MP3. 64). The experimental results are shown in Table 5. The
proposed method was basically robust against these attacks at 4, 8
and 16 bps. However, for MP3 compression with 64 kbps mono,
GNA, and BPF at high embedding capacities, the robustness
needed to be improved.
4.5 Comparison with previous echo-hiding methods
Finally, we compared the proposed method with two typical
previous echo-hiding methods: the backward echo-hiding method
and the PN sequence-based method [18, 19].
Fig. 14 DRs under different embedding capacities, where ‘Low-rank’ denoted the DRs calculated from the low-rank part, ‘Sparse’ denoted the DRs
calculated from the sparse part, and ‘Overall’ denoted the DRs calculated by considering the reliabilities of the detected watermark bits from two parts
Fig. 15 DRs of the multi-bit embedding method under different embedding capacities
IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
239
4.5.1 Comparison with backward echo-hiding method: For a
fair comparison, we ensured the similar DRs of two methods by
adjusting their attenuation amplitudes and then compare their
inaudibility performance. In the experiment, both two methods had
the same delays (d= [10, 20]), the λ of the proposed method was
set as 0.8, and the embedding capacity were 4, 8, and 16 bps.
First, we compare the watermark detection and security
performance of the proposed method and the backward echo-
hiding method. The DRs of two methods were controlled to be
similar in normal detection, as shown in Fig. 16a. In Fig. 16b, we
can find that the DRs of the proposed method were much lower
than the backward echo-hiding method when detecting the
watermarks over the whole watermarked signal. These results
suggested that the proposed method was more secure than the
backward echo-hiding method.
When the DRs of the two methods were similar (see Fig. 16a),
we compared the inaudibility of two methods, as shown in Fig. 17.
The proposed method had similar SNR results and LSD results
compared with the backward echo-hiding method. In particular, the
PEAQ results of the proposed method were obviously better than
the backward echo-hiding method. Since the PEAQ is more
accurate in evaluating the audio quality compared to SNR and
LSD, we can conclude that the proposed method could provide
much better inaudibility performance than the backward echo-
hiding method while maintaining the similar DRs.
4.5.2 Comparison with previous PN-based echo-hiding
method: Since both the proposed method and the original PN-
based method [18, 19] are implemented with the PN sequence, we
could easily maintain the similar inaudibility of them and then
compare their watermark detection and security performance. In
the experiment, both methods were implemented using the PN
sequence of the same length (60% of the frame length) and the
same delays (d= [10, 20]). The λ of the proposed method was set
as 0.8. We compared the results on different embedding capacities
of 4, 8, and 16 bps. The inaudibility results of the two methods are
shown in Fig. 18 and the DRs under similar inaudibility are shown
in Fig. 19a. One can see that the proposed method and the previous
PN-based echo-hiding method had the similar inaudibility and DRs
results.
Moreover, we compared the security performance of these two
methods. The DRs are shown in Fig. 19b. For the previous PN-
based echo-hiding, we can correctly detect the watermarks from
the whole watermarked signal using the correct PN sequence (i.e.
the PN sequence used for watermark embedding). In contrast,
watermarks could not be detected from the whole watermarked
signal generated by the proposed method even using the correct PN
sequences of the low-rank part and the sparse part. These results
suggested that the proposed method was more secure than the
previous PN-based echo-hiding method.
Finally, we compared the proposed method with four typical
PN-based echo-hiding methods. Embedding capacity follows their
original implementations. Robustness against several common
attacks of these methods are shown in Table 6. These results
suggested that the proposed method which greatly improved the
security performance also achieved similar or even better
robustness than the other PN-based echo-hiding methods.
Table 5Robustness of the proposed method against different attacks, where inaudibility at each embedding capacity was
given
Inaudibility (SNR (dB), LSD (dB), PEAQ (ODG))
Capacity 4 bps 8 bps 16 bps 32 bps 64 bps 128 bps
Watermarked SNR LSD PEAQ SNR LSD PEAQ SNR LSD PEAQ SNR LSD PEAQ SNR LSD PEAQ SNR LSD PEAQ
signal 8.67 0.74 −0.66 13.22 0.55 −0.33 14.96 0.43 −0.23 16.08 0.37 −0.16 19.93 0.32 0.00 27.48 0.23 0.09
Detection rate, %
Capacity 4 bps 8 bps 16 bps 32 bps 64 bps 128 bps
Attacks Low-
rank
Sparse Overall Low-
rank
Sparse Overall Low-
rank
Sparse Overall Low-
rank
Sparse Overall Low-
rank
Sparse Overall Low-
rank
Sparse Overall
GNA 59.39 92.75 89.93 74.61 87.10 84.52 76.83 80.55 80.97 78.06 76.12 78.62 73.87 62.93 68.24 59.35 59.74 55.07
BPF 59.49 96.18 97.60 76.89 87.72 93.86 78.11 73.46 81.05 67.95 65.31 63.94 63.57 54.01 55.03 53.88 55.02 50.41
res. 16 69.02 99.02 98.50 84.30 95.21 93.95 76.42 85.23 80.80 67.69 76.61 65.18 61.04 59.24 51.71 54.90 60.71 51.49
res. 22 74.22 99.44 99.61 89.56 97.10 99.18 87.10 87.80 91.52 78.92 79.55 76.43 68.77 62.51 55.86 58.14 61.54 52.52
req. 8 64.88 98.53 98.19 81.83 96.00 96.24 82.37 86.76 88.66 75.17 77.43 73.91 66.25 60.23 57.51 57.43 59.97 52.24
req. 32 73.97 98.87 100.0 90.80 98.92 100.0 95.95 94.96 99.78 94.87 85.89 95.51 85.54 67.45 77.66 65.39 62.86 57.51
MP3. 128 70.51 98.53 99.71 86.73 95.25 98.93 87.41 82.71 88.87 73.66 73.39 70.19 66.42 58.47 55.50 57.04 58.58 51.71
MP3. 64 65.83 93.73 94.78 74.36 78.65 80.4 68.82 65.88 62.55 60.44 59.46 54.18 57.01 52.24 50.80 52.45 53.08 50.28
Fig. 16 Comparison on watermark detection and security between the
proposed method and the backward echo-hiding method
(a) The DRs the proposed method and the backward echo-hiding method were
controlled to be similar, (b) The DRs of the proposed method was much lower than the
backward echo-hiding method when detecting the watermarks over the whole
watermarked signal, which suggested that the proposed method was more secure
Fig. 17 Inaudibility comparison between the proposed method and the
backward echo-hiding method, where the proposed method outperformed
the backward echo-hiding method on inaudibility while maintaining the
similar DRs (see Fig. 16a)
240 IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
5Conclusions
This paper proposed a more secure echo-hiding audio
watermarking based on improved PN sequence and the RPCA. In
the proposed method, the RPCA was used to decompose the
original audio signal into low-rank and sparse parts and then a pair
of opposite PN sequences was employed to embed watermarks into
them, which greatly improved the security of the proposed method.
In the watermark detection process, we took advantage of the low-
rank and sparse characteristics of the embedded echoes and then
separately detected watermarks from them. Experimental results
revealed that the proposed method had good inaudibility
performance and provided better security compared with the
previous echo-hiding method and the PN-based echo-hiding
method. The overall DR of the proposed method was always
satisfactory by considering the reliabilities of the detected
watermark bits from low-rank and sparse parts. Furthermore, the
embedding capacity of the proposed method could be considerably
increased compared with the previous PN-based methods by using
the multi-bit embedding scheme.
6Acknowledgments
This work was supported by the National Natural Science
Foundation of China (61902280 and 61373104), the Natural
Science Foundation of Tianjin (19JCYBJC15600 and
18JCYBJC15300), and the Scientific Research Project of Tianjin
Education Commission (19PTZWHZ00020). It was also supported
by the Program for Innovative Research Team in University of
Tianjin (TD13-5032) and Tianjin Major Project for Civil-Military
Integration of Science and Technology (18ZXJMTG00260).
7References
[1] Diego, R., Ballesteros, L.D.M., Camilo, L.: ‘Authenticity verification of audio
signals based on fragile watermarking for audio forensics’, Expert Syst. Appl.,
2018, 91, pp. 211–222
[2] Liu, Z., Huang, Y., Huang, J.: ‘Patchwork-based audio watermarking robust
against de-synchronization and recapturing attacks’, IEEE Trans. Inf.
Forensics Secur., 2019, 14, (5), pp. 1171–1180
[3] Hua, G., Huang, J., Shi, Y.Q., et al. ‘Twenty years of digital audio
watermarking – a comprehensive review’, Signal Process., 2016, 128, pp.
222–242
[4] Kondo, K.: ‘Towards estimation of quality of watermarked audio signal using
objective measures’. Proc. Int. Conf. Intelligent Information Hiding and
Multimedia Signal Processing, Beijing, People's Republic of China, 2013, pp.
279–282
[5] Subhashini, R., Bagan, K.B.: ‘Robust audio watermarking for monitoring and
information embedding’. Proc. Int. Conf. Signal Processing, Communication
and Networking (ICSCN), Chennai, India, 2017, pp. 1–4
[6] Mustapha, H., Bachir, B., David, M., et al.: ‘Adjustable audio watermarking
algorithm based on DWPT and psychoacoustic modeling’, Multimed. Tools
Appl.., 2018, 77, (10), pp. 11693–11725
[7] Kanhe, A., Gnanasekaran, A.: ‘Robust image-in-audio watermarking
technique based on DCT-SVD transform’, EURASIP J. Audio Speech Music
Process., 2018, 2018, p. 16
[8] Verma, V.S., Bhardwaj, A., Jha, R.K.: ‘A new scheme for watermark
extraction using combined noise-induced resonance and support vector
machine with PCA based feature reduction’, Multimed. Tools Appl., 2019, 78,
pp. 23203–23224
[9] Kanhe, A., Gnanasekaran, A.: ‘Security of electronic patient record using
imperceptible DCT-SVD based audio watermarking technique’, Int. J.
Electron. Telecommun., 2019, 65, (1), pp. 19–24
Fig. 18 Inaudibility comparison between the proposed method and the previous PN-based echo-hiding method
Fig. 19 Comparison on watermark detection and security between the proposed method and the previous PN-based echo-hiding method
(a) The proposed method and the previous PN-based echo-hiding method had similar DRs under similar inaudibility (see Fig. 18), (b) The DRs of the proposed method was much
lower than the previous PN-based echo-hiding when detecting the watermarks over the whole watermarked signal using the corresponding PN sequences, i.e. the proposed method
was more secure than the previous PN-based echo-hiding method
Table 6Comparison of the proposed method with other PN-based echo-hiding methods, where ‘–’ means corresponding
results were not provided by the method
Detection rate, %
Method MPN [35] DUAL [36] OPT [37] HRS [38] Proposed
capacity 1 bps 1 bps 5 bps 10 bps 4 bps
normal 97.23 99.57 99.97 96.5 99.26
req. 8 88.36 91.46 96.73 85.8 98.19
GNA 74.34 82.67 93.94 76.0 89.93
MP3. 128 95.14 97.03 93.41 90.8 99.71
BPF 91.8 97.60
IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
241
[10] Ngo, N.M., Kurkoski, B.M., Unoki, M.: ‘Robust and reliable audio
watermarking based on dynamic phase coding and error control coding’. Proc.
Int. Conf. European Signal Processing Conf. (EUSIPCO), Nice, France, 2015,
pp. 2276–2280
[11] Xiang, Y., Natgunanathan, I., Peng, D., et al.: ‘Spread spectrum audio
watermarking using multiple orthogonal PN sequences and variable
embedding strengths and polarities’, IEEE Trans. Audio Speech Lang.
Process., 2018, 26, (3), pp. 529–539
[12] Galajit, K., Karnjana, J., Aimmanee, P., et al.: ‘Digital audio watermarking
method based on singular spectrum analysis with automatic parameter
estimation using a convolutional neural network’. Proc. Int. Conf. Intelligent
Information Hiding and Multimedia Signal Processing, Sendai, Japan, 2018,
pp. 63–73
[13] Tiwari, A., Jain, L.: ‘Digital audio watermarking using frequency masking
technique’, Int. J. Comput. Appl., 2015, 126, (4), pp. 1–7
[14] Zebbiche, K., Khelifi, F., Loukhaoukha, K.: ‘Robust additive watermarking in
the DTCWT domain based on perceptual masking’, Multimed. Tools Appl.,
2018, 77, (16), pp. 21281–21304
[15] Natgunanathan, I., Xiang, Y., Rong, Y., et al.: ‘Robust patchwork-based
embedding and decoding scheme for digital audio watermarking’, IEEE
Trans. Audio Speech Lang. Process., 2012, 20, (8), pp. 2232–2239
[16] Natgunanathan, I., Xiang, Y., Hua, G., et al.: ‘Patchwork-based multilayer
audio watermarking’, IEEE Trans. Audio Speech Lang. Process., 2017, 25,
(11), pp. 2176–2187
[17] Gruhl, D., Lu, A., Bender, W.: ‘Echo hiding’. Information Hiding, First Int.
Workshop, Cambridge, UK, 1996, pp. 293–315
[18] Ko, B.-S., Nishimura, R., Suzuki, Y.: ‘Time-spread echo method for digital
audio watermarking using PN sequences’. Proc. Int. Conf. Acoustics, Speech,
and Signal Processing, ICASSP, Orlando, FL, USA, 2002, pp. 2001–2004
[19] Ko, B.-S., Nishimura, R., Suzuki, Y.: ‘Time-spread echo method for digital
audio watermarking’, IEEE Trans. Multimed., 2005, 7, (2), pp. 212–221
[20] Hu, P., Peng, D., Yi, Z., et al.: ‘Robust time-spread echo watermarking using
characteristics of host signals’, Electron. Lett., 2016, 52, (1), pp. 5–6
[21] Natgunanathan, I., Xiang, Y., Pan, L., et al.: ‘Robustness and embedding
capacity enhancement in time-spread echo-based audio watermarking’. Proc.
Int. Conf. Industrial Electronics and Applications (ICIEA), Hefei, People's
Republic of China, 2016, pp. 1536–1541
[22] Shu, N., Kotaro, S., Senya, K.: ‘Audio watermark sharing based on time
spread echo method’, IEICE Technical Report; IEICE Tech. Rep., 2017, 117,
(282), pp. 23–26
[23] Shoya, O., Shunsuke, A., Takeru, M., et al.: ‘A study on the system of
detecting falsification for conference records using echo spread method and
octave similarity’, IEICE Technical Report; IEICE Tech. Rep., 2019, 118,
(478), pp. 57–64
[24] Oh, H., Seok, J.W., Hong, J.W., et al.: ‘New echo embedding technique for
robust and imperceptible audio watermarking’. Proc. Int. Conf. Acoustics,
Speech, and Signal Processing, ICASSP, Salt Lake City, UT, USA, 2001, pp.
1341–1344
[25] Xu, C., Wu, J., Sun, Q., et al.: ‘Applications of digital watermarking
technology in audio signals’, J. Audio Eng. Soc., 1999, 47, (10), pp. 805–812
[26] Oh, H., Kim, H.W., Seok, J.W., et al.: ‘Transparent and robust audio
watermarking with a new echo embedding technique’. Proc. Int. Conf.
Multimedia and Expo, ICME, Tokyo, Japan, 2001
[27] Kim, H.J., Choi, Y.H.: ‘A novel echo-hiding scheme with backward and
forward kernels’, IEEE Trans. Circuits Syst. Video Techn., 2003, 13, (8), pp.
885–889
[28] Xiang, Y., Peng, P., Natgunanathan, I., et al.: ‘Effective pseudo noise
sequence and decoding function for imperceptibility and robustness
enhancement in time-spread echo-based audio watermarking’, IEEE Trans.
Multimed., 2011, 13, (1), pp. 2–13
[29] Candès, E.J., Li, X., Ma, Y., et al.: ‘Robust principal component analysis?’, J.
ACM, 2011, 58, (3), pp. 1–37
[30] Huang, P.-S., Chen, S.D., Smaragdis, P., et al.: ‘Singing-voice separation from
monaural recordings using robust principal component analysis’. Proc. Int.
Conf. Acoustics, Speech and Signal Processing ICASSP, Kyoto, Japan, 2012,
pp. 57–60
[31] Goto, M., Hashiguchi, H., Nishimura, T., et al.: ‘RWC music database: music
genre database and musical instrument sound database’. Proc. Int. Conf.
Music Information Retrieval, Baltimore, MD, USA, 2003
[32] Hoffmann, E., Kolossa, D., Köhler, B.-U., et al.: ‘Using information theoretic
distance measures for solving the permutation problem of blind source
separation of speech signals’, EURASIP J. Audio Speech Music Process.,
2012, 2012, (1), pp. 1–14
[33] ITU-R: ‘Method for objective measurements of perceived audio quality’,
BS.1387, 2001
[34] Lin, Y., Waleed, H.A.: ‘Perceptual evaluation of audio watermarking using
objective quality measures’. Proc. of the IEEE Int. Conf. on Acoustics,
Speech, and Signal Processing, ICASSP 2008, Las Vegas, Nevada, USA, 30
March – 4 April 2008, pp. 1745–1748
[35] Xiang, Y., Peng, D., Natgunanathan, I., et al.: ‘Effective pseudonoise
sequence and decoding function for imperceptibility and robustness
enhancement in time-spread echo-based audio watermarking’, IEEE Trans.
Multimedia, 2011, 13, (1), pp. 2–13
[36] Xiang, Y., Natgunanathan, I., Peng, D., et al.: ‘A dual-channel time-spread
echo method for audio watermarking’, IEEE Trans. Inf. Forensics Secur.,
2012, 7, (2), pp. 383–392
[37] Hua, G., Goh, J., Thing, V.L.L.: ‘Time-spread echo-based audio watermarking
with optimized imperceptibility and robustness’, IEEE/ACM Trans. Audio
Speech Lang. Process., 2015, 23, (2), pp. 227–239
[38] Chen, O.T.-C., Wu, W.-C.: ‘Highly robust, secure, and perceptual-quality
echo hiding scheme’, IEEE Trans. Audio Speech Lang. Process., 2008, 16,
(3), pp. 629–638
242 IET Signal Process., 2020, Vol. 14 Iss. 4, pp. 229-242
© The Institution of Engineering and Technology 2020
... According to previous studies, the larger the echo domain coefficient, the lower the transparency. In this case, the robustness rate increases [47,48]. This hypothesis has been analyzed on quantum signals in this study. ...
Article
Full-text available
Watermarking is the process of embedding information into a host signal, and it’s a significant process when it enters into the copyright protection or digital works. In the near future, quantum computing will be developed and therefore the protection of quantum data will be a vital issue. This paper proposes a novel quantum reversible realization of echo hiding-based audio watermarking in the quantum representation of digital signal (QRDS). In the embedding process, according to watermark qubits, some echo frames are generated by modifying time and amplitude qubits of host audio frames. Then, a watermarked quantum audio signal is obtained; as a sum on quantum host and quantum echo audio signals. The proposed extraction phase is carried out in non-blind manner. For this purpose, a calculation is carried out on each frame of host audio signal, in which, the given frame is embedded with |0〉 and |1〉, separately, the sum of the absolute values of differences between corresponding frame of received watermarked quantum audio signal and these two calculated frames is computed. The lower sum of absolute differences indicates the correct embedded qubit in the corresponding frame. The embedding and extraction processes are implemented using quantum reversible circuits, on nanoscale. The proposed scheme has a payload of 512 at SNR = 60.31. The simulation results show that the proposed scheme has high robustness against quantum signal processing attacks.
... Several methods proposed previously involved in inserting watermarks in the time domain [11][12][13][14][15][16][17][18][19][20][21] or frequency domain [3][4][5][6][7][8][9][10][22][23][24][25]. In this study, we focus on the frequency domain especially the wavelet domain. ...
Article
Full-text available
This work's contributions include three innovative concepts, an improved model, two-stage Lagrange principle, and minimum-energy scaling optimisation, for quantisation audio watermarking in the wavelet domain. First, discrete wavelet transform (DWT) multi-coefficients quantisation, composed of arbitrary scaling on the lowest DWT coefficients, and the group-based signal-to-noise ratio (SNR) of these coefficients is connected in a model. Then, the two-stage Lagrange principle and minimum-energy approach play two essential roles to obtain the optimal scaling factors. With the proposed scheme, the best fidelity and robustness of embedded audio can be attained and the perceptual evaluation of audio quality (PEAQ) test with an illustration of the relationship between SNR and PEAQ is also performed as well. Simulation results show that each watermarked audio by the proposed method attains a high SNR, good PEAQ, and a low bit error rate (BER). The SNR of most watermarked audios in their method is above 35 or even above 40 and the corresponding subjective difference grade of PEAQ is close to 0. In terms of comparing BER, most of their BER is as low as 2% or less indicating better robustness against many attacks, such as re-sampling, amplitude scaling, and mp3 compression.
Article
Full-text available
This manuscript presents a new scheme for binary watermark extraction using the combined application of noise-induced resonance (NIR) and support vector machine (SVM). The principal component analysis (PCA) is incorporated to minimize the dimension of the feature set obtained from the attacked watermarked image. The scheme utilizes lifting wavelet transform to decompose the original image into three levels, and blocks of low frequency sub-band coefficients are used for embedding purpose. Reference and signature information is embedded by quantizing the maximum and minimum coefficients of the corresponding block. Whereas, to extract the watermark, NIR-based tuning operation is performed. The transformed coefficients of the attacked watermarked image are tuned using iterative procedure of NIR in such a way that the transformed coefficients change their state from low signal-to-noise ratio (SNR) to maximum SNR or enhanced state. Finally, the tuned coefficients are fed into the machine i.e. SVM to classify as binary classes (0 or 1) which result in the corresponding watermark extraction. Experimental results of the proposed algorithm demonstrate noteworthy robustness against various signal processing attacks and remarkable improvements comparing with some of the recent techniques. Also, the scheme fulfills the requirements of image integrity in case of new strategic attack (i.e. print attack).
Article
Full-text available
A robust and highly imperceptible audio water-marking technique is presented to secure the electronic patient record of Parkinson's Disease (PD) affected patient. The proposed DCT-SVD based watermarking technique introduces minimal changes in speech such that the accuracy in classification of PD affected person's speech and healthy person's speech is retained. To achieve high imperceptibility the voiced part of the speech is considered for embedding the watermark. It is shown that the proposed watermarking technique is robust to common signal processing attacks. The practicability of the proposed technique is tested: by creating an android application to record & watermark the speech signal. The classification of PD affected speech is done using Support Vector Machine (SVM) classifier in cloud server.
Article
Full-text available
In this paper, a robust and highly imperceptible audio watermarking technique is presented based on discrete cosine transform (DCT) and singular value decomposition (SVD). The low-frequency components of the audio signal have been selectively embedded with watermark image data making the watermarked audio highly imperceptible and robust. The imperceptibility of proposed methods is evaluated by computing signal-to-noise ratio and by conducting subjective listening tests. The robustness of proposed technique is evaluated by computing bit error rate and average information loss in retrieved watermark image subjected to MP3 compression, AWGN, re-sampling, re-quantization, amplitude scaling, low-pass filtering, and high-pass filtering attacks with high data payload of 6 kbps. The information-theoretic approach is used to model the proposed watermarking technique as discrete memoryless channel. The Shannon’s entropy concept is used to highlight the robustness of proposed technique by computing the information loss in retrieved watermarked image.
Article
Full-text available
In this paper, a robust additive image watermarking system operating in the Dual Tree Complex Wavelet Transform (DTCWT) domain is proposed. The system takes advantage of a new perceptual masking model that exploits the Human Visual System (HVS) characteristics at the embedding stage. It also uses an efficient watermark detection structure, called the Rao-test, to verify the presence of the candidate watermark. This structure relies on the statistical modeling of high frequency DTCWT coefficients by the Generalized Gaussian distribution. Experimental results show that the proposed system outperforms related state-of-the-art watermarking systems in terms of imperceptibility and robustness.
Chapter
This paper proposes an audio watermarking method based on the singular-spectrum analysis (SSA) incorporating with a convolutional neural network (CNN) for parameter estimation. A watermark is embedded into an audio signal by modifying some part of its singular spectrum according to an embedding rule. Such a modified part affects both the robustness of the scheme and sound quality of watermarked signals, and it should be determined appropriately in order to balance the robustness and sound quality. In our previous work, we used a method based on a differential evolution (DE) algorithm to estimate the suitable part. However, it is a time-consuming approach. Therefore, in this work, we replace it with a CNN approach. A dataset used to train the CNN is constructed based on the DE. Experimental results show that the computational time is considerably reduced by 96,923 times. The average bit-error rate is 0.07 when there is no attack, and the sound quality of watermarked signals satisfies three objective evaluation metrics. Also, the proposed scheme could blindly extract the watermark due to the time efficiency of the CNN-based method.
Article
Watermarking is a solution for copyright protection and forensics tracking, but recapturing and de-synchronization attacks may be used to effectively remove audio watermarks. Although much effort has been made in recent years, the robustness of audio watermarking against recapturing and desynchronization attacks is still a challenging issue. Specifically, we first construct the frequency-domain coefficients logarithmic mean (FDLM) feature of digital audio. By theoretical analysis, we conclude that the residual of the two groups FDLM feature is robust against recapturing attack. We then propose a robust audio watermarking method based on this feature using the patchwork framework. Compared with the method having the best robustness performance against recapturing attack, the BER value of our method is decreased by 7%. Besides that, the proposed method outperforms the state-ofthe- art patchwork-based watermarking methods notably, under recapturing and post-processed with signal processing operations and de-synchronization attacks.
Article
Copyright protection of audio data is a serious problem and spread spectrum (SS) based audio watermarking is a promising technology to tackle this problem. Although a number of SS-based audio watermarking methods have been reported in the literature, they cannot achieve high robustness and embedding capacity at the same time. In this paper, we propose a novel SS-based audio watermarking method which can embed a large number of watermark bits into an audio signal without compromising the robustness against common attacks. Compared with the existing audio watermarking methods, the proposed one is especially robust against severe noise addition and compression attacks, while achieving high embedding capacity. Moreover, the new audio watermarking method is computationally efficient. The validity of the proposed SS-based audio watermarking method is demonstrated by simulation results.
Article
This paper presents a new fragile watermarking method for digital audio authenticity for audio forensics purposes. The aim is to verify if an audio proof has been tampered and to locate the segments where the signal was modified. Our proposal is based on an embedding process of a text that is encoded through OVSF (Orthogonal Variable Spreading Factor) codes and spread into the entire signal using automatic adjustment. Several tests were performed in order to quantify the accuracy and the reliability of the tampering detection against four classical attacks (cropping, replacement, additive noise and amplitude reduction) by using kappa index, sensitivity and specificity. It was demonstrated that even if a small number of samples is modified, the system correctly labels the audio proof as manipulated, and locates both the start and end of the manipulation; the kappa index (reliability) is around 0.96, sensitivity is always 1, and specificity is around 0.995. The proposed algorithm could be used as a decision support tool for audio forensics verification purposes, that allows to identify if an audio proof has been modified, and the time segments in which it has been modified.
Article
A multi-layer watermarking system is a system that is able to embed watermarks to a host media signal repeatedly in an overlaying manner, without incurring troubles in extracting the watermarks in each layer. In this paper, we present a novel patchwork-based audio watermarking algorithm which can embed and extract watermark bits successfully in such a multi-layer framework. In the proposed method, a new watermark embedding algorithm is designed to ensure that the embedded watermarks in a certain layer do not affect the detection of watermarks in other layers. Adding multiple layers of watermark bits inevitably reduces the perceptual quality. However, to minimize the perceptual quality degradation in multi-layer watermarking, the audio fragments for watermark embedding are selected from a set of specially arranged discrete cosine transform (DCT) coefficients of the host audio signal. Watermark embedding is achieved by modifying the mean values of selected sample fragments. With the use of an embedding error buffer, the proposed system can withstand a wide range of common attacks. To maintain the balance between the perceptual quality and robustness, watermark embedding strength is adjusted according to the specific layer used. The proposed multi-layer scheme ensures the independence of the processing in different layers. The effectiveness of the proposed system is demonstrated and verified by extensive simulation results.