Content uploaded by Teddy Furon
Author content
All content in this area was uploaded by Teddy Furon on Nov 05, 2014
Content may be subject to copyright.
Decoding Fingerprints Using the Markov Chain
Monte Carlo Method
Teddy Furon #, Arnaud Guyader +, Fr´
ed´
eric C´
erou ∗
INRIA Rennes #∗+, IRMAR ∗+, Universit ´
e de Rennes II +
Rennes, France
teddy.furon@inria.fr
Abstract— This paper proposes a new fingerprinting decoder
based on the Markov Chain Monte Carlo (MCMC) method. A
Gibbs sampler generates groups of users according to the poste-
rior probability that these users could have forged the sequence
extracted from the pirated content. The marginal probability that
a given user pertains to the collusion is then estimated by a Monte
Carlo method. The users having the biggest empirical marginal
probabilities are accused. This MCMC method can decode any
type of fingerprinting codes.
This paper is in the spirit of the ‘Learn and Match’ decoding
strategy: it assumes that the collusion attack belongs to a family
of models. The Expectation-Maximization algorithm estimates
the parameters of the collusion model from the extracted se-
quence. This part of the algorithm is described for the binary
Tardos code and with the exploitation of the soft outputs of the
watermarking decoder.
The experimental body considers some extreme setups where
the fingerprinting code lengths are very small. It reveals that the
weak link of our approach is the estimation part. This is a clear
warning to the ‘Learn and Match’ decoding strategy.
I. INTRODUCTION
A. The Application
This paper deals with active fingerprinting, a.k.a. traitor
tracing. A robust watermarking technique embeds the user’s
codeword into the content to be distributed. When a pirated
copy of the content is scouted, the watermark decoder extracts
the message, which identifies the dishonest user. However,
there might exist a group of dishonest users, so called collu-
sion, who mix their personal versions of the content to forge
the pirated copy. The extracted message no longer corresponds
to the codeword of one user, but is a mix of several codewords.
The decoder aims at finding back some of these codewords to
identify the colluders, while avoiding accusing innocent users.
A popular construction of the codewords is the Tardos
fingerprinting code [1]. The recent literature on this type of
binary probabilistic code aims at improving either the coding
or the decoding side. The first topic uses theoretical arguments
to fine-tune the code construction [2], [3]. Our paper has no
contribution on this side: we use the classical code construc-
tion originally proposed by Tardos in [1] for benchmarking,
WIFS‘2012, December, 2-5, 2012, Tenerife, Spain. 978-1-
4244-9080-6/10/$26.00 c
2012 IEEE.
but our scheme readily works with these more recent codes.
As for the decoding side, we pinpoint the following features
in the literature: Hard decoders work on the binary sequence
extracted from the pirated content, whereas soft decoders uses
some real outputs from the watermarking layer [4]. This paper
pertains to this latter trend. Some decoders compute a score per
user, which is provably good whatever the collusion strategy,
whereas others algorithms first estimate the collusion attack
and adapt their scoring function. This paper pertains to this
‘Learn and Match’ trend [5]. Last but not least, information
theory tells us that a joint decoder considering groups of users
is potentially more powerful than a single decoder computing
a score per user. Our approach is based on joint decoding [3].
B. The problem with joint decoding
Very few papers put into practice the principle of joint
decoding. The main difficulty is the matter of complexity. If
we consider groups of epeople among nusers, then we need
to browse n
e=O(ne)such groups, which is not tractable for
a large database of users. The approach first proposed in [6]
for pair decoding and generalized for bigger subsets in [5]
is an iterative decoder. The iteration tconsiders subsets of
tusers, which are taken from a small pool of most likely
colluders of size nt. If nt=O(n1/t), then the number of
subsets to be analyzed remains in O(n)and supposedly within
computational resources. But, this implies that more and more
users are discarded and this pruning is prone to miss colluders.
Another point is that the scoring function used in [5] is
not optimal for three reasons. The scoring is based on the
likelihood that a group of users are all guilty, whereas it
should be the likelihood that some of them are guilty. The
scoring relies on the way the codewords were generated and
is extremely specific to the probabilistic nature of the Tardos
code construction. It is also based on a pessimistic estimation
of the collusion attack artificially assuming a bigger collusion
size cmax. It appears that this scoring is less discriminative if
cmax is much larger than the true collusion size c.
C. Our contributions
This paper presents a soft decoder in the trend of the ‘Learn
and Match’ approach. The estimation part elegantly uses the
E.-M. algorithm for a fast estimation of the collusion attack.
After this first task, our decoding algorithm puts into practice
the concept of joint decoding more directly than the previous
methods [6], [5]. Specifically, there is no iterative pruning
of the users. Instead of computing scores per user subset,
the decoder generates typical subsets likely to be the real
collusion. This is efficiently done by a Markov chain and a
convenient representation of collusion. A Monte Carlo method
computes statistics from these sampled subsets such as the
marginals of being a colluder. This Markov Chain Monte Carlo
(MCMC) method is the core of our algorithm and is the main
contribution of this paper. It works with any code construction
(probabilistic or error correction based). This approach has
been also used in biology for library screening [7] and in
blind deconvolution [8].
Our main contributions are introduced first: the repre-
sentation of collusion (Sect. II) and the MCMC decoder
(Sect. III). The more technical details about the experimental
setup (collusion model for the watermarking layer, estimation
of its parameters with the E.-M. method, and the derivation
of transition probabilities) are presented in the second part
of the paper (Sect. IV and V). An experimental investigation
shows state-of-the-art performances as well as the limits of
our approach with very short codes.
II. THE REPRESENTATION OF COLLUSION
The keystone of our approach is the representation of
collusion by a limited list of the colluder identities. Suppose
there are nusers indexed from 1 to nand denote [n],
{1, . . . , n}. Depending on how short is the code, tracing
collusions bigger than a given size, denoted by cmax, produces
unreliable decision . The collusion representation is a vector
sof cmax integer components, each ranging from 0 to n.
Some of them may take the value 0 which codes ‘nobody’.
We denote s0,ksk0its `0-norm, i.e. the number of non-
zero components. This vector represents a collusion of size s0
whose users are given by the non-zero components. Hence,
there is no pair of non-zero components with the same value.
We denote by Sthe set of all such vectors.
For example, with cmax = 5,˜
s= (6,0,3,2,0) represents
the collusion of the ˜s0= 3 following users: 2, 3, and 6.
A. The Neighborhood of a collusion
We denote by S(s)the neighborhood of sas the set of col-
lusions differing by at most one component in their representa-
tion. We have s∈ S(s)and the other neighbors have one more
colluder, one less colluder, or just one different colluder. For
instance, {(6,1,3,2,0),(6,0,3,0,0),(6,0,4,2,0)}⊂S(˜
s).
The neighborhood is decomposed as S(s) = Scmax
i=1 Si(s),
with
Si(s),{s0∈ S|s0(k) = s(k),∀k6=i}.(1)
The subsets Si(s)are not disjoint. If s(i) = 0, then Si(s)
is composed of sand some collusions of size s0+ 1 (one
user is added). The cardinal of Si(s)equals n+ 1 −s0. If
s(i)>0, then Si(s)is composed of some collusions of size
s0including s(user s(i)is replaced by someone else or not)
and one collusion of size s0−1(user s(i)is removed). In this
case, |Si(s)|=n+ 2 −s0.
B. The prior probability of a collusion
We now cast a probabilistic model on the collusion repre-
sentation. Our prior is as less informative as possible. Having
no information about the size of the collusion, we pose that
all sizes are equally probable if less or equal than cmax:
P(s0) = c−1
max,∀s0∈[cmax ].(2)
We also pose that the n
ccollusions of size care equally
likely. Finally, the prior on sis given by:
P(s) = P(s, s0) = P(s|s0)P(s0) = n
s0−1
c−1
max.(3)
With this model, the prior distribution is not uniform: collu-
sions of bigger size have lower probabilities.
C. The posterior probability of a collusion
Once a pirated copy is scouted, a sequence zis extracted.
This observation refines our knowledge on the collusion,
which is reflected by the posterior probability P(s|z). Thanks
to the Bayes rule:
P(s|z) = P(z|s)P(s)
P(z).(4)
Yet, we cannot compute this probability because P(z)is
unknown. This is not critical since this quantity is a common
denominator to all posteriors: we can still compare posteriors
or compute the ratio of two posteriors. The next difficulty is
the conditional P(z|s), which in words is the probability that
collusion scould have forged sequence z. This is where we
need a probabilistic model of the collusion process. Sect. IV-
C presents our models and Sect. V-D gives the expressions of
the conditional probabilities.
The Maximum A Posteriori decoder consists in finding the
collusion with the biggest posterior. Browsing all of them is
yet computationally intractable for a large database of users.
As an alternative, we can build a single decoder based on
the marginal posterior probabilities: the probability that user
jis a colluder is given by
P(j|z) = X
s|∃s(i)=j
P(s|z).(5)
Again, browsing the collusions of all the summands is not
practical for a large number of users.
III. THE MA RKOV CHAIN MO NT E CARLO MET HO D
The key idea of our approach is to estimate the above
marginals with a Markov Chain Monte Carlo method.
0 50 100 150 200 250 300
0
100
200
300
400
500
0 50 100 150 200 250 300
0
0.2
0.4
0.6
0.8
1
Fig. 1. Illustration of the MCMC metod for K= 500 and n= 300: [Up]
The Markov chain: the binary matrix K×nindicating which users belong
to the state s(t)for 1≤t≤K, [Down] The Monte Carlo estimation: the
empirical marginal probabilities, i.e. the mean of the columns of the above
binary matrix.
A. The Monte Carlo Method
Instead of computing the marginals with (5), a Monte Carlo
method estimates their values: we draw Kcollusions {sk}K
k=1
according to distribution P(s|z), and the empirical marginals
are given by:
ˆ
P(j|z) = |{sk|∃sk(i) = j}|/K, (6)
which reads as the empirical frequency that user jbelongs to
these sampled collusions. The next subsections explain how
we succeed to sample collusions distributed as P(s|z). The
difficulty lies in the fact that we cannot sample them directly
because we only know P(s|z)up to the multiplicative constant
P(z).
B. The Markov Chain
The collusions are indeed generated thanks to a Markov
chain. It is an iterative process with a state (here a collusion)
taking value s(t)at iteration t. The next iteration draws a new
state s(t+1) according to the transition probability P(s(t+1) =
s|s(t))specified in the next section. The Markov chain is ini-
tialized by randomly selecting a collusion s(0). The transition
probabilities are carefully crafted so that the distribution of
the state s(t)converges to the targeted distribution P(s|z)
as tincreases. After a burn-in period T, we assume that
the Markov chain has forgotten s(0) and that the states are
correctly sampled from now on: the collusions {s(t)}T+K
t=T
are then passed to the Monte Carlo part of the algorithm
that computes the empirical marginals thanks to (6). Fig. 1
illustrates the MCMC method. The colluders are the 6 users
with the highest empirical marginals. One sees that innocents
often belong to some collusion states s(t), but it intermittently
occurs for one given innocent so that eventually it doesn’t
harm.
C. The Gibbs sampler
Instead of computing the transition probabilities from s(t)
to any possible collusion, we restrict the transitions to the
collusions of a subset of its neighborhood S(s(t))(See Sect. II-
A). This is called a multi-stage Gibbs sampler with random
scan [9, Alg. A.42]. At iteration t+ 1, an integer iis first
uniformly drawn in [cmax]that indicates the subset Si(s(t)).
Then, the following transition distribution is constructed:
∀s∈ Si(s(t))
P(s(t+1) =s|s(t)) := P(s|z)
Ps0∈Si(s(t))P(s0|z)
=P(z|s)P(s)
Ps0∈Si(s(t))P(z|s0)P(s0)(7)
This choice guarantees that the stationary distribution of this
Markov chain is P(s|z), which legitimates our approach [9,
Sect. 10.2.1]. The unknown multiplicative constant P(z)in (4)
has disappeared in the ratio. This transition distribution only
depends on the priors P(s0)and P(s)given by (3), and the
conditional probabilities {P(z|s)}s∈Si(s(t))which depend on
the collusion process that will be estimated (See Sect. IV-C
and (17) or (18)). These latter quantities are functions of the
codewords of users listed in s, whatever their construction.
This decoding algorithm thus works with any fingerprinting
code. The remainder of the paper applies this approach to
Tardos codes.
IV. THE SE TU P
This section briefly reviews the construction of the Tardos
code, describes the setup for hiding the user’s codeword into
the content, and presents the model of the collusion attack.
A. Tardos Code Construction
The binary code is composed of ncodewords of mbits.
The codeword xj= (xj,1,··· , xj,m )Tidentifying user j∈
U= [n], is composed of mbinary symbols independently
drawn at the code construction s.t. P(xj,i = 1) = pi,∀(j, i)∈
[n]×[m]. At initialization, the auxiliary variables {pi}m
i=1 are
independent and identically drawn according to distribution
f(p) : [0,1] →R+. This distribution is a parameter to
be selected by the code designer. Tardos originally proposed
fT(p),1[t,1−t](p)κt/pp(1 −p)in [1], where 1[a,b]is the
indicator function of the interval (i.e. 1[a,b](p) = 1 if a≤p≤
b,0otherwise), and κtthe constant s.t. R1
0fT(p)dp = 1. The
cut-off parameter tis usually set to 1/(300cmax). The integer
cmax is the maximum expected number of colluders.
The distribution fis public, but both the code Ξ =
[x1,...,xn]and the auxiliary sequence p= (p1, . . . , pm)T
must be kept as secret parameters. A user possibly knows his
codeword, but the only way to gain knowledge on some other
codewords is to team up in a collusion.
B. The Modulation
We assume the content is divided into chunks. For instance,
a movie is split into blocks of frames. A watermarking tech-
nique embeds a binary symbol per chunk, sequentially hiding
the codeword into the content. This implies that the code
length mis limited by the watermarking embedding rate times
the duration (or size) of the content. The robustness of the
watermarking technique strongly depends on the embedding
rate: the lower the embedding rate, the more robust the
watermark. Therefore, it is crucial to design a fingerprinting
decoder able to trace colluders despite a short code length.
The watermarking decoder retrieves the hidden bit xper
watermarked chunk of content. We assume that it first com-
putes a statistic z, so-called soft output, which ideally equals
1(resp. −1) if the hidden bit is ‘1’ (resp. ‘0’). The decoder
then threshold the soft output at 0 to yield a hard output:
ˆx= ‘1’ if z≥0,‘0’ otherwise. This is the case for instance
with a spread spectrum based watermarking technique using
an antipodal modulation (a.k.a. BPSK).
In this paper, the fingerprinting decoder (i.e. the accusation
process) is named soft decoder because it uses the soft outputs
of the watermarking decoder. This brings more information on
the colluders’ identity than the hard outputs.
C. The collusion Attack
The model of the collusion attack is taken from [5]. The
partition of content into chunks is not a secret. We assume
that the collusion attack sequentially processes all the chunks
in the same way: there is first a fusion of the versions of the
colluders into one chunk, and then a process (coarse source
compression, noise addition, filtering etc.) further distorts this
fused chunk. We foresee the following types of fusion.
1) Attacks of type I: The fusion is a signal processing
operation that mixes cchunks of content into one. This
includes sample-wise average or median, sample patchwork
etc. The fusion aims at reducing the confidence in the decoded
symbol so we assume that |z| ≤ 1.
This mixing is strongly driven by the number kof chunks
watermarked with symbol ‘1’ (denoted ‘1’-chunk). For a
collusion of size c,k∈ {0, . . . , c}and we assume that zcan
take c+ 1 values {µc(k)}c
k=0. In the manner of the marking
assumption, we restrict the power of the collusion by enforcing
µc(c) = −µc(0) = 1. In other words, there is no fusion when
all chunks are watermarked with the same symbol.
The watermarking secret key prevents the colluders from
determining the hidden symbol. Yet, they can group their
cchunks into two groups of exactly equal instances1. This
reveals the number of hidden ‘1’ up to an ambiguity: it is k
or c−k. This implies a symmetry in the model: µc(k) =
−µc(c−k),∀k∈ {0, . . . , c + 1}.
At last, the distortion on the fused chunk adds a noise on
the soft output: z=µc(k) + n. We assume that this noise is
independent of k, and i.i.d. with n∼ N(0, σ2).
We give two examples taken from [4].
•‘Average’: µc(k) = 2kc−1−1,∀k. This is the case
for instance if the soft decision is a linear process and
the colluders fuse all their chunks with a sample-wise
average.
•‘Average2’: µc(k)=0,∀k∈[c−1] and µc(c) =
−µc(0) = 1. This is the case for instance when the soft
1In other words, the watermarking is deterministic: its result only depends
on its inputs (the original chunk, the symbol to be embedded, and the secret
key)
decision is a linear process and the colluders fuse only
two different blocks with a sample-wise average.
2) Attacks of type II: In this type, the colluders benefit from
the division into blocks. At a given block index, they copy and
paste one of their chunks. There is no fusion of blocks. The
probability they put a ‘1’-chunk when they have kchunks
watermarked with symbol ‘1’ over cis denoted by θc(k). The
marking assumption imposes that θc(c)=1−θc(0) = 1. The
ambiguity on the number of ‘1’-chunks results in the symmetry
θc(k)=1−θc(c−k). After the distortion on the selected
chunk, zis distributed as N(−1, σ2)with probability 1−θc(k)
or as N(1, σ2)with probability θc(k).
Here are two classical examples also used in [4]:
•‘Uniform’: θc(k) = kc−1,∀k. The colluders uniformly
draw one of their chunks.
•‘Majority’: θc(k)=1if k > c/2,0otherwise (θc(c/2) =
1/2if cis even). The colluders classify the chunks into
two groups of same instance and choose a chunk from
the bigger group.
V. TH E ESTIMATION WITH THE E.-M. ALGORITHM
The first task of the proposed decoder is the estimation of
the collusion attack. It amounts to guess the type of the attack
and its parameters (µc, σ)(type I) or (θc, σ)(type II) from
the observations zand the knowledge of the secret p. The
model is not identifiable if cis unknown. For this reason,
the estimation is done for all collusion sizes ranging from 1
to cmax. This results in a set of cmax model parameters. This
estimation task heavily relies on the collusion model (Sect. IV-
C) and the Tardos code.
A. The Maximum Likelihood Estimator
The MLE (Maximum Likehood Estimator) searches the
maximum of the log-likelihood. For type I, the latter has the
following expression:
L(I)(µ˜c, σ),log P(z|p,µ˜c, σ)
=
m
X
i=1
log ˜c
X
k=0
P(k|pi)f(zi;µ˜c(k), σ)!,(8)
with f(z;µ, σ),e−(z−µ)2
2σ2/√2πσ2. For type II, we have
L(II )(θ˜c, σ ),log P(z|p,θ˜c, σ)
=
m
X
i=1
log ˜c
X
k=0
P(k|pi)[θ˜c(k)f(zi; 1, σ)
+ (1 −θ˜c(k))f(zi;−1, σ)]) .(9)
The final decision about the type of the attack is taken by
comparing the values L(I)(µ?
˜c, σ?
I,˜c)and L(I I)(θ?
˜c, σ?
II,˜c): the
type giving the biggest likelihood is selected. Note that both
types have the same number of parameters, so that none is
more prone than the other to overfitting and this justifies the
comparison of their respective likelihoods.
Yet, these two optimizations are nothing trivial because
the functionals have plenty of local maxima. Under both
attack types, the observation zis indeed a mixture of a fixed
number of Gaussian distributions, a typical case where the
Expectation-Maximization estimator is elegant and efficient
even if convergence to the global maximum is not ensured. The
following two subsections are the application of [10, Tab. 3.1].
B. E.M. Algorithm for type I Attacks
We introduce the unknown latent variables ϕ=
(ϕ1, . . . , ϕm)that capture the number of ‘1’-chunks per index:
∀i∈[m], ϕi∈ {0,...,˜c}. If ˜c=c, the true value of ϕi
is Pk∈C xk,i. The log-likelihood function with these latent
variables is now L(I)(µ˜c, σ),log P(z,ϕ|p,µ˜c, σ ):
L(I)(µ˜c, σ) =
m
X
i=1
log (P(ϕi|pi)f(zi;µ˜c(ϕi), σ)) ,(10)
with P(ϕ|p) = ˜c
ϕpϕ(1 −p)˜c−ϕ. The E.-M. iteratively refines
the estimation (ˆ
µ˜c,ˆσ)of the parameters and of the (˜c+1)×m
matrix Tstoring the conditional probabilities of ϕ:
Tk,i ,P(ϕi=k|zi, pi,ˆ
µ˜c,ˆσ),∀k∈ {0,...,˜c}.(11)
1) E-step: Given the current estimate of the model, the
conditional probabilities of ϕare updated via the Bayes rule:
ˆ
Tk,i =P(ϕi=k|pi)f(zi; ˆµ˜c(k),ˆσ)
P˜c+1
u=0 P(ϕi=u|pi)f(zi; ˆµ˜c(k),ˆσ)(12)
2) M-step: Given the conditional probabilities, this step
updates the model by finding the parameters that maximize
the Qfunction Eϕ|zp[L(I)(µ˜c, σ)]. These have a closed-form
expression in the case of Gaussian mixtures:
ˆµ˜c(k) = m
X
i=1
ˆ
Tk,izi!/
m
X
i=1
ˆ
Tk,i,∀k∈[˜c−1] (13)
ˆσ2=m−1
˜c
X
k=0
m
X
i=1
ˆ
Tk,i(zi−ˆµ˜c(k))2(14)
These two steps are iterated until the increase of the true log-
likelihood L(I)(ˆ
µ˜c,ˆσ)is no longer sensitive.
C. E.M. Algorithm for type II Attacks
Under this type of attack, the latent variable ζi∈
{−˜c, . . . , −1,0,1,...,˜c}captures the event that the colluders
have |ζi|‘1’-chunks and that they copy-paste a chunk with
symbol sg(ζi)at the i-th index (sg(k) = 1 if k > 0,
0 otherwise). The log-likelihood function with these latent
variables is now L(II)(θ˜c, σ ),log P(z,ζ|p,θ˜c, σ):
L(II )(θ˜c, σ ) =
m
X
i=1
log (π(ζi, pi).f(zi; 2sg(ζi)−1, σ)) ,
(15)
with π(ζi, pi) = θ(|ζi|)sg(ζi)(1−θ(|ζi|))1−sg(ζi)P(|ζi||pi), and
P(|ζ||p) = ˜c
|ζ|p|ζ|(1 −p)˜c−|ζ|.
1) E-step: The estimates of the conditional probabilities
Uk,i ,P(ζi=k|zi, pi,ˆ
θ˜c,ˆσ)are updated as follows:
ˆ
Uk,i ∝ˆ
θ(k)P(k|pi)f(zi; 1,ˆσ) if 0 ≤k≤˜c
(1 −ˆ
θ(k))P(k|pi)f(zi;−1,ˆσ) if −˜c≤k < 0
such that P˜c
k=−˜cˆ
Uk,i = 1.
2) M-step: The estimate of the model is updated as follows:
ˆ
θ(k) = m
X
i=1
ˆ
Uk,i!/ m
X
i=1
ˆ
Uk,i +ˆ
U−k,i!,∀k∈[˜c−1]
ˆσ2=m−1
m
X
i=1
˜c
X
k=−˜c
ˆ
Uk,i(zi−2sg(k) + 1)2(16)
D. The expression of the conditional probabilities
Once these cmax estimations of the collusion attacks are
done, these models are used in the MCMC decoder via the
conditional probabilities P(z|s). Denote κ= (κ1, . . . , κm)
the sequence of number of symbols ‘1’ in the codewords
of collusion s:κi=Pj∈sxj,i (with the convention that
x0,i = 0). If for the size s0, a type I collusion attack has
been diagnosed, then
P(z|s) =
m
Y
i=1
f(zi;µ?
s0(κi), σ?
I,s0),(17)
otherwise, for a type II attack:
P(z|s) =
m
Y
i=1 θ?
s0(κi).f(zi; 1, σ?
II ,s0)
+ (1 −θ?
s0(κi)).f(zi;−1, σ?
II ,s0)(18)
VI. EXPER IM EN TAL BODY
The experimental setup is the same as in [4] with ‘Uniform’,
‘Majority’, ‘Average’ and ‘Average2’ attacks (See examples
of Sect. IV-C.1 and IV-C.2). There are two scenarios:
(a) m= 2048,c= 8,cmax = 10,
(b) m= 1024,c= 6,cmax = 8.
The common parameters are n= 104,T= 400,K= 2000.
The variance of the noise is given by σ= 10−SNR/20, with
SNR ranging from 10 to 2 dB. This is the signal to noise power
ratio at the output of the watermark decoder. It should not be
confused with the amount of noise on the content samples.
These settings are very extreme in the sense that the codes
are too short to be used in practice. We made this choice in
order to show the limits of our method.
The set Ais defined as A,{j|ˆ
P(j|z)> τ}. A false
negative occurs if this set is empty. In a single decoder, the
user in Awith the biggest empirical marginal is accused. In
case of a tie where dusers have the maximum score, one
user is randomly picked. This leads to a false positive with
probability di/d where diis the number of innocents with
this maximum score. In a joint decoder, the users in Aare all
accused. We record the number of caught colluders |A ∩ C|
and the number of accused innocents |A| − |A ∩ C|.
Under scenario (a), the comparison with the performances
of the decoder proposed in [4] is mitigated as shown in
Fig. 2 (down). The baseline of [4] is slightly better against
the ‘Majority’ attack whereas our decoder is better against the
three other attacks. Yet, our decoder has a major drawback: the
probability of false alarm is totally unacceptable at low SNR
(i.e., at 2 dB in Fig. 2) and when facing the ‘Uniform’ attack.
The same comment holds for the scenario (b) (See Fig. 3).
2 3 4 5 6 7 8 9 10
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
0
log
10
2 6 10
0
1
2
3
4
5
6
7
8
Fig. 2. Scenario (a), 2250 experiments, SNR ∈ {2,6,10}dB. [Up]
Probabilities of errors for the single decoder: (solid) Probability of false
negative, (dashed) Probability of false positive, [Down] Average number of
caught colluders for the joint decoder: (black) average number of accused
innocents, (dashed) the baseline from [4], Color: (blue) ‘Uniform’, (green)
‘Majority’, (red) ‘Average’, (pink) ‘Average2’.
We suspect the ‘Uniform’ attack to be closed to the worst
attack of type I, at a given SNR: it has been proven that the
achievable rate against the ‘Uniform’ quickly converges to the
one of the worst case attack when considering hard outputs [3].
The scenario (b) gives us another hint. Both the length of
the code and the square of the collusion size are now twice
smaller. Since the code length should be of order m∼O(c2),
the performances of the decoder should be roughly the same
under both scenarios. This is not the case: the probability
of false accusation is bigger. This allows us to suspect the
estimation part to be the Achilles’ heel of our algorithm: it
produces a bad quality estimation because the code is too short
and this spoils the MCMC decoding part. A close look at the
estimated collusion models reveals that the type of model and
the variance of the noise on the soft outputs are almost always
evaluated with high fidelity. Under the ‘Uniform’ attack, the
problem indeed stems from the estimations of ˆ
θ˜cwith ˜c>c
prone to overfitting: these estimated models are quite different
in nature from the ‘Uniform’ attack of size ˜c.
To confirm this intuition by a last experiment, we by-pass
the estimation and give to the MCMC method the collusion
models exactly matching the ‘Uniform’ attack. This simulates
a ‘perfect’ estimation. The results are shown in Fig. 3 (down)
in light blue color. If the number of caught colluders decreases
a little, the accusation is now much more reliable with a
probability of false positive around 3.10−3.
VII. CONCLUSION
The fingerprinting decoder studied in this paper shows
some novelty: i) the accusation is probabilistic because of
the randomness of the MCMC, ii) it resorts to a posteriori
probabilities for groups of users more directly than in the
previous approach, iii) the decoding part makes no assumption
2 3 4 5 6 7 8 9 10
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
0
2 6 10
0
1
2
3
4
5
6
Fig. 3. Scenario (b), 2715 experiments, SNR ∈ {2,6,10}dB. [Up]
Probabilities of errors for the single decoder (solid) Probability of false
negative - (dashed) Probability of false positive, [Down] Average number
of caught colluders for the joint decoder, (black) average number of accused
innocents , (dashed) the baseline from [4], Color: (blue) ‘Uniform’, (green)
‘Majority’, (red) ‘Average’, (pink) ‘Average2’, and (light blue) ‘Uniform’
attack with true model.
on the code construction. It has also some drawbacks: i) the
estimation part detailed here is specific to Tardos fingerprinting
codes, ii) the complexity of MCMC is quite high in O(Kmn),
iii) the probability of false alarm is not easy to control.
The main message is that the performances are limited by
the quality of the estimation of the collusion strategy. This is
a clear warning to the ‘Learn and Match’ decoding strategy,
and this issue deserves more research efforts.
REFERENCES
[1] G. Tardos, “Optimal probabilistic fingerprint codes,” in Proc. of the 35th
annual ACM symposium on theory of computing. San Diego, CA, USA:
ACM, 2003, pp. 116–125.
[2] K. Nuida, S. Fujitsu, M. Hagiwara, T.Kitagawa, H. Watanabe, K. Ogawa,
and H. Imai, “An improvement of discrete Tardos fingerprinting codes,”
Designs, Codes and Cryptography, vol. 52, no. 3, pp. 339–362, 2009.
[3] Y.-W. Huang and P. Moulin, “On the saddle-point solution and the large-
coalition asymptotics of fingerprinting games,” IEEE Transactions on
Information Forensics and Security, vol. 7, no. 1, pp. 160–175, 2012.
[4] M. Kuribayashi, “Bias equalizer for binary probabilistic fingerprinting
codes,” in Proc. of 14th Information Hiding Conference, ser. LNCS,
S. Verlag, Ed., Berkeley, CA, USA, may 2012.
[5] P. Meerwald and T. Furon, “Towards practical joint decoding of binary
Tardos fingerprinting codes,” Information Forensics and Security, IEEE
Transactions on, vol. 7, no. 4, pp. 1168–1180, August 2012.
[6] E. Amiri, “Fingerprinting codes: higher rates, quick accusations,” Ph.D.
dissertation, Simon Fraser University, Fall 2010.
[7] E. Knill, A. Schliep, and D. C. Torney, “Interpretation of pooling
experiments using the Markov chain Monte Carlo method,” J Comput
Biol, vol. 3, no. 3, pp. 395–406, 1996.
[8] D. Ge, J. Idier, and E. L. Carpentier, “Enhanced sampling schemes for
MCMC based blind Bernoulli-Gaussian deconvolution,” Signal Process-
ing, vol. 91, no. 4, pp. 759–772, 2011.
[9] C. Robert and G. Casella, Monte Carlo statistical methods. Springer
Verlag, 2004.
[10] M. R. Gupta and Y. Chen, “Theory and use of the EM algorithm,”
Foundations and Trends in Signal Processing, vol. 4, no. 3, pp. 223–
296, 2010.