Content uploaded by Jonathan Scarlett
Author content
All content in this area was uploaded by Jonathan Scarlett on Mar 17, 2018
Content may be subject to copyright.
1
Noisy Adaptive Group Testing:
Bounds and Algorithms
Jonathan Scarlett
Abstract
The group testing problem consists of determining a small set of defective items from a larger set of items based
on a number of possibly-noisy tests, and is relevant in applications such as medical testing, communication protocols,
pattern matching, and many more. One of the defining features of the group testing problem is the distinction between
the non-adaptive and adaptive settings: In the non-adaptive case, all tests must be designed in advance, whereas in
the adaptive case, each test can be designed based on the previous outcomes. While tight information-theoretic limits
and near-optimal practical algorithms are known for the adaptive setting in the absence of noise, surprisingly little is
known in the noisy adaptive setting. In this paper, we address this gap by providing information-theoretic achievability
and converse bounds under a widely-adopted symmetric noise model, as well as a slightly weaker achievability bound
for a computationally efficient variant. These bounds are shown to be tight or near-tight in a broad range of scaling
regimes, particularly at low noise levels. The algorithms used for the achievability results have the notable feature of
only using two or three stages of adaptivity.
Index Terms
Group testing, sparsity, information-theoretic limits, adaptive algorithms
I. INTRODUCTION
The group testing problem consists of determining a small subset Sof “defective” items within a larger set of
items {1, . . . , p}, based on a number of possibly-noisy tests. This problem has a history in medical testing [1],
and has regained significant attention with following new applications in areas such as communication protocols
[2], pattern matching [3], and database systems [4], and new connections with compressive sensing [5], [6]. In the
noiseless setting, each test takes the form
Y=_
j∈S
Xj,(1)
where the test vector X= (X1, . . . , Xp)∈ {0,1}pindicates which items are included in the test, and Yis the
resulting observation. That is, the output indicates whether at least one defective item was included in the test. One
wishes to design a sequence of tests X(1), . . . , X (n), with nideally as small as possible, such that the outcomes
can be used to reliably recover the defective set Swith probability close to one.
The author is with the Department of Computer Science & Department of Mathematics, National University of Singapore (e-mail:
scarlett@comp.nus.edu.sg). This work was supported by an NUS Startup Grant.
March 15, 2018 DRAFT
arXiv:1803.05099v1 [cs.IT] 14 Mar 2018
2
One of the defining features of the group testing problem is the distinction between the non-adaptive and adaptive
settings. In the non-adaptive setting, every test must be designed prior to observing any outcomes, whereas in the
adaptive setting, a given test X(i)can be designed based on the previous outcomes Y(1), . . . , Y (i−1). It is an active
area of research to determine the extent to which this additional freedom helps in reducing the number of tests.
In the noiseless setting, a number of interesting results have been discovered along these lines:
•When the number of defectives k:= |S|scales as k=O(p1/3), the minimal number of tests permitting
vanishing error probability scales as n=klog2p
k(1 + o(1)) in both the adaptive and non-adaptive settings
[7], [8]. Hence, at least information-theoretically, there is no asymptotic adaptivity gain.
•For scalings of the form k= Θ(pθ)with θ∈1
3,1, the behavior n=klog2p
k(1+o(1)) remains unchanged
in the adaptive setting [7], but it remains open as to whether this can be attained non-adaptively. For θclose
to one, the best known non-adaptive achievability bounds are far from this threshold.
•Even in the first case above with no adaptivity gain, the adaptive algorithms known to achieve the optimal
threshold are practical, having low storage and computation requirements [9]. In contrast, in the non-adaptive
case, only computationally intractable algorithms have been shown to attain the optimal threshold [8], [10].
•It has recently been established that there is a provable adaptivity gap under certain scalings of the form
k= Θ(p), i.e., the linear regime [11], [12].
Despite this progress for the noiseless setting, there has been surprisingly little work on adaptivity in noisy settings;
the vast majority of existing group testing algorithms for random noise models are non-adaptive [13]–[16]. In this
paper, we address this gap by providing new achievability and converse bounds for noisy adaptive group testing,
focusing primarily on a widely-adopted symmetric noise model. Before outlining our contributions, we formally
introduce the setup.
A. Problem Setup
Except where stated otherwise, we let the defective set Sbe uniform on the p
ksubsets of {1, . . . , p}of cardinality
k. As mentioned above, an adaptive algorithm iteratively designs a sequence of tests X(1), . . . , X (n), with X(i)∈
{0,1}p, and the corresponding outcomes are denoted by Y= (Y(1), . . . , Y (n)), with Y(i)∈ {0,1}. A given test is
allowed to depend on all of the previous outcomes.
Generalizing (1), we consider the following widely-adopted symmetric noise model:
Y=_
j∈S
Xj⊕Z, (2)
where Z∼Bernoulli(ρ)for some ρ∈0,1
2, and ⊕denotes modulo-2 addition. In Section V, we will also consider
other asymmetric noise models.
Given the tests and their outcomes, a decoder forms an estimate b
Sof S. We consider the exact recovery criterion,
in which the error probability is given by
Pe:= P[b
S6=S],(3)
March 15, 2018 DRAFT
3
and is taken over the randomness of the defective set S, the tests X(1), . . . , X (n)(if randomized), and the noisy
outcomes Y(1), . . . , Y (n).
As a stepping stone towards exact recovery results, we will also consider a less stringent partial recovery criterion,
in which we allow for up to dmax false positives and up to dmax false negatives, for some dmax >0. That is, the
error probability is
Pe(dmax) := P[d(S, b
S)> dmax],(4)
where
d(S, b
S) = max{|S\b
S|,|b
S\S|}.(5)
Understanding partial recovery is, of course, also of interest in its own right. However, the results of [8], [17]
indicate that there is no adaptivity gain under this criterion, at least when k=o(p)and dmax =o(k).
Except where stated otherwise, we assume that the noise level ρand number of defectives kare known. In Section
IV, we will consider cases where kis only approximately known.
B. Related work
Non-adaptive setting. The information-theoretic limits of group testing were first studied in the Russian literature
[13], [18], and have recently become increasingly well-understood [8], [10], [17], [19]–[21]. Among the existing
works, the results most relevant to the present paper are as follows:
•In the adaptive setting, it was shown by Baldassini et al. [7] that if the output Yis produced by passing the
noiseless outcome U=∨j∈SXjthrough a binary channel PY|U, then the number of tests for attaining Pe→0
must satisfy n≥1
Cklog p
k(1 −o(1)),1where Cis the Shannon capacity of PY|Uin nats. For the symmetric
noise model (2), this yields
n≥klog p
k
log 2 −H2(ρ)(1 −o(1)),(6)
where H2(ρ) = ρlog 1
ρ+ (1 −ρ) log 1
1−ρis the binary entropy function.
•In the non-adaptive setting with symmetric noise, it was shown that an information-theoretic threshold decoder
attains the bound (6) for k=o(p)under the partial recovery criterion with dmax = Θ(k)and an arbitrarily
small implied constant [8], [17]. For exact recovery, a more complicated bound was also given in [8] that
matches (6) when k= Θ(pθ)for sufficiently small θ > 0.
Several non-adaptive noisy group testing algorithms have been shown to come with rigorous guarantees. We will
use two of these non-adaptive algorithms as building blocks in our adaptive methods:
•The Noisy Combinatorial Orthogonal Matching Pursuit (NCOMP) algorithm checks, for each item, the
proportion of tests it was included in that returned positive, and declares the item to be defectie if this number
exceeds a suitably-chosen threhsold. This is known to provide optimal scaling laws for the regime k= Θ(pθ)
(θ∈(0,1)) [14], [15], albeit with somewhat suboptimal constants.
1Here and subsequently, the function log(·)has base e.
March 15, 2018 DRAFT
4
•The method of separate decoding of items, also known as separate testing of inputs [13], [16], also considers the
items separately, but uses all of the tests. Specifically, a given item’s status is selected via a binary hypothesis
test. This method was studied for k=O(1) in [13], and for k= Θ(pθ)in [16]; in particular, it was shown that
the number of tests is within a factor log 2 of the optimal information-theoretic threshold under exact recovery
as θ→0, and under partial recovery (with dmax = Θ(k)) for all θ∈(0,1).
A different line of works has considered group testing with adversarial errors (e.g., see [22]–[24]); these are less
relevant to the present paper.
Adaptive setting. As mentioned above, adaptive algorithms have enjoyed a great deal of success in the noiseless
setting [25], [26]. To our knowledge, the first algorithm that was proved to achieve n=klog2p
k(1 + o(1)) for
all k=o(p)is Hwang’s generalized binary splitting algorithm [9], [25]. More recently, there has been interest in
algorithms that only use limited rounds of adaptivity [26]–[28], and among other things, it has been shown that the
same guarantee can be attained using at most four stages [26].
In the noisy adaptive setting, the existing work is relatively limited. In [29], an adaptive algorithm called
GROTESQUE was shown to provide optimal scaling laws in terms of both samples and runtime. Our focus in
this paper is only on the number of samples, but with a much greater emphasis on the constant factors. In [30,
Ch. 4], noisy adaptive group testing algorithms were proposed for two different noise models based on the Z-channel
and reverse Z-channel, also achieving an order-optimal required number of tests with reasonable constant factors.
We discuss these noise models further in Section V.
C. Contributions
In this paper, we characterize both the information-theoretic limits and the performance of practical algorithms
for noisy adaptive group testing, characterizing the asymptotic required number of tests for Pe→0as p→ ∞.
For the achievability part, we propose an adaptive algorithm whose first stage can be taken as any non-adaptive
algorithm that comes with partial recovery guarantees, and whose second stage (and third stage in a refined version)
improve this initial estimate. By letting the first stage use the information-theoretic threshold decoder of [8], we
attain an achievability bound that is near-tight in many cases of interest, whereas by using separate decoding of
items as per [13], [16], we attain a slightly weaker guarantee while still maintaining computational efficiency. In
addition, we provide a novel converse bound showing that Ω(klog k)tests are always necessary, and hence, the
implied constant in any scaling of the form n= Θklog p
kwith k= Θ(pθ)must grow unbounded as θ→1.
Our results are summarized in Figure 1, where we observe a considerable gain over the best known non-adaptive
guarantees, particularly when the noise level ρis small. Although there is a gap between the achievability and
converse bounds for most values of θ, the converse has the notable feature of showing that n=klog2
p
k
log 2−H2(ρ)(1+o(1))
is not always achievable, as one might conjecture based on the noiseless setting. In addition, the gap between the
(refined) achievability bound and the converse bound is zero or nearly zero in at least two cases: (i) θis small; (ii)
θis close to one and ρis close to zero. The algorithms used in our upper bounds have the notable feature of only
using two or three rounds of adaptivity, i.e., two in the simple version, and three in the refined version.
March 15, 2018 DRAFT
5
Value 3such that k=#(p3)
0 0.2 0.4 0.6 0.8 1
Asymptotic ratio of klog2p=k to n
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Non-adaptive
Adaptive (Simple)
Adaptive (Practical)
Adaptive (Refined)
Converse
Value 3such that k=#(p3)
0 0.2 0.4 0.6 0.8 1
Asymptotic ratio of klog2p=k to n
0
0.2
0.4
0.6
0.8
1
Figure 1: Asymptotic thresholds on the number of tests required for vanishing error probability under the noise
levels ρ= 0.11 (Left) and ρ= 10−4(Right).
In addition to these contributions for the symmetric noise model, we provide the following results for other
observation models:
•In the noiseless case, we recover the threshold n=klog2p
k(1 + o(1)) for all θ∈(0,1) using a two-stage
adaptive algorithm. Previously, the best known number of stages was four [26].
•For the Z-channel noise model (defined formally in Section V), we show that one can attain n=
1
Cklog2p
k(1 + o(1)) for all θ∈(0,1), where Cis the Shannon capacity of the channel. This matches
the general converse bound given in [7], i.e., the generalized version of (6). As a result, we improve on the
above-mentioned bounds of [30], which contain reasonable yet strictly suboptimal constant factors.
•For the reverse Z-channel noise model (defined formally in Section V), we prove a similar converse bound to
the one mentioned above for the symmetric noise model, thus showing that one cannot match the converse
bound of [7] for all θ∈(0,1).
The remainder of the paper is organized as follows. For the symmetric noise model, we present the simple version
of our achievability bound in Section II, the refined version in Section III, and the converse in Section IV. The
other observation models mentioned above are considered in Section V, and conclusions are drawn in Section VI.
II. ACHIEVABILITY (SIMPLE VER SI ON )
In this section, we formally state our simplest achievability results; a more complicated but powerful variant is
given in Section III. Using a common two-stage approach, we provide achievability bounds for both a computationally
intractable information-theoretic decoder, and computationally efficient decoder.
March 15, 2018 DRAFT
6
A. Information-theoretic decoder
The two-stage algorithm that we adopt is outlined informally in Algorithm 1; we describe the steps more precisely
in the proof of Theorem 1 below. The high-level intuition is to use a non-adaptive algorithm with partial recovery
guarantees, and then refine the solution by resolving the false negatives and false positives separately, i.e., Steps
2a and 2b. While these latter steps are stated separately in Algorithm 1, the tests that they use can be performed
together in a single round of adaptivity, so that the overall algorithm is a two-stage procedure.
Algorithm 1: Two-stage algorithm for noisy group testing (informal).
1. Apply the information-theoretic threshold decoder of [8] (see Appendix A) to the ground set {1, . . . , p}to find
an estimate b
S1of cardinality ksuch that
max{|b
S1\S|,|S\b
S1|} ≤ α1k(7)
with high probability, for some small α1>0.
2a. Apply a variation of NCOMP [14] (see Appendix C) to the reduced ground set {1, . . . , p}\b
S1to exactly identify
the false negatives S\b
S1from the first step. Let these items be denoted by b
S0
2a.
2b. Test the items in b
S1individually entimes, and let b
S0
2bcontain the items that returned positive at least
en
2times.
The final estimate of the defective set is given by b
S:= b
S0
2a∪b
S0
2b.
Our first main information-theoretic achievability result is as follows.
Theorem 1. Under the symmetric noisy group testing setup with crossover probability ρ∈0,1
2, with k= Θ(pθ)
for some θ∈(0,1), there exists a two-stage adaptive group testing algorithm such that
n≤klog p
k
log 2 −H2(ρ)+klog k
1
2log 1
4ρ(1−ρ)(1 + o(1)) (8)
and such that Pe→0as p→ ∞.
Proof. We study the guarantees of the three steps in Algorithm 1, and the number of tests used for each one.
Step 1. It was shown in [8] that, for an arbitrarily small constant α1>0, there exists a non-adaptive group
testing algorithm returning some set b
S1of cardinality ksuch that
max{|b
S1\S|,|S\b
S1|} ≤ α1k, (9)
with probability approaching one, with the number of tests being at most
n1≤klog p
k
log 2 −H2(ρ)(1 + o(1)).(10)
In Appendix A, we recap the decoding algorithm and its analysis. The non-adaptive test design for this stage is the
ubiquitous i.i.d. Bernoulli design.
Step 2a. Let us condition in the first step being successful, in the sense that (9) holds. We claim that there
exists a non-adaptive algorithm that, when applied to the reduced ground set {1, . . . , p}\b
S1, returns b
S0
2acontaining
March 15, 2018 DRAFT
7
precisely the set of (at most α1k) defective items S\b
S1with probability approaching one, with the number of
samples behaving as
n2=Oα1klog p
α1k.(11)
If the number of defectives k1:= |S∩b
S1|in the reduced ground set were known, this would simply be an application
of the O(k1log p)scaling derived in [14] for the NCOMP algorithm. In Appendix C, we adapt the algorithm and
analysis of [14] to handle the case that k1is only known up to a constant factor.
In fact, in the present setting, we only know that k1∈[0, α1k], so we do not even know k1up to a constant
factor. To get around this, we apply a simple trick that is done purely for the purpose of the analysis: Instead of
applying the modified NCOMP algorithm directly to {1, . . . , p}\b
S1, apply it to the slightly larger set in which α1k
“dummy” defective items are included. Then, the number of defectives is in [α1k, 2α1k], and is known up to a
factor of two. We do not expect that this trick would ever be useful practice, but it is convenient for the sake of
the analysis.
Step 2b. Since we conditioned on the first step being successful, at most α1kof the kitems in b
S1are non-
defective. In the final step, we simply test each item in b
S1individually entimes, and declare the item positive if
and only if at least half of the outcomes are positive.
To study the success probability, we use a well-known Chernoff-based concentration bound for Binomial random
variables: If Z∼Binomial(N, q), then
P[Z≤nq0]≤e−N D2(q0kq), q0< q, (12)
where D2(q0kq) = q0log q0
q+ (1 −q0) log 1−q0
1−qis the binary KL divergence function.
Fix an arbitrary item j, and let e
Nj,1be the number of its entests that are positive. Since the test outcomes are
distributed as Bernoulli(1 −ρ)for defective jand Bernoulli(ρ)for non-defective j, we obtain from (12) that
Pe
Nj,1≤en
2≤e−enD2(1
2k1−ρ)=e−enD2(1
2kρ), j ∈S(13)
Pe
Nj,1≥en
2≤e−enD2(1
2kρ), j /∈S. (14)
Hence, we obtain from the union bound over the kitems in b
S1that
P[b
S0
2b6= (S∩b
S1)] ≤k·e−enD2(1
2kρ).(15)
For any η > 0, the right-hand side tends to zero as p→ ∞ under the choice
en=log k
D2(1
2kρ)(1 + η),(16)
which gives a total number of tests in Step 2b of
n2b=klog k
D2(1
2kρ)(1 + η).(17)
The proof is concluded by noting that ηcan be arbitrarily small, and writing D2(1
2kρ) = 1
2log 1
4ρ(1−ρ).
March 15, 2018 DRAFT
8
A weakness of Theorem 1 is that it does not achieve the threshold n=klog p
k
log 2−H2(ρ)(1 + o(1)) for any value of
θ > 0(see Figure 1), even though such a threshold is achievable for sufficiently small θeven non-adaptively [8].
We overcome this limitation via a refined three-stage algorithm in Section III.
B. Practical decoder
Of the three steps given in Algorithm 1 and the proof of Theorem 1, the only one that is computationally
challenging is the first, which uses an information-theoretic threshold decoder to identify Sup to a distance (cf.,
(5)) of d(S, b
S)≤α1k, for small α1>0. A similar approximate recovery result was also shown in [16] for separate
decoding of items, which is computationally efficient. The asymptotic threshold on nfor separate decoding of items
is only a log 2 factor worse than the optimal information-theoretic threshold [16], and this fact leads to the following
counterpart to Theorem 1.
Theorem 2. Under the symmetric noisy group testing setup with crossover probability ρ∈0,1
2, and k= Θ(pθ)
for some θ∈(0,1), there exists a computationally efficient two-stage adaptive group testing algorithm such that
n≤klog p
k
log 2 ·(log 2 −H2(ρ)) +klog k
1
2log 1
4ρ(1−ρ)(1 + o(1)) (18)
and Pe→0as p→ ∞.
The proof is nearly identical to that of Theorem 1, except that the required number of tests in the first stage is
multiplied by 1
log 2 in accordance with [16]. For brevity, we omit the details.
III. ACHIEVABILITY (RE FIN ED VERSION)
As mentioned previously, a weakness of Theorem 1 is that it only achieves the behavior n≤klog p
k
log 2−H2(ρ)(1 +
o(1)) (for which a matching converse is known [7]) in the limit as θ→0, even though this can be achieved even
non-adaptively for sufficiently small θ[8]. Since adaptivity provides extra freedom in the design, we should expect
the corresponding bounds to be at least as good as the non-adaptive setting.
While we can simply take the better of Theorem 1 and the exact recovery result of [8], this is a rather unsatisfying
solution, and it leads to a discontinuity in the asymptotic threshold (cf., Figure 1). It is clearly more desirable
to construct an adaptive scheme that “smoothly” transitions between the two. In this section, we attain such an
improvement by modifying Algorithm 1 in two ways. The resulting algorithm is outlined informally in Algorithm
2, and the modifications are as follows:
•In the first stage, instead of learning Sup to a distance of α1kfor some constant α1∈(0,1), we learn it up
to a distance of kγfor some γ∈(0,1). The non-adaptive partial recovery analysis of [8] requires non-trivial
modifications for this purpose; we provide the details in Appendix A.
•We split Step 2b of Algorithm 1 into two stages, one comprising Step 2b in Algorithm 2, and the other
comprising Step 3. The former of these identifies most of the defective items, and the latter resolves the rest.
March 15, 2018 DRAFT
9
It is worth noting that, at least using our analysis techniques, neither of the above modifications alone is enough to
obtain a bound that is always at least as good as the non-adaptive exact recovery result of [8]. We will shortly see,
however, that the two modifications combined do suffice.
Algorithm 2: Three-stage algorithm for noisy group testing (informal).
1. Apply the information-theoretic threshold decoder of [8] (see Appendix A) to the ground set {1, . . . , p}to find
an estimate b
S1of cardinality ksuch that
max{|b
S1\S|,|S\b
S1|} ≤ kγ(19)
with high probability, where γ∈(0,1).
2a. Apply a variation of NCOMP [14] (see Appendix C) to the reduced ground set {1, . . . , p}\b
S1to exactly identify
the false negatives from the first step. Let these items be denoted by b
S0
2a.
2b. Test each item in b
S1individually ˇntimes, and let b
S0
2b⊆b
S1contain the k−α2kitems that returned positive
the highest number of times, for some small α2>0.
3. Test the items in b
S1\b
S0
2b(of which there are α2k) individually entimes, and let b
S0
3contain the items that
returned positive at least
en
2times. The final estimate of the defective set is given by b
S:= b
S0
2a∪b
S0
2b∪b
S0
3.
The following theorem characterizes the asymptotic number of tests required.
Theorem 3. Under the symmetric noisy group testing setup with crossover probability ρ∈0,1
2, under the scaling
k= Θ(pθ)for some θ∈(0,1), there exists a three-stage adaptive group testing algorithm such that
n≤inf
γ∈(0,1),δ2∈(0,1) max nMI,1, nMI,2(γ, δ2), nConc (γ, δ2)+nIndiv (γ)(1 + o(1)) (20)
and Pe→0as p→ ∞, where:
•The standard mutual information based term is
nMI,1=klog p
k
log 2 −H2(ρ).(21)
•An additional mutual information based term is
nMI,2(γ, δ2) = 2
(log 2)(1 −2ρ) log 1−ρ
ρ·1
1−δ2·(1 −θ)klog p+ 2(1 −γ)klog k.(22)
•The term associated with a concentration bound is
nConc(γ , δ2) = 4(1 + 1
3δ2(1 −2ρ))
(log 2)δ2
2(1 −2ρ)2·(1 −γ)klog k. (23)
•The term associated with individual testing is
nIndiv(γ) = γk log k
D2(ρk1−ρ).(24)
While the theorem statement is somewhat complex, it is closely related to other simpler results on group testing:
March 15, 2018 DRAFT
10
•In the limit as γ→0, the term max nMI,1, nMI,2(δ2), nConc(γ , δ2)corresponds to the condition for exact
recovery derived in [8]. Since nIndiv(γ)becomes negligible as γ→0, this means that we have the above-
mentioned desired property of being at least as good as the exact recovery result.
•Taking γ→1and δ2→0in a manner such that 1−γ
δ2
2→0, we recover a strengthened version of Theorem 1
with D1
2k1−ρ) = 1
2log 1
4ρ(1−ρ)increased to Dρk1−ρ).2
The parameter δ2controls the trade-off between the concentration behavior associated with nConc and the mutual
information term associated with nMI,2.
A. Proof of Theorem 3
The proof follows similar steps to those of Theorem 1, considering the four steps of Algorithm 2 separately.
Step 1. We show in Appendix A that the approximate recovery result of [8] can be extended as follows: There
exists a non-adaptive algorithm recovering an an estimate b
S1of cardinality ksuch that d(S, b
S1)≤kγwith probability
approaching one, provided that the number of tests n1satisfies
n1≥max nMI,1, nMI,2(γ, δ2), nConc (γ, δ2)·(1 + o(1)) (25)
for some δ2∈(0,1), under the definitions in (21)–(23). This algorithm and its corresponding estimate b
S1constitute
the first step.
Step 2a. The algorithm and analysis for Stage 2 are identical to that of Theorem 1: We use the variation of
NCOMP given in Appendix C to identify all defective items in {1, . . . , p} \ b
S1with probability approaching one,
while only using O(kγlog p) = o(klog p)tests.
Step 2b. For this step, we need to show that the set b
S0
2bconstructed in Algorithm 1 only contains defective
items. Recall that this set is constructed by testing each item in b
S1individually ˇntimes, and keeping items that
returned positive the highest number of times. Since b
S0
2bcontains |b
S1| − α2kitems, requiring all of these items
to be defective is equivalent to requiring that the set of α2kitems with the smallest number of positive outcomes
includes the kγ(or fewer) non-defective items in b
S1. For any ζ > 0, the following two conditions suffice for this
purpose:
•Event A1: All non-defective items in b
S1return positive less than ζˇntimes;
•Event A2: At most α2k−kγdefective items return positive less than ζˇntimes.
Here we assume that kit sufficiently large so that α2k > kγ, which is valid since γ < 1and α2>0are constant.
Fix an arbitrary item j, and let ˇ
Nj,1be the number of its ˇntests that are positive. Since the test outcomes are
distributed as Bernoulli(1 −ρ)for defective jand Bernoulli(ρ)for non-defective j, we obtain from (12) that
Pˇ
Nj,1≤ζˇn≤e−ˇnD2(ζk1−ρ), j ∈S(26)
Pˇ
Nj,1≥ζˇn≤e−ˇnD2(ζkρ), j /∈S. (27)
2By letting the first stage of Algorithm 2 use separate decoding of items [16], one can obtain a strengthened version of Theorem 2 with the
same improvement. This result is omitted for the sake of brevity, as the main purpose of the refinements given in this section is to obtain a
bound that is always at least as good as the non-adaptive information-theoretic bound of [8].
March 15, 2018 DRAFT
11
Hence, we obtain from the union bound over the non-defective items in b
S1that
P[Ac
1]≤kγ·e−ˇnD2(ζkρ),(28)
which is upper bounded by δ3>0as long as
ˇn≥log kγ
δ3
D2(ζkρ).(29)
Moreover, regarding the event A2, the average number of defective items that return positive less than ζˇntimes is
upper bounded by ke−ˇnD2(ζk1−ρ)(recall that |b
S1|=k), and hence, Markov’s inequality gives
P[Ac
2]≤ke−ˇnD2(ζk1−ρ)
α2k−kγ.(30)
This is upper bounded by
k
log k
α2k−kγ→0as long as ˇn≥log log k
D2(ζk1−ρ). This, in turn, behaves as o(log k)for any
ζ < 1−ρ. Hence, we are left with only the condition on ˇnin (29), and choosing ζarbitrarily close to 1−ρmeans
that we only need the following to hold for arbitrarily small η > 0:
ˇn≥γlog k
D2(1 −ρkρ)(1 + η),(31)
since log kγ
δ3= (γlog k)(1 + o(1)) no matter how small δ3is. Multiplying by k(i.e., the number of items that are
tested individually ˇntimes) and noting that D(1 −ρkρ) = D(ρk1−ρ), we deduce that the number of tests in this
stage is asymptotically at most nIndiv(γ), defined in (24).
Step 3. This step is the same as that of Step 2b in Algorithm 1, but we are now working with α2kitems rather
than kitems. As a result, the number of tests required is O(α2klog k), meaning that the coefficient to klog kcan
be made arbitrarily small by a suitable choice of α2.
IV. CONVERSE
To our knowledge, the best-known existing converse bound for the symmetric noise model in the adaptive setting
is that of Baldissini et al. [7], shown in (6). On the other hand, the achievability bound of Theorem 1 contains a
klog kterm, meaning that the gap between the achievability and converse grows unbounded as θ→1under the
scaling k= Θ(pθ). In this section, we provide a novel converse bound revealing that the Ω(klog k)behavior is
unavoidable.
There is a minor caveat to this converse result: We have not been able to prove it in the case that Sis known
to have cardinality exactly k, but rather, only in the case that it is known to have cardinality either kor k−1.
We strongly conjecture that this distinction has no impact on the fundamental limits; we argue in Appendix B that
Theorem 1 remains true even when kis only known up to a multiplicative 1 + o(1) term, and Theorem 3 remains
true when kis only known up to an additive o(kγ)term. Since we assume that k→ ∞, these assumptions are
much weaker than the assumption |S| ∈ {k−1, k}
To make the model definition more precise, fix k≤2p, and define
Sk,p =S⊆ {1, . . . , p}:|S|=k,(32)
March 15, 2018 DRAFT
12
and similarly for Sk−1,p. We consider the following distribution for the random defective set:
S∼Uniform(Sk,p ∪ Sk−1,p).(33)
Under this slightly modified model, we have the following.
Theorem 4. Consider the symmetric noisy group testing setup with crossover probability ρ∈0,1
2,Sdistributed
according to (33), and k→ ∞ with k≤p
2. For any adaptive algorithm, in order to achieve Pe→0, it is necessary
that
n≥max klog p
k
log 2 −H2(ρ),klog k
log 1−ρ
ρ(1 −o(1)).(34)
The first term is precisely (6), so our novelty is in deriving the second term. This result provides the first counter-
example to the natural conjecture that the optimal number of tests is klog p
k
log 2−H2(ρ)(1 + o(1)) whenever k= Θ(pθ)
with θ∈(0,1). Indeed, the Ω(klog k)lower bound reveals that the constant pre-factor to klog p
kmust grow
unbounded as θ→1.
It is interesting to observe the behavior of Theorems 1, 3, and 4 in the limit as ρ→0. As one should expect,
under the scaling k= Θ(pθ)for fixed θ∈(0,1), both the achievability and converse bounds (see (8) and (34))
tend towards the noiseless limit klog2p
k(1 + o(1)) as ρ→0. Moreover, the achievability and converse bounds
scale similarly with respect to ρ, in the sense that the klog kterm is scaled by Θ1
log 1
ρin both cases.
In fact, if we consider the refined achievability bound (Theorem 3), we can make a stronger claim. If we take
γ→1and θ→1simultaneously, then the bound in (20) is asymptotically equivalent to nIndiv(1), since nMI,1
scales as klog p
kklog k, whereas the constant factors in nMI,2and nConc vanish (see (21)–(23)). Hence, we are
only left with nIndiv(1) in (24), and if ρis small, then the denominator D2(ρk1−ρ) = ρlog ρ
1−ρ+ (1 −ρ) log 1−ρ
ρ
is approximately equal to log 1
ρ. The exact same statement is true for the denominator in (34), and hence, the
achievability and converse bounds exhibit matching constant factors. Specifically, this statement holds when the
order of the limits is first n→ ∞, then θ→1, then ρ→0. This fact explains the near-identical behavior of the
achievability and converse in Figure 1 for θclose to one in the low noise setting, ρ= 10−4.
On the other hand, for fixed θ∈(0,1), the logarithmic decay of the Θ1
log 1
ρfactor to zero is quite slow, which
explains the non-negligible deviation from the noiseless threshold (i.e., a straight line at height 1) in Figure 1, both
in the high-noise and low-noise cases.
Another interesting consequence of Theorem 4 is that in the linear regime k= Θ(p), one requires n= Ω(plog p)
in the presence of noise. This is in stark contrast to the noiseless setting, where individual testing trivially identifies
Swith only ptests. In the non-adaptive setting, establishing the necessity of n= Ω(plog p)is straightforward:3If
a genie reveals S∪ {j}to the decoder for some j /∈S, then the decoder can only identify the final non-defective
jby testing each item Ω(log p)times. However, this argument does not extend to the adaptive setting.
3This argument is due to Sidharth Jaggi, whose insight is gratefully acknowledged.
March 15, 2018 DRAFT
13
Instead, the proof of Theorem 4 is inspired by that of a converse bound for the top-marm identification problem
from the multi-armed bandit (MAB) literature [31]. Compared with the latter, our setting has a number of distinct
features that are non-trivial to handle:
•In group testing, one does not necessarily test one item at a time, whereas in the MAB setting of [31], one
pulls one arm at a time.
•In contrast with [31], we do not consider a minimax lower bound, but rather, a Bayesian lower bound for a
given distribution on S. The latter is more difficult, in the sense that a Bayesian lower bound implies a minimax
lower bound but not vice versa.
•In our setting, the status of each item is binary-valued (defective or non-defective), whereas the construction
of a hard MAB problem in [31] consists of three distinct types of items (or “arms” in the MAB terminology),
corresponding to high reward, medium reward, and low reward.
We now proceed with the proof.
A. Proof of Theorem 4
We assume without loss of optimality that any given test X(i)is deterministic given Y(1), . . . , Y (i−1), and that
the final estimate b
Sis similarly deterministic given the test outcomes. To see that it suffices to consider this case,
we note that
P[error] = EP[error |A]≥max
A
P[error |A =A],(35)
where Adenotes a randomized algorithm (i.e., combination of test design and decoder), and Ais a realization of
Acorresponding to a deterministic algorithm.
Suppose that after Sis randomly generated according to (33), a genie reveals S∪Tto the decoder, where Tis a
uniformly random set of non-defective items such that |S∪T|= 2k(i.e., Thas cardinality 2k−|S|∈{k, k + 1}).
Hence, we are left with an easier group testing problem consisting of 2kitems, k−1or kof which are defective.
Since the prior distribution on Sin (33) is uniform, we see that conditioned on the ground set of size 2k, the
defective set Sis uniform on the 2k
k+2k
k−1possibilities.
Without loss of generality, assume that the 2krevealed items are {1,...,2k}, and hence, the new distribution of
Sgiven the information from the genie is
S∼Uniform(Sk,2k∪ Sk−1,2k).(36)
We first study the error probability conditioned on a given defective set S⊂ {1,...,2k}having cardinality k. For
any such fixed choice, we denote probabilities and expectations by PSand ES.
Fix ∈(0,1), and for each j∈S, let Njbe the (random) number of tests containing item jand no other
defective items. Since Pj∈SNj≤nwith probability one, we have Pj∈SES[Nj]≤n, meaning that at most
(1 −)kof the j∈Shave ES[Nj]≥n
(1−)k. For all other j, we have ES[Nj]≤n
(1−)k, and Markov’s inequality
gives PS[Nj≥(1+)n
k]≤1
(1−)(1+)=1
1−2. We have therefore proved the following.
March 15, 2018 DRAFT
14
Lemma 1. For any > 0, and any set S⊂ {1,...,2k}of cardinality k, there exist at least k items j∈Ssuch
that PS[Nj≥(1+)n
k]≤1
1−2.
The following lemma, consisting of a change of measure between the probabilities under two different defective
sets, will also be crucial. Recalling that we are considering test designs that are deterministic given the past samples,
we see that Njis a deterministic function of Y, so we write the corresponding function as nj(y). Moreover, we
let YSbe the set of ysequences that are decoded as S, and we write P[y]and P[YS]as shorthands for P[Y=y]
and P[Y∈YS], respectively.
Lemma 2. Given Sof cardinality k, for any j∈S, and any output sequence ysuch that nj(y)≤(1+)n
k, we have
PS\{j}[y]≥PS[y]ρ
1−ρ(1+)n
k.(37)
Consequently, if j∈Sis such that PS[Nj≥(1+)n
k]≤1
1−2, then
PS\{j}[YS]≥PS[YS]−1
1−2ρ
1−ρ(1+)n
k.(38)
Proof. Again using the fact that the test designs that are deterministic given the past samples, we can write
PS[y] =
n
Y
i=1
PS[y(i)|y(1), . . . , y(i−1)](39)
=
n
Y
i=1
PS[y(i)|x(i)],(40)
where x(i)∈ {0,1}pis the i-th test. Note that (40) holds because Y(i)depends on the previous samples only
through X(i). An analogous expression also holds for PS\{j}[y].
Due to the “or” operation in the observation model (2), the only tests for which the outcome probability changes
as a result of removing jfrom Sare those for which jwas the unique defective item tested. We have at most
(1+)n
ksuch tests by assumption, and each of them causes the probability of y(i)(given x(i)) to be multiplied or
divided by ρ
1−ρ. Since ρ < 0.5, we deduce the lower bound in (37), corresponding to the case that all (1+)n
kof
them are multiplied by this factor.
To prove the second part, we write
PS\{j}[YS]≥PS\{j}Y∈ YS∩Nj≤(1 + )n
k(41)
≥PSY∈ YS∩Nj≤(1 + )n
kρ
1−ρ(1+)n
k(42)
≥PS[YS]−1
1−2ρ
1−ρ(1+)n
k,(43)
where (42) follows from the first part of the lemma, and (43) follows by writing P[A∩B]≥P[A]−P[Bc].
March 15, 2018 DRAFT
15
The idea behind applying this lemma is that if a given yis decoded to S, then it cannot be decoded to S\{j};
hence, if a given sequence ycontributes to PS[no error], then it also contributes to PS\{j}[error]. We formalize
this idea as follows. Recalling that Sk,2kis the set of all subsets of {1,...,2k}of cardinality k, we have
X
S0∈Sk−1,2k
PS0[error] ≥X
S0∈Sk−1,2kX
j /∈S0
PS0[YS0∪{j}](44)
=X
S0∈Sk−1,2kX
j /∈S0X
S∈Sk,2k
1S=S0∪ {j}PS0[YS](45)
=X
S0∈Sk−1,2k
2k
X
j=1 X
S∈Sk,2k
1S=S0∪ {j}PS0[YS](46)
=X
S∈Sk,2kX
j∈SX
S0∈Sk−1,2k
1S=S0∪ {j}PS0[YS](47)
=X
S∈Sk,2kX
j∈S
PS\{j}[YS],(48)
where (44) follows since S0differs from S0∪ {j}, (45) follows since the indicator function is only equal to one
for S=S0∪ {j}, (46), (46) follows since the extra jincluded in the middle summation (i.e., j∈S) also make
the indicator function equal zero, (47) follows by re-ordering the summations and noting that the indicator function
equals zero when j /∈S, and (48) follows by only keeping the S0for which the indicator function is one.
The following lemma is based on lower bounding (48) using Lemma 2.
Lemma 3. If 1
|Sk,2k|PS∈Sk,2k
PS[error] ≤δfor some δ > 0, then
1
|Sk−1,2k|X
S0∈Sk−1,2k
PS0[error] ≥k ·1−2δ−1
1−2·ρ
1−ρ(1+)n
k(49)
for any ∈(0,1).
Proof. Since 1
|Sk,2k|PS∈Sk,2k
PS[error] ≤δand |Sk,2k|=2k
k, there must exist at least 1
22k
kdefective sets
S∈ Sk,2ksuch that PS[error] ≤2δ. We lower bound the first summation in (48) by a summation over such
S, and for each one, we lower bound the summation over j∈Sby the set of size at least k given in Lemma
1. For the choices of Sand jthat are kept in this lower bound, the summand PS\{j}[YS]is lower bounded by
1−2δ−1
1−2 ρ
1−ρ(1+)n
kby the second part of Lemma 2 (with PS[YS] = PS[no error] ≥1−2δ). Putting it all
together, we obtain
X
S0∈Sk−1,2k
PS0[error] ≥1
22k
k·k ·1−2δ−1
1−2·ρ
1−ρ(1+)n
k.(50)
Using the identity 2k
k=2k
k−1·2k−k
k= 22k
k−1, this yields
1
2k
k−1X
S0∈Sk−1,2k
PS0[error] ≥k ·1−2δ−1
1−2·ρ
1−ρ(1+)n
k,(51)
thus proving the lemma.
Observe that for sufficiently small δ, we can choose to be arbitrarily small while still ensuring that 1−2δ−1
1−2>
0. Moreover, P[S∈ Sk,2k]and P[S∈ Sk−1,2k]are both bounded away from zero under the distribution in (33). Most
March 15, 2018 DRAFT
16
0
1
0
1
0
1
0
1
⇢
1⇢
1
⇢
1
1⇢
0
1
0
1
0
1
0
1
⇢
1⇢
1
⇢
1
1⇢
Figure 2: Z-channel (Left) and reverse Z-channel (Right).
importantly, the term kρ
1−ρ(1+)n
kappearing in (49) is lower bounded by δ0>0as long as n≤klog(kδ0)
(1+) log 1−ρ
ρ
. Since
may be arbitrarily small and log(kδ0) = (log k)(1 + o(1)), we deduce that the following condition is necessary
for attaining arbitrarily small error probability:
n≥klog k
log 1−ρ
ρ
(1 −η),(52)
where η > 0is arbitrarily small. This completes the proof of Theorem 4.
V. OTHER OBSERVATI ON MO DE LS
While we have focused on the symmetric noise model (2) for concreteness, most of our algorithms and analysis
techniques can be extended to other observation models. In this section, we present some of the resulting bounds
for three different models: The noiseless model (1), the Z-channel model,
PY|U(0|0) = 1, PY|U(1|0) = 1,(53)
PY|U(0|1) = ρ, PY|U(1|1) = 1 −ρ, (54)
and the reverse Z-channel model,
PY|U(0|0) = 1 −ρ, PY|U(1|0) = ρ, (55)
PY|U(0|1) = 0, PY|U(1|1) = 0,(56)
where in both cases we define U=∨j∈SXj. That is, we pass the noiseless observation through its corresponding
binary channel; see Figure 2 for an illustration. Under the Z-channel model, positive tests indicate with certainty
that a defective item is included, whereas under the reverse Z-channel model, negative tests indicate with certainty
that no defective item is included. While the two channels have the same capacity, it is interesting to ask whether
one of the two is fundamentally more difficult to handle in the context of group testing.
A. Noiseless setting
In the noiseless setting, the final step of Algorithm 1 is much simpler: Simply test the items in b
S2individually
once each. This only requires ktests, and succeeds with certainty, yielding the following.
March 15, 2018 DRAFT
17
Theorem 5. Under the scaling k= Θ(pθ)for some θ∈(0,1), there exists a two-stage algorithm for noiseless
adaptive group testing that succeeds with probability approaching one, with a number of tests bounded by
n≤klog2
p
k(1 + o(1)).(57)
Moreover, there exists a computationally efficient two-stage algorithm that succeeds with probability approaching
one, with a number of tests bounded by
n≤1
log 2 klog2
p
k(1 + o(1)).(58)
The upper bound (57) is tight, as it matches the so-called counting bound, e.g., see [32]. To our knowledge, the
minimum number of stages used to attain this bound previously for all θ∈(0,1) was four [26]. It is worth noting,
however, that the algorithm of [26] has low computational complexity, unlike Algorithm 1.
The bound (57) does not contradict the converse bound of Mézard and Toninelli [28]; the latter states that any
two-stage algorithm with zero error probability must have an average number of tests of 1
log 2 klog2p
k(1 + o(1))
or higher. In contrast, (57) corresponds to vanishing error probability and a fixed number of tests.
B. Z-channel model
Under the Z-channel model, the capacity-based converse bound of [7] turns out to be tight for all θ∈(0,1), as
stated in the following.
Theorem 6. Under the noisy group testing model with Z-channel noise having parameter ρ∈(0,1), and a number
of defectives satisfying k= Θ(pθ)for some θ∈(0,1), there exists a three-stage adaptive algorithm achieving
vanishing error probability with
n≤klog p
k
C(ρ)(1 + o(1)),(59)
where C(ρ)is the capacity of the Z-channel in nats.
Proof. The analysis is similar to that of the symmetric noise model, so we omit most of the details.
In the first stage, we use i.i.d. Bernoulli testing with parameter ν > 0chosen to ensure that the induced
distribution PUof U=∨j∈SXiequals the capacity-achieving input distribution of the Z-channel. Under this
choice, a straightforward extension of the analysis of [8] (see the final part of Appendix A for details) reveals that
we can find a set b
S1of cardinality ksuch that d(S, b
S1)≤α1kwith nsatisfying (59), where dis defined in (5),
and α1>0is arbitrarily small.
The second stage is similar to steps 2a and 2b in Algorithm 2. The modifications required in step 2a are stated
in Appendix C, and step 2b is in fact simpler: We include a given item in b
S0
2bif and only if any of its tests
returned positive. Due to the nature of the Z-channel, no non-defectives are included in b
S0
2b. On the other hand, the
probability of a positive item returning negative on all ˇntests is given by ρˇn, and is asymptotically vanishing if
ˇn= log log k(say). Hence, by Markov’s inequality, we have with probability approaching one that the number of
defective items that fail to be placed in b
S0
2bis smaller than α1kwith probability approaching one. Moreover, the
required number of tests is O(klog log k), which is asymptotically negligible.
March 15, 2018 DRAFT
18
In the third stage, as in Algorithm 2, we test each item individually entimes. Here, however, we let b
S0
3contain
the items that returned positive in any test. There are again no false positives, and a given defective item is a false
negative with probability ρ
en. By the union bound and the fact that there are at most 2α1kitems and α1kdefective
items, we readily deduce vanishing error probability as long as en=O(log k), meaning the total number of tests is
O(α1klog k). This is asymptotically negligible, since α1is arbitrarily small.
This result shows that under Z-channel noise, the conjecture of the optimal (inverse) coefficient to klog p
kequaling
the channel capacity (e.g., see [7]) is true for all θ∈(0,1), in stark contrast to the symmetric noise model.
It is worth noting that the converse analysis of Section IV does not apply to the Z channel model. This is because
any analog of Lemma 2 is impossible: If there exists a test outcome yi= 1 where jis the only defective included,
then PS\{j}[y]=0, meaning we cannot hope for an inequality of the form (37).
C. Reverse Z-channel model
Under the reverse Z-channel model, we have the following analog of the converse bound in Theorem 4.
Theorem 7. Consider the noisy group testing setup with reverse Z-channel noise having parameter ρ∈(0,1),S
distributed according to (33), and k→ ∞ with k≤p
2. For any adaptive algorithm, in order to achieve Pe→0, it
is necessary that
n≥max klog p
k
C(ρ),klog k
log 1
ρ(1 −o(1)),(60)
where C(ρ)is the capacity of the Z-channel in nats.
Proof. The first bound in (60) is the capacity-based bound from [7]. On the other hand, the second bound follows
from a near-identical analysis to the proof of Theorem 4, with the only difference being that ρ
1−ρis replaced by ρ
in (37) and the subsequent equations that make use of (37).
We note that unlike the Z-channel, the cases where one of PS[y]and PS\{j}[y]is zero and the other is non-zero
are not problematic. Specifically, this only occurs when PS[y]=0, and in this case, any inequality of the form (37)
is trivially true.
Interestingly, this result shows that reverse Z-channel noise is more difficult to handle than Z-channel noise by
an arbitrarily large factor as θgets closer to one, even though the two channels have the same capacity. In this
sense, we deduce that at least under optimal adaptive testing, having reliable positive test outcomes is preferable to
having reliable negative test outcomes.
VI. CONCLUSION
We have developed both information-theoretic limits and practical performance guarantees for noisy adaptive group
testing. Some of the main implications of our results include the following:
•Under the scaling k= Θ(pθ), for most most θ∈(0,1), our information-theoretic achievability guarantees for
the symmetric noise model are significantly better than the best known non-adaptive achievability guarantees,
and similarly when it comes to practical guarantees.
March 15, 2018 DRAFT
19
•Our converse for the symmetric noise model reveals that n= Ω(klog k)is necessary, and hence, the implied
constant to n= Θklog p
kmust grow unbounded as θ→1. This phenomenon also holds true for the reverse
Z-channel noise model, but not for the Z-channel noise model.
•Our bounds are tight or near-tight in several cases of interest, including small values of θand low noise levels.
Moreover, in the noiseless case, we obtain the optimal threshold using a two-stage algorithm; previously the
smallest known number of stages was four.
It is worth noting that our two-stage (or three-stage) algorithm and its analysis remain applicable when any non-
adaptive algorithm is used in the first stage, as long as it identifies a suitably high fraction of the defective set. Hence,
improved practical or information-theoretic guarantees for partial recovery in the non-adaptive setting immediately
transfer to improved exact recovery guarantees in the adaptive setting.
APPENDIX
A. Non-Adaptive Partial Recovery Result with Vanishing Number of Errors
The analysis of [8] considers the case that the number of allowed errors scales as dmax = Θ(k). In this section,
we adapt the analysis therein to the case dmax = Θ(kγ)for some γ∈(0,1). This generalization is useful for the
refined achievability bound given in Section III (cf., Theorem 3), and is also of interest in its own right.
1) Notation: Recall that Sis uniform on the set of subsets of {1, . . . , p}having a given cardinality k. As in [8],
we consider non-adaptive i.i.d. Bernoulli testing, where each item is placed in a given test with probability ν
kfor
some ν > 0. We focus our attention on ν= log 2, though we will still write νfor the parts of the analysis that
apply more generally. The test matrix is denoted by X∈ {0,1}n×p(i.e., the i-th row is X(i)), and the notation Xs
denotes the sub-matrix obtained by keeping only the columns indexed by s{1, . . . , p}.
Next, we recall some notation from [8]. It will prove convenient to work with random variables that are implicitly
conditioned on a fixed value of S, say s={1, . . . , k}. We write PY|Xsfor the conditional test outcome probability,
where Xsis the subset of the test vector Xindexed by s. Moreover, we write
PXsY(xs, y) := Pk
X(xs)PY|Xs(y|xs)(61)
PXsY(xs,y) := Pn×k
X(xs)Pn
Y|Xs(y|xs),(62)
where Pn
Y|Xs(·|·)is the n-fold product of PY|Xs(·|·), and P(·)
Xdenotes the i.i.d. Bernoulliν
kdistribution for a
vector or matrix of the size indexed in the superscript. The random variables (Xs, Y )and (Xs,Y)are distributed
as
(Xs, Y )∼PXsY(63)
(Xs,Y)∼PXsY,(64)
and the remaining entries of the measurement matrix are distributed as Xsc∼Pn×(p−k)
X, independent of (Xs,Y).
March 15, 2018 DRAFT
20
In our analysis, we consider partitions of the defective set sinto two sets sdif 6=∅and seq. One can think of seq
as corresponding to an overlap s∩sbetween the true set sand some incorrect set s, with sdif corresponding to the
indices s\sin one set but not the other. For a fixed defective set s, and a corresponding pair (sdif, seq ), we write
PY|Xsdif Xseq (y|xsdif , xseq ) := PY|Xs(y|xs),(65)
where PY|Xsis the marginal distribution of (62). This form of the conditional error probability allows us to introduce
the marginal distribution
PY|Xseq (y|xseq ) := X
xsdif
P`
X(xsdif )PY|Xsdif Xseq (y|xsdif , xseq ),(66)
where `:= |sdif |. Using the preceding definitions, we introduce the information density [33]
ın(xsdif ;y|xseq ) :=
n
X
i=1
ı(x(i)
sdif ;y(i)|x(i)
seq )(67)
ı(xsdif ;y|xseq ) := log PY|Xsdif Xseq (y|xsdif , xseq )
PY|Xseq (y|xseq )(68)
where (·)(i)denotes the i-th entry (respectively, row) of a vector (respectively, matrix). Averaging (68) with respect
to (Xs, Y )in (63) yields a conditional mutual information, which we denote by
I`:= I(Xsdif ;Y|Xseq ),(69)
where `:= |sdif |; by symmetry, the mutual information for each (sdif, seq )depends only on this quantity.
2) Choice of decoder: We use the same information-theoretic threshold decoder as that in [8]: Fix the constants
{γ`}k
`=dmax+1 , and search for a set sof cardinality ksuch that
ın(Xsdif ;Y|Xseq )≥γ|sdif |,∀(sdif , seq)such that |sdif |> dmax .(70)
If multiple such sexist, or if none exist, then an error is declared. This decoder is inspired by analogous thresholding
techniques from the channel coding literature [34], [35].
3) Useful existing results: We build heavily on several intermediate results given in [17], stated as follows:
•Initial bounds. Since the analysis is the same for any defective set sof cardinality k, we assume without loss
of generality that s={1, . . . , k}. The initial non-asymptotic bound of [8] takes the form
Pe(dmax)≤P[
(sdif ,seq) : |sdif |>dmax
ın(Xsdif ;Y|Xseq )≤log p−k
|sdif |+ log k
δ1k
|sdif |+δ1(71)
for any δ1>0. A simple consequence of this non-asymptotic bound is the following: For any positive constants
{δ2,`}k
`=dmax+1 , if the number of tests is at least
n≥max
`=dmax+1,...,k
log p−k
`+ log k
δ1k
`
I`(1 −δ2,`),(72)
and if each information density satisfies a concentration bound of the form
Pın(Xsdif ;Y|Xseq )≤n(1 −δ2,`)I`≤ψ`(n, δ2,` ),(73)
March 15, 2018 DRAFT
21
for some functions {ψ`}k
`=dmax+1 , then
Pe(dmax)≤
k
X
`=dmax+1 k
`ψ`(n, δ2,`) + δ1.(74)
•Characterization of mutual information. Under the symmetric noise model with crossover probability ρ∈
0,1
2, the conditional mutual information behaves as follows:
–If `
k→0, then
I`=e−νν`
k(1 −2ρ) log 1−ρ
ρ(1 + o(1)).(75)
–If `
k→α∈(0,1], then
I`=e−(1−α)νH2e−αν ? ρ−H2(ρ)(1 + o(1)).(76)
•Concentration bounds. The following concentration bounds provide explicit choices for ψ`satisfying (73):
–For all `and δ > 0, we have
Phın(Xsdif ;Y|Xseq )−nI`≥nδi≤2 exp −δ2n
4(8 + δ)(77)
for all (sdif , seq)with |sdif |=`.
–If `
k→0, then for any > 0and δ2>0(not depending on p), the following holds for sufficiently large p:
Phın(Xsdif ;Y|Xseq )≤nI`(1 −δ2)i≤exp −n`
ke−ννδ2
2(1 −2ρ)2
2(1 + 1
3δ2(1 −2ρ))(1 −).(78)
for all (sdif , seq)with |sdif |=`.
With these tools in place, we proceed by obtaining an explicit bound on the number of tests for the case dmax =
Θ(kγ).
4) Bounding the error probability: We split the summation over `in (74) into two terms:
T1:=
k
√log k
X
`=dmax+1 k
`ψ`(n, δ(1)
2), T2:=
k
X
`=k
√log k
k
`ψ`(n, δ(2)
2),(79)
where we have let δ2,` equal a given value δ(1)
2∈(0,1) for all `in the first sum, and a different value δ(2)
2∈(0,1)
for all `in the second sum.
To bound T1, we consider ψ`(n, δ2)equaling the right-hand side of (78). Letting c(δ2) = e−ννδ2
2(1−2ρ)2
2(1+ 1
3δ2(1−2ρ))
for brevity, we have
T1≤kmax
`=dmax+1,..., k
√log kk
`e−n·`
k·c(δ(1)
2),(80)
where we have upper bounded the summation defining T1by ktimes the maximum. Re-arranging, we find in order
to attain T1≤δ1, it suffices that
n≥max
`=dmax+1,..., k
√log k
1
c(δ(1)
2)·k
`·log k
`+ log k
δ1.(81)
March 15, 2018 DRAFT
22
Writing log k
`=`log k
`(1 + o(1)), this simplifies to
n≥max
`=dmax+1,..., k
√log k
1
c(δ(1)
2)·klog k
`+k
`log k
δ1(82)
=1
c(δ(1)
2)·(1 −γ)klog k(1 + o(1)),(83)
since the maximum is achieved by the smallest value dmax + 1 = Θ(kγ), and for that value, the second term is
asymptotically negligible compared to the first. Substituting the definition of c(·), we obtain the condition
n≥2(1 + 1
3δ2(1 −2ρ))
e−ννδ2
2(1 −2ρ)2·(1 −γ)klog k(1 + o(1)) (84)
To bound T2, we consider ψ`(n, δ2)equaling the right-hand side of (77) with δ=δ2I`. Again upper bounding
the summation by ktimes the maximum, and defining c0(δ2) = δ2
2
4(8+δ2I`), we obtain
T2≤kmax
`=k
√log k,...,k k
`·2 exp −c0(δ2)I2
`n.(85)
It follows that in order to attain T2≤δ1, it suffices that
n≥1
c0(δ(2)
2)I2
`
log 2k·k
`
δ1
.(86)
By the mutual information characterizations in (75)–(76), we have c0(δ(2)
2) = Θ(1) for any δ(2)
2∈(0,1), and
I2
`= Θ`
k2. By also writing log 2k·(k
`)
δ1= Θ`log k
`, we find that (86) takes the form n= Ωk2
`log k
`. The
most stringent condition is then provided by the smallest value `=k
√log k, yielding n= Ωk·log log k·√log k.
Thus, T2vanishes for any scaling of the form n= Ωklog p
k, since log p
k= Θ(log p)in the sub-linear regime
k= Θ(pθ)with θ∈(0,1).
5) Characterizing the mutual-information based condition (72):Recall that we require the number of tests to
satisfy (72). For the values of `corresponding to T1in (79), we have chosen δ2,` =δ(1)
2, and the mutual information
characterization (75) yields the condition
n≥max
`=dmax+1,..., k
√log k
klog p
`+klog k
`+k
`log k
δ1
e−νν(1 −2ρ) log 1−ρ
ρ(1 −δ(1)
2)(1 + o(1)),(87)
where we have applied log p−k
`=`log p
`(1 + o(1)) and log k
`=`log k
`(1 + o(1)) for `=o(k). Writing
klog p
`+klog k
`=klog p
k+klog k2
`2and recalling that k= Θ(pθ)and dmax = Θ(kγ), we find that (87) simplifies
to
n≥(1 −θ)klog p+ 2(1 −γ) log k
e−νν(1 −2ρ) log 1−ρ
ρ(1 −δ(1)
2)(1 + o(1)),(88)
since the maximum over `is achieved by the smallest value, `=dmax + 1 = Θ(kγ).
For the `values corresponding to T2in (79), the condition (72) was already simplified in [8]. It was shown that
under the choice ν= log 2, the dominant condition is that of the highest value, `=k, and the resulting condition
on the number of tests is
n≥klog p
k
(log 2 −H2(ρ))(1 −δ(2)
2)(1 + o(1)).(89)
March 15, 2018 DRAFT
23
6) Wrapping up: We obtain the final condition on nby combining (84), (88), and (89). We take δ(2)
2to be
arbitrarily small, while renaming δ(1)
2to δ2and letting it remain a free parameter. Also recalling the choice ν= log 2,
we obtain the following generalization of the partial recovery bound given in [8].
Theorem 8. Under the symmetric noise model (2), in the regime k= Θ(pθ)and dmax = Θ(kγ)with θ, γ ∈(0,1),
there exists a non-adaptive group testing algorithm such that Pe→0as p→ ∞ with a number of tests satisfying
n≤inf
δ2∈(0,1) max nMI,1, nMI,2(γ, δ2), nConc (γ, δ2)(1 + o(1)),(90)
where nMI,1,nMI,2, and nConc are defined in (21)–(23).
Variation for the Z-channel. For general γ∈(0,1), the preceding analysis is non-trivial to extend to the Z-
channel noise model, which we consider in Section V. However, it is relatively easy to obtain a partial recovery
result for the case dmax = Θ(k), and such a result suffices for our purposes. We outline the required changes here.
We continue to assume that the test matrix Xis i.i.d. Bernoulli, but now the probability of a given entry being one
is ν
kfor some ν > 0to be chosen later.
As was observed in [8], the analysis is considerably simplified by the fact that we do not need to consider the
case `
k→0. This means that we do not need to consider the rely exclusively on (77), which is known to hold for
any binary-output noise model [8]. Consequently, one finds that the only requirement on nis that (72) holds, with
the conditional mutual information I`=I(Xsdif ;Y|Xseq )suitably modified due to the different noise model. By
some asymptotic simplifications and the fact that `= Θ(k)for all `under consideration, this condition simplifies
to
n≥max
`>dmax
`log p
k
I`
(1 + o(1)).(91)
Next, we note that an early result of Malyutov and Mateev [13] (see also [36]) implies that `
I`is maximized at
`=k. For completeness, we provide a short proof. Assuming without loss of generality that s={1, . . . , k}, and
letting Xj0
jdenote the collection (Xj, . . . , Xj0)for indices 1≤j≤j0≤k, we have
I`
`=1
`I(Xk
k−`+1;Y|Xk−`
1)(92)
=1
`
k
X
j=k−`+1
I(Xj;Y|Xj−1
1)(93)
=1
`
k
X
j=k−`+1 H(Xj)−H(Xj|Y, X j−1
1),(94)
where (92) follows since I`=I(Xsdif ;Y|Xseq )only depends on the sets (sdif , seq)through their cardinalities, (93)
follows from the chain rule for mutual information, and (94) follows since Xjis independent of Xj−1
1. We establish
the desired claim by observing that I`
`is decreasing in `: The term H(Xj)is the same for all j, whereas the term
H(Xj|Y, X j−1
1)is smaller for higher values of jbecause conditioning reduces entropy.
Using this observation, the condition in (91) simplifies to
n≥max
`>dmax
klog p
k
Ik
(1 + o(1)).(95)
March 15, 2018 DRAFT
24
We can further replace Ik=I(Xs;Y)by the capacity of the Z-channel upon optimizing the i.i.d. Bernoulli
parameter ν > 0. The optimal value is the one that makes P[∨j∈sXs= 1] the same as P∗
U(1), where P∗
Uis the
capacity-achieving input distribution of the Z-channel PY|U.4
B. Partial Recovery Result with Unknown k
In this section, we explain how to adapt the partial recovery analysis of [8] (as well as that of Appendix A for
dmax = Θ(kγ)) to the case that kis only known to lie within a certain interval Kof length ∆ = o(dmax), where
dmax is the partial recovery threshold. Specifically, we argue that for any defective set swith |s|∈K, there exists
a decoder that knows Kbut not |s|, such that the error probability P[b
S6=s|S=s]vanishes under i.i.d. Bernoulli
testing, with the same requirement on nis the case of known |s|. Of course, this also implies that P[b
S6=S]vanishes
under any prior distribution on Ssuch that |S|∈Kalmost surely.
We consider the same non-adaptive setup of Appendix A, denoting the test matrix by X∈ {0,1}pand making
extensive use of the information densities defined in in (67)–(68). Since k:= |s|is unknown, we can no longer
assume that the test matrix is i.i.d. with distribution PX∼Bernoulliν
k, so we instead use PX∼Bernoulliν
kmax ,
with kmax equaling the maximum value in K.
In the case of known k, we considered the decoder in (70), first proposed in [8]. In the present setting, we modify
the decoder to consider all possible k, and to allow sdif ∪seq to be a strict subset of s. More specifically, the decoder
is defined as follows. For any pair (sdif, seq )such that |sdif ∪seq|equals some constant k0, let ın
k0(xsdif ;y|xseq )be
the information density corresponding to the case that the defective set equals sdif ∪seq, with an explicit dependence
on the cardinality k0. We consider a decoder that searches over all s⊆ {1, . . . , p}whose cardinality is in K, and
seeks a set such that
ın
k0(Xsdif ;Y|Xseq )≥γk0,`,∀(sdif , seq )⊆˜
Ss,(96)
where {γk0,`}is a set of constants depending on k0:= |sdif ∪seq |and `:= |sdif |, and ˜
Ssis the set of pairs (sdif , seq)
satisfying the following:
1. sdif ⊆sand seq ⊆sare disjoint;
2. The total cardinality k0=|sdif ∪seq|lies in K;
3. The “distance” `+k−k0exceeds dmax. Specifically, if sis the true defective set and ˆsis some estimate of
cardinality k0≤kwith s∩ˆs=seq and |seq |=k0−`, then we have `+k−k0false negatives, and `false
positives, so that d(s, ˆs) = `+k−k0under the distance function in (5).
If multiple ssatisfy (96), then the one with the smallest cardinality k:= |s|is chosen, with any remaining ties
broken arbitrarily. If none of the ssatisfy (96), an error is declared.
4In fact, this analysis applies to any binary channel PY|U.
March 15, 2018 DRAFT
25
Under this decoder, an error occurs if the true defective set sfails the threshold test (96), or if some s0with
|s0| ≤ |s|and d(s, s0)> dmax passes it. By the union bound, the first of these occurs with probability at most
P(1)
e(s, dmax)≤X
(k0,`) : k0∈K,`≤k0≤k,
`+k−k0>dmax
k
k0k0
`Pın
k0(Xsdif ;Y|Xseq )≥γk0,`,(97)
where (sdif , seq)is an arbitrary pair with |sdif |=`and |sdif ∪seq |=k0. Here the combinatorial terms arise by
choosing k0elements of sto form sdif ∪seq, and then choosing `of those elements to form sdif .
As for the probability of some incorrect s0passing the threshold test, we have the following. Let seq =s0∩s
and sdif =s0\s. Since only sets with |s0| ≤ |s|can cause errors, k0:= |s0|=|seq ∪sdif |is upper bounded by k,
and since only sets with d(s, s0)> dmax can cause errors, we can also assume that this holds. Defining `=|sdif |,
we can upper bound the probability of s0passing the test (96) for all (sdif , seq)by the probability of passing it for
the specific pair (sdif , seq). By doing so, and summing over all possible s0, we find that the second error event is
upper bounded as follows for any given s:
P(2)
e(s, dmax)≤X
(k0,`) : k0∈K,`≤k0≤k,
`+k−k0>dmax
p−k
` k
k0−`Pın
k0(Xsdif ;Y|Xseq )≥γk0,`,(98)
where the combinatorial terms corresponding to choosing `elements of {1, . . . , p} \ sto form sdif , and choosing
k0−`elements of sto form seq.
Combining the above, the overall upper bound on the error probability given sis
Pe(s)≤P(1)
e(s, dmax) + P(2)
e(s, dmax).(99)
Upon substituting the upper bounds in (97) and (98), we obtain an expression that is nearly the same as that when
kis known [8], except that we sum over a number of different k0, rather than only k0=k. We proceed by arguing
that this does not affect the final bound, as long as dmax = Θ(kγ)for some γ∈(0,1], and ∆ = o(dmax)(recall
that ∆is the highest possible difference between two possible kvalues).
The main additional difficulty here is that the information density ık0(xsdif ;y|xseq) = log PY|Xsdif ,Xseq (y|xsdif ,xseq )
PY|Xseq (y|xseq )
is defined with respect to (seq, sdif )of total cardinality k0, whereas the output variables Yare distributed according
to the true model in which there are kdefectives. The following lemma allows us to perform a change of measure
to circumvent this issue.
Lemma 4. Fix a defective set sof cardinality k, let (sdif , seq)be disjoint subsets of swith total cardinality k0≤k,
and let P(k)
Y|Xsdif ,Xseq be the conditional probability of Ygiven the partial test vector (Xsdif , Xseq), in the case of
a test vector with i.i.d. Bernoulliν
kmax entries, where kmax =k(1 + o(1)). Similarly, let P(k0)
Y|Xsdif ,Xseq denote the
conditional transition law when s0=sdif ∪seq is the true defective set. Then, if |k−k0| ≤ ∆ = o(k), we have
max
xsdif ,xseq ,y
P(k)
Y|Xsdif ,Xseq (y|xsdif , xseq )
P(k0)
Y|Xsdif ,Xseq (y|xsdif , xseq )≤1 + O∆
k.(100)
March 15, 2018 DRAFT
26
Consequently, the corresponding n-letter product distributions P(k)
Y|Xsdif ,Xseq and P(k0)
Y|Xsdif ,Xseq for conditionally
independent observations satisfy the following:
max
xsdif ,xseq ,y
P(k)
Y|Xsdif ,Xseq (y|xsdif ,xseq )
P(k0)
Y|Xsdif ,Xseq (y|xsdif ,xseq )≤eO(n∆
k)(101)
Proof. First observe that if xsdif or xseq contain an entry equal to one, then the ratio in (100) equals one, as Y= 1
with probability 1−ρin either case. Hence, it suffices to prove the claim for xsdif and xseq having all entries equal
to zero. In the denominator, we have
P(k0)
Y|Xsdif ,Xseq (1|xsdif , xseq ) = ρ, (102)
since there (sdif , seq)corresponds to the entire defective set. On the other hand, in the numerator, there are k−k0
additional defective items, and the probability of one or more of them being defective is := 1 −1−ν
k)k−k0=
O∆
kmax , where we applied the assumptions |k−k0| ≤ ∆ = o(k)and kmax =k(1 + o(1)), along with some
asymptotic simplifications. Therefore, we have
P(k)
Y|Xsdif ,Xseq (1|xsdif , xseq ) = (1 −)ρ+(1 −ρ)(103)
=ρ+(1 −2ρ).(104)
The ratio of (104) and (102) evaluates to 1 + O(), and similarly for the conditional probabilities of Y= 0 obtained
by taking one minus the right-hand sides. Since =O∆
k, this proves (100).
We obtain (101) by raising the right-hand side of (100) to the power of n, and applying 1 + α≤eα.
We now show how to use Lemma 4 to bound P(1)
e(s, dmax)and P(2)
e(s, dmax). Starting with the former, we
observe that Yin (97) is conditionally distributed according to P(k)
Y|Xsdif ,Xseq , and hence, (101) yields
Pın
k0(Xsdif ;Y|Xseq )≥γk0,`≤eO(n∆
k)·Pın
k0(Xsdif ;e
Y|Xseq )≥γk0,`,(105)
where e
Yis conditionally distributed according to P(k0)
Y|Xsdif ,Xseq .
For P(2)
e(s, dmax), we first note that a similar bound to (101) holds when we condition on Xseq alone; this is
seen by simply moving the denominator to the right-hand side and averaging over Xsdif on both sides. Since Yin
(98) is conditionally distributed according to P(k)
Y|Xseq , we obtain from (101) that
Pın
k0(Xsdif ;Y|Xseq )≥γk0,`≤eO(n∆
k)·Pın
k0(Xsdif ;e
Y|Xseq )≥γk0,`,(106)
where e
Yis conditionally distributed according to P(k0)
Y|Xseq .
Next, observe that if the number of tests satisfies n=O(klog p), then we can simplify the term eO(n∆
k)to
eO(∆ log p). By doing so, and substituting (105) and (106) into (97)–(99), we obtain
Pe(s)≤eO(∆ log p)X
(k0,`) : k0∈K,`≤k0≤k,
`+k−k0>dmax
k
k0k0
`Pın
k0(Xsdif ;e
Y|Xseq )≥γk0,`(107)
+eO(∆ log p)X
(k0,`) : k0∈K,`≤k0≤k,
`+k−k0>dmax
p−k
` k
k0−`Pın
k0(Xsdif ;e
Y|Xseq )≥γk0,`.(108)
March 15, 2018 DRAFT
27
This bound is now of a similar form to that analyzed in [8], in the sense that the joint distributions of the tests
and outcomes match those that define the information density. The only differences are the presence of additional
k0values beyond only k=k0, and the presence of the eO(∆ log p)terms. We conclude by explaining how these
differences do not impact the final result as long as ∆ = o(dmax)with dmax = Θ(kγ)for some γ∈(0,1]:
•The term p−k
`satisfies log p−k
`=`log p
`(1 + o(1)), and the assumption |k−k0| ≤ ∆ = o(dmax) = o(k)
implies that the term k0
`satisfies log k0
`=`log k
`(1+ o(1)). On the other hand, the logarithm of eO(∆ log p)
is O(∆ log p), so it is dominated by the other combinatorial terms due to the fact that ∆ = o(dmax)and
`= Ω(dmax). Similarly, the term k
k0=k
k−k0satisfies log k
k0=O(∆ log k), and is dominated by k0
`.
•The term k
k0−`simplifies to k
k−k0+`=k
`(1+o(1))(by the assumption ∆ = o(dmax)), and hence, the
asymptotic behavior for any k0is the same as k
k−`, the term corresponding to k=k0. Similarly, the asymptotics
of the tail probabilities of the information densities are unaffected by switching from kto k0=k(1 + o(1)).
•In [8] the number of `being summed over is upper bounded by k, whereas here we can upper bound the number
of (k0, `)being summed over by k∆. Since ∆ = o(k), this simplifies to k1+o(1). Since it is the logarithm of
this term that appears in the final expression, this difference only amounts to a multiplication by 1 + o(1).
C. NCOMP with Unknown Number of Defectives
Chan et al. [14] showed that Noisy Combinatorial Orthogonal Matching Pursuit (NCOMP), used in conjunction
with i.i.d. Bernoulli test matrices, ensures exact recovery of a defective set Sof cardinality kwith high probability
under the scaling n=O(klog p), which in turn behaves as Oklog p
kwhen k=O(pθ)for some θ < 1. However,
the random test design and the decoding rule in [14] assume knowledge of k, meaning the result cannot immediately
be used for our purposes in Step 2 of Algorithm 1. In this section, we modify the algorithm and analysis of [14]
to handle the case that kis only known up to a constant factor.
Suppose that k∈[c0kmax, kmax ]for some kmax = Θ(pθ), where c0∈(0,1) and θ∈(0,1) do not depend on
p. We adopt a Bernoulli design in which each item are independently placed in each test with probability ν
kmax for
fixed ν > 0. It follows that for a given test vector X= (X1, . . . , Xp), we have
P_
j∈S
Xj= 1= 1 −1−ν
kmax k
= (1 −e−cν )(1 + o(1)) (109)
for some c∈[c0,1], and hence, the corresponding observation Ysatisfies
P[Y= 1] = (1 −ρ)(1 −e−cν ) + ρe−cν (1 + o(1)).(110)
In contrast, for any j∈S, we have
P[Y= 1|Xj= 1] = 1 −ρ. (111)
The idea of the NCOMP algorithm is the following: For each item j, consider the set of tests in which the item is
included, and define the total number as N0
j. If jis defective, we should expect a proportion of roughly 1−ρof
these tests to be positive according to (111), whereas if jis non-defective, we should expect the proportion to be
March 15, 2018 DRAFT
28
roughly (1 −ρ)(1 −e−cν ) + ρe−cν according to (110). Hence, we set a threshold in between these two values, and
declare jto be defective if and only if the proportion of positive tests exceeds that threshold.
We first study the behavior of N0
j. Under the above Bernoulli test design, we have N0
j∼Binomialn, ν
kmax ,
and hence, standard Binomial concentration [37, Ch. 4] gives
PN0
j≤nν
2kmax ≤e−Θ(1) n
kmax (112)
≤1
p2,(113)
where (113) holds provided that n= Ω(klog p)with a suitably-chosen implied constant (recall that k= Θ(kmax)).
Next, we present the modified NCOMP decoding rule, and study its performance under the assumption that
N0
j=n0
jwith n0
j≥nν
2kmax , for each j∈ {1, . . . , p}. Observe that the gap between (110) and (111) behaves as Θ(1)
for any c∈[c0,1]. Hence, for sufficiently small ∆>0, we have P[Y= 1] ≤1−ρ−2∆. Accordingly, letting
N0
j,1be the number of the N0
jtests including jthat returned positive, we declare jto be defective if and only if
N0
j,1≥(1 −ρ−∆)N0
j. We then have the following:
•If jis defective, then the probability of incorrectly declaring it to be non-defective given N0
j=n0
jsatisfies
PN0
j,1<(1 −ρ−∆)n0
j≤e−Θ(1)n0
j≤e−Θ(1) nν
2kmax ,(114)
where the first inequality is standard Binomial concentration, and the second holds for n0
j≥nν
2kmax .
•Similarly, if jis non-defective, the probability of incorrectly declaring it to be defective given N0
j=n0
jsatisfies
PN0
j,1≥(1 −ρ−∆)n0
j≤e−Θ(1)n0
j≤e−Θ(1) nν
2kmax .(115)
Combining these bounds with (113) and a union bound over the pitems, the overall error probability Pe=P[b
S6=S]
of the modified NCOMP algorithm is upper bounded by
Pe≤1
p+pe−Θ(1) nν
2kmax .(116)
Since kmax = Θ(k), this vanishes when n= Ω(klog p)with a suitably-chosen implied constant, thus establishing
the desired result.
Z-channel noise. Under the Z-channel noise model introduced in Section V, the preceding analysis is essentially
unchanged. It only relied on there being a constant gap between the probabilities P[Y= 1] and P[Y= 1 |Xj= 1],
and this is still the case here: Equations (110) and (111) still hold true when (1 −ρ)(1 −e−cν ) + ρe−cν is replaced
by (1 −e−cν ) + ρe−cν in the former.
ACKNOWLEDGMENT
The author thanks Volkan Cevher, Sidharth Jaggi, Oliver Johnson, and Matthew Aldridge for helpful discussions,
and Leonardo Baldassini for sharing his PhD thesis [30].
March 15, 2018 DRAFT
29
REFERENCES
[1] R. Dorfman, “The detection of defective members of large populations,” Ann. Math. Stats., vol. 14, no. 4, pp. 436–440, 1943.
[2] A. Fernández Anta, M. A. Mosteiro, and J. Ramón Muñoz, “Unbounded contention resolution in multiple-access channels,” in Distributed
Computing. Springer Berlin Heidelberg, 2011, vol. 6950, pp. 225–236.
[3] R. Clifford, K. Efremenko, E. Porat, and A. Rothschild, “Pattern matching with don’t cares and few errors,” J. Comp. Sys. Sci., vol. 76,
no. 2, pp. 115–124, 2010.
[4] G. Cormode and S. Muthukrishnan, “What’s hot and what’s not: Tracking most frequent items dynamically,” ACM Trans. Database Sys.,
vol. 30, no. 1, pp. 249–278, March 2005.
[5] A. Gilbert, M. Iwen, and M. Strauss, “Group testing and sparse signal recovery,” in Asilomar Conf. Sig., Sys. and Comp., Oct. 2008, pp.
1059–1063.
[6] A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin, “One sketch for all: Fast algorithms for compressed sensing,” in Proc.
ACM-SIAM Symp. Disc. Alg. (SODA), New York, 2007, pp. 237–246.
[7] L. Baldassini, O. Johnson, and M. Aldridge, “The capacity of adaptive group testing,” in IEEE Int. Symp. Inf. Theory, July 2013, pp.
2676–2680.
[8] J. Scarlett and V. Cevher, “Phase transitions in group testing,” in Proc. ACM-SIAM Symp. Disc. Alg. (SODA), 2016.
[9] D.-Z. Du and F. K. Hwang, Combinatorial group testing and its applications, ser. Series on Applied Mathematics. World Scientific, 1993.
[10] M. Aldridge, “The capacity of Bernoulli nonadaptive group testing,” 2015, http://arxiv.org/abs/1511.05201.
[11] A. Agarwal, S. Jaggi, and A. Mazumdar, “Novel impossibility results for group-testing,” 2018, http://arxiv.org/abs/1801.02701.
[12] M. Aldridge, “Individual testing is optimal for nonadaptive group testing in the linear regime,” 2018, http://arxiv.org/abs/1801.08590.
[13] M. B. Malyutov and P. S. Mateev, “Screening designs for non-symmetric response function,” Mat. Zametki, vol. 29, pp. 109–127, 1980.
[14] C. L. Chan, P. H. Che, S. Jaggi, and V. Saligrama, “Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds
with efficient algorithms,” in Allerton Conf. Comm., Ctrl., Comp., Sep. 2011, pp. 1832–1839.
[15] C. L. Chan, S. Jaggi, V. Saligrama, and S. Agnihotri, “Non-adaptive group testing: Explicit bounds and novel algorithms,” IEEE Trans.
Inf. Theory, vol. 60, no. 5, pp. 3019–3035, May 2014.
[16] J. Scarlett and V. Cevher, “Efficient and near-optimal noisy group testing: An information-theoretic framework,”
https://arxiv.org/abs/1710.08704.
[17] ——, “Limits on support recovery with probabilistic models: An information-theoretic framework,” IEEE Trans. Inf. Theory, vol. 63, no. 1,
pp. 593–620, 2017.
[18] M. Malyutov, “The separating property of random matrices,” Math. Notes Acad. Sci. USSR, vol. 23, no. 1, pp. 84–91, 1978.
[19] G. Atia and V. Saligrama, “Boolean compressed sensing and noisy group testing,” IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 1880–1901,
March 2012.
[20] M. Aldridge, L. Baldassini, and K. Gunderson, “Almost separable matrices,” J. Comb. Opt., pp. 1–22, 2015.
[21] J. Scarlett and V. Cevher, “Converse bounds for noisy group testing with arbitrary measurement matrices,” in IEEE Int. Symp. Inf. Theory,
Barcelona, 2016.
[22] A. J. Macula, “Error-correcting nonadaptive group testing with de-disjunct matrices,” Disc. App. Math., vol. 80, no. 2-3, pp. 217–222,
1997.
[23] H. Q. Ngo, E. Porat, and A. Rudra, “Efficiently decodable error-correcting list disjunct matrices and applications,” in Int. Colloq. Automata,
Lang., and Prog., 2011.
[24] M. Cheraghchi, “Noise-resilient group testing: Limitations and constructions,” Disc. App. Math., vol. 161, no. 1, pp. 81–95, 2013.
[25] F. Hwang, “A method for detecting all defective members in a population by group testing,” J. Amer. Stats. Assoc., vol. 67, no. 339, pp.
605–608, 1972.
[26] P. Damaschke and A. S. Muhammad, “Randomized group testing both query-optimal and minimal adaptive,” in Int. Conf. Current Trends
in Theory and Practice of Computer Science. Springer, 2012, pp. 214–225.
[27] A. J. Macula, “Probabilistic nonadaptive and two-stage group testing with relatively small pools and DNA library screening,” J. Comb.
Opt., vol. 2, no. 4, pp. 385–397, 1998.
[28] M. Mézard and C. Toninelli, “Group testing with random pools: Optimal two-stage algorithms,” IEEE Trans. Inf. Theory, vol. 57, no. 3,
pp. 1736–1745, 2011.
March 15, 2018 DRAFT
30
[29] S. Cai, M. Jahangoshahi, M. Bakshi, and S. Jaggi, “GROTESQUE: Noisy group testing (quick and efficient),” 2013,
https://arxiv.org/abs/1307.2811.
[30] L. Baldassini, “Rates and algorithms for group testing,” Ph.D. dissertation, Univ. Bristol, 2015.
[31] S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone, “PAC subset selection in stochastic multi-armed bandits.” in Int. Conf. Mach. Learn.
(ICML), 2012.
[32] O. Johnson, “Strong converses for group testing from finite blocklength results,” IEEE Trans. Inf. Theory, vol. 63, no. 9, pp. 5923–5933,
Sept. 2017.
[33] Y. Polyanskiy, V. Poor, and S. Verdú, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp.
2307–2359, May 2010.
[34] A. Feinstein, “A new basic theorem of information theory,” IRE Prof. Group. on Inf. Theory, vol. 4, no. 4, pp. 2–22, Sept. 1954.
[35] T. S. Han, Information-Spectrum Methods in Information Theory. Springer, 2003.
[36] M. B. Malyutov, “Search for sparse active inputs: A review,” in Inf. Theory, Comb. and Search Theory, 2013, pp. 609–647.
[37] R. Motwani and P. Raghavan, Randomized Algorithms. Chapman & Hall/CRC, 2010.
March 15, 2018 DRAFT