Content uploaded by Daniel Fraiman

Author content

All content in this area was uploaded by Daniel Fraiman on Jul 01, 2020

Content may be subject to copyright.

Group testing with nested pools

In´es Armend´ariz∗

, Pablo A. Ferrari†

,

Daniel Fraiman‡

, Silvina Ponce Dawson§

June 9, 2020

Abstract

We iterate Dorfman’s pool testing algorithm [7] to identify infected individuals in a large

population, a classiﬁcation problem. This is an adaptive scheme with nested pools: pools at a

given stage are subsets of the pools at the previous stage. We compute the mean and variance

of the number of tests per individual as a function of the pool sizes m= (m1, . . . , mk) in the

ﬁrst kstages; in the (k+ 1)-th stage all remaining individuals are tested. Denote by Dk(m, p)

the mean number of tests per individual, which we will call the cost of the strategy m. The

goal is to minimize Dk(m, p) and to ﬁnd the optimizing values kand m. We show that the

cost of the strategy (3k,...,3) with k≈log3(1/p) is of order plog(1/p), and diﬀers from the

optimal cost by a fraction of this value. To prove this result we bound the diﬀerence between

the cost of this strategy with the minimal cost when pool sizes take real values. We conjecture

that the optimal strategy, depending on the value of p, is indeed of the form (3k,...,3) or of

the form (3k−14,3k−1. . . , 3), with a precise description for k. This conjecture is supported by

inspection of a family of values of p. Finally, we observe that for these values of pand the best

strategy of the form (3k,...,3), the standard deviation of the number of tests per individual is

of the same order as the cost. As an example, when p= 0.02, the optimal strategy is k= 3,

m= (27,9,3). The cost of this strategy is 0.20, that is, the mean number of tests required to

screen 100 individuals is 20.

Keywords Dorfman’s retesting, Group testing, Nested pooled testing, Adaptive testing.

AMS Math Classiﬁcation Primary 62P10

1 Introduction

The outbreak of COVID-19 caused by the novel Coronavirus, SARS-CoV-2, has spread over almost

all countries in the world [19]. Running diagnostic tests is a key tool not only for the treatment

of those infected but also to make decisions on how to handle the spread of the epidemic within

nations and communities. The possibility of running the current gold standard test, RT-qPCR,

in sample pools was investigated in [25] ﬁnding that the identiﬁcation of individuals infected with

SARS-CoV-2 is in fact possible using mixtures of up to 32 individual samples. The use of more

sensitive tests [24, 6] would likely improve this limit.

∗Universidad de Buenos Aires & IMAS-CONICET-UBA. Email: iarmend@dm.uba.ar

†Universidad de Buenos Aires & IMAS-CONICET-UBA. Email: pferrari@dm.uba.ar

‡Universidad de San Andr´es & CONICET. Email: dfraiman@udesa.edu.ar

§Universidad de Buenos Aires & IFIBA-CONICET-UBA Email: silvina@df.uba.ar

1

arXiv:2005.13650v2 [math.ST] 6 Jun 2020

Dorfman [7] was the ﬁrst to propose a group testing strategy in 1943. The samples from n

individuals are pooled and tested together. If the test is negative, the nindividuals of the pool are

cleared. If the test is positive, each of the nindividuals must be tested separately, and n+1 tests are

required to test npeople. In the present note we focus on the mathematical aspects of a sequential

multi-stage extension of Dorfman’s algorithm. This scheme belongs to the family of adaptive models,

in which the course of action chosen at each stage depends on the results of previous stages. Our

approach assumes that each test is conclusive, i.e. there are no false positives/negatives.

Dorfman’s 2-stage strategy was subsequently improved by Sterrett [23] to further reduce the

number of tests and extended to more stages of group testing by Sobel and Groll [22] and Finu-

can [10]. Noticing that the optimal strategy depends on the fraction of infected individuals within

the tested population, in [22] the approach was extended to estimate the infection probability p,

and to situations with subpopulations characterized by diﬀerent infection probabilities. References

[16, 2, 3] proposed to use information on heterogeneous populations to improve Dorfman’s algorithm.

An extension of Dorfman’s algorithm in which each group is tested several times to minimize testing

errors was presented in [11]. The previous strategies are classiﬁed as adaptive. Adaptive strategies

can lead to errors if the test gives false positives or negatives. When testing errors are present

and/or tests are time consuming, it may be convenient to perform tests in parallel; these methods

are called nonadaptive [20, 8, 14, 5, 4, 17, 1]. Nonadaptive testing is not necessarily free of errors.

The impact of test sensitivity in both adaptive/nonadaptive testing was analyzed in [14].

Diﬀerent pool testing strategies have been analyzed speciﬁcally for the case of SARS-CoV-2.

In [12] Dorfman’s algorithm is applied including the use of replicates to check for false negatives

or positives. In [18] adaptive and non-adaptive methods that use binary splitting are compared

numerically. The work in [21] evaluates numerically the performance of two-dimensional array

pooling comparing it with Dorfman’s strategy.

Under Dorfman’s strategy [7] the mean number of tests per pool of npeople is

1 + n1−(1 −p)n,(1)

where pis the probability that an individual is infected, assuming that the events that diﬀerent

people are infected are independent. The ﬁrst term above is the number of tests in the ﬁrst stage:

one test per pool. The second term accounts for the nadditional tests required in the second stage,

one test per individual in the pool, when there is at least one infected individual, an event with

probability 1 −(1 −p)n. Dividing by nin (1), the cost of the scheme, that is the mean number of

tests per individual, is then

D(n, p) = 1

n+ 1 −(1 −p)n.(2)

The cost of the one stage strategy consisting in testing every person in the pool is 1, hence Dorfman’s

strategy is worth pursuing only if D(n, p)<1. Solving for p, this means that

p < 1−1

e1/e ≈0.3077992 . . . , (3)

and in this case nmust be greater than or equal to 3. For small values of pthe value nthat

minimizes Das a function of pis approximately p−1/2and D(p−1/2, p)≈2√p, see Feller [9].

In this note we propose to consider a sequence of nested pools, a scheme mentioned by Sobel

and Groll [22]. In the ﬁrst stage test pools of size m1; individuals in pools that tested negative

are healthy. Pools that tested positive are split into (smaller) pools of size m2, and these pools are

2

tested in the second stage. Iterate until the k-th stage where the pools are of size mk, and ﬁnally,

in the k+ 1-th stage, test every remaining individual belonging to pools that tested positive in

the previous stage. For each infection proportion pand pool strategy given by the choice of kand

m= (m1, . . . , mk), let

Tk(m, p) := total number of tests per initial pool (of size m1),

Dk(m, p) := 1

m1

ETk= cost of this strategy.

We compute a precise formula for Tk(m, p) and derive its mean and variance in §2.

Consider Dk(m, p) : Rk

≥0→R, that is, allow pool sizes to take real values. We show in §3.1

that for a given pand ﬁxed k, the minimum of Dk(m, p) can be computed using the Lambert W

function. Unfortunately there is no amenable formula of this minimum cost as a function of k. On

the other hand, the linearization Lk(m, p) in pof the cost function has a very simple form, and its

minimum is achieved at (ek, . . . , e) with k≈log(1/p), see §3.3. We note that Lk(m, p) coincides

with the cost of the simpliﬁed scheme proposed by Finucan [10].

Drawing inspiration from the study of the linear approximation, we consider in §3.4 the family

of pool strategies (3k,...,3), k ∈N, which are associated to feasible testing schemes. We compute

in Lemma 8 the optimal choice of kfor this family, that we denote by k3=k3(p), and prove in

Theorem 9 that the cost Dk3(3k3,...,3), pis of the same order as the minimum cost

D?(p) = min

k,m Dk(m, p) = Oplog(1/p)(4)

and bound their diﬀerence. Notice that the optimization in (4) is carried out over real valued

m∈Rk, and the minimum attained is a lower bound to the optimum value achieved when

considering actual testing schemes. Thus Theorem 9 provides a bound for the diﬀerence be-

tween Dk3(3k3,...,3), pand the cost of the optimal feasible strategy, and shows that both are

O(plog(1/p)).

In §3.5 we study by inspection the cost function restricted to feasible nested pooling strategies,

for a family of values of p∈[0.002,0.1]. The results conﬁrm that in most cases (3k3. . . , 3) is the

optimal strategy, see Table 1. These ﬁndings can be used as a guide to designing concrete group

testing strategies. In §3.6 we contrast these discrete optimization results with those obtained in the

linearized optimization problem, Table 2.

After plotting many diﬀerent strategies, we conjecture in §3.7 that the optimal strategy is given

by (3j,...,3) when p∈[λj, ρj], and (3j−14,3j−1,...,3) when p∈[ρj+1, λj], where ρ1> λ1> ρ2>

λ2> . . . are given explicitly.

Lemma 1 in §2 gives a closed formula for the VarTk(m, p)in the particular case of interest here,

when pool sizes m1, . . . , mkare powers of a ﬁxed quantity. We apply it to compute the standard

deviation of the number of tests per person in the discrete optimization solution in §3.5. It turns

out that, for the family of values of pconsidered, the standard deviation is of the same order as the

mean number of tests, Table 1.

The article is organized as follows. In §2 we deﬁne the pool strategy and compute the mean and

variance of the random number of tests per individual. We consider the problem of optimizing the

cost function in §3. We include three appendices with technical computations.

3

2 Nested strategy

We iterate Dorfman’s strategy using nested pools, that is, pools in each stage are obtained as a

partition of the positive-testing pools in the previous stage.

We work under the assumption that the events that diﬀerent individuals in the population to

be tested are infected are independent. Let X= (X1, . . . , XN) with i.i.d. Xi∼Bernoulli(p). When

Xi= 1 we say that the individual iis infected. The parameter pis the probability that an individual

is infected. For any subset Aof the population, that is A⊂ {1, . . . , N }, the function

φA(X) := Y

i∈A

(1 −Xi).(5)

is called a test. Notice that if there is no infected individual in Athen the test takes the value 1,

and otherwise it vanishes. The goal is to reveal the values of Xas the result of a family of tests.

To describe Dorfman’s strategy [7] in these terms, let m1=ndenote the chosen size for the

pools, let W={Xi}1≤i≤m1be one of the pools, and compute

φ:= (1 −X1). . . (1 −Xm1).(6)

This test is the ﬁrst stage of the strategy. If φ= 1 the pool has no infected individuals, and we

conclude that X1=··· =Xm1= 0. If φ= 0 we move on to the second stage, where we compute

each Xiindividually. The number T=T(X) of performed tests is a function of Xgiven by

T= 1 + m11−φ,(7)

and the cost (2) is

D(m1, p) := 1

m1ET =1

m1

+ 1 −qm1,(8)

with

q:= 1 −p. (9)

Let us now describe the 3-stage procedure. Denote by m1, m2≥1 the sizes of the pools in the

ﬁrst and second stage, respectively, where m1is a multiple of m2. Let W={Xi}1≤i≤m1be the pool

in the ﬁrst stage. We partition this family into m1

m2subsets W0, . . . , W m1

m2−1, by setting

Wi:= Xm2i+1, . . . , Xm2(i+1),0≤i≤m1

m2−1.(10)

These are the pools of the second stage. Let

φ:=

m1

Y

i=1

(1 −Xi) and φi=

m2

Y

j=1

(1 −Xm2i+j),0≤i≤m1

m2−1.(11)

As before, φ= 1 if the pool Whas no infected individuals and 0 otherwise, and similarly φi= 1 if

and only if Wicontains no infected individuals.

In order to reveal the set of variables Xi= 1, 1 ≤i≤m1, we propose the following sequential

group testing scheme: ﬁrst evaluate φ. If φ= 1 then there are no infected individuals in W. If

4

φ= 0 then evaluate φi,0≤i≤m1

m2−1. This is the second stage of testing. Note that conditional

on φ= 0 there must be at least one index iwith φi= 1, in this case we will say that the i-th pool

is infected. Finally, in the third stage, apply the test to each individual belonging to an infected

pool in the second stage. Denote by T2the number of tests for the sample Wunder this 3-stage

scheme. Note that we label Taccording to the number kof pooled stages (2 in this case) rather

than with the total number of stages. T2is a function of W= (X1, . . . , Xm1). We get

T2= 1 + (1 −φ)m1

m2

+ (1 −φ)m2

m1

m2−1

X

i=0

(1 −φi).(12)

Now 1 −φ∼Bernoulli(1 −qm1), and {1−φi}0≤i≤m1

m2−1are independent, identically distributed

random variables with distribution Bernoulli1−qm2, hence

I:=

m1

m2−1

X

i=0

(1 −φi)∼Binomial m1

m2

,1−qm2.(13)

Note that φ= 1 implies φi= 1, i = 0,..., m1

m2−1, hence

φ(1 −φi)≡0 and Iφ ≡0.(14)

Replacing in (12) we get

T2= 1 + (1 −φ)m1

m2

+m2

m1

m2−1

X

i=0

(1 −φi).(15)

Expected number of tests per individual Let D2(m1, m2, p) := 1

m1ET2denote the cost of

the 3-stage scheme with size m1pools in the ﬁrst stage, size m2pools in the second stage, and

probability of infection p. From (15) we get

D2(m1, m2, p) = 1

m1

+1

m21−qm1+ 1 −qm2.(16)

In general, for the k+ 1-stage scheme, let us denote by Tkthe total number of tests that are

needed to classify the variables in the pool W. We get

Tk= 1 + (1 −φ)m1

m2

+m2

m3

m1

m2−1

X

i=0

(1 −φi) + m3

m4

m1

m2−1

X

i1=0

m2

m3−1

X

i2=0

(1 −φi1i2)

+· ·· +mk

m1

m2−1

X

i1=0

m2

m3−1

X

i2=0 ···

mk−1

mk−1

X

ik−1=0 1−φi1i2...ik−1,(17)

where (1 −φi1i2...ik−1) are i.i.d. Bernoulli1−qmk. Then, denoting m= (m1, . . . , mk) and

Dk(m, p) := 1

m1

ETk,(18)

5

the cost is

Dk(m, p) = 1

m1

+ 1 −qmk+

k

X

j=2

1

mj1−qmj−1.(19)

Note that, if we denote by

Tj

k:= mj−1

mj

m1

m2−1

X

i1=0

m2

m3−1

X

i2=0 ···

mj−2

mj−1−1

X

ij−2=0 1−φi1i2...ij−2,(20)

the number of tests performed at the j-th stage, then its mean is independent of k,

ET j

k=m1

mj1−qmj−1.(21)

Variance The variance of Tkcan be computed explicitly. We write down here the case k= 2; the

proof is given in the Appendix A,

VarT2=m2

1

m2

2

qm11−qm1+m2m1qm21−qm2+ 2 m2

1

m2

qm11−qm2.(22)

An important case is when the ratio between consecutive pool sizes is constant and given by the

last pool size mk. This case will be relevant for the linearized optimization problem in §3.3 and for

the discrete optimization problem in §3.5.

Lemma 1 (Variance when pool-sizes are powers of mk).Let mj

mj+1 =µ=mk,j≤1≤k, so that

mj=µk−j+1. Then,

VarTk=µ2k

X

i=1

µi−11−qmihqmi+ 2

i−1

X

j=1

qmji(23)

=µ2nk

X

i=1

µi−11−qµk−i+1 hqµk−i+1 + 2

i−1

X

j=1

qµk−j+1 io.(24)

This lemma is proved in Appendix A. Computations are similar for the general case, without

assumptions on the sequence of pool sizes.

3 Optimization

In this section we search for the values of kand m= (m1, . . . , mk) which minimize Dk(m, p).

Throughout §3.1, §3.2 and §3.3 pool sizes are allowed to take values in R≥0. In §3.1 we show that

it is possible to optimize the cost Dkfor ﬁxed pand kusing the Lambert Wfunction, and derive

some estimates for the optimal strategy in §3.2. In §3.3 we optimize the linearization of the cost,

and bound its diﬀerence with the cost. We optimize over the family of strategies (3k,...,3), k ∈N,

in §3.4, and show that the cost of the best choice is of the same order as the optimal strategy

(Theorem 9), we compare this cost with the linearized cost for the same strategy in §3.6. In §3.5

we optimize over nested pool sizes (m1, . . . , mk) in Nkby inspection, for some values of p. Finally,

in §3.7 we conjecture the optimal nested strategy for any p∈[0,1].

6

3.1 Exact optimization with (m1, . . . , mk)∈Rk

≥0

We now optimize (19) for ﬁxed pand k, over the vector of nested pool sizes (m1, . . . , mk) in Rk

≥0.

Denote

m0=∞, mk+1 = 1, xi=milog q. (25)

With these deﬁnitions (19) reads

Dk(m, p) = log q

k+1

X

j=1

1−exj−1

xj

=: h(x) log q. (26)

Consider x= (x1, . . . , xk)∈Rk

≤0. We look for a maximum of h. Setting ∂h

∂xi= 0, i= 1, . . . , k, we

get the following equations

x2

jexj=−xj+1(1 −exj−1), j = 1, . . . , k. (27)

This system can be solved exactly in terms of the principal branch W0of the Lambert function. As

an example, for k= 2 the two equations are

x2

1ex1=−x2(1 −ex0), x2

2ex2=−x3(1 −ex1),(28)

or, using (25),

x2

1ex1=−x2, x2

2ex2=−(1 −ex1) log q. (29)

From the second equation we get

x2

2ex2/2=−1

2p(1 −ex1) log(1/q).

The function z7→ zezmaps (−∞,0] to [−e−1,0]. Hence, if the right hand side of the above

expression falls in the interval [−e−1,0], we can solve

x2= 2W0−1

2p(1 −ex1) log(1/q),(30)

The choice of the principal branch W0: [−e−1,0] →[−1,0] of the Lambert function is required by

the ﬁrst equation in (29) and the fact that infz≤0−z2ez=−4e−2. Plugging (30) in the left equation

of (29), we have

x2

1ex1=−2W0−1

2p(1 −ex1) log(1/q).(31)

For p= 0.04, we use Mathematica [13] to get x1≈ −0.452041460261919 and x2≈ −0.1300281628.

Dividing by log 0.96 we obtain m1≈11.07347805 and m2≈3.185247667. Rounding up to get a

feasible strategy such that m1is a multiple of m2, we get m1= 12 and m2= 3. This strategy turns

out to be optimal among the strategies with m1≤100, its cost is shown in Table 1. When p= 0.08

we get m1≈7.893901064 and m2≈2.69028519. These values are close to strategies (8,2), and

(9,3) and D2((9,3),0.08) ≈0.508369323 < D2((8,2),0.08) ≈0.521990563. In fact, (9,3) is optimal

among the strategies such that m1≤100.

This procedure can be carried out for any k. For instance, for k= 3, denoting V: [−4e−2,0] →R

by V(x)=2W0(−√−x

2) and g(z) = V−1(z) = −z2ez, we get

x1=VV(1 −ex1)V(1 −eg(x1)) log q.(32)

Once we know x1we can derive x2and x3using (27).

7

3.2 Some estimates for the optimal strategy

In this subsection we establish bounds for the size m1of the ﬁrst pool and the number kof stages

under an optimal strategy. We then show that, under some assumptions, the cost of a strategy may

be lowered by adding one stage and reducing the ratio between two consecutive pool sizes. As a

consequence of these results, we obtain a bound on the ratio of consecutive pool sizes of the optimal

strategy.

For k∈N, let

Rk,↓

≥0:= nm= (m1, . . . , mk)∈Rk, mi≥mi+1 ≥1 and mi

mi+1 ≥2,1≤i≤k−1o.(33)

Note that the constraints in this deﬁnition are naturally satisﬁed when the vector m∈Nkdeﬁnes

a nested strategy, while a generic m∈Rk,↓

≥0will not be associated to a feasible nested strategy if

any of its coordinates belongs to R≥0\N, or if any of the entries mifails to be a multiple of the

next coordinate mi+1, 1 ≤i≤k−1. For the rest of this section we consider the cost functions

as mappings Dk:Rk,↓

≥0→R. The estimates in Lemmas 2 and 3 below will be useful in §3.4 to

determine feasible and economical nested strategies.

Lemma 2 (Bounds on the ﬁrst pool size m1and stage number k).Let p∈[0,1], and let k?∈N

and m?= (m?

1, . . . , m?

k?)be minimizers of the cost function Dk(m, p),

k?= arg min

k∈N

min

m∈Rk,↓

≥0

Dk(m, p), m?= arg min

m∈Rk?,↓

≥0

Dk?(m, p).(34)

Then

m?

1≤1

log 1/q and k?≤1 + |log log(1/q)|

log 2 .(35)

Proof. The second inequality in (35) follows from the ﬁrst. Indeed, by the conditions in (33)

1≤m?

k?≤1

2k?−1m?

1=⇒log(m?

1)

log 2 ≥k?−1 =⇒k?≤1 + |log log(1/q)|

log 2 .

Now we show the ﬁrst inequality. By optimality, Dk?m?, p)≤Dk?−1(m?

2, . . . , m?

k?), p, that is,

1

m?

1

+1−qm?

1

m?

2≤1

m?

2

,

which implies m?

2≤m?

1qm?

1and

m?

2≤max

x≥0xqx=e−1

log(1/q).(36)

Fix (m?

2, . . . , m?

k?) and maximize Dk?(x, m?

2, . . . , m?

k?), pover x. We ﬁnd that the critical point

bm1satisﬁes

bm1=2

log qW0 −plog(1/q)m?

2

2!,

8

W0the principal branch of the Lambert function. From (36) we get

−plog(1/q)m?

2

2≥ −e−1

2

2=⇒W0 −plog(1/q)m?

2

2!≥ −1

2and bm1≤1

log 1/q .

It is easy to check that bm1≥2m?

2, hence by the optimality of m?∈Rk?,↓

≥0we conclude that

bm1=m?

1.

Lemma 3 (Bounded ratio between consecutive pool sizes).Let k∈Nand m= (m1, . . . , mk)∈Rk,↓

≥0

such that

i) m1≤1

log 1/q ,

ii) mi−1=`mi, for some `≥6and i∈ {2, . . . , k}.

Let

m0= (m1, . . . , `mi,3mi, mi, . . . , mk)∈Rk+1,↓

≥0.

Then

Dk+1(m0, p)≤Dk(m, p).(37)

Proof. From (19) we get

Dk+1(m0, p)−Dk(m, p) = 1−q`mi

3mi

+1−q3mi

mi−1−q`mi

mi≤0,

if and only if

1+2q`mi−3q3mi≤0⇐⇒ x=qmisatisﬁes 1 + 2x`−3x3≤0.

For `= 6 we have

2x6−3x3+ 1 = 0, x ∈R⇐⇒ x= 1 or x=1

3

√2

and 1 + 2q6mi−3q3mi≤0⇐⇒ 1

3

√2≤qmi≤1.

In order to show that this last inequality holds under the hypotheses of the lemma, note that

qmi≥q1

log 1/q =e−1

6by i) and ii), so that

qmi≥e−1

6≥1

3

√2,(38)

as wanted.

To prove the lemma for ` > 6, write

2x`−3x3+ 1 = (x−1)(2x`−1+·· · + 2x−1) = (x−1)(2x`−1+·· · + 2x5+· ·· + 2x−1)

≤(x−1)(2x5+· ·· + 2x−1)

= 2x6−3x3+ 1 ≤0, x ∈h1

3

√2,1i,

and the result follows from (38).

9

Corollary 4 (Bounded ratio between consecutive pool sizes in m?).Let p∈[0,1], and let k?∈N

and m?= (m?

1, . . . , m?

k?)be minimizers of the cost function Dk(m, p)as in (34). Then

m?

i+1 >1

6m?

i,1≤i≤k?−1.(39)

Proof. The conditions in Lemma 3 are satisﬁed by m?by Lemma 2.

3.3 Linearization of the cost function

The exact computation of the optimal strategy becomes complicated as the number of stages in-

creases. In this subsection we study the linearized version of the cost, which is easier to optimize

and gives a good approximation to the cost for small p.

Let us ﬁx pand a stage number k+ 1. We linearize the expected number of tests per individual

Dk=Dk(m, p) obtained in (19):

Dk=1

m1

+ 1 −emklog q+

k

X

j=2

1

mj1−emj−1log q

=Lk+ error.(40)

where the linear approximation Lk=Lk(m, p) is given by

Lk:= 1

m1

+mkp+p

k

X

j=2

mj−1

mj

.(41)

The linearized cost Lkcoincides with the cost proposed by Finucan [10], who assumed that for

suitable pand m1there is at most one infected individual per pool at all stages; we give some

details after Lemma 7.

In the next lemma we show that the cost is bounded above by the linearized cost, and provide

an estimate for the diﬀerence. The result is proved in Appendix B.

Lemma 5 (Domination and error bounds).Let p∈[0,1

2]. Let m= (m1, . . . , mk)∈Rk

≥1. Then

1. The linearized cost is an upper bound to the cost,

Dk(m, p)≤Lk(m, p).(42)

2. If mi−1

mi≤`for 2≤i≤k+ 1, with mk+1 := 1, then

|Dk(m, p)−Lk(m, p)| ≤ `m1log2q+`kp2.(43)

In particular, when mj

mj+1 =mk,1≤j≤k−1, equation (43) becomes

|Dk(m, p)−Lk(m, p)| ≤ mk+1

klog2q+mkkp2.(44)

Deﬁne the optimal values for Lkby

m](k)=(m]

1(k), . . . , m]

k(k)) := arg min

(m1,...,mk)∈Rk

+

Lkand (45)

L]

k:= Lk(m](k), p).(46)

In the next two lemmas we compute the optimal linearized values, see also [10].

10

Lemma 6 (Optimal pool sizes).Let p∈(0,1) and k∈N,k≥2. Then

m]

j(k) = p−k−j+1

k+1 ,1≤j≤k. (47)

L]

k= (k+ 1) pk

k+1 .(48)

Proof. For k≥3 we get

∂Lk

∂m1

=−1

m2

1

+p

m2

,(49)

∂Lk

∂mi

=−pmi−1

m2

i

+p

mi+1

,2≤i≤k−1,(50)

∂Lk

∂mk

=−pmk−1

m2

k

+p. (51)

In order to ﬁnd critical points we look for values of mj,1≤j≤kwhere these derivatives vanish.

We get

∂Lk

∂mk

= 0 ⇐⇒ mk−1=m2

kfrom (51) (52)

∂Lk

∂mi

= 0 ⇐⇒ mi−1=m2

i

mi+1

for 2 ≤i≤k−1,(53)

∂Lk

∂m1

= 0 ⇐⇒ m2=pm2

1(54)

Given mkwe use (52) and (53) to solve backwards in the index i, and we get

mk−1=m2

k, mk−2=m2

k−1

mk

=m3

k, mk−3=m2

k−2

mk−1

=m4

k

and, in general, mk−j=mj+1

k,0≤j≤k−1.(55)

We replace the values of m1and m2in (54) to obtain the equation

mk−1

k=pm2k

k⇐⇒ mk=p−1

k+1 ,(56)

from where we get (47). We show in Appendix C that the Hessian matrix of Lkevaluated at the

critical point (m]

1(k), . . . , m]

k(k)) is positive deﬁnite, and hence this is a minimum of Lk.

Substituting (47) in (41) yields L]

k=pk

k+1 +kp p−1

k+1 = (k+ 1) pk

k+1 .

We now optimize L]

kas a function of k. Denote

L]=L](p) := min

k∈R+

L]

k;k]:= arg min

k∈R+

L]

k.(57)

In general k]∈R\N. Notice that when kis not a positive integer it is not possible to deﬁne a

vector (m1, . . . , mk) where to evaluate Lk.

11

Lemma 7 (Optimal number of stages).For any p∈(0,1) we have

k]= log 1

p−1, L]=e p log(1/p).(58)

Furthermore, if p=e−ufor some integer u≥2, then k]=u−1and

L]=Lk](m], p),where m]= (eu−1, . . . , e).(59)

Proof of Lemma 7. We compute the derivative

∂L]

k

∂k =p1

k+1 h1 + log p

1 + ki,

which vanishes at k=k]= log 1

p−1. This is in fact a global minimum of L]

k, for a given value of p.

We now replace this value in L]

kto get

L]=p1−1

log(1/p)log 1

p=e p log(1/p).

Under p=e−uwe have k]=u−1, which replaced in (47) gives

m]

k]=eand m]

j=ek−j+1.

Remark Finucan [10] proposes to iterate Dorfman’s strategy with non necessarily nested pools,

under the assumption that at every stage each pool has at most one infected individual. Call U:=

number of infected individuals in a population of size N,Uhas Binomial(N, p) distribution. The

number of individuals to be tested in the i-th stage is Umi−1, and the total number of tests is

N

m1

+Um1

m2

+Um2

m3

+· ·· +U mk−1

mk

+Umk.(60)

Dividing by Nand taking expectation, Finucan gets the linearized cost Lk(m, p) deﬁned in (41)

and derives the results of Lemmas 6 and 7. He also shows that this optimal values maximize the

information gain per test in the case that there is at most one infected individual per pool.

However, the hypothesis that there is at most one infected individual per pool is not satisﬁed

for the optimal values (47). Indeed, when m1≈1/p, the number of infected individuals per pool

is approximately Poisson with mean 1. In any case, Finucan’s cost provides an upper bound to

the true cost of the strategy (k, m), as it in fact computes the number of tests in the worst case

scenario, this is proved rigorously in Lemma 5. This result can also be derived using an information-

based approach since the least informative case is that in which the infected samples are as uniformly

distributed as possible which, in the case of interest here, corresponds to having at most one infected

individual per pool at all stages.

3.4 The strategy (3k3,...,3)

In this subsection we compute kthat minimizes the cost Dk(µk, . . . , µ), pfor a given µ≥0, and

prove in Theorem 9 that this optimal choice when µ= 3 leads to a cost of the same order as the

minimum possible cost associated to p,

D?(p) := min

k∈Nmin

m∈Rk,↓

≥0

Dk(m, p).(61)

12

The choice of pool sizes is inspired by the optimal results for the linearization of the cost obtained

in the previous subsection.

Lemma 8. Among strategies m= (µk, . . . , µ), the optimizing kfor the cost Dk(m, p)is achieved

at kµ(p)deﬁned by

kµ(p) := jlogµ1

logµ(1/(1 −p))k.(62)

Proof. From the deﬁnition (19) of Dkwe have

Dk+1(m, p)−Dk(m, p) = 1

µk+1 −1

µk+1

µk(1 −qµk+1 )

=1

µk+1 −1

µkqµk+1 .

This expression is greater than zero if and only if µk+1 >1

logµ(1/q), which implies that kµ(p) given

by (62) minimizes Dk(m, p).

Theorem 9 (Estimation of the cost and accuracy of the strategy (3k3,...,3)).Let p∈[0,1

2],

k3=k3(p)as in (62). Then

Dk3(3k3,...,3), p≤3

log 3 plog(1/p) + p+ 5p2,(63)

and, with D?(p)deﬁned in (61),

Dk3(3k3,...,3), p−D?(p)≤3

log 3 −eplog(1/p) + p+ 15p2log(1/p) + 180p3(64)

≤0.013 plog(1/p) + p+O(p2log(1/p)).

Proof. Since the cost of a strategy is bounded by its linearized cost (42) we have

Dk3(3k3,...,3), p≤Lk3(3k3,...,3), p

=1

3k3+ 3pk3by (41) and the constant ratio mj−1

mj

= 3. (65)

Now

3k3≥log 3

3

1

log 1/q ≥log 3

3

1

p(1 + p

2)≥log 3

3

1

p,(66)

and

3pk3≤3plog31

log3(1/q)=−3p

log 3 [log log(1/q)−log log 3]

≤3

log 3 plog(1/p) + 3 log log 3

log 3 p+ 5p2.(67)

Apply bounds (66) and (67) to (65) to obtain

Dk3(3k3,...,3), p≤3

log 3 plog(1/p) + p+ 5p2,(68)

13

which is (63).

We next derive a lower bound. Let k?∈Nand m?∈Rk?,↓

≥0be as in (34), so that D?(p) =

Dk?(m?, p). By Lemma 5, we have

Dk3(3k3,...,3), p≥Dk?(m?, p)≥Lk?(m?, p)−`m?

1log2q−`k?p2,(69)

where `= max

2≤i≤k

m?

i−1

m?

i≤6 (Corollary 4), k?≤1+ |log log(1/q)|

log 2 and m?

1≤1

log(1/q)(Lemma 2). Replace

these bounds in (69) and use that by Lemma (7) Lk?(m?, p)≥ep log(1/p), to get

Dk3(3k3,...,3), p≥Dk?(m?, p)≥ep log(1/p)−6 log(1/q) log2q−6p21 + |log log(1/q)|

log 2

≥ep log(1/p)−6 log3(1/q)−6p21 + |log log(1/q)|

log 2

≥ep log(1/p)−15p2log(1/p)−180p3.(70)

Combining (68) and (70), we conclude that

ep log(1

p)−15p2log(1

p)−180p3≤Dk?(m?, p)≤Dk3(3k3,...,3), p≤3

log 3 plog( 1

p) + p+ 5p2.

(71)

and in particular

Dk3(3k3,...,3), p−D?(p)≤3

log 3 −eplog(1/p) + p+ 15p2log(1/p) + 180p3

≤0.013 plog(1/p) + p+O(p2log(1/p)).

The result follows.

3.5 Optimizing the cost by inspection

We report here some results obtained by inspecting the values taken by Dkover the family

Bk:= {(m1, . . . , mk)∈Z∩[2,100]k:mj=bjmj+1 for some bj∈N, j = 1, . . . , k −1},(72)

k∈ {1,...,5}. That is, Bkconsists of vectors of kpool sizes in the range [2,100] ∩Nsatisfying that

each size mjis a multiple of the next stage size mj+1. Denote by

m= (m

1(k), . . . , m

k(k)) := arg min

(m1,...,mk)∈Bk

Dk(73)

and D

k:= Dk(m, p).(74)

Optimizing now on k, let

D:= min

k∈{1,...,5}D

k;k:= arg min

k∈{1,...,5}

D

k;m

j:= m

j(k).(75)

Consider the standard deviation of the number of tests per person

σ(p) = σk(m, p) := 1

m

1qVarTk(m, p).(76)

14

To compute the variance we use (22), or (24) in Lemma 1. We note that the standard deviation is

of the same order as D(p). Hence, if one performs say, 100 independent realizations of the strategy,

we can expect that the average number of tests per individual will be within 3

√100 % of the optimal

value D(p) (with probability ≈.99, by the central limit theorem).

If we assume that m1does not exceed 100, then the values in (75) are optimal: they minimize

the (non-linearized) cost D. We observe in Table 1 critical values for several instances of p; see

Fig. 1 for a graphical representation.

p m

1m

2m

j≥3kD(p)σ(p)−log3(−log3(1 −p))

0.1 9 3 2 0.5863043 0.4027611 2.133979

0.08 9 3 2 0.5083693 0.3934454 2.346938

0.06 9 3 2 0.4228622 0.3725679 2.618467

0.04 12 3 2 0.3276941 0.3145522 2.997037

0.02 27 9 3 3 0.1979772 0.1997479 3.637304

0.01 343334−j+1 4 0.1179085 0.1059675 4.272842

0.008 343334−j+1 4 0.09877677 0.09875318 4.476873

0.006 343334−j+1 4 0.07876518 0.08931578 4.739649

0.004 353435−j+1 5 0.05722486 0.04901306 5.109633

0.002 353435−j+1 5 0.03220212 0.03821587 5.741475

0.0001 383738−j+1 8 0.002425894 0.002686147 8.469174

0.00001 310 39310−j+1 10 0.000305373 0.000363323 10.565118

Table 1: Upper 8 rows: optimal values of Dfound by full inspection of D(m, p) for all possible

pool-sizes min Bkand k∈ {1,2,3,4,5}. We notice that, except for p= 0.04, we have k=k3

and m= (3k3,...,3), see (62). Lower four (gray) rows (p=.004 to p=.00001): here D(p) =

D(3k3,...,3, p), the cost of the strategies (3k3,...,3) for these values of p. We have not proved

that those strategies are optimal but still they are feasible and their cost is computable. D(p) is

an upper bound of the optimal cost for these values.

15

Figure 1: Das a function of pin log-log scale. Colored dots represent the values of Dfor each p.

Filled dots correspond to the true minimum value and empty dots correspond to Dk3(3k3,...,3, p),

at p= 10−5and p= 10−4. The continuous line is the graph of the function L](p) = e p log(1/p)

obtained in Lemma 7.

3.6 Cost and linearized cost of strategy (3k3,...,3)

We now consider infection probabilities of the form p=e−ufor u∈Nand compare

(i) the linearized cost of the optimal linearized strategy L](p) = Lk](m], p) = ue1−u, see (58)

and (59). The choice p=e−uimplies k]=u−1∈Nso we can deﬁne m]= (ek], . . . , e);

(ii) the linearized cost Lk3(m3, p) of the strategy m3:= (3k3,...,3), where k3is deﬁned in (62);

(iii) the cost Dk3(m3, p) of the strategy m3:= (3k3,...,3), where k3is deﬁned in (62). We

concluded by inspection that for p≥e−5this is the optimal nested strategy in Bk,k≤5,

deﬁned in (72).

The choice of pis motivated by the requirement that k]= log(1/p)−1 belong to N, to be able to

deﬁne Lk](m], p), see comment above Lemma 7. In this case k]=u−1. Notice that (42) implies

Dk3(m3, p)≤Lk3(m3, p).(77)

Table 2 shows those values. For 2 ≤u≤5 the values of Dk3(m3, p) are optimal in the sets of

strategies Bkdeﬁned in (72), for k≤5; this was observed by inspection. We include

(iv) the diﬀerence Lk3(m3, p)−Dk3(m3, p);

(v) the bound (44) for this diﬀerence obtained in Lemma 5.

16

u p =e−uk]k3Lk](m], p)Lk3(m3, p)Dk3(m3, p)Lk3−Dk3Bound (44)

2 0.1353353 1 1 0.7357589 0.739339183 0.686871 0.052468183 0.245252580231

3 0.04978707 2 2 0.4060058 0.4098335 0.3759855 0.033848 0.0852901665983

4 0.01831564 3 3 0.1991483 0.2018778 0.1857311 0.0161467 0.0306978149437

5 0.006737947 4 4 0.09157819 0.09320104 0.08625753 0.00694351 0.011651778305

6 0.002478752 5 5 0.04042768 0.04129651 0.03843346 0.00286305 0.0045824219304

7 0.000911882 6 6 0.01735127 0.01778562 0.01662603 0.00115959 0.0018351805189

8 0.000335462 7 7 0.007295056 0.007501963 0.007036057 0.000465906 0.0007409542827

9 0.000123409 8 8 0.003019164 0.003114251 0.002927807 0.000186444 0.0003001742097

10 4.539993e-05 9 9 0.001234098 0.001276603 0.001202175 0.000074428 0.0001217702372

11 1.67017e-05 10 10 0.000499399 0.000517986 0.000488330 2.96557e-05 4.9423784149e-05

12 6.144212e-06 11 11 0.000200420 0.000201261 0.000196608 4.6534e-06 2.0063981836e-05

13 2.260329e-06 12 11 7.987476e-05 8.02359e-05 7.8386e-05 1.8499e-06 2.7153541193e-06

14 8.315287e-07 13 12 3.164461e-05 3.181671e-05 3.107271e-05 7.44e-07 1.1024045206e-06

15 3.059023e-07 14 13 1.247293e-05 1.255742e-05 1.225847e-05 2.9895e-07 4.4757599204e-07

16 1.125352e-07 15 14 4.894437e-06 4.935552e-06 4.815546e-06 1.20006e-07 1.8171748609e-07

17 4.139938e-08 16 15 1.913098e-06 1.932664e-06 1.88454e-06 4.8124e-08 7.3778218113e-08

18 1.522998e-08 17 16 7.451888e-07 7.542696e-07 7.349931e-07 1.92765e-08 2.9954367138e-08

19 5.602796e-09 18 17 2.893696e-07 2.934861e-07 2.85774e-07 7.7121e-09 1.2161645332e-08

20 2.061154e-09 19 18 1.120559e-07 1.138835e-07 1.10802e-07 3.0815e-09 4.9376984908e-09

Table 2: Comparison of the costs L]

k](m], p), Lk3(m3, p) and Dk3(m3, p); see (i)-(v) above for

deﬁnitions. Only strategy (k3, m3) corresponds to a practical application. For 2 ≤u≤5 the values

of Dk3(m3, p) are optimal in the sets of strategies Bkdeﬁned in (72), for k≤5.

3.7 Conjecturing the optimal strategy

After plotting the cost of diﬀerent strategies we have observed that, depending on p, the cost

is minimized by strategies (3k,...,3) or (3k−14,3k−1,...,3). The transition from one of these

strategies to another occurs at points λkand ρk, where ρ1:= 1 −3−1/3–the solution of D1(3, p)=1

and for k≥1,

λk:= solution pin [0, ρk) of Dk((3k,...,3), p) = Dk((3k−14,3k−1,...,3), p); (78)

ρk+1 := solution pin [0, λk) of Dk+1((3k+1,...,3), p) = Dk((3k−14,3k−1, , . . . , 3), p).(79)

The solution of each equation is unique in the corresponding interval. Denote the space of nested

feasible strategies of size kby

Nk:= m= (m1, . . . , mk)∈Nk:mj=bjmj+1 for some bj∈N, j = 1, . . . , k −1,(80)

and let

Dopt(p) := min

k∈Nmin

m∈Nk

Dk(m, p).(81)

Conjecture 1. The optimal cost Dopt(p)is realized by the strategy k= 1, m = 1 (no pooling) if

p≥ρ1, and

Dopt(p) = (Dk((3k,...,3), p)if λk≤p≤ρk,

Dk((3k−14,3k−1,...,3), p)if ρk+1 ≤p≤λk.(82)

This conjecture has been checked by J. M. Martinez for one million values of pchosen uniformly

in the interval [0,0.31], using dynamic programming [15].

17

Computation of ρk, λkRecalling q= 1 −p, we have

Dk((3k,...,3), p)−Dk((3k−14,3k−1,...,3), p) = 0

if and only if x=q3k−1satisﬁes 1 + 12 x4−12 x3= 0,(83)

and

Dk+1((3k+1,...,3), p)−Dk((3k−14,3k−1,...,3), p) = 0

if and only if y=q3k−1satisﬁes 12 y9+ 36 y3−36 y4−9 = 0.(84)

Using Mathematica [13], we ﬁnd that the solutions of (83) and (84) are x≈0.876057169753174 and

y≈0.89015230755103751611. Using λk= 1 −x(1/3k−1)and ρk+1 = 1 −y(1/3k−1), we listed some of

those transition points in Table 3.

k λk≈ρk≈

1 0.12394283024682595 0.30663872564936529515

2 0.043149364977271065 0.10984769244896248389

3 0.0145951023362655970 0.03804496086305981708

4 0.004888896470446 0.01284596585087308883

5 0.00163229509440177 0.004300456028242352465

6 0.0005443946765845142 0.001435545146496115234

7 0.00018149783166476752 0.0004787442082735720083

8 0.00006050293775328175 0.0001596068757573522991

Table 3: Transition points ρkand λkdeﬁned in (78) and (79).

We plot the strategies of Conjecture 1 for 1 ≤k≤7 in Fig. 2 and 3.

Figure 2: Strategy cost in function of p. Each line represents one of the strategies of Conjec-

ture 1. The conjecture states that the optimal cost at pis realized by the minimum of these curves

at p. Dotted lines are strategies (3k,...,3), optimal for p∈[λk, ρk], and full lines are strategies

(3k−14,3k−1,...,3), optimal for p∈[ρk+1, λk]. Plots obtained using desmos.com software; detailed

plot at https://www.desmos.com/calculator/tr3bek9hm0.

18

Figure 3: Details of Fig. 2. The ﬁgure on the left shows the transitions at point λ1≈0.1239 from

strategy k= 1, m = 3 (black dotted line) to k= 1, m = 4 (black continuous line) and then at point

ρ2≈0.1098 to strategy k= 2, m = (9,3) (blue dotted line). The ﬁgure on the right shows the

transition at point ρ7≈0.0004787 from strategy k= 6, m = (354,35,...,3) (green continuous line)

to k= 7, m = (37,...,3) (orange dotted line). Plots obtained with desmos.com software.

Finally, the cost of the conjectured optimal strategy as a function of pin log-log scale is shown

in Fig. 4.

p

0.01

0.1

1

0.001 0.01 0.1

Dopt(p)

Figure 4: Cost Dopt(p) of the conjectured optimal strategy in function of pin log-log scale. Blue

segments corresponds to mopt

1= 4 ×3kopt −1and grey segments to mopt

1= 3kopt .

19

A Computation of the variance

We compute here VarTk. We start with k+ 1 = 3. From (15) we get

VarT2=m2

m2

2

qm11−qm1+m2

2

m1

m2

qm21−qm2(85)

+ 2 m2

2X

i6=j

Cov(1 −φi,1−φj) (86)

+ 2 m1

m1

m2−1

X

i=0

Cov(1 −φ, 1−φi).(87)

The sum in (86) vanishes because 1 −φjand 1 −φkare independent if j6=k, while

Cov(1 −φ, 1−φi) = E(1 −φ)(1 −φi)−E[1 −φ]E[1 −φi]

=E[1 −φi]−1−qm11−qm2by (14)

=1−qm2qm1.(88)

Replacing in (85) we get

VarT2=m2

1

m2

2

qm11−qm1+m2m1qm21−qm2+ 2 m2

1

m2

qm11−qm2.(89)

The previous argument can be extended to several stages, as long as each pool size is a multiple

of the pool size in the following stage. This is the content of Lemma 1 in §2 which we prove next.

Proof of Lemma 1. From (17) we get

VarTk=µ2nVar(1 −φ) + ·· · +

m1

m2−1

X

i1=0

m2

m3−1

X

i2=0 ···

mk−1

mk−1

X

ik−1=0

Var1−φi1i2...ik−1o(90)

+ 2µ2n

m1

m2−1

X

i=0

Cov1−φ, 1−φi+·· · +

m1

m2−1

X

i1=0

m2

m3−1

X

i2=0 ···

mk−1

mk−1

X

ik−1=0

Cov1−φ, 1−φi1i2...ik−1o(91)

+ 2µ2n

m1

m2−1

X

i1=0 h

m2

m3−1

X

i2=0

Cov1−φi1,1−φi1,i2+·· · +

m2

m3−1

X

i2=0 ···

mk−1

mk−1

X

ik−1=0

Cov1−φi1,1−φi1i2...ik−1io

(92)

+. . .

+ 2µ2n

m1

m2−1

X

i1=0 ···

mk−2

mk−1−1

X

ik−2=0 h

mk−1

mk−1

X

ik−1=0

Cov1−φi1...ik−2,1−φi1...ik−2ik−1io.(93)

The ﬁrst line (90) follows by adding the variances of each of the sums in (17), and using that terms

belonging to the same sum are independent, hence that are no covariance terms arising from each

of the individual sums. We then compute the covariances between the diﬀerent sums, and we take

20

advantage of the fact that if l < n, then 1 −φi1...iland 1 −φj1...jnare independent unless i`=j`for

all 1 ≤`≤l, and in this case

Cov1−φi1...il,1−φi1...in=qml+1 1−qmn+1 ,(94)

by a computation similar to (88). Recall that 1 −φi1...ij∼Bernoulli(1 −qmj+1 ), hence

Var(1 −φi1...ij) = qmj+1 1−qmj+1 .(95)

Substituting (94) and (95) in the expression for the variance above, we have

VarTk=µ2nqm11−qm1+µqm21−qm2+· ·· +µk−1qmk1−qmko

+ 2µ2nµqm11−qm2+µ2qm11−qm3· ·· +µk−1qm11−qmko

+ 2µ2nµ2qm21−qm3+· ·· +µk−1qm21−qmko(96)

+. . .

+ 2µ2nµk−1qmk−11−qmko.

If we rewrite (96) by collecting all terms that have a factor (1−qmi),1≤i≤k, we get the expression

in (23).

B Error in the linear approximation

Proof of Lemma 5. We have

Dk(m, p)−Lk(m, p) = 1−emklog q−mkp+

k

X

j=2

1

mj1−emj−1log q−mj−1p.(97)

To show that the error is non positive it suﬃces to prove that

f(p)=1−emjlog q−mjp≤0,0≤p < 1

for any given 1 ≤j≤k. We have f(0) = 0 and

f0(p) = mj

qemjlog q−mj

=mjhemjlog q

q−1i=mjhqmj

q−1i

=mjhqmj−1−1i≤0.

because mj≥1. This implies that fis decreasing and f(p)<0 for all p, and item i) in the lemma

follows.

To prove item ii), note that by the inequality |1−ex+x| ≤ x2

2on x≤0, we have

1−emj−1log q

mj

+mj−1

mj

log q≤1

2

m2

j−1log2q

mj

.

21

Denote m= (m1, . . . , mk) and recall the notation mk+1 := 1. Then

Dk(m, p)−1

m1−mklog q−

k

X

j=2

mj−1

mj

log q≤1

2

k+1

X

j=2

m2

j−1log2q

mj

≤1

2`log2q

k+1

X

j=2

m1

2j−2using mj≤m1

2j−1

≤`m1log2q. (98)

On the other hand

Lk(m, p)−1

m1−mklog q−

k

X

j=2

mj−1

mj

log q=p+ log q

k+1

X

j=2

mj−1

mj

≤k`p2,(99)

where the last line follows from the inequality |x+ log(1 −x)| ≤ x2, 0 ≤x≤1

2. The result follows

from (98) and (99).

C Positive deﬁnite Hessian matrix

We prove here that the critical point (47) is indeed a minimum of Lk, for given k. The Hessian

matrix of Lkis a tridiagonal symmetric matrix Hkwith entries

H11 =2

m3

1

, Hii =2pmi−1

m3

i

,2≤i≤k,

Hi i+1 =Hi+1 i=−p

m2

i+1

,1≤i≤k−1,

and Hij = 0 if |i−j|>1.

Let us denote by H]=H(m]

1(k), . . . , m]

k(k). To simplify notation, let µ:= p−1

k+1 , so that

m]

j(k) = µk−j+1. We have

Hii = 2µp3µ2i,2≤i≤k,

Hi i+1 =Hi+1 i=−p3µ2(i+1),1≤i≤k−1,

and Hij = 0 if |i−j|>1.

Given x= (x1, . . . , xk)∈Rk, let us deﬁne y:= µx1, µ2x2, . . . , µkxk. We compute

xtH]x=p3hk

X

i=1

2µµ2ix2

i−2

k−1

X

i=1

µ2(i+1)xixi+1i

=p3k

X

i=1

2µy2

i−2

k−1

X

i=1

µyiyi+1i

=µp3hy2

1+ (y1−y2)2+ (y2−y3)2+· ·· + (yk−1−yk)2+y2

ki≥0,

and xtH]x= 0 if and only if y1=y2=. . . yk= 0, or, in terms of the original vector, x= 0. We

conclude that H]is positive deﬁnite.

22

Acknowledgements

We would like to thank Pablo Aguilar, Alejandro Colaneri, Hugo Menzella, Juliana Sesma, Sergio

Chialina for bringing this problem to our attention and encouraging us to study it. P.A.F. would

like to thank Luiz-Rafael Santos for comments and reference suggestions.

References

[1] M. Aldridge, L. Baldassini, and O. Johnson,Group testing algorithms: Bounds and

simulations, IEEE Transactions on Information Theory, 60 (2014), pp. 3671–3687.

[2] C. R. Bilder and J. M. Tebbs,Pooled-testing procedures for screening high volume clinical

specimens in heterogeneous populations, Statistics in Medicine, 31 (2012), pp. 3261–3268.

[3] M. S. Black, C. R. Bilder, and J. M. Tebbs,Optimal retesting conﬁgurations for hier-

archical group testing, Journal of the Royal Statistical Society: Series C (Applied Statistics),

64 (2015), pp. 693–710.

[4] C. L. Chan, P. H. Che, S. Jaggi, and V. Saligrama,Non-adaptive probabilistic group

testing with noisy measurements: Near-optimal bounds with eﬃcient algorithms, in 2011 49th

Annual Allerton Conference on Communication, Control, and Computing (Allerton), Sep. 2011,

pp. 1832–1839.

[5] A. De Bonis,New combinatorial structures with applications to eﬃcient group testing with

inhibitors, Journal of Combinatorial Optimization, 15 (2008), pp. 77–94.

[6] L. Dong, J. Zhou, C. Niu, Q. Wang, Y. Pan, S. Sheng, X. Wang, Y. Zhang, J. Yang,

M. Liu, Y. Zhao, X. Zhang, T. Zhu, T. Peng, J. Xie, Y. Gao, D. Wang, Y. Zhao,

X. Dai, and X. Fang,Highly accurate and sensitive diagnostic detection of sars-cov-2 by

digital pcr, medRxiv, (2020).

[7] R. Dorfman,The detection of defective members of large populations, Ann. Math. Statist.,

14 (1943), pp. 436–440.

[8] D.-Z. Du and F. K. Hwang,Pooling Designs and Nonadaptive Group Testing, WORLD

SCIENTIFIC, 2006.

[9] W. Feller,An introduction to probability theory and its applications. Vol. I, John Wiley and

Sons, 1968.

[10] H. M. Finucan,The blood testing problem, Journal of the Royal Statistical Society. Series C

(Applied Statistics), 13 (1964), pp. 43–50.

[11] L. E. Graff and R. Roeloffs,Group testing in the presence of test error; an extension of

the Dorfman procedure, Technometrics, 14 (1972), pp. 113–122.

[12] R. Hanel and S. Thurner,Boosting test-eﬃciency by pooled testing strategies for sars-cov-2,

2020.

[13] W. R. Inc.,Mathematica, Version 12.0. Champaign, IL, 2019.

23

[14] H.-Y. Kim, M. G. Hudgens, J. M. Dreyfuss, D. J. Westreich, and C. D. Pilcher,

Comparison of group testing algorithms for case identiﬁcation in the presence of test error,

Biometrics, 63 (2007), pp. 1152–1163.

[15] J. M. Martinez, Personal communication.

[16] C. S. McMahan, J. M. Tebbs, and C. R. Bilder,Informative Dorfman screening, Bio-

metrics, 68 (2012), pp. 287–296.

[17] , Two-dimensional informative array testing, Biometrics, 68 (2012), pp. 793–804.

[18] C. Mentus, M. Romeo, and C. DiPaola,Analysis and applications of non-adaptive and

adaptive group testing methods for covid-19, medRxiv, (2020).

[19] W. H. Organization,Coronavirus disease 2019 (COVID-19) situation report-82. April 11,

2020., 2020.

[20] R. M. Phatarfod and A. Sudbury,The use of a square array scheme in blood testing,

Statistics in Medicine, 13 (1994), pp. 2337–2343.

[21] N. Sinnott-Armstrong, D. Klein, and B. Hickey,Evaluation of group testing for sars-

cov-2 rna, medRxiv, (2020).

[22] M. Sobel and P. A. Groll,Group testing to eliminate eﬃciently all defectives in a binomial

sample, The Bell System Technical Journal, 38 (1959), pp. 1179–1252.

[23] A. Sterrett,On the detection of defective members of large populations, The Annals of

Mathematical Statistics, 28 (1957), pp. 1033–1036.

[24] T. Suo, X. Liu, M. Guo, J. Feng, W. Hu, Y. Yang, Q. Zhang, X. Wang, M. Sajid,

D. Guo, Z. Huang, L. Deng, T. Chen, F. Liu, K. Xu, Y. Liu, Q. Zhang, Y. Liu,

Y. Xiong, G. Guo, Y. Chen, and K. Lan,ddpcr: a more sensitive and accurate tool for

sars-cov-2 detection in low viral load specimens, medRxiv, (2020).

[25] I. Yelin, N. Aharony, E. Shaer-Tamar, A. Argoetti, E. Messer, D. Beren-

baum, E. Shafran, A. Kuzli, N. Gandali, T. Hashimshony, Y. Mandel-Gutfreund,

M. Halberthal, Y. Geffen, M. Szwarcwort-Cohen, and R. Kishony,Evaluation of

covid-19 rt-qpcr test in multi-sample pools, medRxiv, (2020).

24