Content uploaded by Jakub Pawlewicz

Author content

All content in this area was uploaded by Jakub Pawlewicz on Jul 16, 2014

Content may be subject to copyright.

arXiv:1107.4890v1 [math.NT] 25 Jul 2011

Counting Square-Free Numbers

Jakub Pawlewicz

Institute of Informatics

University of Warsaw

pan@mimuw.edu.pl

Abstract. The main topic of this contribution is the problem of count-

ing square-free numbers not exceeding n. Before this work we were able to

do it in time1˜

O(√n). Here, the algorithm with time complexity ˜

O(n2/5)

and with memory complexity ˜

O(n1/5) is presented. Additionally, a par-

allel version is shown, which achieves full scalability.

As of now the highest computed value was for n= 1017. Using our

implementation we were able to calculate the value for n= 1036 on a

cluster.

Keywords: square-free number, M¨obius function, Mertens function

1 Introduction

A square-free number is an integer which is not divisible by a square of any

integer greater than one. Let S(n) denote the number of square-free positive

integers less or equal to n. We can approximate the value of S(n) using the

asymptotic equation:

S(n) = 6

π2n+O(√n).

Under the assumption of the Riemann hypothesis the error term can be further

reduced [3]:

S(n) = 6

π2n+O(n17/54+ε).

Although these asymptotic equations allow us to compute S(n) with high accu-

racy, they do not help to compute the exact value.

The basic observation for eﬃcient algorithms is the following formula.

Theorem 1.

S(n) = ⌊√n⌋

X

d=1

µ(d)jn

d2k,(1)

where µ(d)is the M¨obius function.

1Comparing to the Big-O notation, Soft-O ( ˜

O) ignores logarithmic factors.

The simple proof of this theorem using the inclusion-exclusion principle is pre-

sented in App. A. The same proof can be found in [4]. It allows the author to

develop an ˜

O(√n) algorithm and to compute S(1017). In Sect. 2 we show de-

tails of this algorithm together with the reduction of the memory complexity to

O(4

√n).

To construct a faster algorithm we have to play with the summation (1). In

Sect. 3.1 the new formula is derived and stated in Theorem 2. Using this theorem

we are able to construct the algorithm working in time ˜

O(n2/5). It is described

in the rest of Sect. 3. However, to achieve a memory eﬃcient procedure more

research is required. The memory reduction problem is discussed in Sect. 4, where

the modiﬁcations leading to the memory complexity ˜

O(n1/5) are presented. The

result is put into Algorithm 4.

Applying Algorithm 4 for huge nleads to computing time measured in years.

Therefore, a practical algorithm should be distributed. Section 5 addresses the

parallelization problem. At ﬁrst sight it looks that Algorithm 4 can be easily

distributed, but a deeper analysis uncovers new problems. We present a solution

for these problems, and get a fully scalable method. As a practical evidence, we

computed S(10e) for all integers e≤36, whereas before, the largest known value

of S(n) was for n= 1017 [4,6]. For instance, the value S(1036) was computed in

88 hours using 256 processors. The detailed computation results are attached in

Sect. 6.

2 The ˜

O(√n) algorithm

We simply use Theorem 1 to compute S(n). In order to compute summation

(1) we need to ﬁnd the values of µ(d) for d= 1,...,K, where K=⌊√n⌋.

This can be done in time O(Klog log K) and in memory O(√K) using a sieve

similar to the sieve of Eratosthenes [2]. See App. B for a detailed descrip-

tion. This sieving algorithm tabulates values in blocks of size B=⌊√K⌋.

We assume we have the function TabulateM¨

obiusBlock such that the call

TabulateM¨

obiusBlock(a, b) outputs the array mu containing the values of

the M¨obius function: µ(k) = mu[k] for each k∈(a, b ]. This function works in

time O(blog log b) and in memory O(max(√b, b −a)). Now, to calculate S(n),

we split the interval [1, K ] into O(√K) blocks of size B. It is presented in Algo-

rithm 1.

Summarizing, the basic algorithm has ˜

O(√n) time complexity and O(4

√n)

memory complexity.

3 The New Algorithm

The key point of discovering a faster algorithm is a derivation of a new formula

from (1) in Sect. 3.1. The new formula depends on the Mertens function (the

M¨obius summation function). Section 3.2 explains how one may compute the

needed values. Section 3.3 states the algorithm. In Sect. 3.4 the optimal values

Algorithm 1 Calculating S(n) in time ˜

O(√n) and in memory O(4

√n)

1: s←0, b ←0, K ←Θ(⌊√n⌋), B ← ⌊√K⌋

2: repeat

3: a←b, b ←min(b+B, K )

4: TabulateM¨

obiusBlock(a, b)

5: for k=a+ 1,...,b do

6: s←s+mu[k]·jn

k2k

7: end for

8: until a≥K

9: return s

of the algorithm parameters are estimated, and the resulting time complexity of

˜

O(n2/5) is derived.

3.1 Establishing the New Formula

To alter (1) we break the sum. We split the summation range [1,⌊√n⌋] into two

smaller intervals [1, D] and (D, ⌊√n⌋]:

S(n) = S1(n) + S2(n),

where

S1(n) = X

1≤d≤D

µ(d)jn

d2k,

S2(n) = X

d>D

µ(d)jn

d2k.

We introduced a new variable D. Optimal value of this variable will be deter-

mined later. Sum S2(n) can be rewritten using Iverson’s convention2:

S2(n) = X

d>D

µ(d)jn

d2k=X

d>D X

ihi=jn

d2kiiµ(d).(2)

The predicate in brackets transforms as follows:

i=jn

d2k⇐⇒ i≤n

d2< i + 1 ⇐⇒ rn

i+ 1< d ≤rn

i.

To shorten the notation we introduce a new variable Iand a new sequence xi:

xi=rn

ifor i= 1,...,I . (3)

2[P] = (1 if Pis true ,

0 otherwise .

The sequence xishould be strictly decreasing. To ensure this, it is enough to set

Isuch that

rn

I−1−rn

I≥1

√n≥√I√I−1

√I−√I−1

√n≥√I√I−1(√I+√I−1)

n≥I(I−1)(√I+√I−1)2.(4)

Because

I(I−1)(√I+√I−1)2< I ·I·(2√I)2= 4I3,

to satisfy (4), it is enough to set

I≤3

rn

4.(5)

Suppose we set Isatisfying (5). Now, we take D=xIand we use xinotation

(3) in (2):

S2(n) = X

d>xIX

i

[xi+1 < d ≤xi]iµ(d) = X

1≤i<I

iX

xi+1<d≤xi

µ(d).(6)

Finally, it is convenient to use the Mertens function:

M(x) = X

1≤i≤x

µ(i) = ⌊x⌋

X

i=1

µ(i),(7)

thus we simplify (6) to:

S2(n) = X

1≤i<I

iM(xi)−M(xi+1)=X

1≤i<I

M(xi)−(I−1)M(xI).(8)

Theorem 2 summarizes the above analysis.

Theorem 2. Let Ibe a positive integer satisfying I≤3

rn

4. Let xi=rn

i

for i= 1,...,I and D=xI. Then S(n) = S1(n) + S2(n), where

S1(n) =

D

X

d=1

µ(d)jn

d2k,

S2(n) = I−1

X

i=1

M(xi)!−(I−1)M(xI).

3.2 Computing Values of the Mertens Function

By applying the M¨obius inversion formula to (7) we can get a nice recursion for

the Mertens function:

M(x) = 1 −X

d≥2

Mx

d.(9)

Here, an important observation is that having all values M(x/d) for d≥2, we

are able to calculate M(x) in time O(√x). This is because there are at most

2√xdiﬀerent integers of the form ⌊x/d⌋, since x/d < √xfor d > √x.

3.3 The Algorithm

The simple algorithm exploiting the above ideas is presented in Algorithm 2.

Algorithm 2 Eﬃcient counting square-free numbers

1: compute S1(n) and M(d) for d= 1,...,D

2: for i=I−1,...,1do

3: compute M(xi) by (9)

4: end for

5: compute S2(n) by (8)

6: return S1(n) + S2(n)

To compute M(xi) (line 3) we need the values M(xi/d) for d≥2. If xi/d ≤D

then M(xi/d) was determined during the computation of S1(n). If xi/d > D then

see that

jxi

dk=pn

i

d=pn

i

d=rn

d2i=xd2i,(10)

thus M(xi/d) = M(xj) for j=d2i. Of course j < I, because otherwise pn/j ≤

D. Observe that it is important to compute M(xi) in a decreasing order (line 2).

3.4 The Complexity

Let us estimate the time complexity of Algorithm 2. Computing S1(n) has com-

plexity O(Dlog log D).

Computing M(xi) takes O(√xi) time. The entire for loop (line 2–4) has the

time complexity:

I

X

i=1

O(√xi) =

I

X

i=1

O srn

i!=O4

√n

I

X

i=1

1

4

√i.(11)

Using the asymptotic equality

I

X

i=1

1

4

√i=Θ(I3/4),

(11) rewrites to:

On1/4I3/4.

The computation of S2(n) is dominated by the for loop. Summarizing the

time complexity of Algorithm 2 is

ODlog log D+n1/4I3/4.(12)

We have to tune the selection of Iand Dto minimize the expression (12). The

larger Iwe take the smaller Dwill be, thus the parameters Iand Dare optimal

when

O(Dlog log D) = On1/4I3/4.

This takes place for

I=n1/5(log log n)4/5,

and then

O(Dlog log D) = On1/4I3/4=O(n2/5(log log n)3/5).

Theorem 3. The time complexity of Algorithm 2 is O(n2/5(log log n)3/5) =

˜

O(n2/5)for I=n1/5(log log n)4/5=˜

O(n1/5).

The bad news are the memory requirements. To compute M(xi) values we

need to remember M(d) for all d= 1,...,D, thus we need O(D) = ˜

O(n2/5)

memory. This is even greater memory usage than in the basic algorithm. In the

next section we show how to overcome this problem.

4 Reducing Memory

To reduce memory we have to process values of the M¨obius function in blocks.

This aﬀects the computation of needed Mertens function values which were pre-

viously computed by the recursion (9) as described in Sect. 4.1. These values

have to be computed in a more organized manner. Section 4.2 provides neces-

sary utilities for that. Moreover in Sect. 4.3 some data structures are introduced

in order to achieve a satisfying time complexity. Finally, Sect. 4.4 states the

algorithm together with a short complexity analysis.

4.1 Splitting into Blocks

We again apply the idea of splitting computations into smaller blocks. To com-

pute S1(n) we need to determine µ(d) and M(d) for d= 1,...,D. We do it

in blocks of size B=Θ(√D) by calling procedure TabulateM¨

obiusBlock.

That way we are able to compute S1(n), but to compute S2(n) we face to the

following problem.

We need to compute M(xi) for integer i∈[1, I). Previously, we memorized

all needed M(1),...,M(D) values and used recursion (9). Now, we do not have

unrestricted access to values of the Mertens function. After processing a block

(a, b ] we have only access to values M(k) for k∈(a, b ]. We have to utilize these

values before we switch to the next block. If a value M(k) occurs on the right

hand side of the recursion (9) for x=xifor some i∈[1, I ), then we should make

an update.

The algorithm should look as follows. We start the algorithm by creating an

array Mx:

Mx[i]←1 for i= 1,...,I −1.

During the computation of S1(n) we determine M(k) for some k. Then, for every

i∈[1, I) such that M(k) occurs in the sum

X

d≥2

Mxi

d,(13)

i.e. for every i∈[1, I ) such that there exists an integer d≥2 such that

k=jxi

dk,(14)

we estimate the number of occurrences mof M(k) in (13) and update

Mx[i]←Mx[i]−m·M(k).(15)

After processing all k= 1,...,D, there remains to update Mx[i] by M(xi/d) for

all ⌊xi/d⌋> D. With the help of equality (10) it is enough to update Mx[i] by

M(xd2i) for all d2i < I. After these updates we will have Mx[i] = M(xi).

4.2 Dealing with Mx Array Updates

The problem is how to, for given k, quickly ﬁnd all possible values of i, that

there exists an integer d≥2 fulﬁlling (14). There is no simple way to do it in

expected constant time. Instead, for given iwe can easily calculate successive k.

Lemma 1. Suppose that for a given integer i∈[1, I)and an integer kthere

exists an integer dsatisfying (14). Let us denote

da=jxi

kk,

db=jxi

k+ 1 k,

then

(i) the number of occurrences m, needed for update (15), equals da−db,

(ii) the next integer ksatisfying (14) is for d=db, and it is equal to ⌊xi/db⌋.

Proof. All possible integers dsatisfying (14) are:

k≤xi

d< k + 1 ⇐⇒ xi

k+ 1 < d ≤xi

k⇐⇒ jxi

k+ 1 k< d ≤jxi

kk,(16)

so (14) is satisﬁed for d∈(db, da], and the next ksatisfying (14) is for d=db.⊓⊔

Lemma 1, for every i, allows us to walk through successive values of k, for which

we have to update Mx[i]. Since the target is to reduce the memory usage, we

need to group all updates into blocks. Algorithm 3 shows how to utilize Lemma 1

in order to update Mx[i] for the entire block (a, b ].

Algorithm 3 Updating Mx[i] for a block (a, b ]

Require: bounds 0 ≤a < b, index i∈[1, I ), the smallest k∈(a, b ] that there exists

dsatisfying (14)

Ensure: Mx[i] is updated by all M(k) for k∈(a, b ], the smallest k > b for the next

update is returned

1: function MxBlockUpdate(a , b, i, k)

2: da←jxi

kk

3: repeat

4: db←jxi

k+ 1 k

5: Mx[i]←Mx[i]−(da−db)·M(k)

6: k←jxi

dbk

7: da←db

8: until k > b

9: return k

10: end function

4.3 Introducing Additional Structures

Let B=⌊√D⌋be the block size, and L=⌈D/B⌉be the number of blocks.

We process kvalues in blocks (a0, a1],(a1, a2],...,(aL−1, aL], where al=Bl for

0≤l < L and aL=D. We need additional structures to keep track for every

i∈[1, I) where is the next update:

–mink[i] stores the next smallest kfor which Mx[i] has to be updated,

–ilist[l] is a list of indexes ifor which the next update will be for kbelonging

to the block (al, al+1].

Using these structures we are able to perform every update in constant time.

Once we update Mx[i] for all necessary k∈(al, al+1] by MxBlockUpdate, we

get next k > al+1 for which the next update should be done. We can easily

calculate the block index l′for this kand schedule it by putting iinto ilist[l′].

4.4 The Algorithm

The result of the entire above discussion is presented in Algorithm 4. We man-

aged to preserve the number of operations, therefore the time complexity re-

mained ˜

O(n2/5). Each of the additional structures has I=˜

O(n1/5) or L=

O(√D) = ˜

O(n1/5) elements. Blocks have size O(B) = O(√D) = ˜

O(n1/5),

Therefore, the memory complexity of Algorithm 4 is ˜

O(n1/5).

Algorithm 4 Calculating S(n) in time ˜

O(n2/5) and in memory ˜

O(n1/5)

1: I←Θ(n1/5(log log n)4/5), D ←rn

I, B ← ⌊√D⌋, L ← ⌈D/B⌉

2: for l= 0,...,L−1do

3: ilist[l]← ∅

4: end for

5: for i= 0,...,I −1do

6: Mx[i]←1

7: mink[i]←1

8: ilist[0] ←ilist[0] ∪ {i}

9: end for

10: s1←0

11: for l= 0,...,L−1do ⊲blocks processing loop

12: TabulateM¨

obiusBlock(al, al+1)

13: for k∈(al,...,al+1]do

14: s1←s1+mu[k]·jn

k2k

15: end for

16: compute M(k) for k∈(al, al+1] from values mu[k] and M(al)

17: for each i∈ilist[l]do

18: mink[i]←MxBlockUpdate(al, a l+1, i, mink[i])

19: l′←jmink[i]

Bk⊲next block where Mx[i] has to be updated

20: if l′≤Land mink[i]< xithen

21: ilist[l′]←ilist[l′]∪ {i}

22: end if

23: end for

24: ilist[l]← ∅

25: end for

26: for i=I−1,...,1do ⊲updating Mx[i] by M(k) for k > D

27: for all d≥2 such that d2i < I do

28: Mx[i]←Mx[i]−Mx[d2i]

29: end for

30: end for

31: compute s2=S2(n) by (8)

32: return s1+s2

Observe that most work is done in the blocks processing loop (lines 11–25),

because every other part of the algorithm takes at most ˜

O(n1/5) operations. Ini-

tialization of structures (lines 1–10) is proportional to their size ˜

O(n1/5). Com-

puting S2(n) by (8) (line 31) takes O(I) = ˜

O(n1/5). Only the time complexity

of the part responsible for updating Mx[i] by M(k) for k > D (lines 26–30) is

unclear. The total number of updates in this part is:

I−1

X

i=1 X

2≤d

d2i<I

1≤

I

X

i=1 rI

i=√I·

I

X

i=1

1

√i=√I·O(√I) = O(I),

thus it is O(I) = ˜

O(n1/5).

5 Parallelization

As noted in Sect. 4.4, the most time consuming part of Algorithm 4 is the

blocks processing loop. The basic idea is to distribute calculations made by this

loop between Pprocessors. We split the interval [1, D] into a list of Psmaller

intervals: (a0, a1],(a1, a2],...,(aP−1, aP], where 0 = a0< a1<···< aP=D.

Processor number p, 0 ≤p < P , focus only on the interval (ap, ap+1], and it is

responsible for

(i) calculating part of the sum S1(n)

X

k∈(ap,ap+1]

µ(k)·jn

k2k,(17)

(ii) making updates of the array Mx[1,...,I −1] for all k∈(ap, ap+1].

All processors share s1value and Mx array. The only changes are additions of

an integer, and it is required that these changes are atomic. Alternatively, a

processor can collect all changes in its own memory, and, in the end, it only once

change the value s1and each entry of Mx array.

Although the above approach is extremely simple, there are two drawbacks.

First, for updates (ii), a processor needs to calculate successive values of the

Mertens function: M(ap+ 1),...,M(ap+1). Computation of (17) produce suc-

cessive values of the M¨obius function starting from µ(ap+ 1), therefore the

Mertens function values can be also computed if only we knew the value of

M(ap). Unfortunately, there is no other way than computing it from scratch.

However, to compute M(x) there is an algorithm working in time ˜

O(x2/3) and

memory ˜

O(x1/3). See for instance [2], or [5] for a simpler algorithm missing a

memory reduction.

In our application we have x≤D=˜

O(n2/5), therefore cumulative additional

time we spend in computing Mertens function values from scratch is ˜

O(P D2/3) =

˜

O(P n4/15).We want this does not exceed the targeted time of ˜

O(n2/5), therefore

the number of processors is limited by:

P=˜

On2/5

n4/15 =˜

O(n2/15).(18)

Second drawback comes from an observation that the number of updates of

Mx array is not uniformly distributed on k∈[1, D]. For example for k≤√D=

˜

O(n1/5) for every i∈[1, I ) there always exists d≥2 such that (14) is satisﬁed,

therefore for every such kthere will be I−1 = ˜

O(n1/5) updates. It means that in

a very small block (1,⌊√D⌋] there will be ˜

O(n2/5) updates, which is proportional

to the total number of updates. We see that splitting into blocks is non-trivial

and we need better tools for measuring work in the blocks processing loop.

Let tsbe the average time of computing a single summand of the sum S1(n),

and let tube the average time of a single update of Mx array entry. Consider a

block (0, a]. Denote as U(a) the number of updates which must be done in this

block. Then the expected time of processing this block is

T(a) = tsa+tuU(a).(19)

It shows up that U(a) can be very accurately approximated by a closed formula:

U(a) = (Ia for a≤4

pn

I,

1

3

n

a3−2n1/2I1/2

a+8

3n1/4I3/4for 4

pn

I< a ≤D=pn

I.(20)

See App. C for the estimation.

The work measuring function (19) says that the amount of work for the block

(ap, ap+1] is T(ap+1 )−T(ap). Using this we are able to distribute blocks between

processors in a such way, that the work is assigned evenly.

6 Results

We calculated S(10e) for all integer 0 ≤e≤36. In App. D the computed values

are listed. First, for e≤26 we prepared the results using Algorithm 1, the simpler

and slower algorithm. Then we applied Algorithm 4 on a single thread. Thus we

veriﬁed its correctness for e≤26 and we prepared further values for e≤31.

Finally, we used parallel implementation for 24 ≤e≤36. The computations

were performed in ICM UW under grant G43-5 on the cluster Halo2. See [1]

for a speciﬁcation. The results for e≤31 agreed with the previously prepared

results. The timings of these computations are presented in Table 1.

Computation time is calendar time in seconds of cluster occupation. Ideal

time represents how long computations could take, if communication between

processors was ignored and if the work was distributed equally. This was cal-

culated by taking cumulative time of the actual work done for each processor

and dividing by the number of processors. We see that ideal time is close to

computation time showing an experimental evidence of scalability of the parallel

algorithm.

References

1. Halo2 cluster on ICM UW, http://www.icm.edu.pl/kdm/Halo2

processors computation ideal

eused time time

24 16 51 40

25 16 124 107

26 16 279 266

27 16 769 720

28 16 1928 1863

29 32 2594 2446

30 64 3439 3317

31 64 9157 8912

32 128 12138 11771

33 256 18112 16325

34 256 46540 43751

35 256 119749 115448

36 256 315313 303726

Table 1. Computation times in seconds of S(10e) for 24 ≤e≤36

2. Del´eglise, M., Rivat, J.: Computing the summation of the M¨obius function. Exper-

imental Mathematics 5(4), 291–295 (1996)

3. Jia, C.H.: The distribution of square-free numbers. Science in China Series A: Math-

ematics 36(2), 154–169 (1993)

4. Michon, G.P.: On the number of square-free integers not exceeding n(May 2008),

http://www.numericana.com/answer/counting.htm#euler193

5. Pawlewicz, J., P˘atras

,cu, M.: Order statistics in the farey sequences in sublinear

time and counting primitive lattice points in polygons. Algorithmica 55(2), 271–282

(2009)

6. Sloane, N.J.A.: Sequence A071172, the number of square-free integers ≤10n,

http://oeis.org/A071172

A Proof of Theorem 1

Let our universe be all positive integers less or equal to n:

U={1,...,n}.(21)

For a prime integer p, let us deﬁne a set Ap:

Ap={a∈U:p2divides a}.(22)

Complement of set Aprepresents a set of all integers less or equal to nnot

divisible by p2. We want to count integers not divisible by any prime square,

therefore the number we are searching for is the size of the set Tpprime Ap. By

the inclusion-exclusion principle we have:

\

pprime

Ap=|U| − X

pprime |Ap|+X

p<q

p,q prime

|Ap∩Aq|

−X

p<q<r

p,q,r prime

|Ap∩Aq∩Ar|+...

=∞

X

i=0

(−1)iX

p1<···<pi

p1,...,piprime

|Ap1∩ · ·· ∩ Api|(23)

Now, observe that

|Ap1∩ · ·· ∩ Api|=n

p2

1·...·p2

i.

Using the Iverson bracket we can write (23) as

(23) = X

dX

i

[d=p1·...·pi∧p1<···< pi∧p1,...,piprime](−1)ijn

d2k.

The expression [dis a product of idistinct primes](−1)imeans the M¨obius func-

tion µ(d) in other words, therefore we get the ﬁnal formula of Theorem 1.

B Computing the M¨obius Function

To compute values of the M¨obius function we exploit the following property:

µ(k) = (0 if p2divides k ,

(−1)eif k=p1·...·pe.(24)

Using a sieve we can ﬁnd values of µ(k) for all k= 1,...,K simultaneously,

where K=√n, as presented in Algorithm 5. To generate all primes less or

equal to Kwe can use the sieve of Eratosthenes. The memory complexity is

O(K) and the time complexity is O(Klog log K).

The above method could be improved to ﬁt O(√K) memory by tabulating

in blocks. We split the array mu to blocks of the size B=Θ(√K), and for each

block we tabulate µ(·) separately using Algorithm 6. For each block we use only

primes less or equal to √K, and we need only O(√K) memory. There is at most

K/B =O(√K) blocks. Therefore, for each block the number of operations is

O(√K)+ X

p≤√K1+ B

p2+B

p=O(√K+Blog log K) = O(√Klog log K),(25)

which results in O(Klog log K) time complexity for the whole algorithm.

Algorithm 5 Computing values of the M¨obius function: the basic approach

Require: bound 1 ≤K

Ensure: µ(k) = mu[k] for k= 1,...,K

1: procedure TabulateM¨

obius(K)

2: for k= 1,...,K do

3: mu[k]←1

4: end for

5: for each prime p≤Kdo

6: for each k∈[1, K] divisible by p2do

7: mu[k]←0

8: end for

9: for each k∈[1, K] divisible by pdo

10: mu[k]← −mu[k]

11: end for

12: end for

13: end procedure

C Estimating U(a)

Instead of computing the exact number of updates in a block (0, a], we will

compute an approximation of the expected number of updates as follows. Let us

ﬁx k∈(0, a] and x=xifor some i∈[1, I ). The probability that there exists d,

such that (14) is satisﬁed, equals

P(k, x) = (1 for k≤√x ,

x

k−x

k+1 for k > √x .

Let us deﬁne U(a, xi) as the expected number of updates of entry Mx[i] for all

k∈(0, a]. Let x=xi. If a≤√xthen

U(a, x) =

a

X

k=1

P(k, x) =

a

X

k=1

1 = a ,

and if a > √xthen

U(a, x) = X

1≤k≤√x

1 + X

√x<k≤ax

k−x

k+ 1=⌊√x⌋+x

⌊√x⌋−x

a≈2√x−x

a,

thus U(a, x) can be presented as the formula:

U(a, x) = (afor a≤√x ,

2√x−x

afor a > √x . (26)

Now we are ready to compute U(a):

U(a) = X

1≤i<I

U(a, xi).(27)

Algorithm 6 Computing values of the M¨obius function: memory eﬃcient sieving

in blocks

Require: bounds 0 < a < b

Ensure: µ(k) = mu[k] for each k∈(a, b ]

1: procedure TabulateM¨

obiusBlock(a, b)

2: for each k∈(a, b ]do

3: mu[k]←1

4: m[k]←1⊲multiplicity of all found prime divisors of k

5: end for

6: for each prime p≤√bdo

7: for each k∈(a, b ] divisible by p2do

8: mu[k]←0

9: end for

10: for each k∈(a, b ] divisible by pdo

11: mu[k]← −mu[k]

12: m[k]←m[k]·p

13: end for

14: end for

15: for each k∈(a, b ]do

16: if m[k]< k then ⊲ k =m[k]·q, where qis prime and q > √b

17: mu[k]← −mu[k]

18: end if

19: end for

20: end procedure

Expanding a term U(a, xi) using (26) depends on the inequality:

a≤√xi⇐⇒ a≤srn

i⇐⇒ a≤4

rn

i⇐⇒ i≤n

a4.

Therefore U(a, xi) always expand to aif a≤4

pn

I, so then U(a) = Ia, and this

is the ﬁrst case of (20). Otherwise, if a > 4

pn

I, we split the summation (27):

X

1≤i<I

U(a, xi) = X

1≤i≤n

a4

U(a, xi) + X

n

a4<i<I

U(a, xi)

=n

a4a+X

n

a4<i<I24

rn

i−1

arn

i.

Now, we apply the following approximation formulas for sums:

x

X

k=1

k−1/4≈4

3x3/4,

x

X

k=1

k−1/2≈2x1/2,

e S(10e)

0 1

17

261

3608

46083

560794

6607926

76079291

860792694

9607927124

10 6079270942

11 60792710280

12 607927102274

13 6079271018294

14 60792710185947

15 607927101854103

16 6079271018540405

17 60792710185403794

18 607927101854022750

19 6079271018540280875

20 60792710185402613302

21 607927101854026645617

22 6079271018540266153468

23 60792710185402662868753

24 607927101854026628773299

25 6079271018540266286424910

26 60792710185402662866945299

27 607927101854026628664226541

28 6079271018540266286631251028

29 60792710185402662866327383816

30 607927101854026628663278087296

31 6079271018540266286632795633943

32 60792710185402662866327694188957

33 607927101854026628663276901540346

34 6079271018540266286632767883637220

35 60792710185402662866327677953999263

36 607927101854026628663276779463775476

6

π20.60792710185402662866327677925836583342615264 ...

Table 2. Values of S(10e) for 0 ≤e≤36