Content uploaded by Jakub Pawlewicz
Author content
All content in this area was uploaded by Jakub Pawlewicz on Jul 16, 2014
Content may be subject to copyright.
arXiv:1107.4890v1 [math.NT] 25 Jul 2011
Counting Square-Free Numbers
Jakub Pawlewicz
Institute of Informatics
University of Warsaw
pan@mimuw.edu.pl
Abstract. The main topic of this contribution is the problem of count-
ing square-free numbers not exceeding n. Before this work we were able to
do it in time1˜
O(√n). Here, the algorithm with time complexity ˜
O(n2/5)
and with memory complexity ˜
O(n1/5) is presented. Additionally, a par-
allel version is shown, which achieves full scalability.
As of now the highest computed value was for n= 1017. Using our
implementation we were able to calculate the value for n= 1036 on a
cluster.
Keywords: square-free number, M¨obius function, Mertens function
1 Introduction
A square-free number is an integer which is not divisible by a square of any
integer greater than one. Let S(n) denote the number of square-free positive
integers less or equal to n. We can approximate the value of S(n) using the
asymptotic equation:
S(n) = 6
π2n+O(√n).
Under the assumption of the Riemann hypothesis the error term can be further
reduced [3]:
S(n) = 6
π2n+O(n17/54+ε).
Although these asymptotic equations allow us to compute S(n) with high accu-
racy, they do not help to compute the exact value.
The basic observation for efficient algorithms is the following formula.
Theorem 1.
S(n) = ⌊√n⌋
X
d=1
µ(d)jn
d2k,(1)
where µ(d)is the M¨obius function.
1Comparing to the Big-O notation, Soft-O ( ˜
O) ignores logarithmic factors.
The simple proof of this theorem using the inclusion-exclusion principle is pre-
sented in App. A. The same proof can be found in [4]. It allows the author to
develop an ˜
O(√n) algorithm and to compute S(1017). In Sect. 2 we show de-
tails of this algorithm together with the reduction of the memory complexity to
O(4
√n).
To construct a faster algorithm we have to play with the summation (1). In
Sect. 3.1 the new formula is derived and stated in Theorem 2. Using this theorem
we are able to construct the algorithm working in time ˜
O(n2/5). It is described
in the rest of Sect. 3. However, to achieve a memory efficient procedure more
research is required. The memory reduction problem is discussed in Sect. 4, where
the modifications leading to the memory complexity ˜
O(n1/5) are presented. The
result is put into Algorithm 4.
Applying Algorithm 4 for huge nleads to computing time measured in years.
Therefore, a practical algorithm should be distributed. Section 5 addresses the
parallelization problem. At first sight it looks that Algorithm 4 can be easily
distributed, but a deeper analysis uncovers new problems. We present a solution
for these problems, and get a fully scalable method. As a practical evidence, we
computed S(10e) for all integers e≤36, whereas before, the largest known value
of S(n) was for n= 1017 [4,6]. For instance, the value S(1036) was computed in
88 hours using 256 processors. The detailed computation results are attached in
Sect. 6.
2 The ˜
O(√n) algorithm
We simply use Theorem 1 to compute S(n). In order to compute summation
(1) we need to find the values of µ(d) for d= 1,...,K, where K=⌊√n⌋.
This can be done in time O(Klog log K) and in memory O(√K) using a sieve
similar to the sieve of Eratosthenes [2]. See App. B for a detailed descrip-
tion. This sieving algorithm tabulates values in blocks of size B=⌊√K⌋.
We assume we have the function TabulateM¨
obiusBlock such that the call
TabulateM¨
obiusBlock(a, b) outputs the array mu containing the values of
the M¨obius function: µ(k) = mu[k] for each k∈(a, b ]. This function works in
time O(blog log b) and in memory O(max(√b, b −a)). Now, to calculate S(n),
we split the interval [1, K ] into O(√K) blocks of size B. It is presented in Algo-
rithm 1.
Summarizing, the basic algorithm has ˜
O(√n) time complexity and O(4
√n)
memory complexity.
3 The New Algorithm
The key point of discovering a faster algorithm is a derivation of a new formula
from (1) in Sect. 3.1. The new formula depends on the Mertens function (the
M¨obius summation function). Section 3.2 explains how one may compute the
needed values. Section 3.3 states the algorithm. In Sect. 3.4 the optimal values
Algorithm 1 Calculating S(n) in time ˜
O(√n) and in memory O(4
√n)
1: s←0, b ←0, K ←Θ(⌊√n⌋), B ← ⌊√K⌋
2: repeat
3: a←b, b ←min(b+B, K )
4: TabulateM¨
obiusBlock(a, b)
5: for k=a+ 1,...,b do
6: s←s+mu[k]·jn
k2k
7: end for
8: until a≥K
9: return s
of the algorithm parameters are estimated, and the resulting time complexity of
˜
O(n2/5) is derived.
3.1 Establishing the New Formula
To alter (1) we break the sum. We split the summation range [1,⌊√n⌋] into two
smaller intervals [1, D] and (D, ⌊√n⌋]:
S(n) = S1(n) + S2(n),
where
S1(n) = X
1≤d≤D
µ(d)jn
d2k,
S2(n) = X
d>D
µ(d)jn
d2k.
We introduced a new variable D. Optimal value of this variable will be deter-
mined later. Sum S2(n) can be rewritten using Iverson’s convention2:
S2(n) = X
d>D
µ(d)jn
d2k=X
d>D X
ihi=jn
d2kiiµ(d).(2)
The predicate in brackets transforms as follows:
i=jn
d2k⇐⇒ i≤n
d2< i + 1 ⇐⇒ rn
i+ 1< d ≤rn
i.
To shorten the notation we introduce a new variable Iand a new sequence xi:
xi=rn
ifor i= 1,...,I . (3)
2[P] = (1 if Pis true ,
0 otherwise .
The sequence xishould be strictly decreasing. To ensure this, it is enough to set
Isuch that
rn
I−1−rn
I≥1
√n≥√I√I−1
√I−√I−1
√n≥√I√I−1(√I+√I−1)
n≥I(I−1)(√I+√I−1)2.(4)
Because
I(I−1)(√I+√I−1)2< I ·I·(2√I)2= 4I3,
to satisfy (4), it is enough to set
I≤3
rn
4.(5)
Suppose we set Isatisfying (5). Now, we take D=xIand we use xinotation
(3) in (2):
S2(n) = X
d>xIX
i
[xi+1 < d ≤xi]iµ(d) = X
1≤i<I
iX
xi+1<d≤xi
µ(d).(6)
Finally, it is convenient to use the Mertens function:
M(x) = X
1≤i≤x
µ(i) = ⌊x⌋
X
i=1
µ(i),(7)
thus we simplify (6) to:
S2(n) = X
1≤i<I
iM(xi)−M(xi+1)=X
1≤i<I
M(xi)−(I−1)M(xI).(8)
Theorem 2 summarizes the above analysis.
Theorem 2. Let Ibe a positive integer satisfying I≤3
rn
4. Let xi=rn
i
for i= 1,...,I and D=xI. Then S(n) = S1(n) + S2(n), where
S1(n) =
D
X
d=1
µ(d)jn
d2k,
S2(n) = I−1
X
i=1
M(xi)!−(I−1)M(xI).
3.2 Computing Values of the Mertens Function
By applying the M¨obius inversion formula to (7) we can get a nice recursion for
the Mertens function:
M(x) = 1 −X
d≥2
Mx
d.(9)
Here, an important observation is that having all values M(x/d) for d≥2, we
are able to calculate M(x) in time O(√x). This is because there are at most
2√xdifferent integers of the form ⌊x/d⌋, since x/d < √xfor d > √x.
3.3 The Algorithm
The simple algorithm exploiting the above ideas is presented in Algorithm 2.
Algorithm 2 Efficient counting square-free numbers
1: compute S1(n) and M(d) for d= 1,...,D
2: for i=I−1,...,1do
3: compute M(xi) by (9)
4: end for
5: compute S2(n) by (8)
6: return S1(n) + S2(n)
To compute M(xi) (line 3) we need the values M(xi/d) for d≥2. If xi/d ≤D
then M(xi/d) was determined during the computation of S1(n). If xi/d > D then
see that
jxi
dk=pn
i
d=pn
i
d=rn
d2i=xd2i,(10)
thus M(xi/d) = M(xj) for j=d2i. Of course j < I, because otherwise pn/j ≤
D. Observe that it is important to compute M(xi) in a decreasing order (line 2).
3.4 The Complexity
Let us estimate the time complexity of Algorithm 2. Computing S1(n) has com-
plexity O(Dlog log D).
Computing M(xi) takes O(√xi) time. The entire for loop (line 2–4) has the
time complexity:
I
X
i=1
O(√xi) =
I
X
i=1
O srn
i!=O4
√n
I
X
i=1
1
4
√i.(11)
Using the asymptotic equality
I
X
i=1
1
4
√i=Θ(I3/4),
(11) rewrites to:
On1/4I3/4.
The computation of S2(n) is dominated by the for loop. Summarizing the
time complexity of Algorithm 2 is
ODlog log D+n1/4I3/4.(12)
We have to tune the selection of Iand Dto minimize the expression (12). The
larger Iwe take the smaller Dwill be, thus the parameters Iand Dare optimal
when
O(Dlog log D) = On1/4I3/4.
This takes place for
I=n1/5(log log n)4/5,
and then
O(Dlog log D) = On1/4I3/4=O(n2/5(log log n)3/5).
Theorem 3. The time complexity of Algorithm 2 is O(n2/5(log log n)3/5) =
˜
O(n2/5)for I=n1/5(log log n)4/5=˜
O(n1/5).
The bad news are the memory requirements. To compute M(xi) values we
need to remember M(d) for all d= 1,...,D, thus we need O(D) = ˜
O(n2/5)
memory. This is even greater memory usage than in the basic algorithm. In the
next section we show how to overcome this problem.
4 Reducing Memory
To reduce memory we have to process values of the M¨obius function in blocks.
This affects the computation of needed Mertens function values which were pre-
viously computed by the recursion (9) as described in Sect. 4.1. These values
have to be computed in a more organized manner. Section 4.2 provides neces-
sary utilities for that. Moreover in Sect. 4.3 some data structures are introduced
in order to achieve a satisfying time complexity. Finally, Sect. 4.4 states the
algorithm together with a short complexity analysis.
4.1 Splitting into Blocks
We again apply the idea of splitting computations into smaller blocks. To com-
pute S1(n) we need to determine µ(d) and M(d) for d= 1,...,D. We do it
in blocks of size B=Θ(√D) by calling procedure TabulateM¨
obiusBlock.
That way we are able to compute S1(n), but to compute S2(n) we face to the
following problem.
We need to compute M(xi) for integer i∈[1, I). Previously, we memorized
all needed M(1),...,M(D) values and used recursion (9). Now, we do not have
unrestricted access to values of the Mertens function. After processing a block
(a, b ] we have only access to values M(k) for k∈(a, b ]. We have to utilize these
values before we switch to the next block. If a value M(k) occurs on the right
hand side of the recursion (9) for x=xifor some i∈[1, I ), then we should make
an update.
The algorithm should look as follows. We start the algorithm by creating an
array Mx:
Mx[i]←1 for i= 1,...,I −1.
During the computation of S1(n) we determine M(k) for some k. Then, for every
i∈[1, I) such that M(k) occurs in the sum
X
d≥2
Mxi
d,(13)
i.e. for every i∈[1, I ) such that there exists an integer d≥2 such that
k=jxi
dk,(14)
we estimate the number of occurrences mof M(k) in (13) and update
Mx[i]←Mx[i]−m·M(k).(15)
After processing all k= 1,...,D, there remains to update Mx[i] by M(xi/d) for
all ⌊xi/d⌋> D. With the help of equality (10) it is enough to update Mx[i] by
M(xd2i) for all d2i < I. After these updates we will have Mx[i] = M(xi).
4.2 Dealing with Mx Array Updates
The problem is how to, for given k, quickly find all possible values of i, that
there exists an integer d≥2 fulfilling (14). There is no simple way to do it in
expected constant time. Instead, for given iwe can easily calculate successive k.
Lemma 1. Suppose that for a given integer i∈[1, I)and an integer kthere
exists an integer dsatisfying (14). Let us denote
da=jxi
kk,
db=jxi
k+ 1 k,
then
(i) the number of occurrences m, needed for update (15), equals da−db,
(ii) the next integer ksatisfying (14) is for d=db, and it is equal to ⌊xi/db⌋.
Proof. All possible integers dsatisfying (14) are:
k≤xi
d< k + 1 ⇐⇒ xi
k+ 1 < d ≤xi
k⇐⇒ jxi
k+ 1 k< d ≤jxi
kk,(16)
so (14) is satisfied for d∈(db, da], and the next ksatisfying (14) is for d=db.⊓⊔
Lemma 1, for every i, allows us to walk through successive values of k, for which
we have to update Mx[i]. Since the target is to reduce the memory usage, we
need to group all updates into blocks. Algorithm 3 shows how to utilize Lemma 1
in order to update Mx[i] for the entire block (a, b ].
Algorithm 3 Updating Mx[i] for a block (a, b ]
Require: bounds 0 ≤a < b, index i∈[1, I ), the smallest k∈(a, b ] that there exists
dsatisfying (14)
Ensure: Mx[i] is updated by all M(k) for k∈(a, b ], the smallest k > b for the next
update is returned
1: function MxBlockUpdate(a , b, i, k)
2: da←jxi
kk
3: repeat
4: db←jxi
k+ 1 k
5: Mx[i]←Mx[i]−(da−db)·M(k)
6: k←jxi
dbk
7: da←db
8: until k > b
9: return k
10: end function
4.3 Introducing Additional Structures
Let B=⌊√D⌋be the block size, and L=⌈D/B⌉be the number of blocks.
We process kvalues in blocks (a0, a1],(a1, a2],...,(aL−1, aL], where al=Bl for
0≤l < L and aL=D. We need additional structures to keep track for every
i∈[1, I) where is the next update:
–mink[i] stores the next smallest kfor which Mx[i] has to be updated,
–ilist[l] is a list of indexes ifor which the next update will be for kbelonging
to the block (al, al+1].
Using these structures we are able to perform every update in constant time.
Once we update Mx[i] for all necessary k∈(al, al+1] by MxBlockUpdate, we
get next k > al+1 for which the next update should be done. We can easily
calculate the block index l′for this kand schedule it by putting iinto ilist[l′].
4.4 The Algorithm
The result of the entire above discussion is presented in Algorithm 4. We man-
aged to preserve the number of operations, therefore the time complexity re-
mained ˜
O(n2/5). Each of the additional structures has I=˜
O(n1/5) or L=
O(√D) = ˜
O(n1/5) elements. Blocks have size O(B) = O(√D) = ˜
O(n1/5),
Therefore, the memory complexity of Algorithm 4 is ˜
O(n1/5).
Algorithm 4 Calculating S(n) in time ˜
O(n2/5) and in memory ˜
O(n1/5)
1: I←Θ(n1/5(log log n)4/5), D ←rn
I, B ← ⌊√D⌋, L ← ⌈D/B⌉
2: for l= 0,...,L−1do
3: ilist[l]← ∅
4: end for
5: for i= 0,...,I −1do
6: Mx[i]←1
7: mink[i]←1
8: ilist[0] ←ilist[0] ∪ {i}
9: end for
10: s1←0
11: for l= 0,...,L−1do ⊲blocks processing loop
12: TabulateM¨
obiusBlock(al, al+1)
13: for k∈(al,...,al+1]do
14: s1←s1+mu[k]·jn
k2k
15: end for
16: compute M(k) for k∈(al, al+1] from values mu[k] and M(al)
17: for each i∈ilist[l]do
18: mink[i]←MxBlockUpdate(al, a l+1, i, mink[i])
19: l′←jmink[i]
Bk⊲next block where Mx[i] has to be updated
20: if l′≤Land mink[i]< xithen
21: ilist[l′]←ilist[l′]∪ {i}
22: end if
23: end for
24: ilist[l]← ∅
25: end for
26: for i=I−1,...,1do ⊲updating Mx[i] by M(k) for k > D
27: for all d≥2 such that d2i < I do
28: Mx[i]←Mx[i]−Mx[d2i]
29: end for
30: end for
31: compute s2=S2(n) by (8)
32: return s1+s2
Observe that most work is done in the blocks processing loop (lines 11–25),
because every other part of the algorithm takes at most ˜
O(n1/5) operations. Ini-
tialization of structures (lines 1–10) is proportional to their size ˜
O(n1/5). Com-
puting S2(n) by (8) (line 31) takes O(I) = ˜
O(n1/5). Only the time complexity
of the part responsible for updating Mx[i] by M(k) for k > D (lines 26–30) is
unclear. The total number of updates in this part is:
I−1
X
i=1 X
2≤d
d2i<I
1≤
I
X
i=1 rI
i=√I·
I
X
i=1
1
√i=√I·O(√I) = O(I),
thus it is O(I) = ˜
O(n1/5).
5 Parallelization
As noted in Sect. 4.4, the most time consuming part of Algorithm 4 is the
blocks processing loop. The basic idea is to distribute calculations made by this
loop between Pprocessors. We split the interval [1, D] into a list of Psmaller
intervals: (a0, a1],(a1, a2],...,(aP−1, aP], where 0 = a0< a1<···< aP=D.
Processor number p, 0 ≤p < P , focus only on the interval (ap, ap+1], and it is
responsible for
(i) calculating part of the sum S1(n)
X
k∈(ap,ap+1]
µ(k)·jn
k2k,(17)
(ii) making updates of the array Mx[1,...,I −1] for all k∈(ap, ap+1].
All processors share s1value and Mx array. The only changes are additions of
an integer, and it is required that these changes are atomic. Alternatively, a
processor can collect all changes in its own memory, and, in the end, it only once
change the value s1and each entry of Mx array.
Although the above approach is extremely simple, there are two drawbacks.
First, for updates (ii), a processor needs to calculate successive values of the
Mertens function: M(ap+ 1),...,M(ap+1). Computation of (17) produce suc-
cessive values of the M¨obius function starting from µ(ap+ 1), therefore the
Mertens function values can be also computed if only we knew the value of
M(ap). Unfortunately, there is no other way than computing it from scratch.
However, to compute M(x) there is an algorithm working in time ˜
O(x2/3) and
memory ˜
O(x1/3). See for instance [2], or [5] for a simpler algorithm missing a
memory reduction.
In our application we have x≤D=˜
O(n2/5), therefore cumulative additional
time we spend in computing Mertens function values from scratch is ˜
O(P D2/3) =
˜
O(P n4/15).We want this does not exceed the targeted time of ˜
O(n2/5), therefore
the number of processors is limited by:
P=˜
On2/5
n4/15 =˜
O(n2/15).(18)
Second drawback comes from an observation that the number of updates of
Mx array is not uniformly distributed on k∈[1, D]. For example for k≤√D=
˜
O(n1/5) for every i∈[1, I ) there always exists d≥2 such that (14) is satisfied,
therefore for every such kthere will be I−1 = ˜
O(n1/5) updates. It means that in
a very small block (1,⌊√D⌋] there will be ˜
O(n2/5) updates, which is proportional
to the total number of updates. We see that splitting into blocks is non-trivial
and we need better tools for measuring work in the blocks processing loop.
Let tsbe the average time of computing a single summand of the sum S1(n),
and let tube the average time of a single update of Mx array entry. Consider a
block (0, a]. Denote as U(a) the number of updates which must be done in this
block. Then the expected time of processing this block is
T(a) = tsa+tuU(a).(19)
It shows up that U(a) can be very accurately approximated by a closed formula:
U(a) = (Ia for a≤4
pn
I,
1
3
n
a3−2n1/2I1/2
a+8
3n1/4I3/4for 4
pn
I< a ≤D=pn
I.(20)
See App. C for the estimation.
The work measuring function (19) says that the amount of work for the block
(ap, ap+1] is T(ap+1 )−T(ap). Using this we are able to distribute blocks between
processors in a such way, that the work is assigned evenly.
6 Results
We calculated S(10e) for all integer 0 ≤e≤36. In App. D the computed values
are listed. First, for e≤26 we prepared the results using Algorithm 1, the simpler
and slower algorithm. Then we applied Algorithm 4 on a single thread. Thus we
verified its correctness for e≤26 and we prepared further values for e≤31.
Finally, we used parallel implementation for 24 ≤e≤36. The computations
were performed in ICM UW under grant G43-5 on the cluster Halo2. See [1]
for a specification. The results for e≤31 agreed with the previously prepared
results. The timings of these computations are presented in Table 1.
Computation time is calendar time in seconds of cluster occupation. Ideal
time represents how long computations could take, if communication between
processors was ignored and if the work was distributed equally. This was cal-
culated by taking cumulative time of the actual work done for each processor
and dividing by the number of processors. We see that ideal time is close to
computation time showing an experimental evidence of scalability of the parallel
algorithm.
References
1. Halo2 cluster on ICM UW, http://www.icm.edu.pl/kdm/Halo2
processors computation ideal
eused time time
24 16 51 40
25 16 124 107
26 16 279 266
27 16 769 720
28 16 1928 1863
29 32 2594 2446
30 64 3439 3317
31 64 9157 8912
32 128 12138 11771
33 256 18112 16325
34 256 46540 43751
35 256 119749 115448
36 256 315313 303726
Table 1. Computation times in seconds of S(10e) for 24 ≤e≤36
2. Del´eglise, M., Rivat, J.: Computing the summation of the M¨obius function. Exper-
imental Mathematics 5(4), 291–295 (1996)
3. Jia, C.H.: The distribution of square-free numbers. Science in China Series A: Math-
ematics 36(2), 154–169 (1993)
4. Michon, G.P.: On the number of square-free integers not exceeding n(May 2008),
http://www.numericana.com/answer/counting.htm#euler193
5. Pawlewicz, J., P˘atras
,cu, M.: Order statistics in the farey sequences in sublinear
time and counting primitive lattice points in polygons. Algorithmica 55(2), 271–282
(2009)
6. Sloane, N.J.A.: Sequence A071172, the number of square-free integers ≤10n,
http://oeis.org/A071172
A Proof of Theorem 1
Let our universe be all positive integers less or equal to n:
U={1,...,n}.(21)
For a prime integer p, let us define a set Ap:
Ap={a∈U:p2divides a}.(22)
Complement of set Aprepresents a set of all integers less or equal to nnot
divisible by p2. We want to count integers not divisible by any prime square,
therefore the number we are searching for is the size of the set Tpprime Ap. By
the inclusion-exclusion principle we have:
\
pprime
Ap=|U| − X
pprime |Ap|+X
p<q
p,q prime
|Ap∩Aq|
−X
p<q<r
p,q,r prime
|Ap∩Aq∩Ar|+...
=∞
X
i=0
(−1)iX
p1<···<pi
p1,...,piprime
|Ap1∩ · ·· ∩ Api|(23)
Now, observe that
|Ap1∩ · ·· ∩ Api|=n
p2
1·...·p2
i.
Using the Iverson bracket we can write (23) as
(23) = X
dX
i
[d=p1·...·pi∧p1<···< pi∧p1,...,piprime](−1)ijn
d2k.
The expression [dis a product of idistinct primes](−1)imeans the M¨obius func-
tion µ(d) in other words, therefore we get the final formula of Theorem 1.
B Computing the M¨obius Function
To compute values of the M¨obius function we exploit the following property:
µ(k) = (0 if p2divides k ,
(−1)eif k=p1·...·pe.(24)
Using a sieve we can find values of µ(k) for all k= 1,...,K simultaneously,
where K=√n, as presented in Algorithm 5. To generate all primes less or
equal to Kwe can use the sieve of Eratosthenes. The memory complexity is
O(K) and the time complexity is O(Klog log K).
The above method could be improved to fit O(√K) memory by tabulating
in blocks. We split the array mu to blocks of the size B=Θ(√K), and for each
block we tabulate µ(·) separately using Algorithm 6. For each block we use only
primes less or equal to √K, and we need only O(√K) memory. There is at most
K/B =O(√K) blocks. Therefore, for each block the number of operations is
O(√K)+ X
p≤√K1+ B
p2+B
p=O(√K+Blog log K) = O(√Klog log K),(25)
which results in O(Klog log K) time complexity for the whole algorithm.
Algorithm 5 Computing values of the M¨obius function: the basic approach
Require: bound 1 ≤K
Ensure: µ(k) = mu[k] for k= 1,...,K
1: procedure TabulateM¨
obius(K)
2: for k= 1,...,K do
3: mu[k]←1
4: end for
5: for each prime p≤Kdo
6: for each k∈[1, K] divisible by p2do
7: mu[k]←0
8: end for
9: for each k∈[1, K] divisible by pdo
10: mu[k]← −mu[k]
11: end for
12: end for
13: end procedure
C Estimating U(a)
Instead of computing the exact number of updates in a block (0, a], we will
compute an approximation of the expected number of updates as follows. Let us
fix k∈(0, a] and x=xifor some i∈[1, I ). The probability that there exists d,
such that (14) is satisfied, equals
P(k, x) = (1 for k≤√x ,
x
k−x
k+1 for k > √x .
Let us define U(a, xi) as the expected number of updates of entry Mx[i] for all
k∈(0, a]. Let x=xi. If a≤√xthen
U(a, x) =
a
X
k=1
P(k, x) =
a
X
k=1
1 = a ,
and if a > √xthen
U(a, x) = X
1≤k≤√x
1 + X
√x<k≤ax
k−x
k+ 1=⌊√x⌋+x
⌊√x⌋−x
a≈2√x−x
a,
thus U(a, x) can be presented as the formula:
U(a, x) = (afor a≤√x ,
2√x−x
afor a > √x . (26)
Now we are ready to compute U(a):
U(a) = X
1≤i<I
U(a, xi).(27)
Algorithm 6 Computing values of the M¨obius function: memory efficient sieving
in blocks
Require: bounds 0 < a < b
Ensure: µ(k) = mu[k] for each k∈(a, b ]
1: procedure TabulateM¨
obiusBlock(a, b)
2: for each k∈(a, b ]do
3: mu[k]←1
4: m[k]←1⊲multiplicity of all found prime divisors of k
5: end for
6: for each prime p≤√bdo
7: for each k∈(a, b ] divisible by p2do
8: mu[k]←0
9: end for
10: for each k∈(a, b ] divisible by pdo
11: mu[k]← −mu[k]
12: m[k]←m[k]·p
13: end for
14: end for
15: for each k∈(a, b ]do
16: if m[k]< k then ⊲ k =m[k]·q, where qis prime and q > √b
17: mu[k]← −mu[k]
18: end if
19: end for
20: end procedure
Expanding a term U(a, xi) using (26) depends on the inequality:
a≤√xi⇐⇒ a≤srn
i⇐⇒ a≤4
rn
i⇐⇒ i≤n
a4.
Therefore U(a, xi) always expand to aif a≤4
pn
I, so then U(a) = Ia, and this
is the first case of (20). Otherwise, if a > 4
pn
I, we split the summation (27):
X
1≤i<I
U(a, xi) = X
1≤i≤n
a4
U(a, xi) + X
n
a4<i<I
U(a, xi)
=n
a4a+X
n
a4<i<I24
rn
i−1
arn
i.
Now, we apply the following approximation formulas for sums:
x
X
k=1
k−1/4≈4
3x3/4,
x
X
k=1
k−1/2≈2x1/2,
e S(10e)
0 1
17
261
3608
46083
560794
6607926
76079291
860792694
9607927124
10 6079270942
11 60792710280
12 607927102274
13 6079271018294
14 60792710185947
15 607927101854103
16 6079271018540405
17 60792710185403794
18 607927101854022750
19 6079271018540280875
20 60792710185402613302
21 607927101854026645617
22 6079271018540266153468
23 60792710185402662868753
24 607927101854026628773299
25 6079271018540266286424910
26 60792710185402662866945299
27 607927101854026628664226541
28 6079271018540266286631251028
29 60792710185402662866327383816
30 607927101854026628663278087296
31 6079271018540266286632795633943
32 60792710185402662866327694188957
33 607927101854026628663276901540346
34 6079271018540266286632767883637220
35 60792710185402662866327677953999263
36 607927101854026628663276779463775476
6
π20.60792710185402662866327677925836583342615264 ...
Table 2. Values of S(10e) for 0 ≤e≤36