ArticlePDF Available


The main topic of this contribution is the problem of counting square-free numbers not exceeding $n$. Before this work we were able to do it in time (Comparing to the Big-O notation, Soft-O ($\softO$) ignores logarithmic factors) $\softO(\sqrt{n})$. Here, the algorithm with time complexity $\softO(n^{2/5})$ and with memory complexity $\softO(n^{1/5})$ is presented. Additionally, a parallel version is shown, which achieves full scalability. As of now the highest computed value was for $n=10^{17}$. Using our implementation we were able to calculate the value for $n=10^{36}$ on a cluster.
arXiv:1107.4890v1 [math.NT] 25 Jul 2011
Counting Square-Free Numbers
Jakub Pawlewicz
Institute of Informatics
University of Warsaw
Abstract. The main topic of this contribution is the problem of count-
ing square-free numbers not exceeding n. Before this work we were able to
do it in time1˜
O(n). Here, the algorithm with time complexity ˜
and with memory complexity ˜
O(n1/5) is presented. Additionally, a par-
allel version is shown, which achieves full scalability.
As of now the highest computed value was for n= 1017. Using our
implementation we were able to calculate the value for n= 1036 on a
Keywords: square-free number, M¨obius function, Mertens function
1 Introduction
A square-free number is an integer which is not divisible by a square of any
integer greater than one. Let S(n) denote the number of square-free positive
integers less or equal to n. We can approximate the value of S(n) using the
asymptotic equation:
S(n) = 6
Under the assumption of the Riemann hypothesis the error term can be further
reduced [3]:
S(n) = 6
Although these asymptotic equations allow us to compute S(n) with high accu-
racy, they do not help to compute the exact value.
The basic observation for efficient algorithms is the following formula.
Theorem 1.
S(n) = n
where µ(d)is the M¨obius function.
1Comparing to the Big-O notation, Soft-O ( ˜
O) ignores logarithmic factors.
The simple proof of this theorem using the inclusion-exclusion principle is pre-
sented in App. A. The same proof can be found in [4]. It allows the author to
develop an ˜
O(n) algorithm and to compute S(1017). In Sect. 2 we show de-
tails of this algorithm together with the reduction of the memory complexity to
To construct a faster algorithm we have to play with the summation (1). In
Sect. 3.1 the new formula is derived and stated in Theorem 2. Using this theorem
we are able to construct the algorithm working in time ˜
O(n2/5). It is described
in the rest of Sect. 3. However, to achieve a memory efficient procedure more
research is required. The memory reduction problem is discussed in Sect. 4, where
the modifications leading to the memory complexity ˜
O(n1/5) are presented. The
result is put into Algorithm 4.
Applying Algorithm 4 for huge nleads to computing time measured in years.
Therefore, a practical algorithm should be distributed. Section 5 addresses the
parallelization problem. At first sight it looks that Algorithm 4 can be easily
distributed, but a deeper analysis uncovers new problems. We present a solution
for these problems, and get a fully scalable method. As a practical evidence, we
computed S(10e) for all integers e36, whereas before, the largest known value
of S(n) was for n= 1017 [4,6]. For instance, the value S(1036) was computed in
88 hours using 256 processors. The detailed computation results are attached in
Sect. 6.
2 The ˜
O(n) algorithm
We simply use Theorem 1 to compute S(n). In order to compute summation
(1) we need to find the values of µ(d) for d= 1,...,K, where K=n.
This can be done in time O(Klog log K) and in memory O(K) using a sieve
similar to the sieve of Eratosthenes [2]. See App. B for a detailed descrip-
tion. This sieving algorithm tabulates values in blocks of size B=K.
We assume we have the function TabulateM¨
obiusBlock such that the call
obiusBlock(a, b) outputs the array mu containing the values of
the M¨obius function: µ(k) = mu[k] for each k(a, b ]. This function works in
time O(blog log b) and in memory O(max(b, b a)). Now, to calculate S(n),
we split the interval [1, K ] into O(K) blocks of size B. It is presented in Algo-
rithm 1.
Summarizing, the basic algorithm has ˜
O(n) time complexity and O(4
memory complexity.
3 The New Algorithm
The key point of discovering a faster algorithm is a derivation of a new formula
from (1) in Sect. 3.1. The new formula depends on the Mertens function (the
obius summation function). Section 3.2 explains how one may compute the
needed values. Section 3.3 states the algorithm. In Sect. 3.4 the optimal values
Algorithm 1 Calculating S(n) in time ˜
O(n) and in memory O(4
1: s0, b 0, K Θ(n), B ← ⌊K
2: repeat
3: ab, b min(b+B, K )
4: TabulateM¨
obiusBlock(a, b)
5: for k=a+ 1,...,b do
6: ss+mu[k]·jn
7: end for
8: until aK
9: return s
of the algorithm parameters are estimated, and the resulting time complexity of
O(n2/5) is derived.
3.1 Establishing the New Formula
To alter (1) we break the sum. We split the summation range [1,n] into two
smaller intervals [1, D] and (D, n]:
S(n) = S1(n) + S2(n),
S1(n) = X
S2(n) = X
We introduced a new variable D. Optimal value of this variable will be deter-
mined later. Sum S2(n) can be rewritten using Iverson’s convention2:
S2(n) = X
d>D X
The predicate in brackets transforms as follows:
d2< i + 1 rn
i+ 1< d rn
To shorten the notation we introduce a new variable Iand a new sequence xi:
ifor i= 1,...,I . (3)
2[P] = (1 if Pis true ,
0 otherwise .
The sequence xishould be strictly decreasing. To ensure this, it is enough to set
Isuch that
I(I1)(I+I1)2< I ·I·(2I)2= 4I3,
to satisfy (4), it is enough to set
Suppose we set Isatisfying (5). Now, we take D=xIand we use xinotation
(3) in (2):
S2(n) = X
[xi+1 < d xi](d) = X
Finally, it is convenient to use the Mertens function:
M(x) = X
µ(i) = x
thus we simplify (6) to:
S2(n) = X
Theorem 2 summarizes the above analysis.
Theorem 2. Let Ibe a positive integer satisfying I3
4. Let xi=rn
for i= 1,...,I and D=xI. Then S(n) = S1(n) + S2(n), where
S1(n) =
S2(n) = I1
3.2 Computing Values of the Mertens Function
By applying the M¨obius inversion formula to (7) we can get a nice recursion for
the Mertens function:
M(x) = 1 X
Here, an important observation is that having all values M(x/d) for d2, we
are able to calculate M(x) in time O(x). This is because there are at most
2xdifferent integers of the form x/d, since x/d < xfor d > x.
3.3 The Algorithm
The simple algorithm exploiting the above ideas is presented in Algorithm 2.
Algorithm 2 Efficient counting square-free numbers
1: compute S1(n) and M(d) for d= 1,...,D
2: for i=I1,...,1do
3: compute M(xi) by (9)
4: end for
5: compute S2(n) by (8)
6: return S1(n) + S2(n)
To compute M(xi) (line 3) we need the values M(xi/d) for d2. If xi/d D
then M(xi/d) was determined during the computation of S1(n). If xi/d > D then
see that
thus M(xi/d) = M(xj) for j=d2i. Of course j < I, because otherwise pn/j
D. Observe that it is important to compute M(xi) in a decreasing order (line 2).
3.4 The Complexity
Let us estimate the time complexity of Algorithm 2. Computing S1(n) has com-
plexity O(Dlog log D).
Computing M(xi) takes O(xi) time. The entire for loop (line 2–4) has the
time complexity:
O(xi) =
O srn
Using the asymptotic equality
(11) rewrites to:
The computation of S2(n) is dominated by the for loop. Summarizing the
time complexity of Algorithm 2 is
ODlog log D+n1/4I3/4.(12)
We have to tune the selection of Iand Dto minimize the expression (12). The
larger Iwe take the smaller Dwill be, thus the parameters Iand Dare optimal
O(Dlog log D) = On1/4I3/4.
This takes place for
I=n1/5(log log n)4/5,
and then
O(Dlog log D) = On1/4I3/4=O(n2/5(log log n)3/5).
Theorem 3. The time complexity of Algorithm 2 is O(n2/5(log log n)3/5) =
O(n2/5)for I=n1/5(log log n)4/5=˜
The bad news are the memory requirements. To compute M(xi) values we
need to remember M(d) for all d= 1,...,D, thus we need O(D) = ˜
memory. This is even greater memory usage than in the basic algorithm. In the
next section we show how to overcome this problem.
4 Reducing Memory
To reduce memory we have to process values of the M¨obius function in blocks.
This affects the computation of needed Mertens function values which were pre-
viously computed by the recursion (9) as described in Sect. 4.1. These values
have to be computed in a more organized manner. Section 4.2 provides neces-
sary utilities for that. Moreover in Sect. 4.3 some data structures are introduced
in order to achieve a satisfying time complexity. Finally, Sect. 4.4 states the
algorithm together with a short complexity analysis.
4.1 Splitting into Blocks
We again apply the idea of splitting computations into smaller blocks. To com-
pute S1(n) we need to determine µ(d) and M(d) for d= 1,...,D. We do it
in blocks of size B=Θ(D) by calling procedure TabulateM¨
That way we are able to compute S1(n), but to compute S2(n) we face to the
following problem.
We need to compute M(xi) for integer i[1, I). Previously, we memorized
all needed M(1),...,M(D) values and used recursion (9). Now, we do not have
unrestricted access to values of the Mertens function. After processing a block
(a, b ] we have only access to values M(k) for k(a, b ]. We have to utilize these
values before we switch to the next block. If a value M(k) occurs on the right
hand side of the recursion (9) for x=xifor some i[1, I ), then we should make
an update.
The algorithm should look as follows. We start the algorithm by creating an
array Mx:
Mx[i]1 for i= 1,...,I 1.
During the computation of S1(n) we determine M(k) for some k. Then, for every
i[1, I) such that M(k) occurs in the sum
i.e. for every i[1, I ) such that there exists an integer d2 such that
we estimate the number of occurrences mof M(k) in (13) and update
After processing all k= 1,...,D, there remains to update Mx[i] by M(xi/d) for
all xi/d> D. With the help of equality (10) it is enough to update Mx[i] by
M(xd2i) for all d2i < I. After these updates we will have Mx[i] = M(xi).
4.2 Dealing with Mx Array Updates
The problem is how to, for given k, quickly find all possible values of i, that
there exists an integer d2 fulfilling (14). There is no simple way to do it in
expected constant time. Instead, for given iwe can easily calculate successive k.
Lemma 1. Suppose that for a given integer i[1, I)and an integer kthere
exists an integer dsatisfying (14). Let us denote
k+ 1 k,
(i) the number of occurrences m, needed for update (15), equals dadb,
(ii) the next integer ksatisfying (14) is for d=db, and it is equal to xi/db.
Proof. All possible integers dsatisfying (14) are:
d< k + 1 xi
k+ 1 < d xi
k+ 1 k< d jxi
so (14) is satisfied for d(db, da], and the next ksatisfying (14) is for d=db.
Lemma 1, for every i, allows us to walk through successive values of k, for which
we have to update Mx[i]. Since the target is to reduce the memory usage, we
need to group all updates into blocks. Algorithm 3 shows how to utilize Lemma 1
in order to update Mx[i] for the entire block (a, b ].
Algorithm 3 Updating Mx[i] for a block (a, b ]
Require: bounds 0 a < b, index i[1, I ), the smallest k(a, b ] that there exists
dsatisfying (14)
Ensure: Mx[i] is updated by all M(k) for k(a, b ], the smallest k > b for the next
update is returned
1: function MxBlockUpdate(a , b, i, k)
2: dajxi
3: repeat
4: dbjxi
k+ 1 k
5: Mx[i]Mx[i](dadb)·M(k)
6: kjxi
7: dadb
8: until k > b
9: return k
10: end function
4.3 Introducing Additional Structures
Let B=Dbe the block size, and L=D/Bbe the number of blocks.
We process kvalues in blocks (a0, a1],(a1, a2],...,(aL1, aL], where al=Bl for
0l < L and aL=D. We need additional structures to keep track for every
i[1, I) where is the next update:
mink[i] stores the next smallest kfor which Mx[i] has to be updated,
ilist[l] is a list of indexes ifor which the next update will be for kbelonging
to the block (al, al+1].
Using these structures we are able to perform every update in constant time.
Once we update Mx[i] for all necessary k(al, al+1] by MxBlockUpdate, we
get next k > al+1 for which the next update should be done. We can easily
calculate the block index lfor this kand schedule it by putting iinto ilist[l].
4.4 The Algorithm
The result of the entire above discussion is presented in Algorithm 4. We man-
aged to preserve the number of operations, therefore the time complexity re-
mained ˜
O(n2/5). Each of the additional structures has I=˜
O(n1/5) or L=
O(D) = ˜
O(n1/5) elements. Blocks have size O(B) = O(D) = ˜
Therefore, the memory complexity of Algorithm 4 is ˜
Algorithm 4 Calculating S(n) in time ˜
O(n2/5) and in memory ˜
1: IΘ(n1/5(log log n)4/5), D rn
I, B ← ⌊D, L ← ⌈D/B
2: for l= 0,...,L1do
3: ilist[l]← ∅
4: end for
5: for i= 0,...,I 1do
6: Mx[i]1
7: mink[i]1
8: ilist[0] ilist[0] ∪ {i}
9: end for
10: s10
11: for l= 0,...,L1do blocks processing loop
12: TabulateM¨
obiusBlock(al, al+1)
13: for k(al,...,al+1]do
14: s1s1+mu[k]·jn
15: end for
16: compute M(k) for k(al, al+1] from values mu[k] and M(al)
17: for each iilist[l]do
18: mink[i]MxBlockUpdate(al, a l+1, i, mink[i])
19: ljmink[i]
Bknext block where Mx[i] has to be updated
20: if lLand mink[i]< xithen
21: ilist[l]ilist[l]∪ {i}
22: end if
23: end for
24: ilist[l]← ∅
25: end for
26: for i=I1,...,1do updating Mx[i] by M(k) for k > D
27: for all d2 such that d2i < I do
28: Mx[i]Mx[i]Mx[d2i]
29: end for
30: end for
31: compute s2=S2(n) by (8)
32: return s1+s2
Observe that most work is done in the blocks processing loop (lines 11–25),
because every other part of the algorithm takes at most ˜
O(n1/5) operations. Ini-
tialization of structures (lines 1–10) is proportional to their size ˜
O(n1/5). Com-
puting S2(n) by (8) (line 31) takes O(I) = ˜
O(n1/5). Only the time complexity
of the part responsible for updating Mx[i] by M(k) for k > D (lines 26–30) is
unclear. The total number of updates in this part is:
i=1 X
i=1 rI
i=I·O(I) = O(I),
thus it is O(I) = ˜
5 Parallelization
As noted in Sect. 4.4, the most time consuming part of Algorithm 4 is the
blocks processing loop. The basic idea is to distribute calculations made by this
loop between Pprocessors. We split the interval [1, D] into a list of Psmaller
intervals: (a0, a1],(a1, a2],...,(aP1, aP], where 0 = a0< a1<···< aP=D.
Processor number p, 0 p < P , focus only on the interval (ap, ap+1], and it is
responsible for
(i) calculating part of the sum S1(n)
(ii) making updates of the array Mx[1,...,I 1] for all k(ap, ap+1].
All processors share s1value and Mx array. The only changes are additions of
an integer, and it is required that these changes are atomic. Alternatively, a
processor can collect all changes in its own memory, and, in the end, it only once
change the value s1and each entry of Mx array.
Although the above approach is extremely simple, there are two drawbacks.
First, for updates (ii), a processor needs to calculate successive values of the
Mertens function: M(ap+ 1),...,M(ap+1). Computation of (17) produce suc-
cessive values of the M¨obius function starting from µ(ap+ 1), therefore the
Mertens function values can be also computed if only we knew the value of
M(ap). Unfortunately, there is no other way than computing it from scratch.
However, to compute M(x) there is an algorithm working in time ˜
O(x2/3) and
memory ˜
O(x1/3). See for instance [2], or [5] for a simpler algorithm missing a
memory reduction.
In our application we have xD=˜
O(n2/5), therefore cumulative additional
time we spend in computing Mertens function values from scratch is ˜
O(P D2/3) =
O(P n4/15).We want this does not exceed the targeted time of ˜
O(n2/5), therefore
the number of processors is limited by:
n4/15 =˜
Second drawback comes from an observation that the number of updates of
Mx array is not uniformly distributed on k[1, D]. For example for kD=
O(n1/5) for every i[1, I ) there always exists d2 such that (14) is satisfied,
therefore for every such kthere will be I1 = ˜
O(n1/5) updates. It means that in
a very small block (1,D] there will be ˜
O(n2/5) updates, which is proportional
to the total number of updates. We see that splitting into blocks is non-trivial
and we need better tools for measuring work in the blocks processing loop.
Let tsbe the average time of computing a single summand of the sum S1(n),
and let tube the average time of a single update of Mx array entry. Consider a
block (0, a]. Denote as U(a) the number of updates which must be done in this
block. Then the expected time of processing this block is
T(a) = tsa+tuU(a).(19)
It shows up that U(a) can be very accurately approximated by a closed formula:
U(a) = (Ia for a4
3n1/4I3/4for 4
I< a D=pn
See App. C for the estimation.
The work measuring function (19) says that the amount of work for the block
(ap, ap+1] is T(ap+1 )T(ap). Using this we are able to distribute blocks between
processors in a such way, that the work is assigned evenly.
6 Results
We calculated S(10e) for all integer 0 e36. In App. D the computed values
are listed. First, for e26 we prepared the results using Algorithm 1, the simpler
and slower algorithm. Then we applied Algorithm 4 on a single thread. Thus we
verified its correctness for e26 and we prepared further values for e31.
Finally, we used parallel implementation for 24 e36. The computations
were performed in ICM UW under grant G43-5 on the cluster Halo2. See [1]
for a specification. The results for e31 agreed with the previously prepared
results. The timings of these computations are presented in Table 1.
Computation time is calendar time in seconds of cluster occupation. Ideal
time represents how long computations could take, if communication between
processors was ignored and if the work was distributed equally. This was cal-
culated by taking cumulative time of the actual work done for each processor
and dividing by the number of processors. We see that ideal time is close to
computation time showing an experimental evidence of scalability of the parallel
1. Halo2 cluster on ICM UW,
processors computation ideal
eused time time
24 16 51 40
25 16 124 107
26 16 279 266
27 16 769 720
28 16 1928 1863
29 32 2594 2446
30 64 3439 3317
31 64 9157 8912
32 128 12138 11771
33 256 18112 16325
34 256 46540 43751
35 256 119749 115448
36 256 315313 303726
Table 1. Computation times in seconds of S(10e) for 24 e36
2. Del´eglise, M., Rivat, J.: Computing the summation of the M¨obius function. Exper-
imental Mathematics 5(4), 291–295 (1996)
3. Jia, C.H.: The distribution of square-free numbers. Science in China Series A: Math-
ematics 36(2), 154–169 (1993)
4. Michon, G.P.: On the number of square-free integers not exceeding n(May 2008),
5. Pawlewicz, J., P˘atras
,cu, M.: Order statistics in the farey sequences in sublinear
time and counting primitive lattice points in polygons. Algorithmica 55(2), 271–282
6. Sloane, N.J.A.: Sequence A071172, the number of square-free integers 10n,
A Proof of Theorem 1
Let our universe be all positive integers less or equal to n:
For a prime integer p, let us define a set Ap:
Ap={aU:p2divides a}.(22)
Complement of set Aprepresents a set of all integers less or equal to nnot
divisible by p2. We want to count integers not divisible by any prime square,
therefore the number we are searching for is the size of the set Tpprime Ap. By
the inclusion-exclusion principle we have:
Ap=|U| − X
pprime |Ap|+X
p,q prime
p,q,r prime
|Ap1∩ · ·· ∩ Api|(23)
Now, observe that
|Ap1∩ · ·· ∩ Api|=n
Using the Iverson bracket we can write (23) as
(23) = X
[d=p1·...·pip1<···< pip1,...,piprime](1)ijn
The expression [dis a product of idistinct primes](1)imeans the M¨obius func-
tion µ(d) in other words, therefore we get the final formula of Theorem 1.
B Computing the M¨obius Function
To compute values of the M¨obius function we exploit the following property:
µ(k) = (0 if p2divides k ,
(1)eif k=p1·...·pe.(24)
Using a sieve we can find values of µ(k) for all k= 1,...,K simultaneously,
where K=n, as presented in Algorithm 5. To generate all primes less or
equal to Kwe can use the sieve of Eratosthenes. The memory complexity is
O(K) and the time complexity is O(Klog log K).
The above method could be improved to fit O(K) memory by tabulating
in blocks. We split the array mu to blocks of the size B=Θ(K), and for each
block we tabulate µ(·) separately using Algorithm 6. For each block we use only
primes less or equal to K, and we need only O(K) memory. There is at most
K/B =O(K) blocks. Therefore, for each block the number of operations is
O(K)+ X
pK1+ B
p=O(K+Blog log K) = O(Klog log K),(25)
which results in O(Klog log K) time complexity for the whole algorithm.
Algorithm 5 Computing values of the M¨obius function: the basic approach
Require: bound 1 K
Ensure: µ(k) = mu[k] for k= 1,...,K
1: procedure TabulateM¨
2: for k= 1,...,K do
3: mu[k]1
4: end for
5: for each prime pKdo
6: for each k[1, K] divisible by p2do
7: mu[k]0
8: end for
9: for each k[1, K] divisible by pdo
10: mu[k]← −mu[k]
11: end for
12: end for
13: end procedure
C Estimating U(a)
Instead of computing the exact number of updates in a block (0, a], we will
compute an approximation of the expected number of updates as follows. Let us
fix k(0, a] and x=xifor some i[1, I ). The probability that there exists d,
such that (14) is satisfied, equals
P(k, x) = (1 for kx ,
k+1 for k > x .
Let us define U(a, xi) as the expected number of updates of entry Mx[i] for all
k(0, a]. Let x=xi. If axthen
U(a, x) =
P(k, x) =
1 = a ,
and if a > xthen
U(a, x) = X
1 + X
k+ 1=x+x
thus U(a, x) can be presented as the formula:
U(a, x) = (afor ax ,
afor a > x . (26)
Now we are ready to compute U(a):
U(a) = X
U(a, xi).(27)
Algorithm 6 Computing values of the M¨obius function: memory efficient sieving
in blocks
Require: bounds 0 < a < b
Ensure: µ(k) = mu[k] for each k(a, b ]
1: procedure TabulateM¨
obiusBlock(a, b)
2: for each k(a, b ]do
3: mu[k]1
4: m[k]1multiplicity of all found prime divisors of k
5: end for
6: for each prime pbdo
7: for each k(a, b ] divisible by p2do
8: mu[k]0
9: end for
10: for each k(a, b ] divisible by pdo
11: mu[k]← −mu[k]
12: m[k]m[k]·p
13: end for
14: end for
15: for each k(a, b ]do
16: if m[k]< k then ⊲ k =m[k]·q, where qis prime and q > b
17: mu[k]← −mu[k]
18: end if
19: end for
20: end procedure
Expanding a term U(a, xi) using (26) depends on the inequality:
Therefore U(a, xi) always expand to aif a4
I, so then U(a) = Ia, and this
is the first case of (20). Otherwise, if a > 4
I, we split the summation (27):
U(a, xi) = X
U(a, xi) + X
U(a, xi)
Now, we apply the following approximation formulas for sums:
and as a result we get the second case of (20):
a3+ 2n1/4·4
DS(n) values for powers of 10
e S(10e)
0 1
10 6079270942
11 60792710280
12 607927102274
13 6079271018294
14 60792710185947
15 607927101854103
16 6079271018540405
17 60792710185403794
18 607927101854022750
19 6079271018540280875
20 60792710185402613302
21 607927101854026645617
22 6079271018540266153468
23 60792710185402662868753
24 607927101854026628773299
25 6079271018540266286424910
26 60792710185402662866945299
27 607927101854026628664226541
28 6079271018540266286631251028
29 60792710185402662866327383816
30 607927101854026628663278087296
31 6079271018540266286632795633943
32 60792710185402662866327694188957
33 607927101854026628663276901540346
34 6079271018540266286632767883637220
35 60792710185402662866327677953999263
36 607927101854026628663276779463775476
π20.60792710185402662866327677925836583342615264 ...
Table 2. Values of S(10e) for 0 e36
... We describe anÕ`N 1{3˘time algorithm for computing the number of squarefree numbers up to N , improving on the current state-of-the-art algorithm iñ O`N 2{5˘t ime described in [Paw11]. The algorithm presented here combines our techniques with the ideas of [Paw11]. ...
... We describe anÕ`N 1{3˘time algorithm for computing the number of squarefree numbers up to N , improving on the current state-of-the-art algorithm iñ O`N 2{5˘t ime described in [Paw11]. The algorithm presented here combines our techniques with the ideas of [Paw11]. As in [Paw11], we start with an inclusion-exclusion on prime squares, giving the following expression for the number of square-free numbers ď N : ...
... The algorithm presented here combines our techniques with the ideas of [Paw11]. As in [Paw11], we start with an inclusion-exclusion on prime squares, giving the following expression for the number of square-free numbers ď N : ...
We present an efficient and elementary algorithm for computing the number of primes up to $N$ in $\tilde{O}(\sqrt N)$ time, improving upon the existing combinatorial methods that require $\tilde{O}(N ^ {2/3})$ time. Our method has the same complexity as the analytical approach to prime counting, while avoiding complex analysis and the use of arbitrary precision complex numbers. While the most time-efficient version of our algorithm requires $\tilde{O}(\sqrt N)$ space, we present a continuous space-time trade-off, showing, e.g., how to reduce the space complexity to $\tilde{O}(\sqrt[3]{N})$ while slightly increasing the time complexity to $\tilde{O}(N^{8/15})$. We apply our techniques to improve the state-of-the-art complexity of elementary algorithms for computing other number-theoretic functions, such as the the Mertens function (in $\tilde{O}(\sqrt N)$ time compared to the known $\tilde{O}(N^{0.6})$), summing Euler's totient function, counting square-free numbers and summing primes. Implementation code is provided.
... Let x k = x/k , and let M (x) = n≤x µ(n) be Mertens' function. Then it is straightforward to show (see [26] for details) that ...
Full-text available
Let $R_k(x)$ denote the error incurred by approximating the number of $k$-free integers less than $x$ by $x/\zeta(k)$. It is well known that $R_k(x)=\Omega(x^{\frac{1}{2k}})$, and widely conjectured that $R_k(x)=O(x^{\frac{1}{2k}+\epsilon})$. By establishing weak linear independence of some subsets of zeros of the Riemann zeta function, we establish an effective proof of the lower bound, with significantly larger bounds on the constant compared to those obtained in prior work. For example, we show that $R_k(x)/x^{1/2k} > 3$ infinitely often and that $R_k(x)/x^{1/2k} < -3$ infinitely often, for $k=2$, $3$, $4$, and $5$. We also investigate $R_2(x)$ and $R_3(x)$ in detail and establish that our bounds far exceed the oscillations exhibited by these functions over a long range: for $0<x\leq10^{18}$ we show that $|R_2(x)| < 1.12543x^{1/4}$ and $|R_3(x)| < 1.27417x^{1/6}$. We also present some empirical results regarding gaps between square-free numbers and between cube-free numbers.
... We further reduce to only counting those numbers that are square-free, a reduction which is beneficial for a number of reasons. Most importantly, we have that n≤x µ 2 (n) = 6 π 2 + o(1) x, (see [3]) where µ is the Möbius function, so heuristically it should only decrease the size of our set by at most a constant factor. Furthermore, it decreases the difficulty of the calculations tremendously, an idea that will become more apparent shortly. ...
We obtain partial progress towards answering the question of whether the quantity defined in the Mondrian Puzzle can ever equal 0. More specifically, we obtain a nontrivial lower bound for the cardinality of the set $\{n\leq x: M(n)\neq0 \}$ where $M(n)$ is the quantity appearing in the Mondrian Puzzle and $x$ is the usual quantity that one thinks of as tending to infinity. More surprisingly, we do so by use of number theoretic techniques in juxtaposition to the innately geometric nature of the problem.
Full-text available
For any fixed power exponent, it is shown that the first digits of powerful integer powers follow a generalized Benford law (GBL) with size-dependent exponent that converges asymptotically to a GBL with the inverse double power exponent. In particular, asymptotically as the power goes to infinity these sequences obey Benford’s law. Moreover, the existence of a one-parametric size-dependent exponent function that converges to these GBL’s is established, and an optimal value that minimizes its deviation to two minimum estimators of the size-dependent exponent is determined. The latter is undertaken over the finite range of powerful integer powers less than 10 s · m , m = 8 , … , 15 , where s = 1 , 2 , 3 , 4 , 5 is a fixed power exponent. Mathematics Subject Classification: 11K36, 11N37, 11Y55, 62E20, 62F12.
Full-text available
We present the first sublinear-time algorithms for computing order statistics in the Farey sequence and for the related problem of ranking. Our algorithms achieve a running times of nearly O(n 2/3), which is a significant improvement over the previous algorithms taking time O(n). We also initiate the study of a more general problem: counting primitive lattice points inside planar shapes. For rational polygons containing the origin, we obtain a running time proportional to D 6/7, where D is the diameter of the polygon.
this paper we explain another method for computing an isolated value of M(x) using