Content uploaded by Christian Bauckhage
Author content
All content in this area was uploaded by Christian Bauckhage on Jan 03, 2020
Content may be subject to copyright.
Content uploaded by Christian Bauckhage
Author content
All content in this area was uploaded by Christian Bauckhage on Jun 23, 2019
Content may be subject to copyright.
Lecture Notes on Machine Learning
Properties of Sample Means (Part 2)
Christian Bauckhage and Olaf Cremers
B-IT, University of Bonn
This short note points out that the sample mean of a finite sample of
Euclidean data points can be computed recursively or in parallel; both
these facts are of considerable practical utility.
Setting the Stage
Previously,1we considered a finite sample X={x1, . . . , xn}of n1C. Bauckhage and O. Cremers. Lecture
Notes on Machine Learning: Properties
of Sample Means (Part 1). B-IT, Univer-
sity of Bonn, 2019
data points xj∈Rmwhose sample mean amounts to
¯x=1
n
n
∑
j=1
xj. (1)
So far, so good. However, for our purposes in this note, it is even
better to slightly extend our notation. In particular, we henceforth
write
¯xn=1
n
n
∑
j=1
xj. (2)
in order to keep track of the number of points the sample mean is
computed from. For example, for samples of sizes 1, 2,. . . , n−1, we
will express the sample means as
¯x1=1
1
1
∑
j=1
xj(3)
¯x2=1
2
2
∑
j=1
xj(4)
.
.
.
¯xn−1=1
n−1
n−1
∑
j=1
xj. (5)
Looking at these expressions, it may already be obvious where
our discussion is headed.
Indeed, our goal in this note is to show that sample means can be
computed recursively. Once we have established this, it will be easy
to see that sample means can also be computed in parallel.
Recursive Computation of Sample Means
Imagine we were analyzing a data stream where observations arrive
one at a time and suppose we had to determine their mean. Would
this require us to store all the incoming data and to recompute the
sum over increasingly many terms in (2) every time a new data point
arrives? No, it would not!
© C. Bauckhage and O. Cremers
licensed under Creative Commons License CC BY-NC
2 c.bauckhage and o.cremers
Proving this claim is easy. We simply note that the summation
required to compute the sample mean of npoints can be written as
¯xn=1
n
n
∑
j=1
xj(6)
=1
n
n−1
∑
j=1
xj+1
nxn(7)
Given the expression (5), we furthermore realize that
n−1
∑
j=1
xj= (n−1)¯xn−1(8)
which immediately establishes the following recursion
¯xn=n−1
n
¯xn−1+1
nxn(9)
=¯xn−1+1
nhxn−¯xn−1i. (10)
¯xnis a convex combination of ¯xn−1
and xn
In other words, the sample mean of npoints can be computed as
a weighted sum of the mean of the first n−1 points and the n-th
point. This is what allows for great efficiency in stream processing.
Moreover, since the weights n−1
/nand 1
/nin (9) are positive and
add to one, we find that ¯xnis a convex combination of ¯xn−1and xn.
Equivalently, (10) tells us that ¯xnis the point we reach when step-
ping from ¯xn−1into the direction of xnusing a step size of 1
/n.
Parallel Computation of Sample Means
Next, imagine we had to determine the sample mean of a data set so
massive that it does not fit into the memory of a single machine.
Using the above, we could read the data one by one from disk and
compute their mean recursively. But we can also work with larger
batches!
To see this, we first of all introduce yet another notation for the
sample mean of X, namely
¯xX=1
|X | ∑
x∈X
x(11)
Second of all, we note that, in (7), we split the data in Xinto two
subsets X1=X \ {xn}and X2={xn}and computed nothing but
¯xX=1
|X | ∑
x∈X1
x+1
|X | ∑
x∈X2
x(12)
which, third of all, led to
¯xX=|X1|
|X |
¯xX1+|X2|
|X |
¯xX2. (13)
properties of sample means 3
Of course, we can be more general. That is, we can split the given
data set Xinto any two subsets X1and X2such that X=X1∪ X2
and |X | =|X1|+|X2|. The results in (13) will still hold.
But this is to say that we can seamlessly distribute the computation
of sample means over different cores or machines. Instead of consid-
ering only two batches, we might just as well partition the data into k
disjoint batches X=X1∪ X2∪. . . ∪ Xkand therefore implement the
computation of sample means, for instance, in a MapReduce frame-
work.
Summary and Outlook
In this note, we were concerned with the sample mean ¯xof a sample
X=X1∪ X2={x1, . . . , xn}of nEuclidean data points xj∈Rm.
Writing it as
¯xn=1
n
n
∑
j=1
xj
or
¯xX=1
|X | ∑
x∈X
x
revealed the crucial identities
¯xn=n−1
n
¯xn−1+1
nxn
and
¯xX=|X1|
|X |
¯xX1+|X2|
|X |
¯xX2
which establish that sample means can be computed recursively as
well as in parallel.
Both these ways of computing means will come in handy later on;
for instance, when we study the idea of online k-means clustering.
Acknowledgments
This material was prepared within project P3ML which is funded by
the Ministry of Education and Research of Germany (BMBF) under
grant number 01/S17064. The authors gratefully acknowledge this
support.
4 c.bauckhage and o.cremers
References
C. Bauckhage and O. Cremers. Lecture Notes on Machine Learning:
Properties of Sample Means (Part 1). B-IT, University of Bonn, 2019.