Content uploaded by Hongwei Guo
Author content
All content in this area was uploaded by Hongwei Guo on Oct 22, 2014
Content may be subject to copyright.
IEEE SIGNAL PROCESSING MAGAZINE [134] SEPTEMBER 2011
1053-5888/11/$26.00©2011IEEE
[
dsp
TIPS&TRICKS
]
Hongwei Guo
G
aussian functions are suit-
able for describing many
processes in mathematics,
science, and engineering,
making them very useful in
the fields of signal and image processing.
For example, the random noise in a signal,
induced by complicated physical factors,
can be simply modeled with the Gaussian
distribution according to the central limit
theorem from the probability theory.
Another typical example in image process-
ing is the Airy disk resulting from the dif-
fraction of a limited circular aperture as the
point-spread function of an imaging sys-
tem. Usually an Airy disk is approximately
represented by a two-dimensional Gaussian
function. As such, fitting Gaussian func-
tions to experimental data is very impor-
tant in many signal processing disciplines.
This article proposes a simple and
improved algorithm for estimating the
parameters of a Gaussian function fitted
to observed data points.
GAUSSIAN CURVE FITTING
Recall that a Gaussian function is of the
form
y
5
Ae
2
1
x2m
2
2
/2s
2
. (1)
This function can be graphed with a sym-
metrical bell-shaped curve centered at the
position
x
5
m
, with A being the height
of the peak and
s
controlling its width,
and on both sides of the peak the tails
(low-amplitude portions) of the curve
quickly fall off and approach the x-axis.
The focus of this article is on how we fit a
Gaussian function to observed data points
and determine the parameters, A,
m
, and
s
exactly. The centroid method takes
advantage of the symmetry of a Gaussian
function, thus allowing determining the
Gaussian peak position very efficiently
[1], [2]. Although this method is popular-
ly used in image processing for the sub-
pixel peak detection of a point or a line, it
does not enable us to estimate the width
or height of a peak. In practice, it is not
easy to determine all the Gaussian param-
eters including A,
m
, and
s
because this
problem is generally associated with the
solution of an overdetermined system of
nonlinear equations, which is generated
by substituting the observed data into (1).
The standard solution for such a non-
linear problem is to employ an iterative
procedure like Newton-Raphson algo-
rithm, with which a sufficiently good ini-
tial guess is crucial for correctly solving
for the unknowns, and it is possible that
the procedure does not converge to the
true solution [3]. By noting that a
Gaussian function is the exponential of a
quadratic function, a simpler method
was proposed by Caruana et al. [4]. It cal-
culates the natural logarithm of the data
first and then fits the results to a parab-
ola. Another method is to fit the straight
line resulting from the differential of the
quadratic function just mentioned [5],
[6]. With these algorithms, however,
noise in the observed data may induce
relatively large errors in the estimated
parameters, and the accuracies strongly
depend on the y-amplitude range of the
observed data points.
To overcome the aforementioned prob-
lems, this article analyzes the effects of
noise on Caruana’s algorithm and derives
a simple and improved technique for esti-
mating the Gaussian parameters. The
technique uses a weighted least-squares
method, deduced from a noise model, to
fit the logarithm of Gaussian data. As such,
the influences of the statistical fluctua-
tions on the estimated Gaussian parame-
ters are significantly reduced. Based on
this principle, we also suggest an iterative
procedure that is considerably less sensi-
tive to the amplitude range of the observed
data points. As a result, the initial guesses
for the estimated parameters will no lon-
ger be critical. We proceed by reviewing
Caruana’s algorithm in the next section.
CARUANA’S ALGORITHM
Caruana’s algorithm is based on the fact
that a Gaussian function is the exponen-
tial of a quadratic function. Taking the
natural logarithm of the Gaussian func-
tion in (1) yields
ln
1
y
2
5 ln
1
A
2
1
2
1
x 2m
2
2
2
s
2
5 ln
1
A
2
2
m
2
2
s
2
1
2mx
2
s
2
2
x
2
2
s
2
5 a 1 bx 1 cx
2
, (2)
where a5ln (A)2
m
2
/
1
2s
2
2
, b 5
m
/
s
2
,
and c 521
/
1
2s
2
2
. By doing this, the
nonlinear equation with unknowns A,
m
,
and s is transformed into a linear one
with unknowns being a, b, and c, thus
alleviating its computational complexity.
The Gaussian parameters A,
m
, and
s
can
be calculated from a, b, and c.
Note that (2) denotes a parabola
whose peak position is the same as that
of the Gaussian function described in
(1). We show an example of this in
Figure 1, where the black solid curve in
A Simple Algorithm for Fitting a Gaussian Function
Digital Object Identifier 10.1109/MSP.2011.941846
Date of publication: 22 August 2011
“DSP Tips and Tricks” introduces
practical design and implementation
signal processing algorithms that
you may wish to incorporate into
your designs. We welcome readers
to submit their contributions.
Contact Associate Editors Rick Lyons
(R.Lyons@ieee.org) or Clay Turner
(clay@claysturner.com).
IEEE SIGNAL PROCESSING MAGAZINE [135] SEPTEMBER 2011
Figure 1(a) illustrates a Gaussian func-
tion with A 5 1,
m
5 7, an
d
s51. 5,
and the black solid curve in Figure 1(b)
plots its logarithm.
The fundamental principle of
Caruana’s algorithm is to fit this parabola
in Figure 1(b) in the least squares sense so
that its coefficients, a, b, and c are deter-
mined, and then the Gaussian parameters,
A,
m
, and
s
, are calculated as we shall
show. To perform this task, an error func-
tion based on (2) is defined, namely
d5ln
1
y
2
2
1
a 1 bx 1 cx
2
2
. (3)
Differentiating the sum of
d
2
with respect
to a, b, and c and setting the resultant
expressions to zero yields a linear system
of equations
£
N g x g
x
2
g x g x
2
g x
3
g
x
2
g x
3
g x
4
§£
a
b
c
§5 £
gln
1
y
2
g x ln
gx
2
ln
1
y
2
1
y
2
§,
(4)
where N is the number of observed data
points and g denotes g
n
51
N
for shorten-
ing the expression. After solving (4) for a,
b, and c, the desired parameters of the
Gaussian function are calculated using
m5
2b
2
c
,
(5)
s5
Å
21
2c
, (6)
and
A 5 e
a2b
2
/4c
.
(7)
EFFECTS OF NOISE
Caruana’s algorithm is computationally
efficient, since it is noniterative. In the
presence of noise, however, its accuracy
decreases dramatically. See Figure 1 for
an example, where the dots in Figure 1(a)
denote the data obtained by sampling the
black solid curve. The observed data is
contaminated by zero-mean random noise
having a standard deviation (SD) of 0.05.
Excluding the negative-valued data points,
their logarithms are plotted in Figure
1(b), also with dots. We see from them
that the fluctuations of the data from their
theoretical values, induced by the noise,
may be magnified by the logarithmic
operation, especially for the points far
away from
m
having small Gaussian values
falling down. Using Caruana’s algorithm,
we fit the logarithmic data to a quadratic
function. The resulting parabola is plotted
in Figure 1(b) with the blue dashed curve,
which noticeably deviates from the theo-
retical curve. The estimates of the
Gaussian parameters are A50.5946,
m
57.6933, and s52.3768, and the recon-
structed Gaussian curve is shown in
Figure 1(a), also with the dashed blue
curve. From these results, the errors
induced by noise are apparent. This phe-
nomenon can be theoretically explained
by considering an additive noise model.
If there is an additive random noise
h
, the data we observed is not the ideal
value y but
y
^
5 y 1h. (8)
Accordingly, the error function becomes
d5ln
1
y
^
2
2
1
a 1 bx 1 cx
2
2
5 ln
1
y 1h
2
2
1
a 1 bx 1 cx
2
2
. (9)
Expanding it into Taylor series and rea-
sonably omitting the high-order terms,
we have
d < ln
1
y
2
2
1
a 1 bx 1 cx
2
2
1
h
y
(10)
so the expectation of
d
2
is
E
5
d
2
6
5
3
ln
1
y
2
2
1
a 1 bx 1 cx
2
24
2
1
s
h
2
y
2
, (11)
where s
h
is the standard deviation of
the noise.
Noting the second term of (11), if
y
is
very small, the noise at this point will
introduce very large errors in the esti-
mates. This fact means that, when we use
Caruana’s algorithm, the observed data
points used in our computations should
be limited to those within a narrow range
near the peak position of Gaussian curve
(for example within the x-interval of
m
2 2s # x #
m
1 2s
2
, where the
Gaussian function has relatively large val-
ues. In practice, because the parameters
m
and
s
are unknown, we usually set a
threshold to exclude the data points hav-
ing very small amplitude values.
If we use only the data points whose
y-amplitude values are greater than 0.2,
the fitting results are those illustrated
with red dashed curves in Figure 1. (The
threshold value of 0.2 is determined
empirically and must be several times
greater than the noise SD value of 0.05.)
In this scenario the estimated Gaussian
parameters become A 5 0.9639,
m
5 6.9806, and
s5
1.
5
7
5
8, which are
close to the theoretical values.
Even so, the estimation using (11)
remains more dependent on the observed
points with small values than on those
with large ones, and usually a manual
intervention has to be performed for
0 2 4 6 8 10
–0.2
0
0.2
0.4
0.6
0.8
1
–12
–10
–8
–6
–4
–2
0
x
0246810
x
(a) (b)
y
ln (y )
Noisy Data
Theoretical Curves
Curves Fitted with
All Data
Curves Fitted with
Data Above 0.2
[FIG1] Parts (a) and (b) show the results of Caruana’s algorithm in the presence of noise.
IEEE SIGNAL PROCESSING MAGAZINE [136] SEPTEMBER 2011
[
dsp
TIPS&TRICKS
]
continued
thresholding the data in advance. We
shall solve this problem by employing our
proposed weighted least squares algo-
rithm in the next section.
WEIGHTED LEAST SQUARES
ESTIMATION
The description of our proposed weight-
ed least squares Gaussian curve fitting
algorithm, which overcomes the noise
sensitivity of Caruana’s algorithm, begins
by redefining the error function, using
(3), as
e5
yd
5
y
3
ln
1
y
2
2
1
a 1 bx 1 cx
2
24
(12)
and in the presence of noise
e5y
3
ln
1
y 1h
2
2
1
a 1 bx 1 cx
2
24
< y
3
ln
1
y
2
2
1
a 1 bx 1 cx
2
24
1h (13)
so the expectation of
e
2
is
E
5
e
2
6
5
3
y ln
1
y
2
2y
1
a 1 bx 1 cx
2
24
2
1s
h
2
.
(14)
In (14), the influence of y on the sec-
ond term is removed. Minimizing the
sum of
e
2
implies an optimal weighted
least squares estimation with the weights
equaling y. Differentiating the sum of
e
2
with respect to a, b, and c and setting the
resultant expressions to zero yields a lin-
ear system of equations of the form
£
gy
^
2
gxy
^
2
gx
2
y
^
2
gxy
^
2
gx
2
y
^
2
gx
3
y
^
2
gx
2
y
^
2
gx
3
y
^
2
gx
4
y
^
2
§£
a
b
c
§
5 £
g y
^
2
ln
1
^
y
2
g xy
^
2
ln
g x
2
y
^
2
ln
1
^
y
2
1
^
y
2
§, (15)
In this equation system, because the true
values of y are unknown, we have to use
y
^
instead of y for the weights. Solving (15)
for a, b, and c, the parameters of the
Gaussian function are further calculated
via (5) through (7).
Figure 2 illustrates the performance of
our weighted least squares technique.
The solid curve and dots in Figure 2(a)
denote the theoretical Gaussian function
and its sampling data, respectively, which
are the same as those in Figure 1(a). The
curve reconstructed using our technique
is plotted in Figure 2(a) with the blue
dashed line. The estimated Gaussian
parameters are A 5 0.9689,
m
5 7.0184,
and s51.62
5
1, demonstrating that our
technique, which does not exclude small-
valued data, is more accurate than
Caruana’s method.
Of course if we limit our computa-
tions by using only the data points having
amplitudes greater than 0.2, more accu-
rate results may be obtained, say A 5
0.9807,
m
5 7.0114, and s51
.5683.
The above results are obtained from
one simulation, and a more convincing
descriptor is the root-mean-square (RMS)
error. This descriptor, as a statistic, is
more suitable for describing the behavior
of an algorithm in the presence of ran-
dom noise. As such, we performed 5,000
simulations repeatedly with varying noise
(SD 5 0.05) and the threshold held at 0.2.
With Caruana’s algorithm, the RMS
errors for the parameters A,
m
, and
s
are
0.0340, 0.0431, and 0.0779, respectively;
whereas, when our proposed technique is
used, the RMS errors for the three param-
eters are reduced to 0.0179, 0.0315, and
0.0554, respectively. These results dem-
onstrate the accuracy of our proposed
technique over Caruana’s algorithm when
fitting a Gaussian peak.
The reason for the improvement of
the proposed technique is graphically
explained in Figure 2(b), where the verti-
cal axis denotes the logarithms of the
Gaussian data multiplied by the weights
y. From it, we see that the random fluctu-
ations of the data, induced by the noise,
are much more uniform across the full
data x-range. Therefore, data having small
values, as indicated by (14), will not have
an excessively detrimental impact on our
estimation results.
ITERATIVE PROCEDURE
Our technique, compared with Caruana’s
algorithm, is less sensitive to the noise and
more accurate in fitting a Gaussian peak.
However, if a long tail of the Gaussian
function curve is included in the observed
data range, the large noise contamination
in those data points far away from the peak
position still inversely affect our curve fit-
ting accuracies, even lead to failure in the
estimation, due to the following reasons.
First, (10) cannot fully describe the behav-
iors of the noise, because the high-order
terms of the Taylor series have been omit-
ted in this equation. Second, the true val-
ues of y are unknown, so we have to use
the noisy values
^
y
instead of y for the
weights in (15). For the observed data
points whose values of y are very small,
the signal-to-noise ratios may decrease. In
other words, the relative difference
between
^
y
and y may be very large, thus
inducing considerable error.
We solve this large noise contamina-
tion problem by iterating the estimating
0246810
–0.2
0
0.2
0.4
0.6
0.8
1
–0.4
–0.3
–0.1
–0.2
0
0.1
x
0246810
x
(a) (b)
y
y ⋅ ln (y )
Noisy Data
Theoretical
Curves
Reconstructed
Curves
[FIG2] Results of the weighted least squares estimation in the presence of noise.
IEEE SIGNAL PROCESSING MAGAZINE [137] SEPTEMBER 2011
technique described by (15) with the
weight values being updated in each itera-
tion. The procedure is summarized by
£
gy
1
k21
2
2
gxy
1
k21
2
2
gx
2
y
1
k21
2
2
gxy
1
k2
1
2
2
gx
2
y
1
k21
2
2
gx
3
y
1
k21
2
2
g x
2
y
1
k21
2
2
gx
3
y
1
k21
2
2
g x
4
y
1
k21
2
2
§
3 £
a
1
k
2
b
1
k
2
c
1
k
2
§5 £
Sy
2
1
k21
2
ln
1
y
^
2
Sxy
2
1
k21
2
ln
Sx
2
y
1
k21
2
2
ln
1
y
2
^
1
^
y
2
§, (16)
where
y
1
k
2
5
e
y
^
for k 5 0
e
a
1
k
2
1b
1
k
2
x1c
1
k
2
x
2
for k . 0
(17)
with the parenthesized subscripts being
the iteration indices. Compared with a
standard iterative algorithm (e.g.,
Newton-Raphson) for solving a nonlinear
system, our proposed iterative algorithm
is computationally much simpler, and the
initial guesses for the unknown a, b, and
c are not required.
To verify the performance of this itera-
tive procedure, we define a Gaussian func-
tion with the parameters A 5 1,
m
5 9.2,
and s50.7
5
. The x-range of observed
data points is from zero to ten, and the
noisy data is illustrated with the black
solid curve in Figure 3, where the SD of
the additive noise is 0.1.
This noisy Gaussian function has a nar-
row peak located near the right edge of the
data’s x-range. On the left side of the peak
there exists a long tail where Gaussian
function has very small values and noise is
relatively large (the signal-to-noise ratio
here is very small). The weighted least
squares technique in the previous section
is not effective in this situation, but we can
succeed by using the iterative procedure
that was just introduced. Although the
curves in Figure 3 show that the first sev-
eral iterations cannot produce a satisfac-
tory result, after ten iterations the
reconstructed curve fits the theoretical
noise-free Gaussian data quite well.
Table 1 lists the estimated Gaussian
parameters versus the number of itera-
tions. There we see the convergence of
our iterative procedure. In the first two
iterations, the large noise in the data
makes the estimated
s
value to be imagi-
nary, but the final iteration results accu-
rately approximate the theoretical values.
SUMMARY
We proposed an improved technique for
estimating the parameters of a Gaussian
function from its observed data points.
With it, the logarithms of Gaussian data
are fitted by using a weighted least
squares method derived from a noise
model, so that the influences of random
fluctuations on the estimation are
effectively eliminated. Compared to
Caruana’s algorithm, our technique is
much less sensitive to random noise.
Based on our weighted least squares
method we also suggested an iterative
procedure suitable for reconstructing a
Gaussian curve when a long tail of
Gaussian curve is included in the ob-
served data points. Because the iterative
procedure starts directly from the origi-
nal data, the initial guesses, which are
generally crucial for guaranteeing the
convergence of an iterative procedure,
are not required, and the implementa-
tion of setting a threshold for excluding
the small Gaussian data is also unneces-
sary. Although the techniques proposed
in this article focused on fitting a one-
dimensional Gaussian function, their
principles are easy to extend to multidi-
mensional Gaussian fitting.
ACKNOWLEDGMENTS
The author gratefully appreciates Richard
(Rick) Lyons for his suggestions on the
content and his assistance with the text of
this article. The author also acknowledges
the China Scholarship Council and the
Mechatronics Engineering Innovation
Group Project from Shanghai Education
Commission for their support.
AUTHOR
Hongwei Guo (hw-guo@yeah.net) is a
professor in the Lab of Applied Optics and
Metrology, the Department of Precision
Mechanical Engineering at Shanghai
University, China.
REFERENCES
[1] Y. Feng, J. Goree, and B. Liu, “Accurate particle
position measurement from images,” Rev. Scient. In-
strum., vol. 78, no. 5, pp. 53–59, May 2007.
[2] R. B. Fisher and D. K. Naidu, “A comparison of
algorithms for subpixel peak detection,” in Image
technology: Advances in Image Processing,
Multimedia and Machine Vision, J. Sanz, Ed. Berlin:
Springer-Verlag, 1996, pp. 385–404.
[3] W. Press, S. Teukolsky, W. Vetterling, and B.
Flannery, Numerical Recipes: The Art of Scientific
Computing, 3rd ed. New York: Cambridge Univ.
Press, 2007, pp. 733–836.
[4] R. Caruana, R. Searle, T. Heller, and S. Shupack,
“Fast algorithm for the resolution of spectra,” Anal.
Chem., vol. 58, no. 6, pp. 1162–1167, May 1986.
[5] W. Zimmermann, “Evaluation of photopeaks in
scintillation Gamma-ray spectroscopy,” Rev. Scient.
Instrum., vol. 32, no. 9, pp 1063–1065, Sept. 1961.
[6] R. Abdel-Aal, “Comparison of algorithmic and ma-
chine learning approaches for the automatic fitting of
Gaussian peaks,” Neural. Comput. Appl., vol. 11, no. 1,
pp 17–29, June 2002.
[SP]
[FIG3] Results of the proposed iterative algorithm.
0 1 2 3 4 5 6 7 8 9 10
–0.2
0
0.2
0.4
0.6
0.8
1
y
x
(3)
(10)
(5)
Noisy Data
Reconstructed Curves with (k) Iterations
(1)
[TABLE 1] ESTIMATED GAUSSIAN
PARAMETERS (A 5 1, m 5 9.2,
AND s 5 0.75).
NUMBER OF
ITERATIONS A ms
1 0.0648
26.8010
j7.0601
2 0.0270 0.4573 j3.4835
3 0.8175 11.0473 2.7769
4 0.8237 9.9600 1.5710
5 0.8890 9.1644 0.9004
6 0.9778 9.1482 0.7564
7 0.9966 9.1468 0.7272
8 0.9990 9.1473 0.7232
9 0.9994 9.1474 0.7226
10 0.9994
9.1474
0.7225