Simulation framework for detecting and tracking moving sound sources using
acoustical beamforming methods
P´eter Tapolczai, P´eter Fiala, Gergely Firtha, P´eter Rucz
Laboratory of Acoustics and Studio Technologies,
Budapest University of Technology and Economics, H1117 Budapest, Hungary, Email: email@example.com
Microphone arrays and beamforming procedures enable
the localization of sound sources and are used in a great
number of industrial applications. Remote sound sources
can be localized by creating an acoustical image of the
spatial distribution of the source strength. Acoustical
focusing can also be implemented in order to attain the
ﬁltered signal of selected sources. Application to moving
sound sources involve a number of challenges, such as
taking the Doppler shift into account, or to apply the
acoustical focusing with a time-varying focal point.
In this contribution a simulation framework for the de-
tection and tracking of moving sources is introduced. It
is shown that in case of moving sources, source locali-
zation and acoustical focusing can be implemented in a
feedback loop to enhance the quality of the detection.
The object oriented simulator is capable of evaluating
the sound ﬁeld of arbitrary moving sound sources and
contains implementations of various beamforming met-
hods. The reconstruction of the trajectory of the moving
source is supplemented by a nonlinear Kalman ﬁlter. The
tracking of a small unmanned aerial vehicle is demonstra-
ted as an example application.
The framework introduced in this paper is a MATLAB-
based object-oriented toolbox. The simulator serves two
main purposes. On the one hand, it allows for the si-
mulation of scenarios, involving multiple moving sound
sources and arbitrary microphone array conﬁgurations.
On the other hand, it provides a uniﬁed structure for
testing beamforming and source localization algorithms.
The sound ﬁeld simulation module is capable of repro-
ducing the sound signals at stationary receiver locations
(i.e., microphone positions), created by a custom num-
ber of sound sources moving along arbitrary trajectories.
In order to properly account for the movement of the
sources the emission times are calculated at each time
instance for all source–receiver pairs.
The signal processing tools enable the acoustical focusing
to the moving sources using the delay and sum method.
The framework currently contains implementations of
the CLEAN-PSF, CLEAN-SC , DAMAS , and MU-
SIC  algorithms beside the conventional beamforming
method for acoustical imaging and subsequent source lo-
calization. By means of a Kalman ﬁlter, source loca-
lization estimations can be averaged and fusioned with
•Sources: signals and trajectories
•Environment: wave propagation
•Microphone array: sampling
Sound ﬁeld simulation
•Mixdown and ﬁltering
•Delay and sum
Figure 1: Structure and data ﬂow of the framework.
predictions from a dynamical model. Finally, in case of
sound signals with signiﬁcant harmonic content, such as
the noise radiated by rotating propeller blades of small
unmanned aerial vehicles (UAV), a frequency tracking al-
gorithm can supplement localization and source tracking.
Figure 1 displays the structure and the data ﬂow of the
simulation framework. The next sections introduce and
demonstrate the signal processing parts of the simulator,
i.e., acoustical focusing, source localization, and tracking.
Focusing to moving sound sources
In case of moving sources, the acoustical camera is focu-
sed to the target in each block with diﬀerent phase shifts.
As a result, consecutive blocks can overlap or a gap can
occur between the segments, as illustrated in Figure 2.
Two methods are implemented in the framework to solve
this problem. One method is to interpolate each block to
the length of the path in the previous cycle by stretching
its time scale, so that the gaps disappear. The advan-
tage of this solution is that it also takes the speed of the
source into account and changes the size of the current
block, so it can compensate the change in frequency shifts
resulting, from the Doppler eﬀect to some extent. Howe-
ver, it has the disadvantage that it is computationally
intensive, so its real-time utilization is limited.
Alternatively, an additional block is calculated between
each pair of blocks, focused to an interpolated position
DAGA 2019 Rostock
Delay and sum
Block 2 Block 1
· · ·
· · ·
Figure 2: Block summation problem. Diﬀerent delays in two
consecutive blocks (light blue bricks) can result in gaps (yel-
low block parts) or overlapping segments (red block parts).
between the two samples, each shifted block being weigh-
ted using a window function, and the block summation is
then performed. The advantage of this method is that it
is much less involved computationally, compared to the
previous approach. Similar windowing methods are used
in lossy audio compression, where the windowing is re-
quired to smooth the noise resulting from quantization
with diﬀerent number of bits in adjacent frames.
During testing, no signiﬁcant diﬀerences were observed
between the focused signals resulting from the two met-
hods, the received signal was free of gaps and overlapping
artefacts in both cases.
Examination of the Doppler eﬀect
In the case of moving sources, the Doppler eﬀect can have
a signiﬁcant inﬂuence on the frequency of the received
signal. In our case, this is also interesting because the
amplitude map is evaluated in the vicinity of a predeﬁned
In this simulation case, the Doppler frequency shift of
a signal resulting from acoustical focusing is examined.
The sound ﬁeld is sampled by using a circular microp-
hone array, consisting of 24 microphones and having a
radius of 1 m. The source travels along a straight line at
a constant speed of 40 m/s, parallel to the plane of the
microphone array at 10 m distance and emits a harmo-
nic signal with 1 kHz frequency. Acoustical focusing is
performed by applying a block length of 0.01 s, with the
moving focal point set as the position of the source at the
emission time corresponding to the middle of the block.
The analytically calculated frequency and the measured
frequency of the focused signal are compared in Figure 3.
The results nearly coincide and the frequency of the sig-
nal decreases continuously, as expected.
Note that in Figure 3 the frequency of the focused sig-
nal is slightly above the analytical frequency until the
source passes the microphone array. Then, a small jump
is observed, and when the source moves away from the
microphone array, the measured frequency is lower than
the analytical one. The phenomenon is explained by the
fact that while in the analytical case the frequency was
calculated by using a single receiver at the center of the
Figure 3: Eﬀect of acoustical focusing on the measured fre-
quency of a moving harmonic source.
microphone array, the actual microphone array has a ﬁ-
nite extent. In the vicinity of the jump, the source is still
approaching some of the microphones, while it is already
receding from the others. At the point of the frequency
jump, the total weights of these two groups of micropho-
nes change rapidly. It was observed that a larger array
results in a greater frequency jump.
Using the microphone array data, an acoustical image is
created by the conventional beamforming method or an
image-cleaning algorithm (MUSIC, DAMAS, CLEAN).
As these methods operate in the frequency domain, the
signals are band-ﬁltered and the cross spectral matrix
(CSM) is estimated. The imaging algorithms use the
CSM for computing the amplitude map. Once the acou-
stical image is created, it is necessary to estimate the
location of the sources in the observation area. It is also
desirable to quantify the uncertainty (e.g. the covariance
matrix) of the estimated locations.
To estimate the source locations, the acoustical image
is processed in a cyclical manner. Our localization al-
gorithm repeats the sequence of internal steps until the
amplitude is above a certain threshold. The steps of one
cycle are illustrated in parts (a)–(e) of Figure 4. At the
beginning of each iteration, the amplitude map (a) is
cleared where its values are under a given threshold le-
vel (b). Then, a so-called connectivity analysis is carried
out on this modiﬁed amplitude map, and hence the lar-
gest contiguous object is found (c). It is necessary to “de-
lete” the given source from the amplitude map, which is
done by resetting the amplitude map to a lower threshold
in a wider contiguous area around the found source (d),
such that in the next cycle the same source is deﬁnitely
not found again. Finally, an enclosing ellipsoid is ma-
tched to the contiguous object (e) in order to estimate
the statistics of localization: the center of the ellipsoid
indicates the expected value of the distribution, and the
lengths and angles of the axes are related to the covari-
ance matrix. The steps of ﬁtting the minimum volume
enclosing ellipsoid (MVEE) are described in . Then, in
the next cycle, the same steps are performed anew, until
there is no new source found.
DAGA 2019 Rostock
(a) (b) (c)
Figure 4: Illustration of the source localization steps in case
of two moving objects. (a) Detection amplitude map. (b)–(e)
Localization steps for the ﬁrst source.
Tracking supported by Kalman ﬁlter
The Kalman ﬁlter provides an optimal state estimate of
an object by recursively averaging the noisy input data
and combining the measurements with estimations based
on the dynamic model of the object. One useful feature of
the Kalman ﬁlter is that it also estimates the covariance
matrix of the estimated state vector.
The ﬁlter is optimal, if the dynamics of the system are
linear and the noise is additive with a Gaussian distribu-
tion. However, if any of these conditions does not hold,
the conventional Kalman ﬁlter can no longer be used,
and extensions are required. Extended Kalman Filters
(EKF) are based on some kind of linearization, requi-
ring the evaluation of the Jacobian matrix, which can be
computationally involving. As an alternative, scattered
state variables can be generated around the estimated
state vector and the statistics are evaluated by transfor-
ming these scattered states using the non-linear system
or output equation, respectively. A special selection of
scattered states is used by the Unscented Kalman Filter
(UKF) that calculates the transformed statistics using
so-called sigma points that are located on the standard
deviational ellipsoid. Hence, the number of scatter points
is well deﬁned and the computational costs are moderate.
Moreover, it can also be shown that even in the case of
multiple nonlinear transformations, UKF’s sigma point
choice is optimal .
In our framework, a nonlinear relationship exists bet-
ween the observation variables, being spherical coordina-
tes (r, ϑ, ϕ), and the state variables that contain the po-
sition and velocity of the tracked object in the Cartesian
coordinate system. This choice is explained as follows:
in case of inﬁnite focal length, no information is given
about the distance r, while for ﬁnite focal lengths, it can
reasonably be assumed that the distance estimation has
diﬀerent accuracy than that of the angles.
Figure 5: Left: Amplitude spectrum of a quadrocopter re-
corded in a semi-anechoic chamber at ≈6000 RPM. Right:
Result of HPS calculation.
The dynamic model of the source is described by assu-
ming uniform motion, i.e., ˙
v≈0, with xand
vdenoting the position and the velocity of the object.
This system of equations is discretized in time, with the
sampling interval Tsbeing the time diﬀerence between
two consecutive source localization steps. Naturally, the
source is not expected to follow a linear trajectory. This
is taken into account by assuming a large system noise
for the ˙
v≈0equation, resulting in a velocity estima-
tion relying far more on the measured positions than the
model prediction. If a position estimate is not available,
i.e., the source was not found on the acoustical image,
uniform motion is predicted with a steeply increasing va-
riance of the estimated state.
Example case: tracking a simulated UAV
In order to supply more realistic input signals for the
simulation framework, the sound of a DJI–F450 quadro-
copter was recorded in a semi-anechoic chamber under
static (non-ﬂight) conditions with diﬀerent propeller con-
ﬁgurations and at various rotor speeds. The directivity
of the sound radiation was also measured. It was found
that assuming a uniform directivity is a satisfactory ap-
proximation in case of the small quadrocopter, thus, this
assumption was used in the simulation environment.
The left diagram of Figure 5 shows the amplitude
spectrum of the recorded sound with four active rotors
at ≈6000 RPM. The blade passing frequency (BPF) is
≈200 Hz in this case. As observed, there is signiﬁcant
harmonic content in the received signal, emerging from
the broadband noise. The harmonic content can also be
exploited in the detection and localization of the source.
In order to detect the BPF, a frequency detection routine,
such as the Harmonic Product Spectrum (HPS), can be
applied. The right diagram of Figure 5 shows the result
of the HPS calculation and demonstrates the achievable
gain with respect to the signal to noise ratio.
Figure 6 depicts the simulation arrangement in which
a moving UAV is tracked. A moving broadband noise
source disturbs the detection and localization of the ob-
ject. The simulated quadrocopter emits the recorded
DAGA 2019 Rostock
Figure 6: Arrangement for tracking a simulated quadrocop-
ter. A broadband noise source disturbs the detection.
sound of the real copter and moves with a constant velo-
city magnitude of |v|= 20 m/s on a spiral trajectory.
The sound ﬁeld is sampled by a circular microphone ar-
ray with the radius of 1 m and consisting of 24 sensors.
The results of the source tracking are displayed in Fi-
gure 7. As it is depicted, the Kalman ﬁlter gives a reliable
estimate of both the position and the velocity from mere
position measurements. During the initial settling time
of the ﬁlter, the predicted covariance decreases steeply.
After settling, the estimated variances of the state varia-
bles are in good correspondence with the real statistics.
The bottom diagram of Figure 7 shows the result of the
frequency tracking algorithm. When the BPF is deter-
mined using the HPS method, a harmonic to be tracked
is selected. One of the criteria for this selection is due
to the applied MUSIC algorithm, which is only eﬀective
in case of Helmholtz numbers He >4 . The other
requirement is that the signal to noise ratio should be
high in the neighborhood of the selected harmonic. As
seen, the frequency tracking algorithm adaptively selects
a harmonic of the estimated BPF, depending on the tem-
poral changes of the SNR. The selected frequencies are
close to the harmonics of the theoretical BPF. By using
the frequency tracking based on the acoustical focusing
to determine the source localization frequency, and using
the localization output to set the focal point of focusing,
a feedback loop is introduced in the tracking mechanism.
This example demonstrates the capability of the simu-
lation framework for following a moving sound source
in a noisy environment. The focusing, localization, and
source tracking algorithms will be tested on data from
real measurements in the near future. Finally, techniques
for acoustical source classiﬁcation will be investigated.
This project is supported by the European Union, project
ref.: GINOP-2.2.1-15-2017-00087. P. Rucz acknowledges
the support of the Bolyai J´anos research grant provided
by the Hungarian National Academy of Sciences.
Figure 7: Tracking of a moving quadrocopter. Top: position
and velocity from the UKF. Bottom: frequency tracking.
Supported by the ´
UNKP-18-4 New National
Excellence Program of the Ministry of Hu-
 P. Sijtsma. CLEAN based on spatial source cohe-
rence. Technical Report NLR-TP-2007-345, National
Aerospace Laboratory NLR, 2007.
 T. F. Brooks and W. M. Humphreys. A deconvo-
lution approach for the mapping of acoustic sources
(DAMAS) determined from phased microphone ar-
rays. Journal of Sound and Vibration, 294:856–879,
 H. L. van Trees. Optimum array processing, pages
1158–1163. Wiley & Sons, New York, 2002.
 S. Bonnet, C. Bassompierre, C. Godin, S. Lesscq,
and A. Barraud. Calibration methods for inertial and
magnetic sensors. Sensors and Actuators A:Physical,
 D. Simon. Optimal state estimation – Kalman, H∞,
and nonlinear approaches. Jonh Wiley and Sons, New
 G. Herold and E. Sarradj. Performance analysis of
microphone array methods. Journal of Sound and
Vibration, 401:152–168, 2017.
DAGA 2019 Rostock