Virtual Reality 3D Sound Field Surveillance
for a Local and Remote Tower Environment
The Frederic Chopin University of Music
Abstract— The 3D sound surveillance techniques in a Local
and Remote Tower environment help raise the situational
awareness, provide important sound cues to Air Traffic
Controllers in order to bring the virtual reality audio-
video perception to the next level. The system concept is
based on a multiple isotropic high order microphone
arrays supported by high resolution spherical video
cameras to preserve the convergence of visuals and sounds,
supported by computer hardware and software for sound
display and sound field visualization. The multi-sensory
system is intended for a 3D headphone spatial sound
playback with high resolution monitor or head tracked
Head Mounted Display use or for specialized 3D array
speaker system. The surveillance system is intended to help
Ai r Tra f fic C ontr oll e rs to u tili z e the p ower of
underestimated psychoacoustical hearing, to visualize
sound sources in 3D space by visually overlaying — sound
pressure dynamic mapping graphics over the video in
order to help tracking the airplanes and other sound
sources on the ground and in the air. It’s especially useful
in Reduced Aerodrome Visibility Conditions but can serve
as a very advanced a 3D audio-visual recorder for accident
Keywords- Local and Remote Tower; Spatialized audio;
Binaural audio; Microphone arrays; Head tracking; 3D sound
field; Visual sound field surveillance; Situational awareness,
Isotropic microphones; 3D sound recorder system, 3D Audio-Video
Surveillance Data Center.
So far, in scientific literature I have not found any serious
consideration, which treats the area of sound cues in Remote
Tow er e nviro nment wit h pro per at ten ti on. Many A ir Tra ffi c
Controllers, raise the necessity of proper sound display
accompanying the visuals. It is just impossible to convey the
wealth of nuances of the real world airport soundscape into the
Remote Tower environment — bypassing a powerful human
3D psychoacoustic hearing abilities. We take advantage of
them on a daily basis, without realizing how many stimuli and
spatial informations reaches our ears thanks to advanced
hearing sense. In the Remote Tower literature, the importance
of sound in the Remote Tower applications is rarely spoken
about. The need for “at least one speaker for reproducing sound
from the airport” is mentioned, “takeoff power” sound
presence. A professional level discussion about the specificity
and the essence of aural experience in the Air Traffic Controller
work it’s hard to find.
I believe there is no way to think about the proper rendering
of the real-world realism in the Local and Remote Tower
virtual space while bypassing the features of binaural hearing.
We wo ul d h av e t o e xp lo it al l t he se ns es to a ch ie ve th e
maximum situational awareness possible. These days it is not
enough to propose "radially mounted directional microphones"
without the knowledge of polar directional sensitivity
characteristics of traditional directional microphone. Actually
they are radically different and do not match those of the
telephoto lenses. They are strongly dependent of the sound
frequency. For distances of several hundred meters and more,
the microphone sensitivity range is far larger than the telephoto
video camera footage (in addition the microphone do not
provide sound directional cues). Therefore, the adequacy of
using typical directional microphones located on the tower as a
main sound cue source can be easily questioned.
II. BINAURAL HEARING
Whatever we hear, there exists at specific location in space.
Humans can localize sounds thanks to binaural hearing. The
binaural hearing is hearing with two ears. It offers an important
advantages over one-ear listening, provides spatial information
which is encoded in differences of the left and right input
signals. Those differences are represented among others by
interaural time differences (ITD) and interaural level
differences (ILD), both frequency dependent. The two auditory
events that are only 1 arc degree apart in azimuth can be
discriminated binaurally for frontal sound incidence, while in
monaural listening the localization blur is 10 arc degrees.
Thanks to binaural hearing humans perceive the sound cues in
360 degrees of azimuth and approx. ± 90 degrees elevation.
The head-related transfer functions (HRTFs) describe the
filtering effect of head, pinna and torso. Primary applications of
HRTF are headphone-based binaural rendering of virtual
auditory spaces. The HRTF sets of filters are individual for
each listener, so for the best source localization the listener
should have their own individual free-space anechoic chamber
HRTF’s loaded into the playback device. The promising new
faster and less troublesome methods for HRTF generation are
ray tracing and other techniques. The quality of localization is
greatly improved when using head tracking device. The head
movements are essential if front-back confusion is affecting the
precision in localizing the sound source.
III. SOUND FIELD AURALIZATION
The expression “auralization” is analogous to the well-
known technique of visualization. The sound field auralization
displays 2D real-time sound pressure and frequency plot,
showing the source sound azimuth and elevation for each 3D
microphone set installed in the airport area. Fig. 1 shows the
side view of a low-to-mid frequency sound field representing a
jet airplane located far behind the listener head.
IV. THE SOUND ADVANTAGE
In many cases the 3D spatial sound system cannot only
complement the video experience but can have the following
•Objects that make sounds such as small birds, flocks of
birds, and other animals, can easily be noticed,
identified and localized in airport 3D space.
•High sampling rate audio (96 kHz) has huge
resolution: 3840 times higher temporal resolution than
standard video. Spherical nature of sound propagation
help with high precision sound source localization.
Such a high sampling rate is well suited for
unexpected, short-therm, transient sound analysis (e.g.,
for accidents and crash investigations), even without
the video data available.
•3D Sound can support the visual experience when the
latter is burdened with insufficient spatial resolution
e.g. beginning of aircraft visual acceleration, reversers
•No computer monitor aliasing or pixelation issue.
•Synchronous multiple 3D microphone recordings using
high sampling rate serves as a perfect resource for
simultaneous sound source localization analysis in the
•The sound quality unlike the high focal length lenses
low resolution video, does not change significantly
depending on atmospheric phenomena or time of a day.
At night, in the fog, in strong precipitation weather,
with a very high contrast image — the quality coming
from camera systems drops significantly while the
sound quality remains unchanged.
•The multi-channel audio transmission at high sampling
rate has a much lower data bandwidth compared to
high quality video resolution stream.
•The sight and auditory perception complement each
other by creating a harmonious multi-sensory
surveillance system. Only the proper use of both senses
ensure high situational awareness, as well as accurate
and quick reactions.
IV. ADVANCED AUDIO IMMERSIVE SYSTEM
The system has two distinctive modes: Local and Global
Surveillance Mode. Each of them had different characteristics
and applications. The Global Mode is intended for a generic
large scale sound field surveillance, showing the operator 2D
view of the whole airport covered with the microphones. In
such a mode the cameras are not needed and not used.
In the Local Mode we “zoom in” to the specific
microphone locations to see the sound field graphically
overlaid over the spherical video. We get more precision in
sound source details, and can look around the hemisphere 0 to
360 degrees azimuth and 0 to 90-degree elevation using the
head-tracked computer displays or Head Mounted Devices.
A. The Global Surveillance Mode
The new generation surveillance system Fig. 2, should
serve as a multi-sensory hub, enabling instant virtual reality
access to the key points of the airport. The area is equipped
with specialized 3D microphone systems, supported by 360
degree cameras for auralization purposes (small red circles at
the center of the white circles, marked with numbers 1 to 8),
situated in a key airport locations, providing the detailed audio-
visual data cues, which are sent to the Surveillance Data
Center, operating in 24-hour data recording mode. The system
operators can view entire airport area with the vertical
projection of the real-time sound field projection sent from all
of the 3D microphones.
The white circles show the upper hemisphere sound field.
In the legend of the drawing I have placed three examples of
the elevation angle visualization with the sound field sources
at: 0 degrees, 45 degrees and 90 degrees elevation. In addition,
the sound source frequencies are color coded as follows:
•Red - low frequencies
•Orange - medium frequencies
•Blue - high frequencies
The location of the colored traces indicates the actual azimuth
of the sound sources. The source elevation is represented by
the dot’s distance from the white circle to the center. Here is a
perspective drawing on Fig. 3.
The 3D sound sources located on the white circle correspond
to the 0 degrees elevation angle. The sounds on the center of
the white circle represents the 90 degrees of elevation,
vertically above the 3D microphone.
Fig. 2 shows a top view of an airport with low traffic
intensity. There are 6 sound sources depicted in white color:
•The 4 engine airplane approaches the runway 33.
•The 2 engine jet is taxiing to the RWY 29.
•The 2 engine jet is on the taxiway parallel to the RWY
•The helicopter is about to take-off near the threshold of
•The truck nearby the RWY 11.
•Flock of birds flying across the airport.
With 8 microphones and 8 cameras recording synchronously,
we can track sound sources with a high precision in the airport
hemisphere. In this example each of the 8 microphones senses
the spatial presence of at least 2 - 3 important sound sources at
once. Same source aural cues are registered by the multiple 3D
microphones at the same time, allowing spatial localization
and/or offline analysis:
•The landing jet sound cues are recorded with 4
microphones: 5, 6, 7, 8.
•The birds flock sound cues are recorded with 4
microphones: 1, 2, 3, 4.
•The RWY 29 taxiing jet cues are recorded with 4
microphones: 5, 6, 7, 8.
•The helicopter take-off sound cues are recorded with 3
microphones: 1, 2, 3.
•The smaller taxiing jet sound cues are recorded with 3
microphones: 2, 4, 5.
•The truck engine sound cues are recorded with 2
microphones: 3, 4.
Each 3D microphone sound field shows the 3 main frequency
bands: low-mid-high frequencies. This feature helps to
understand the source sound origin and speed-up the
recognition process. The elevation display shows if the sound
source is located on the ground plane or in the air. The
microphones 2, 3, 4 shows the birds flock flying across the
airport as a blue traces. The whole sound field graph is updated
25 times per second to show the dynamic changes across the
whole airport area and it could serve as very detailed off-line
accident analysis tool.
B. The Local Surveillance Mode
The Local Surveillance Mode can be used by multiple
operators looking and listening to any microphone’s sound
cues. The soundscape is visually augmented by the real-time
graphics sound field overlaid over the 2D or 3D spherical video
for multi-sensory projection. The local mode is intended for
more detailed surveillance. The operator is supplied with 3D
binaural audio cues using the headphone playback. The high
quality binaural headphone playback requires head tracker use,
because the head movements are essential for the best sound
Fig. 4 shows the real-time sound field overlaid over the
equirectangular video. In front of us, behind the parking lot and
the big trees on the right side there is a busy street, quite loud a!
traffic soundscape. The jet airplane overshadowed by the
yellow/green sound field traces is taking off RWY29, EPWA
Airport. The 3D microphone is located approx. 520 meters
from the runway threshold. On the horizontal axis we see the
360 degrees azimuth range: from -180 (Back), -90 (Left), 0
(Front), 90 (Right), to 180 (Back) degrees of 360 range. The
vertical axis shows the elevation from -90 to 90 degrees. The
sound field frequency mapping shows audible audio spectrum
from the lowest to high frequencies as a red/yellow/green/blue
traces. The next equirectangular video screenshot on Fig. 5
shows the jet airplane location 5 seconds later.
The frequency spectrum has changed. There is no high
frequency content, only the very low end and the medium
This video could be projected on the computer screen in
equirectangular form Fig. 4, 5, or spherical view using 3 axis
head tracker connected to the computer system. Fig. 6 shows
the third option, to use the Head Mounted Display for
immersive multi-sensory display. The computer display audio
playback device could be 3D speaker-based system or binaural
headphones. The Head Mounted Display requires binaural
headphones audio playback only.
The purpose of the multi-sensory system is to increase the
situational awareness within the area covered by the array of
specialized 3D microphones and accompanying spherical
The sound carries a huge amount of significant information
about the surrouning 3D area. The human spatial sound
perception is possible thanks to binaural hearing. The new
methods of sound auralization help not only to understand the
airport area sound field, but also to see the sound cue traces on
I have presented the two main methods of sound field
visualization: the Global Surveillance Mode and the Local
Surveillance Mode. They differ in a way of displaying the
sound field cues but also complement each other. Multiple
system users can instantly switch between the modes by
choosing the one more appropriate for completing the task.
The system will raise the safety and security of the airport
area giving brand new tools to the Air Traffic Controllers and
the Surveillance Services. The system will help with detecting
the expected and the unexpected sound sources like taxiing
airplanes, other vehicles and sound objects, especially in
reduced aerodrome visibility conditions.
The extremely high temporal resolution of audio recorded in
Surveillance Data Center could also serve as a super high
precision chronometer in case of off-line accident data
analysis, to locate sound sources in the airport area.
1. Bill Kapralos, “Auditory Perception and Virtual Environments”,
Department of Computer Science York University, North York, Ontario,
Canada M3J 1P3, January 29,2003 J. Clerk Maxwell, A Treatise on
Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892.
2. Michael Vorlander “Auralization, Fundamentals of Acoustics, Modeling,
Simulation, Algorithms and Acoustic Virtual Reality” RWTH Aachen
Institut für Technische Akustik, 2008 Springer
3. Niklas Rober, Sven Andres, Maic Masuch “HRTF simulations through
acoustic raytracing”, Department of Simulation and Graphics, School of
Com put ing S cience, Otto -von-Gue ric ke-Unive rsi ty Magdeb urg,
4. Francis Rumsey, “Spatial Audio”, Focal Press 2001, Linacre House,
Jordan Hill, IX2 8DP, 225 Wildwood Avenue, Woburnm MA