Figure 1 - uploaded by Liang Sun
Content may be subject to copyright.
Source publication
When an array of two acoustic sensors is used to localize sound sources based on time differences alone, possible solutions form a cone of confusion. This study, together with a similar one for human listeners, demonstrates that azimuth/vertical planes localization of a sound source using only time difference information is feasible when self-motio...
Contexts in source publication
Context 1
... the psychoacoustic experiment setups used in sound source localization studies consist of a seat and either a moving loudspeaker or an array of nonmoving loudspeakers. Sound waves travel at a certain velocity. The acoustic signal from a certain side of a human listener would first arrive at one ear and later the other. The interaural time difference, or ITD, constitutes one of the most important spatial hearing cues. The value of ITD depends on the spatial angle of the incoming sound wave (Figure 1.2). The theoretical model for the computation of ITD was summarized by Woodworth (1962). In the model it was assumed that the incoming sound wave was planar wave, i.e. the location of the listener was at an adequately large distance from the sound source to make the curvature of the wave front negligible. When the sound wave comes from an angle that is not 0° or 180°, the additional distance that the sound wave had to travel to reach the other ear consisted of two parts. The first part was the furthest possible linear distance that the wave traveled so as to reach the side of the head. The second part was a curvature over which the sound wave travels along the surface of head to the other ear. A very careful and detailed measurement by Kuhn (1977) showed that for a certain frequency, a certain ITD value corresponded to only a single location in the quarter field. There is an inherent limitation to using ITD alone for localization of sound sources. At lower frequencies regions where the wavelength is long, the time delay only corresponds to a small phase difference. As the frequency increases, the wavelength is ...
Context 2
... larger, causing an increasingly bigger phase difference for the same ITD value. Above 1.6 kHz the phase difference would be over 180°, and it is hard to tell the leading signal from the following signal. As a result ITD is not useful for high frequency horizontal localization of sound sources. When sound waves propagate to reach an object, they are diffracted, the extent of diffraction depending on the dimension of the object as compared to the sound wavelengths. At mid- to high-frequencies, the sound level is attenuated due to the diffraction in the spatial area behind the object along the direction of sound propagation. Due to the head shadow effect as shown in Figure 1.2, there would be an interaural level difference (ILD) for the human listeners, except for sound sources in the mid-sagittal plane. A careful study by Kuhn (1983) demonstrated the ILDs at different spatial angles on the horizontal plane. If the angle changes from 15° to 60°, the ILD increases in the majority of frequencies. But for 90°, the curve intersects with others, meaning that for a given frequency and a given ILD, multiple sound source locations are possible. This is because the head is largely spherical. For angular positions above 60° and especially close to 90°, the acoustic diffractions from different paths coincide in phase at place of head-shadowed ear. Thus, they are adding together and cause a sound level increase. So a major difference between the spatial hearing cues of ITD and ILD is that the latter change with angle in a less linear manner. ...
Similar publications
For augmented reality experiences, users wear head-mounted displays (HMD) while listening to real and virtual sound sources. This paper assesses the impact of wearing an HMD on localization accuracy of real sources. Eighteen blindfolded participants completed a localization task on 32 loudspeakers while wearing either no HMD, a bulky visor HMD, or...
Hearing protection devices (HPDs) such as earplugs offer to mitigate noise exposure and
reduce the incidence of hearing loss among persons frequently exposed to intense sound.
However, distortions of spatial acoustic information and reduced audibility of low-intensity
sounds caused by many existing HPDs can make their use untenable in high-risk (e....
Auditory spatial perception relies on more than one spatial cue. This study investigated the effects of cue congruence on auditory localization and the extent of visual bias between two binaural cues-interaural time differences (ITDs) and interaural level differences (ILDs). Interactions between these binaural cues were manipulated by stereophonic...
Blind people use auditory information to locate sound sources and sound-reflecting objects (echolocation). Sound source localization benefits from the hearing system's ability to suppress distracting sound reflections, whereas echolocation would benefit from "unsuppressing" these reflections. To clarify how these potentially conflicting aspects of...
Introduction
Visual-to-auditory sensory substitution devices are assistive devices for the blind that convert visual images into auditory images (or soundscapes) by mapping visual features with acoustic cues. To convey spatial information with sounds, several sensory substitution devices use a Virtual Acoustic Space (VAS) using Head Related Transfe...
Citations
... Figure 1.1 depicts these concepts in a simplified way. (Sun et al., 2015) However, these cues are non-existent for vertical spatial perception in the median plane. Vertical localization solely relies on spectral cues, that is, the coloration of sound resulting in different reflections from ear folds, head, and torso. ...
This dissertation work concerns the possibility of using ambisonics to create immersive impressions for the recipient when recordings were made remotely. As part of the work, issues related to the impression of externalization in the place of location of a virtual sound object were presented. Methods for creating a virtual music recording in the remote mode were also presented. Listening experiments were carried out, and the behavior of participants in remote recordings was examined using an eye-tracking device to show visual fixation. Methods have been developed to enable the remote, creative work of a choir using techniques of low-level audiovisual communication. A method of combining high-resolution 360° image and high-quality surround sound (3rd-order ambisonics) was presented. Experiments related to techniques of 6 degrees of freedom were also carried out. These experiments were performed in accordance with the state-of-the-art methodology of research on virtual reality (VR) techniques, i.e., using VR glasses and ensuring free rotational and translational movements for the participants of the experiments. These experiments were subjected to statistical analyses carried out using modern methods using linear models. The summary of the work contains general conclusions, the original author's achievements, and directions of development of the conducted research.
... Others have since confirmed the occurrence of the illusion [3][4][5][6], but have also found that it does not universally apply. It usually succeeds where the acoustic signal is of sufficiently low frequency [7]; less than approximately 1500 Hz [8]. ...
Wallach (J. Exp. Psychol. 1940, 27, 339–368) predicted that a human subject rotating about a vertical axis through the auditory centre, having an acoustic source rotating around the same axis at twice the rotation rate of the human subject, would perceive the acoustic source to be stationary. His prediction, which he confirmed by experiment, was made to test the hypothesis that humans integrate head movement information that is derived from the vestibular system and visual cues, with measurements of arrival time differences between the acoustic signals received at the ears, to determine directions to acoustic sources. The simulation experiments described here demonstrate that a synthetic aperture calculation performed as the head turns, to determine the direction to an acoustic source (Tamsett, Robotics 2017, 6, 10), is also subject to the Wallach illusion. This constitutes evidence that human audition deploys a synthetic aperture process in which a virtual image of the field of audition is populated as the head turns, and from which directions to acoustic sources are inferred. The process is akin to those in synthetic aperture sonar/radar technologies and to migration in seismic profiler image processing. It could be implemented in a binaural robot localizing acoustic sources from arrival time differences in emulation of an aspect of human audition.
... Kumon and Uozumi [36] proposed a binaural system on a robot to localize a mobile sound source but it requires the robot to move with a constant velocity to achieve 2D localization. Also, Zhong et al. [37,39] utilized the extended Kalman filtering technique to perform orientation localization using the inter-channel time difference (ICTD) data acquired by a self-rotating bi-microphone array. However, large errors were observed in [39] when the elevation angle of a sound source was close to zero. ...
... Align the microphone array perpendicular to the source-center vector Estimate the azimuth and elevation angle for each rotation step using EKF Estimate the distance for each shift using EKF Given the process and measurement models for the 2D, 3D and distance estimation described by Equations (20), (21), (23), (24), (37) and (38), the corresponding A 2D , C 2D , A 3D , C 3D , A dist and C dist matrices are given by ...
... Setup of the KEMAR dummy head on a rotating chair in the middle of the sound treated room[37]. ...
While vision-based localization techniques have been widely studied, sound source localization capabilities have not been fully enabled. In this dissertation, I
present novel three-dimensional (3D) sound source localization (SSL) techniques
based on only inter-channel time difference environment. Both the azimuth and elevation angles of a stationary sound source
are identified using the phase angle and amplitude of the acquired ICTD signal.
An SSL algorithm based on an extended Kalman filter (EKF) is developed. The
observability analysis reveals the singularity of the state estimates when the sound
source is placed above the microphone array. A means of detecting this singularity
is then proposed and incorporated into the proposed SSL algorithm. The proposed
technique is tested in both a simulated environment and two hardware platforms,
i.e., a KEMAR dummy binaural head and a robotic platform. All results show
the fast and accurate convergence of estimates.
Chapter 4 presents a novel technique that performs both orientation and distance localization of a sound source in a 3D space using only the ICTD cue, generated by the self-rotating bi-microphone array mounted on the robotic platform. The system dynamics is established in the spherical coordinate frame using a state-space model. The observability analysis of the state-space model shows
that the system is unobservable when the sound source is placed with elevation angles of 90 and 0 degrees. The proposed method utilizes the difference between the azimuth estimates resulting from respectively the 3D and the two-dimensional (2D) models to check the zero-degree-elevation condition and further estimates the elevation angle using a polynomial curve fitting approach. Also, the proposed method is capable of detecting a 90-degree elevation by extracting the zero-ICTD signal ’buried’ in noise. Additionally, a distance localization is performed by first rotating the microphone array to face toward the sound source and then shifting the microphone perpendicular to the source-robot vector by a predefined distance of a fixed number of steps. The integrated rotational and translational motions of the microphone array provide a complete orientation and distance localization using only the ICTD cue. The proposed technique is first tested in simulation and is then verified on the robotic platform. Experimental data collected by the microphones installed on a KEMAR dummy head are also used to test the proposed
technique. All results show the effectiveness of the proposed technique.
In Chapter 5, I present two novel approaches to perform 3D multi-soundsource localization (MSSL) using only the ICTD signal generated by a self-rotating bi-microphone array. The two approaches are based on two machine learning techniques, viz., Density-Based Spatial Clustering of Applications with Noise
(DBSCAN) and Random Sample Consensus (RANSAC) algorithms, respectively,
whose performances are tested and compared in both simulations and experiments. The results show that both approaches are capable of correctly identifying the number of sound sources along with their 3D orientations in a reverberant environment.
Chapter 6 presents three approaches to localizing and tracking a sound source
moving in a 3D space using a bi-microphone array rotating at a fixed angular
velocity. The motion of the sound source along with the rotation of the bimicrophone array results in a sinusoidal ICTD signal with time-varying amplitude and phase. Four state-space models are employed to develop EKFs that identify the instantaneous amplitude and phase of the signal. Observability analysis of
the four state-space models is conducted to reveal singularities. A method based
on Hilbert transform is also developed, which compares the analytic signal of the
true ICTD signal with a virtual signal having zero elevation and azimuth angles.
A moving average filter is then applied to reduce the noise and the effect of the
artifacts at the beginning and the ending portion of the estimates. The effectiveness of the proposed methods is tested and comparison studies are conducted in
the simulation.
... Binaural robotic systems can be designed to locate acoustic sources in space in emulation of the acoustic localization capabilities of natural binaural systems and of those of humans in particular [e.g. [1][2][3][4][5][6]. Binaural acoustic source localization has been largely restricted to estimating azimuth for zero elevation except where audition has been fused with vision for estimates of elevation [7][8][9][10][11]. ...
... Wallach [13] speculated the existence of a process in the human auditory system for integrating ITD information as the head is turned to locate both the azimuth and elevation of acoustic sources. Kalman filters acting on a changing ITD as the head turns have been applied to determine both azimuth and elevation in robotic binaural systems [5,[14][15][16]. Robotic binaural localization based on rotation of the listening antennae rather than the head has also been proposed [16,17]. ...
The representation of multiple acoustic sources in a virtual image of the field of audition based on binaural synthetic-aperture computation (SAC) is described through use of simulated inter-aural time delay (ITD) data. Directions to the acoustic sources may be extracted from the image. ITDs for multiple acoustic sources at an effective instant in time are implied for example by multiple peaks in the coefficients of a short-time base (≈2.25 ms for an antennae separation of 0.15 m) cross correlation function (CCF) of acoustic signals received at the antennae. The CCF coefficients for such peaks at the time delays measured for a given orientation of the head are then distended over lambda circles in a short-time base instantaneous acoustic image of the field of audition. Numerous successive short-time base images of the field of audition generated as the head is turned are integrated into a mid-time base (up to say 0.5 s) acoustic image of the field of audition. This integration as the head turns constitutes a SAC. The intersections of many lambda circles at points in the SAC acoustic image generate maxima in the integrated CCF coefficient values recorded in the image. The positions of the maxima represent the directions to acoustic sources. The locations of acoustic sources so derived provide input for a process managing the long-time base (>10s of seconds) acoustic image of the field of audition representing the robot’s persistent acoustic environmental world view. The virtual images could optionally be displayed on monitors external to the robot to assist system debugging and inspire ongoing development.
... Information gathered as the head is turned has been exploited either to locate the azimuth at which the ITD reduces to zero, thereby determining the azimuthal direction to a source, or to resolve the front-back ambiguity associated with estimating only the azimuth [4][5][6][7][8][9][10][11][12][13][14]19,20]. Recently, Kalman filters acting on a changing ITD have also been applied in robotic systems for acoustic localization [21][22][23]. ...
A solution to binaural direction finding described in Tamsett (Robotics 2017, 6(1), 3) is a synthetic aperture computation (SAC) performed as the head is turned while listening to a sound. A far-range approximation in that paper is relaxed in this one and the method extended for SAC as a function of range for estimating range to an acoustic source. An instantaneous angle λ (lambda) between the auditory axis and direction to an acoustic source locates the source on a small circle of colatitude (lambda circle) of a sphere symmetric about the auditory axis. As the head is turned, data over successive instantaneous lambda circles are integrated in a virtual field of audition from which the direction to an acoustic source can be inferred. Multiple sets of lambda circles generated as a function of range yield an optimal range at which the circles intersect to best focus at a point in a virtual three-dimensional field of audition, providing an estimate of range. A proof of concept is demonstrated using simulated experimental data. The method enables a binaural robot to estimate not only direction but also range to an acoustic source from sufficiently accurate measurements of arrival time/level differences at the antennae.
... Information gathered as the head is turned has been exploited either to locate the azimuth at which ITD reduces to zero thereby determining the azimuthal direction to a source, or to resolve the front-back ambiguity associated with estimating only azimuth [28][29][30][31][32][33][34]39,40]. Recently, the application of Kalman filters acting on a changing ITD has produced promising results [41][42][43]. ...
Binaural systems measure instantaneous time/level differences between acoustic signals received at the ears to determine angles λ between the auditory axis and directions to acoustic sources. An angle λ locates a source on a small circle of colatitude (a lamda circle) on a sphere symmetric about the auditory axis. As the head is turned while listening to a sound, acoustic energy over successive instantaneous lamda circles is integrated in a virtual/subconscious field of audition. The directions in azimuth and elevation to maxima in integrated acoustic energy, or to points of intersection of lamda circles, are the directions to acoustic sources. This process in a robotic system, or in nature in a neural implementation equivalent to it, delivers its solutions to the aurally informed worldview. The process is analogous to migration applied to seismic profiler data, and to that in synthetic aperture radar/sonar systems. A slanting auditory axis, e.g., possessed by species of owl, leads to the auditory axis sweeping the surface of a cone as the head is turned about a single axis. Thus, the plane in which the auditory axis turns continuously changes, enabling robustly unambiguous directions to acoustic sources to be determined.
... Information gathered as the head is turned has been exploited either to locate the azimuth at which ITD reduces to zero thereby determining the azimuthal direction to a source, or to resolve the front-back ambiguity associated with estimating only azimuth [28][29][30][31][32][33][34]39,40]. Recently, the application of Kalman filters acting on a changing ITD has produced promising results [41][42][43]. ...
Binaural systems measure instantaneous time/level differences between acoustic signals received at the ears to determine angles λ between the auditory axis and directions to acoustic sources. An angle λ locates a source on a small circle of colatitude (a lamda circle) on a sphere symmetric about the auditory axis. As the head is turned while listening to a sound, acoustic energy over successive instantaneous lamda circles is integrated in a virtual/subconscious field of audition. The directions in azimuth and elevation to maxima in integrated acoustic energy, or to points of intersection of lamda circles, are the directions to acoustic sources. This process in a robotic system, or in nature in a neural implementation equivalent to it, delivers its solutions to the aurally informed worldview. The process is analogous to migration applied to seismic profiler data, and to that in synthetic aperture radar/sonar systems. A slanting auditory axis, e.g., possessed by species of owl, leads to the auditory axis sweeping the surface of a cone as the head is turned about a single axis. Thus, the plane in which the auditory axis turns continuously changes, enabling robustly unambiguous directions to acoustic sources to be determined.
... We would like to thank Yi Zhou, Steve Helms-Tillery and Michael Dorman for their assistance in this research effort. Pilot studies were presented at the 169th Meeting of the Acoustical Society of America in May 2015 [51][52][53][54]. ...
Sound source localization serves as a significant capability of autonomous robots that conduct missions such as search and rescue, and target tracking in challenging environments. However, localization of multiple sound sources and static sound source tracking in self-motion are both challenging tasks, especially when the number of sound source or reflections increase. This study presents two robotic hearing approaches based on a human perception model (Wallach, 1939) that combines interaural time difference (ITD) and head turn motion data to locate sound sources. The first method uses a fitting-based approach to recognize the changing trends of the cross-correlation function of sound sources. The effectiveness of the first method was validated using data collected from a two-microphone array rotating in a non-anechoic environment, and the experiments reveal its ability to separate and localize up to three sound sources of the same spectral content (white noise) at different azimuth and elevation angles. The second method uses an extended Kalman filter (EKF) that estimates the orientation of a sound source by fusing the robot’s self-motion and ITD data to reduce the localization errors recursively. This method requires limited memory resources and is able to keep tracking the relative position change of a number of static sources when the robot moves. In the experiments, up to three sources can be tracked simultaneously with a two-microphone array.
This work presents a novel technique that performs both orientation and distance localization of a sound source in a three-dimensional (3D) space using only the interaural time difference (ITD) cue, generated by a newly-developed self-rotational bi-microphone robotic platform. The system dynamics is established in the spherical coordinate frame using a state-space model. The observability analysis of the state-space model shows that the system is unobservable when the sound source is placed with elevation angles of 90 and 0 degree. The proposed method utilizes the difference between the azimuth estimates resulting from respectively the 3D and the two-dimensional models to check the zero-degree-elevation condition and further estimates the elevation angle using a polynomial curve fitting approach. Also, the proposed method is capable of detecting a 90-degree elevation by extracting the zero-ITD signal 'buried' in noise. Additionally, a distance localization is performed by first rotating the microphone array to face toward the sound source and then shifting the microphone perpendicular to the source-robot vector by a predefined distance of a fixed number of steps. The integrated rotational and translational motions of the microphone array provide a complete orientation and distance localization using only the ITD cue. A novel robotic platform using a self-rotational bi-microphone array was also developed for unmanned ground robots performing sound source localization. The proposed technique was first tested in simulation and was then verified on the newly-developed robotic platform. Experimental data collected by the microphones installed on a KEMAR dummy head were also used to test the proposed technique. All results show the effectiveness of the proposed technique.