Conference Paper

A study of 3D audio rendering by headphones

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

An efficient implementation of a three-dimensional audio rendering system (3D-ARS) system over headphones is presented and its ability to render natural spatial sound is analyzed. In its most straightforward implementation spatial rendering is achieved by convolving a monophonic signal with the Head related transfer function (HRTF). Several methods were proposed in the literature to improve the naturalness of the spatial sound and the ability of the headphones' wearer to localize sound sources. Among these methods, externalization, by incorporation of room reflections, personalization to the anthropometric attributes of the user, and the introduction of head movements, are known to yield improved performance. This work provides a unified and flexible platform incorporating the various optional components together with software tools to statistically analyze their contribution. Preliminary statistical analysis suggests that the additional components indeed contribute to the overall localization ability of the user.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
High-quality virtual audio scene rendering is required for emerging virtual and augmented reality applications, perceptual user interfaces, and sonification of data. We describe algorithms for creation of virtual auditory spaces by rendering cues that arise from anatomical scattering, environmental scattering, and dynamical effects. We use a novel way of personalizing the head related transfer functions (HRTFs) from a database, based on anatomical measurements. Details of algorithms for HRTF interpolation, room impulse response creation, HRTF selection from a database, and audio scene presentation are presented. Our system runs in real time on an office PC without specialized DSP hardware.
Technical Report
Full-text available
Article
Full-text available
Image methods are commonly used for the analysis of the acoustic properties of enclosures. In this paper we discuss the theoretical and practical use of image techniques for simulating, on a digital computer, the impulse response between two points in a small rectangular room. The resulting impulse response, when convolved with any desired input signal, such as speech, simulates room reverberation of the input signal. This technique is useful in signal processing or psychoacoustic studies. The entire process is carried out on a digital computer so that a wide range of room parameters can be studied with accurate control over the experimental conditions. A FORTRAN implementation of this model has been included.
Article
Full-text available
This paper gives HRTF magnitude data in numerical form for 43 frequencies between 0.2---12 kHz, the average of 12 studies representing 100 different subjects. However, no phase data is included in the tables; group delay simulation would need to be included in order to account for ITD. In 3-D sound applications intended for many users, we want might want to use HRTFs that represent the common features of a number of individuals. But another approach might be to use the features of a person who has desirable HRTFs, based on some criteria. (One can sense a future 3-D sound system where the pinnae of various famous musicians are simulated.) A set of HRTFs from a good localizer (discussed in Chapter 2) could be used if the criterion were localization performance. If the localization ability of the person is relatively accurate or more accurate than average, it might be reasonable to use these HRTF measurements for other individuals. The Convolvotron 3-D audio system (Wenzel, Wightman, and Foster, 1988) has used such sets particularly because elevation accuracy is affected negatively when listening through a bad localizers ears (see Wenzel, et al., 1988). It is best when any single nonindividualized HRTF set is psychoacoustically validated using a 113 statistical sample of the intended user population, as shown in Chapter 2. Otherwise, the use of one HRTF set over another is a purely subjective judgment based on criteria other than localization performance. The technique used by Wightman and Kistler (1989a) exemplifies a laboratory-based HRTF measurement procedure where accuracy and replicability of results were deemed crucial. A comparison of their techniques with those described in Blauert (1983), Shaw (1974), Mehrgardt and Mellert (1977), Middlebrooks, Makous, and Gree...
Article
Head-related transfer function and virtual auditory display are hot issues in researches of acoustics, signal processing, and hearing etc., and have been employed in a variety of applications. In recent years, they have received an increasing attention in China. This paper reviews the latest development of head-related transfer function and virtual auditory display in China, especially works accomplished by our group.
Article
For as long as we humans have lived on Earth, we have been able to use our ears to localize the sources of sounds. Our ability to localize warns us of danger and helps us sort out individual sounds from the usual cacophony of our acoustical world. Characterizing this ability in humans and other animals makes an intriguing physical, physiological, and psychological study (see figure 1). Relying on a variety of cues, including intensity, timing, and spectrum, our brains recreate a three‐dimensional image of the acoustic landscape from the sounds we hear.
Book
From the Publisher: This title is no longer being mass-produced. It is now being printed on demand by the publisher. While this process keeps information readily available, the print quality of these books is generally that of a copier and not of a normal book. This is a copy of the original book. Intended for a one-semester advanced graduate course in digitalsignal processing or as a reference for practicing engineers andresearchers.
Article
From the Publisher:This title is no longer being mass-produced. It is now being printed on demand by the publisher. While this process keeps information readily available, the print quality of these books is generally that of a copier and not of a normal book. This is a copy of the original book.Intended for a one-semester advanced graduate course in digitalsignal processing or as a reference for practicing engineers andresearchers.
Article
A simulation of the acoustics of a simple rectangular prism room has been constructed using the MATLAB m-code programming language. The aim of this program (Roomsim) is to provide a signal generation tool for the speech and hearing research community, and an educational tool for illustrating the image method of simulating room acoustics and some acoustical effects. The program is menu driven for ease of use, and will be made freely available under a GNU General Public Licence by publishing it on the MATLAB Central user contributed programs website. This paper describes aspects of the program and presents new research data resulting from its use in a project evaluating a binaural processor for missing data speech recognition.
Article
Based on the measurements from 52 Chinese subjects (26 males and 26 females), a high-spatial-resolution head-related transfer function (HRTF) database with corresponding anthropometric parameters is established. By using the database, cues relating to sound source localization, including interaural time difference (ITD), interaural level difference (ILD), and spectral features introduced by pinna, are analyzed. Moreover, the statistical relationship between ITD and anthropometric parameters is estimated. It is proved that the mean values of maximum ITD for male and female are significantly different, so are those for Chinese and western subjects. The difference in ITD is due to the difference in individual anthropometric parameters. It is further proved that the spectral features introduced by pinna strongly depend on individual; while at high frequencies (f ⩾ 5.5 kHz), HRTFs are left-right asymmetric. This work is instructive and helpful for the research on binaural hearing and applications on virtual auditory in future.
Article
With its power to transport the listener to a distant real or virtual world, realistic spatial audio has a significant role to II play for immersive communications. Headphone-based rendering is particularly attractive for mobile communications systems. Augmented realism and versatility in applications can be achieved when the headphone signals respond dynamically to the motion of the listener. The timely development of miniature lowpower motion sensors is making this technology possible. This article reviews the physical and psychoacoustic foundations, practical methods, and engineering challenges to the realization of motion-tracked sound over headphones. Some new applications that are enabled by this technology are outlined.
Article
Relative importance of different frequency regions in binaural release from masking (for detection) and binaural gain in intelligibility was investigated. Experiments showed that the release from masking (SπN0 case) for single words in high‐level, broad‐band Gaussian noise is roughly 13 dB and is determined primarily by interaural phase opposition in the low‐frequency (<500 Hz) region. The binaural gain in intelligibility at the 50% level was on the order of 6 dB and only partly dependent on interaural phase opposition in the low‐frequency region. Interaural amplitude differences were not considered in the investigation. Subjecting the speech to a large interaural time dealy with the noise binaurally in phase resulted in a relatively constant masking level difference approaching 13 dB over the measured range from 0.5 to 10 msec. The corresponding binaural gain in intelligibility at the 50% level was on the order of 3 dB.
Conference Paper
This paper describes a public-domain database of high-spatial-resolution head-related transfer functions measured at the UC Davis CIPIC Interface Laboratory and the methods used to collect the data.. Release 1.0 (see http://interface.cipic.ucdavis.edu) includes head-related impulse responses for 45 subjects at 25 different azimuths and 50 different elevations (1250 directions) at approximately 5° angular increments. In addition, the database contains anthropometric measurements for each subject. Statistics of anthropometric parameters and correlations between anthropometry and some temporal and spectral features of the HRTFs are reported
Customization for Personalized Rendering of Motion-Tracked Binaural Sound
  • V Algazi
  • R Duda
  • J Melick
  • D Thompson
V. Algazi, R. Duda, J. Melick, and D. Thompson, "Customization for Personalized Rendering of Motion-Tracked Binaural Sound," in Audio Engineering Society Convention 117. Audio Engineering Society, 2004.