Article

A Real-time Audio System for Adjusting the Sweet Spot to the Listener's Position

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In the present study, a new stereophonic playback system was proposed, where the cross-talk signals would be reasonably cancelled at an arbitrary listener position. The system was composed of two major parts: the listener position tracking part and the sound rendering part. The position of the listener was estimated using acoustic signals from the listener (i.e. voice or hand-clapping signals). A direction of arrival (DOA) algorithm was adopted to estimate the directions of acoustic sources where the room reverberation effects were taken into consideration. A Crosstalk cancellation filter was designed using a free-field model. To determine the maximum tolerable shift of the listener position, a quantitative analysis of the channel separation ratio according to the displacement of the listener position was performed. Prototype hardware was implemented using a microprocessor board, a DSP board, a multi-channel ADC board and an analog frontend. The results showed that the average mean square error between the true direction of a listener and the estimated direction was about 5 degrees. More than 80% of the tested subjects indicated that better stereo images were obtained by the proposed system, compared with the non-processed signals.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Characterized by several advantages in terms of ease of implementation and robustness this approach presents also different drawbacks: it requires a carefully tuned setup and the listener have to be located on the symmetry plane between the two loudspeakers. Furthermore, a listener's position tracking algorithm [10,11,12,13], that is an important aspect for crosstalk cancellation procedure, is not directly applicable due to RACE derivation. ...
... Starting from this scenario, an innovative implementation of acoustic crosstalk cancellation for 3D audio rendering is proposed in this paper, taking advantage of the RACE technique. In particular, the introduction of new parameters derived from the free-field model [13], allows to improve the RACE algorithm performance in terms of sweet spot dimension and to add the use of listener's position tracking system by means of Microsoft Kinect. Furthermore, the RACE algorithm is improved also in terms of computational complexity: this aspect is highlighted presenting an efficient real time implementation considering a PC and a DSP-based platform. ...
... In order to overcome the main limitation of RACE, different values of attenuation ∆a and delay ∆t have been considered for each channel in order to use the RACE algorithm also in non-symmetric two loudspeaker setups. As a matter of fact, considering the freefield sound propagation relationship [13] and the geometrical representation of a stereophonic reproduction environment of Fig. 2 ...
Conference Paper
Full-text available
The paper deals with the development of an efficient real time system for the reproduction of a spatialized audio field taking into account the listeners position. The system is composed of two parts: a sound rendering system based on a crosstalk canceller that is required in order to have a spatialized audio reproduction and a listener position tracking system in order to model the crosstalk canceller parameters. Then, an efficient implementation of a time domain crosstalk cancellation algorithm is presented considering an improved version of the recursive ambiophonics crosstalk elimination algorithm. A real time application is proposed introducing a Kinect control, capable to accurately track the listener position and changing the crosstalk parameters related to its position. Several results are presented comparing the proposed approach with the state of the art in order to confirm its validity.
... A great deal of research on high-performance CTC has been carried out for stereo systems [1], [2], [4]. Multispeaker systems with more than two speakers have also attracted interest for providing enhanced stereophonic sound image in large rooms. ...
... If a CTC was designed based on H(x 0 , y 0 ) but the actual listening position is changed to (x 0 + Δx, y 0 + Δy), then the final signal at the ear is not S because the working CTC filter C(x 0 , y 0 ) = H + (x 0 , y 0 ) differs from H + (x 0 + Δx, y 0 + Δy), which is a pseudo inverse of the true acoustic path. The performance degradation by position error (Δx, Δy) is measured from the channel separation ratio (CSR) defined by (3), where G i j is an element of G = H(x 0 + Δx, y 0 + Δy) C(x 0 , y 0 ) [4]. ...
... In an ideal case when (Δx, Δy) = (0, 0), CSR L and CSR R become infinity because G 12 ( f ) = G 21 ( f ) = 0. If both CSR L and CSR R at (x 0 + Δx, y 0 + Δy) are larger than 10dB, (x 0 + Δx, y 0 + Δy) is accepted as the sweet spot of CTC [4]. Figure 2 shows the theoretical sweet spot size of the conventional CTC as a function of x 0 when applied to the four-speaker system shown in Fig. 1 ...
Article
We propose a method of enhancing the performance of a cross-talk canceller for a four-speaker system with respect to sweet spot size and ringing effect. For the large sweet spot of a cross-talk canceller, the speaker layout needs to be symmetrical to the listener's position. In addition, a ringing effect of the cross-talk canceller is reduced when many speakers are located close to each other. Based on these properties, the proposed method first selects the two speakers in a four-speaker system that are most symmetrical to the target listener's position and then adds the remaining speakers between these two to the final selection. By operating only these selected speakers, the proposed method enlarges the sweet spot size and reduces the ringing effect. We conducted objective and subjective evaluations and verified that the proposed method improves the performance of the cross-talk canceller compared to the conventional method. © Copyright 2015 The Institute of Electronics, Information and Communication Engineers.
... Considering the second category, a system based on freefield model for the impulse response definition has been presented in [11], [12], [13], [14]. In this case, a very simple model that gives an estimation of the system impulse response along x-y coordinate is designed, introducing a low accuracy of the impulse responses but providing good performance in terms of audio quality since low alterations of the timbre are introduced. ...
... In this paper, an efficient implementation of an advance audio spatializer based on acoustic crosstalk cancellation is proposed taking into account the free-field model [11]. In particular, since the proposed crosstalk cancellation algorithm lead to low timbre alterations due to the adopted model, an efficient equalization procedure useful to compensate the loudspeakers and room response transfer functions is here introduced taking into consideration a combined multipoint approach [21], [16], [19]. ...
... In particular, since the proposed crosstalk cancellation algorithm lead to low timbre alterations due to the adopted model, an efficient equalization procedure useful to compensate the loudspeakers and room response transfer functions is here introduced taking into consideration a combined multipoint approach [21], [16], [19]. Then, since the approach of [11] deals with a manual tuning of the system, an automatic procedure is here proposed in order to determine the parameters values employed in the real-time processing. In the end, in order to highlight the effectiveness of the proposed algorithm, objective and subjective comparisons have been done comparing the presented approach with the state of the art. ...
Conference Paper
Full-text available
The paper deals with the development of a real time system for the reproduction of an immersive audio field considering the crosstalk cancellation and the room response equalization issues. In particular, the real-time system is composed of two parts: a crosstalk cancellation network and a combined multipoint equalization structure. The former is required in order to have a spatialized audio and it is based on the free-field relationship in order to not introduce a timbre alteration. The latter is used to improve the objective and subjective quality of sound reproduction systems by compensating the room and loudspeakers transfer function. Both steps are based on a-priori analysis of the real environment using real impulse responses measured in different positions. In particular, an offline procedure capable of determining the tuning parameters for the crosstalk network and of deriving the final filters for the equalization structure, is adopted. Several results are presented in order to show the effectiveness of the proposed algorithms considering objective and subjective evaluations and comparing the presented approach with the state of the art.
... Besides, a common drawback of employing HRTFs is that they demand a high load of computation for convolution due to their long order filters, especially in dynamic CTC system which requires to update the CTC filters accordingly. Alternatively, the free-field transfer function is also frequently used because of its simplicity [19]. Under the time domain, the CTC filters can be implemented as a recursive attenuation and delay process, such as RACE (Recursive Ambiophonic Crosstalk Elimination) [20,21], and demand lower computational load [22]. ...
... Instead of using an individualized or generic HRTFs for crosstalk cancellation, the free-field model is also commonly employed to approximate the path responses from the loudspeakers to the listener's ears because of its simplicity and not user-specific, especially for real-time hardware implementation [19]. The free-field model just characterizes the attenuation and the delay due to the distance, and it doesn't take any effects of the listen's head into account. ...
Article
Cross-talk cancellation (CTC) plays an important role in a 3-D audio system with loudspeakers reproduction. For designing the CTC filter coefficients, the transfer functions between the loudspeakers and the listener’s ears are need to be determined in advance. Ideally, the individualized head-related transfer function (HRTF) is preferred. It provides a good performance but impractical for everyone, also with high load of computation due to convolution with a long order. Alternatively, the free-field transfer function is also frequently used because of its simplicity, especially for real-time application. However, in the absence of considering the listener’s head effects, its performance is limited. In this paper, based on the spherical head model, an attenuation factor and a phase difference factor characterizing the head effects are integrated to the free-field CTC method. The proposed CTC method shows better performance than the free-field CTC method and has also less computational cost than the HRTF-based CTC method with a comparable performance.
... However, since all these approaches require a-priori knowledge of the head related transfer functions (HRTFs), a technique capable of solving the crosstalk problem without the measured HRTF is requested for mobile device applications. In this context, a free-field model for a-priori definition of the path between the source and the listener's ears has been presented in [6], however this approach is still dependent on the listener distance. Differently, a heuristic approach named RACE (recursive ambiophonics crosstalk elimination) has been presented in [7]. ...
... Taking into consideration Fig. 1, different values of attenuation a and delay t have been considered for each channel. Therefore, considering the free-field sound propagation relationship of [6] and the geometrical representation of a stereophonic reproduction environment of Fig. 2, gain and delay values employed in the algorithm extended to the three-dimensions can be obtained as reported in the following equations: ...
Article
In recent years the use of portable devices has increased enormously, reaching a very high level of expansion. Due to size constraints, the use of small and very close loudspeakers leads to a poor spatial sound image. In this context, an audio algorithms architecture for stereo portable devices is proposed. The system is composed of a spatializer based on an improved version of the recursive ambiophonics crosstalk elimination algorithm, integrated with a combined quasi-Anechoic equalization approach and a virtual bass algorithm capable of enhancing the loudspeakers' performance. To investigate the effectiveness of the proposed algorithm, an efficient real time implementation has been realized on both Android and iOS operating systems. Finally, starting from objective results, a complete listening test session has been performed providing a deep analysis of the proposed approach. The test results have shown the positive effect of audio algorithms on personal portable devices.
... A more recent study is presented in [190]. A new stereophonic playback system is suggested, where the cross-talk signals would be reasonably cancelled at any arbitrary listener position. ...
Thesis
Full-text available
In this work, three complementary topics regarding the use of multichannel spatial audio in professional applications have b e en studied. SIRIUS, is an audio transport mechanism designed to convey multiple professional-grade audio channels over a regular LAN while maintaining their synchronization. The system reliability is guaranteed by using a FEC mechanism and a selective redundancy, without introducing any important network overload. The system also offers a low latency that meet the professional applications requirements and can operate on the existing infrastructures and coexist with other IT traffic. The system relies on standard protocols and offers a high level of interoperability with equivalent technologies. The overall performances satisfy Pro Audio requirements. The second contribution is AQUA, a comprehensive framework for multichannel audio quality assessment that provides efficient tools for both subjective and objective quality evaluation. The subjective part consists of a new design of reliable listening tests for multichannel sound that analyze both perceptual and spatial information. Audio localization accuracy is reliably evaluated using our gesture-based protocol build around the Kinect. Additionally, this protocol relies on EEG signals analysis for psychological biases monitoring and efficient subjects screening. The objective method uses a binaural model to down-mix the multichannel audio signal into a 2-channels binaural mix that maintains the spatial cues and provides a simple and scalable analysis. The binaural stream is processed by a perceptual and spatial models that calculate relevant cues. Their combination is equivalent to the internal representation and allows the cognitive model to estimate an objective quality grade. In parallel, the psychological model simulate the human behavior by adjusting the output grades according to the previous ones (i.e., the experience effect). The overall performance shows that AQUA model can accurately predict the perceptual and spatial quality of a multichannel audio in a very realistic manner. The third focus of the study is to optimize the listening experience in surround sound systems (OPTIMUS). Considering the sweet spot issue in these systems and the complexity of its widening, we introduce a tracking technique that virtually moves the sweet spot location to the actual position of listener(s). Our approach is non-intrusive and uses thermal imaging for listeners identification and tracking. The original channels are considered as virtual sources and remixed using the VBAP technique. Accordingly, the audio system virtually follows the listener actual position. For home-cinema application, the kinect can be used for the tracking part and the audio adjustment can be done using HRTFs and cross-talk cancellation filters. The system shows an improvement of the localization accuracy and the quality of the listening experience.
Chapter
Many artists produce and mix their virtual reality, game, or screen media audio productions only with headphones, but deploy them to stereo or multi-channel loudspeaker setups. Because of the acoustical and perceptual differences, listening on headphones might sound very different compared to loudspeakers, including the perception of sound sources inside the head (externalisation problem). Nevertheless, by using Head Related Transfer Functions (HRTFs) and accurate movement tracking, it is possible to simulate a loudspeaker setup with proper externalisation. In this paper, an infrared-based positional tracking system with non-individualised HRTFs to simulate a loudspeaker setup is conceptualised, designed and implemented. The system can track the user with six degrees of freedom (6-DOF); an improvement over current commercial systems that only use 3-DOF tracking. The system was evaluated on 20 participants to see if the additional DOF increased the degree of externalisation. While tracking increased the externalisation in general, there was no significant difference between 3-DOF and 6-DOF. Another test indicated that positional movement coupled with positional tracking may have a greater effect on externalisation compared to positional movement coupled with only head movement tracking. Comparisons between these results and previous studies are discussed and improvements for future experiments are proposed.
Article
Full-text available
The principle of 3D audio technology was introduced, the application of signal processing methods was re-viewed in 3D audio from the measure, computation, interpolation, approximation of head-related transfer function (HRTF) and the methods for crosstalk cancellation, summarized currently hot topics in this area. Finally, the future research trends of 3D audio were discussed.
Conference Paper
The paper deals with the development of a real time system for the reproduction of an immersive audio field considering the listeners’ position. The system is composed of two parts: a sound rendering system based on a crosstalk canceller that is required in order to have a spatialized reproduction and a listener position tracking system in order to model the crosstalk canceller parameters. Therefore, starting from the free-field model, a new model is considered introducing a directivity function for the loudspeakers and considering a three-dimensional environment. A real time application is proposed introducing a Kinect control, capable of accurately tracking the listener position and changing the crosstalk parameters. Several results are presented comparing the proposed approach with the state of the art in order to confirm its validity.
Article
The present study tested a new stereo playback system that effectively cancels cross-talk signals at an arbitrary listening position. Such a playback system was implemented by integrating listener position tracking techniques and crosstalk cancellation techniques. The entire listening space was partitioned into a number of non-overlapped cells and a crosstalk cancellation filter was assigned to each cell. The listening space partitions and the corresponding crosstalk cancellation filters were constructed by maximizing the average channel separation ratio (CSR). Since the proposed method employed cell-based crosstalk cancellation, estimation of the exact position of the listener was not necessary. Instead, it was only necessary to determine the cell in which the listener was located. This was achieved by simply employing an artificial neural network (ANN) where the time delay to each pair of microphones was used as the ANN input and the ANN output corresponded to the index of cells. The experimental results showed that more than 95% of the experimental listening space had a CSR ≥ 10 dB when the number of clusters exceeded 12. Under these conditions, the correlation between the true directions of the virtual sound sources and the directions recognized by the subjects was greater than 0.9.
Article
Full-text available
In this paper we describe the underlying concepts behind the spatial sound renderer built at the University of Southern California's Immersive Audio Laboratory. In creating this sound rendering system, we were faced with three main challenges. First the rendering of sound using the Head-Related Transfer Functions, second the cancellation of the crosstalk terms and third the localization of the listener's ears. To deal with the spatial rendering sound we use a two-layer method of modeling the HRTF's. The first layer accurately reproduces the ITD's and IAD's, and the second layer reproduces the spectral characteristics of the HRTF's. A novel method for generating the required crosstalk cancellation filters as the listener moves was developed based on Low-Rank modeling. Using Karhunen-Loeve expansion we can interpolate among listener positions from a small number of HRTF measurements. Finally we present a Head Detection algorithm for tracking the location of the listener's ears in real time using a laser scanner.
Article
Full-text available
This study examined inter-subject differences in the transfer functions from the free field to the human ear canal, which are commonly know as head-related transfer functions. The directional components of such transfer functions are referred here to as directional transfer functions (DTFs). The DTFs of 45 subjects varied systematically among subjects in regard to the frequencies of spectral features such as peaks and notches. Inter-subject spectral differences in DTFs were quantified between 3.7 and 12.9 kHz for sound-source directions throughout the coordinate sphere. For each pair of subjects, an optimal frequency scale factor aligned spectral features between subjects and, thus, minimized inter-subject spectral differences. Frequency scaling of DTFs reduced spectral differences by a median value of 15.5% across all pairs of subjects and by more than half in 9.5% of subject pairs. Optimal scale factors showed a median value of 1.061 and a maximum of 1.38. The optimal scale factor between any pair of subjects correlated highly with the ratios of subjects' maximum interaural delays, sizes of their external ears, and widths of their heads.
Article
Full-text available
Knowledge of the direction-dependent filter characteristics of the external ears is useful for the study of spatial hearing in experimental animals. The present study examined individual differences in the directional components of external-ear transfer functions (directional transfer functions, DTFs) among 24 anesthetized cats. Ears were fixed in a frontal position. Inter-cat differences in DTFs were quantified across a mid-frequency range from 8 to 16 kHz and across 30 locations in the horizontal plane and vertical midline. Across cats, DTFs showed similar direction dependence, but tended to differ in regard to the center frequencies of spectral features, such as spectral peaks and notches. Certain mid-frequency notches, for instance, varied in frequency across cats by nearly a factor of 2. Scaling of DTFs in frequency could reduce the overall differences between pairs of cats. Scale factors that minimized inter-cat differences ranged as high as 1.57 and correlated moderately with cats' body weights. Nevertheless, appreciable individual differences remained after frequency scaling. Inter-cat differences in DTFs were substantially larger than differences that resulted from variability in positioning the ears. The results suggest some guidelines regarding the conditions under which it is acceptable to apply DTF measurements from one cat to another.
Article
Full-text available
In this comprehensive study, algorithms for upmixing, downmixing, and joint up/downmixing are examined and compared. Five upmixing algorithms based on signal decorrelation and reverberation are employed to convert two-channel stereo signals to jive-channel signals. For downmixing, methods ranging from mixing with simple gain adjustment to more sophisticated head related transfer function (HRTF) filtering and crosstalk cancellation system (CCS) are utilized to downmix the center channel and the surround channels into the available two frontal loudspeakers. For situations where only two-channel content and loudspeakers are available, a number of up/down mixing schemes are used to simulate a virtual surround environment. Emphasis of comparison is placed on two consumer electronic products: a 5.1 home theater system and a dual-loudspeaker MP3 handset. The effect of loudspeaker spacing on rendering performance is examined. Listening tests are conducted to compare the processing methods in terms of three levels of subjective indices. The results are processed by using the Multi-Analysis Of VAriance (MANOVA) to justify the statistical significance, followed by a multiple regression analysis to correlate the auditory preference with various timbral and spatial attributes.
Article
Full-text available
This paper gives HRTF magnitude data in numerical form for 43 frequencies between 0.2---12 kHz, the average of 12 studies representing 100 different subjects. However, no phase data is included in the tables; group delay simulation would need to be included in order to account for ITD. In 3-D sound applications intended for many users, we want might want to use HRTFs that represent the common features of a number of individuals. But another approach might be to use the features of a person who has desirable HRTFs, based on some criteria. (One can sense a future 3-D sound system where the pinnae of various famous musicians are simulated.) A set of HRTFs from a good localizer (discussed in Chapter 2) could be used if the criterion were localization performance. If the localization ability of the person is relatively accurate or more accurate than average, it might be reasonable to use these HRTF measurements for other individuals. The Convolvotron 3-D audio system (Wenzel, Wightman, and Foster, 1988) has used such sets particularly because elevation accuracy is affected negatively when listening through a bad localizers ears (see Wenzel, et al., 1988). It is best when any single nonindividualized HRTF set is psychoacoustically validated using a 113 statistical sample of the intended user population, as shown in Chapter 2. Otherwise, the use of one HRTF set over another is a purely subjective judgment based on criteria other than localization performance. The technique used by Wightman and Kistler (1989a) exemplifies a laboratory-based HRTF measurement procedure where accuracy and replicability of results were deemed crucial. A comparison of their techniques with those described in Blauert (1983), Shaw (1974), Mehrgardt and Mellert (1977), Middlebrooks, Makous, and Gree...
Article
Full-text available
Integrated media workstations are increasingly being used for creating, editing, and monitoring sound that is associated with video or computer-generated images. While the requirements for high quality reproduction in large-scale systems are well understood, these have not yet been adequately translated to the workstation environment. In this paper we discuss several factors that pertain to high quality sound reproduction at the desktop including acoustical and psychoacoustical considerations, signal processing requirements, and the importance of dynamically adapting the reproduced sound as the listener's head moves. We present a desktop audio system that incorporates several novel design requirements and integrates vision-based listener-tracking for accurate spatial sound reproduction. We conclude with a discussion of the role the pinnae play in immersive (3D) audio reproduction and present a method of pinna classification that allows users to select a set of parameters that closely m...
Article
In general a head-related-transfer-function-based virtual sound system inherently generates a sweet spot, and a listener positioned outside the sweet spot cannot feel the surround sound effect well. A novel virtual sound rendering method is presented which allows the listener located at an arbitrary position to obtain a good impression of surround sound by using two-channel loudspeakers only. This smart virtual sound rendering system consists of a listener position tracking system with infrared and ultrasonic sensors, and an adaptive virtualizer algorithm optimized for the listener position. Based on this new method, a TV viewer at an arbitrary position can enjoy good surround sound by the simple push of a button on a remote controller.
Article
This paper seeks to pinpoint the optimal loudspeaker span that best reconciles the robustness and performance of the crosstalk cancellation system (CCS). Two sweet spot definitions are employed for assessment of robustness. Besides the point source model, head related transfer functions are employed in the simulation to capture more design aspects in practical situations. Three span angles, 10 degrees, 60 degrees, and 120 degrees, are compared via subjective experiments. Analysis of Variance is applied for analysis. The results indicate that not only the CCS performance but also the panning effect and head shadowing will dictate the overall performance and robustness. The 120-degree arrangement performs comparably well as the 60-degree arrangement, but is more preferred than the 10-degree arrangement.
Article
In conventional two-channel stereophonic reproduction the sound image appears only between the left and right loudspeakers. A new localization theory, derived from a model of hearing, governs a reproducing system that extends the sound-image area beyond the loudspeakers. The basic theory of this new technology and its applications in stereophonic systems are described.
Conference Paper
This paper focuses on a stereophonic play back system designed to adjust the "sweet spot" to the listener's position. The system includes an optical face tracker which provides information about the listener's x-y position. Accordingly, the loudspeaker signals are manipulated in real-time in order to move the "sweet spot". The stereophonic perception with an adjusted "sweet spot" is theoretically investigated on the basis of several models of binaural hearing. The results indicate that an adjustment of signals corresponding to the center of the listener's head does improve the localization over the whole listening area. Although some localization error remains due to asymmetric signal paths for off-center listening positions, which can be estimated and compensated for.
Article
A theoretical analysis of the performance of the maximum likelihood (ML) time delay estimate in a multi-path propagation is proposed. An expression for the probability of anomaly of the ML time delay estimate is obtained. Percentages of anomalous time delay estimates obtained through Monte Carlo simulation are shown to be in close agreement with theoretically predicted values.
Article
Virtual acoustic imaging systems are effective when the listener's head location is close to the head location assumed when the system was designed. The "sweet spot" refers to the spatial bubble of head location in which the system is still effective. Some of the previous work investigating the "stereo dipole" acoustic imaging system shows that for the traditional on-axis listener location the "sweet spot" is about +/-5 cm for lateral head translations. Larger head movements than this require an update of the virtual acoustic imaging filters. The interest here is the "sweet spot" size at off-axis asymmetric listener locations or an understanding of how often one needs to update the filters to ensure the listener perceives a stable virtual image as they move. The examination of the off-axis "sweet spot" size comprises a theoretical acoustic analysis, computer simulations, and a subjective study. The simulations and subjective evaluation both demonstrate that the width of tolerable lateral head translations is comparable for the symmetric on-axis listener location and asymmetric listener locations that are as far as 25 cm off-axis.
Conference Paper
upmixing, downmixing, and joint up/downmixing are examined. Two upmixing algorithms are employed to convert two-channel stereo signals to five-channel signals. For downmixing, methods ranging from mixing with simple gain adjustment to more sophisticated Head Related Transfer Function (HRTF) filtering and Crosstalk Cancellation System (CCS) are utilized to downmix the center channel and the surround channels into the available two frontal loudspeakers. For situations where only two-channel content and loudspeakers are available, a number of up/downmixing schemes are used to simulate a virtual surround environment. Emphasis of comparison is placed on a dual- loudspeaker MP3 handset. Listening tests are conducted to compare the processing methods in terms of three levels of subjective indices. The results are processed by using the Multivariate ANalysis Of VAriance (MANOVA) to justify the statistical significance1.
Conference Paper
Real room acoustic impulse responses (AIRs) modelled by infinite impulse response (IIR) filters require high model orders. Many problems involving the estimation of AIRs reduce to high dimensional optimisation problems. Subband autoregressive (AR) modelling techniques reduce this difficult optimisation problem to a number of simpler low dimensional optimisations. This paper introduces a formulation for subband AR modelling in a probabilistic framework which facilitates robust Bayesian parameter estimation. The paper also provides new results to show that the subband AR representation accurately models typical AIRs and, therefore, is suitable for modelling room reverberation
Conference Paper
It is well known that the effectiveness of loudspeaker-based 3D audio systems is critically dependent on the listener being in a known position, the so-called “sweet spot”. In this paper we propose a new system that provides increased robustness to perturbations such as head movement, reverberation, and different head shapes. The system is based on using a particular geometry that offers a combination of symmetric and asymmetric loudspeaker positions. This combination increases the bandwidth over which the proposed system is robust, when compared with existing 3D audio systems
A maximum likelihood (ML) estimator is developed for determining time delay between signals received at two spatially separated sensors in the presence of uncorrelated noise. This ML estimator can be realized as a pair of receiver prefilters followed by a cross correlator. The time argument at which the correlator achieves a maximum is the delay estimate. The ML estimator is compared with several other proposed processors of similar form. Under certain conditions the ML estimator is shown to be identical to one proposed by Hannan and Thomson [10] and MacDonald and Schultheiss [21]. Qualitatively, the role of the prefilters is to accentuate the signal passed to the correlator at frequencies for which the signal-to-noise (S/N) ratio is highest and, simultaneously, to suppress the noise power. The same type of prefiltering is provided by the generalized Eckart filter, which maximizes the S/N ratio of the correlator output. For low S/N ratio, the ML estimator is shown to be equivalent to Eckart prefiltering.
Modeling and measurement of cross-talk cancellation zones for small displacements of the listener in transaural sound reproduction with different loudspeaker arrangements
  • J J Lopez
  • F Orduna
  • A Gonzalez
Available: ftp://sound.media.mit
  • B Gardner
  • K Martin
  • Kemar Hrtf