Conference PaperPDF Available

Multi-Channel Listening-Room Compensation using a Decoupled Filtered-X LMS Algorithm

Authors:

Abstract and Figures

Dereverberation of speech signals in a hands-free scenario by inverse filtering has been a research topic for several years now. However, it is still a challenging problem because of the nature of common room impulse responses (RIRs), which are time-variant mixed phase systems having a large number of zeros close to, on, and even outside the unit circle in the z-domain. In this contribution an adaptive multi-channel equalization algorithm based on a decoupled version of the modified filtered-X LMS (mFxLMS) will be derived in the partitioned frequency domain. This new algorithm allows for fast convergence, computationally efficient implementation, and a low system delay under realistic conditions such as ambient noise and imperfect RIR estimates.
Content may be subject to copyright.
A preview of the PDF is not available
... Therefore, in this paper we consider equalization using 1 This work was supported in part by the Research Unit FOR 1732 "Individualized Hearing Acoustics", the Collaborative Research Centre 1330 "Hearing Acoustics", and the Cluster of Excellence 1077 "Hearing4all", funded by the German Research Foundation (DFG). multiple loudspeakers to achieve acoustic transparency, which has similarities with approaches for listening room compensation, e.g., [10,11]. Specifically, we consider a custom earpiece with three microphones and two loudspeakers [7,12], depicted in Figure 1. ...
... In order to evaluate the performance of the proposed multiloudspeaker equalization approach, we will consider the amplitude responses of the system transfer functions for the processed open ear O des (q) in (11), and the system transfer function for the (equalized) occluded ear O occ (q) in (10). Furthermore, in order to assess the perceptual quality of the signal at the eardrum we use a speech intelligibility weighted signal distortion (SD int ) measure and the perceptual evaluation of speech quality (PESQ) measure [20]. ...
Conference Paper
To improve the sound quality of hearing devices, equalization algorithms can be used that aim at achieving acoustic transparency, i.e., listening with the device in the ear is perceptually similar to the open ear. The equalization filter needs to ensure that the superposition of the processed and equalized signal played by the device and the signal leaking through the device into the ear canal matches a processed version of the signal reaching the eardrum of the open ear. Since equalization using a single loudspeaker typically does not allow for perfect equalization, in this paper we propose to use a multi-loudspeaker equalization filter to achieve acoustic transparency in a custom multi-loudspeaker hearing device. The equalization filter is computed by minimizing a regularized least-squares optimization problem. Experimental results using measured acoustic transfer functions show that the proposed multi-loudspeaker equalization filter is able to provide the desired signal at the eardrum for different gains and delays of the hearing device.
... The topic of listening room equalization [40][41][42][43][44] is not in the scope of this work and we would like to refer to literature here [45]. However, we still want to investigate how the utilization of real microphones for identifying the RIRs at the control points affects the reproduction performance. ...
Preprint
In this paper, a recently proposed approach to multizone sound field synthesis, referred to as Joint Pressure and Velocity Matching (JPVM), is investigated analytically using a spherical harmonics representation of the sound field. The approach is motivated by the Kirchhoff-Helmholtz integral equation and aims at controlling the sound field inside the local listening zones by evoking the sound pressure and particle velocity on surrounding contours. Based on the findings of the modal analysis, an improved version of JPVM is proposed which provides both better performance and lower complexity. In particular, it is shown analytically that the optimization of the tangential component of the particle velocity vector, as is done in the original JPVM approach, is very susceptible to errors and thus not pursued anymore. The analysis furthermore provides fundamental insights as to how the spherical harmonics used to describe the 3D variant sound field translate into 2D basis functions as observed on the contours surrounding the zones. By means of simulations, it is verified that discarding the tangential component of the particle velocity vector ultimately leads to an improved performance. Finally, the impact of sensor noise on the reproduction performance is assessed.
... Some remedies to this problem can be found in the field of adaptive LRE, where real-time implementations are the ultimate goal. There, the so-called filtered-x structure [22] is often employed to determine inverse filters by exciting an adaptive filter with noise [9]. These approaches can, in principle, include only a frequency-domain weighting of the error. ...
Article
For many inverse filtering problems, finite impulse response (FIR) filters are designed according to least-squares (LS) criteria, where time-domain and frequency-domain weights are often applied to achieve optimal results for the considered application. While (LS)-optimal filter coefficients are given by an explicit formula, the computation cost to compute its solution is proportional up to the third power of the number of jointly optimized filter coefficients. A joint optimization of all filter coefficients is necessary whenever a time-domain or a frequency-domain weight is introduced. This imposes limits for filter lengths and numbers of channels in many real-world scenarios. In this contribution, an algorithm is presented that yields time-domain filter coefficients optimized to meet such a weighted (LS) criterion, while performing the most expensive computation steps efficiently in the discrete Fourier transform (DFT) domain. As a consequence, the demands on computational power and memory are kept on a moderate level, even for large-scale problems. A rigorous mathematical derivation is provided that identifies all approximations used in the algorithm. Additionally, an effective regularization method is proposed that does not depend much on the regularization parameters. Furthermore, the proposed approach is experimentally evaluated considering a sound-zones scenario, which is one of many possible application areas. In that way, the applicability of the proposed approach is verified.
... The topic of listening room equalization[38][39][40][41][42] is not in the scope of this work and we would like to refer to literature here. However, we still want to investigate how the utilization of real microphones for identifying the RIRs at the control points affects the reproduction performance. ...
Article
In this paper, a recently proposed approach to multizone sound field synthesis, referred to as Joint Pressure and Velocity Matching (JPVM), is investigated analytically using a spherical harmonics representation of the sound field. The approach is motivated by the Kirchhoff-Helmholtz integral equation and aims at controlling the sound field inside the local listening zones by evoking the sound pressure and particle velocity on surrounding contours. Based on the findings of the modal analysis, an improved version of JPVM is proposed which provides both better performance and lower complexity. In particular, it is shown analytically that the optimization of the tangential component of the particle velocity vector, as is done in the original JPVM approach, is very susceptible to errors and thus not pursued anymore. The analysis furthermore provides fundamental insights as to how the spherical harmonics used to describe the 3D variant sound field translate into 2D basis functions as observed on the contours surrounding the zones. By means of simulations, it is verified that discarding the tangential component of the particle velocity vector ultimately leads to an improved performance. Finally, the impact of sensor noise on the reproduction performance is assessed.
Article
In recent years, correntropy-based algorithms which include maximum correntropy creterion (MCC), generalized MCC (GMCC), kernel MCC (KMCC) and hyperbolic cosine function-based algorithms such as hyperbolic cosine adaptive filter (HCAF), logarithmic HCAF (LHCAF), least lncosh (Llncosh) have been widely utilized in the adaptive filtering due to their robustness towards non-Gaussian/impulsive background noises. However, the performance of such algorithms suffers from high steady-state misalignment. To minimize the steady-state misalignment along with having comparable computational complexity, an exponential hyperbolic cosine function (EHCF) based new robust norm is introduced and a corresponding EHCF based adaptive filter called exponential hyperbolic cosine adaptive filter (EHCAF) is developed in this letter. Further, computational complexity and bound on learning rate for stability of the proposed algorithm is also studied. A set of simulation studies has been carried out for system identification scenario to assess the performance of the proposed algorithm. Further, EHCAF algorithm has been extended and the filtered-x EHCAF (Fx-EHCAF) algorithm is proposed for robust room equalization.
Chapter
The importance of personalized and adaptable user-interfaces has been extensively discussed (European Ambient Assisted Living Innovation Alliance, 2009; Alexandersson et al., 2009). However, it often remains unclear how to specifically implement such concepts. In the field of acoustic communication, existing models and technologies offer a wide range of possibilities. Based on these technologies, this chapter presents a concrete realization of a model-based interface in the field of acoustic human-computer interaction. The core element of the implementation is a holistic approach towards a hearing perception model, which incorporates information of the acoustic environment, the context and the user himself provides relevant information for control and adjustment of adaptable and personalized acoustic user interfaces. In principle, this way of integrating state-of-the-art technologies and models into user interfaces could be applied to other sensory perceptions as e.g. vision.
Article
Room equalization has become essential for sound reproduction systems to provide the listener with the desired acoustical sensation. Recently, adaptive filters have been proposed as an effective tool in the core of these systems. In this context, this paper introduces different novel schemes based on the combination of adaptive filters idea: a versatile and flexible approach that permits obtaining adaptive schemes combining the capabilities of several independent adaptive filters. In this way, we have investigated the advantages of a scheme called combination of block-based adaptive filters which allows a blockwise combination splitting the adaptive filters into nonoverlapping blocks. This idea was previously applied to the plant identification problem, but has to be properly modified to obtain a suitable behavior in the equalization application. Moreover, we propose a scheme with the aim of further improving the equalization performance using the a priori knowledge of the energy distribution of the optimal inverse filter, where the block filters are chosen to fit with the coefficients energy distribution. Furthermore, the biased block-based filter is also introduced as a particular case of the combination scheme, especially suited for low signal-to-noise ratios (SNRs) or sparse scenarios. Although the combined schemes can be employed with any kind of adaptive filter, we employ the filtered-x improved proportionate normalized least mean square algorithm as basis of the proposed algorithms, allowing to introduce a novel combination scheme based on partitioned block schemes where different blocks of the adaptive filter use different parameter settings. Several experiments are included to evaluate the proposed algorithms in terms of convergence speed and steady-state behavior for different degrees of sparseness and SNRs.
Conference Paper
The performance of sound reproduction systems for spatial audio is impaired by time-variant, reverberant listening environments. To tackle this issue, the Loudspeaker-Enclosure-Microphone System (LEMS) between the loudspeakers and reference microphones in the listening environment can be identified adaptively to allow an LEMS-specific pre-processing of the loudspeaker signals. This contribution introduces a broadband implementation of a narrowband Listening Room Compensation (LRC) method with additive compensation signals, recently proposed by Talagala et al. [1], it extends the concept to higher-order compensation, and compares LRC to Listening Room Equalization (LRE) analytically. Evaluations in an image-source environment confirm the efficacy of higher-order LRC and its suitability as a complexity-reduced alternative to LRE.
Article
In this contribution objective measures for quality assessment of speech signals are evaluated for listening-room compensation algorithms. Dereverberation of speech signals by means of equalization of the room impulse response and reverberation suppression has been an active research topic within the last years. However, no commonly accepted objective quality measures exist for assessment of the enhancement achieved by those algorithms. This paper discusses several objective quality measures and their applicability for dereverberation of speech signals focusing on algorithms for listening-room compensation.
Conference Paper
Multichannel adaptive equalization (AE) systems require high computational capacity, which constraints their practical implementation. Graphics Processing Units (GPUs) are well known due to their potential for highly parallel data processing. Although the GPUs seem to be suitable platforms for multichannel scenarios, an efficient use of parallel computation in the adaptive filtering context is not straightforward due to the feedback loops. This paper presents a GPU implementation of a multichannel AE system based on the filtered-x LMS algorithm working over a real-time prototype. Details of the parallelization of the algorithm are given. Experimental results are presented to validate and computationally analyze the real-time performance of the AE GPU implementation. Results show the usefulness of GPUs to develop versatile, scalable and low cost multichannel AE systems.
Article
Full-text available
Signal-processing methods such as digital equalization can in theory achieve a reduction in acoustic reverberation. In practice, however, the realization of these methods is only partially successful for a number of objective and subjective (perceptual) reasons. Two of these problems, the dependence of the equalizer performance on the source and receiver positions and the requirement for extremely lengthy filters, are addressed. It is proposed that all-pole modeling of room responses can relax the equalizer filter length requirement, and the use of vector quantization can optimally classify such responses, obtained at different source and receiver positions. Such classification can be used as a spatial equalization library, achieving reduction in reverberation over a wide range of positions within an enclosure, as was confirmed by a number of tests.
Conference Paper
Full-text available
Equalization of room impulse responses is an attractive approach for dereverberation of speech signals in a hands-free scenario. In this contribution we address the choice of the delay which has to be introduced in leastsquares
Article
Full-text available
When a conversation takes place inside a room, the acoustic speech signal is distorted by wall reflections. The room's effect on this signal can be characterized by a room impulse response. If the impulse response happens to be minimum phase, it can easily be inverted. Synthetic room impulse responses were generated using a point image method to solve for wall reflections. A Nyquist plot was used to determine whether a given impulse response was minimum phase. Certain synthetic room impulse responses were found to be minimum phase when the initial delay was removed. A minimum phase inverse filter was successfully used to remove the effect of a room impulse response on a speech signal.
Article
Full-text available
Image methods are commonly used for the analysis of the acoustic properties of enclosures. In this paper we discuss the theoretical and practical use of image techniques for simulating, on a digital computer, the impulse response between two points in a small rectangular room. The resulting impulse response, when convolved with any desired input signal, such as speech, simulates room reverberation of the input signal. This technique is useful in signal processing or psychoacoustic studies. The entire process is carried out on a digital computer so that a wide range of room parameters can be studied with accurate control over the experimental conditions. A FORTRAN implementation of this model has been included.
Conference Paper
Full-text available
Modern hands-free telecommunication devices jointly apply several subsystems, e.g. for noise reduction (NR), acoustic echo cancellation (AEC) and listening-room compensation (LRC). In this contribution the combination of an equalizer for listening room compensation and an acoustic echo canceller is analyzed. Inverse filtering of room impulse responses (RIRs) is a challenging task since they are, in general, mixed phase systems having hundreds of zeros inside and outside near the unit circle in the z-domain. Furthermore, a reliable estimate of the RIR which shall be inverted is important. Since RIRs are time-variant due to possible changes of the acoustic environment, they have to be identified adaptively. If an AEC (or any other adaptive method) is used to identify the time variant room impulse responses the estimate's distance to the real RIRs may be too high for a satisfying equalization, especially in periods of initial convergence of the AEC or after RIR changes. Therefore, we propose to estimate the convergence state of the AEC and to incorporate this knowledge into the equalizer design.
Article
A description is given of adaptive signal processing in its current state of development, with attention to the mean square error surface versus the adaptive parameters as well as the employment of adaptive algorithms as methods for the determination of the error surface maximum. Applications for adaptive techniques noted include noise cancelling and system identification, and attention is given to the case of a linear adaptive filter.
Article
A method is present for designing an equalization filter for a sound-reproduction system by adjusting the filter coefficients to minimize the sum of the squares of the errors between the equalized responses at multiple points in the room and delayed versions of the original electrical signal. Such an equalization filter can give a more uniform frequency response over a greater volume of the enclosure than a filter designed to equalize at one point only. The results of computer simulations are presented for equalization in a 'room' with dimensions and acoustic damping typical of a car interior, using various algorithms to adapt automatically the coefficients of a digital equalization filter.
Conference Paper
Teleconferencing systems employ acoustic echo cancelers (AECs) to reduce echos that result from coupling between the loudspeaker and microphone. To enhance the sound realism, two-channel audio is necessary. However, in this case (stereophonic sound) the acoustic echo cancellation problem is more difficult to solve because of the necessity to uniquely identify two acoustic paths. We explain these problems in detail and give an interesting solution which is much better than previously known solutions. The basic idea is to introduce a small nonlinearity into each channel that has the effect of reducing the interchannel coherence while not being noticeable for speech due to self masking