All content in this area was uploaded by Stefan Goetze on Jul 05, 2015
Content may be subject to copyright.
A preview of the PDF is not available
... The performance in terms of instrumental speech quality measures for the different considered conditions is presented in Table 2 for the simulated data and in Table 3 for the real data. Since various instrumental speech quality measures exist which can be used to assess the quality of denoised and dereverberated signals [37][38][39] and since it is difficult to assess the quality using only one single measure, the performance of the proposed system has been evaluated using the five signal-based quality measures suggested in [26], i.e., the speech to reverberation modulation energy ratio (SRMR) [40], the cepstral distance (CD) [41], the log likelihood ratio (LLR) [41], the frequency-weighted segmental SNR (FWSSNR) [41], and the perceptual evaluation of speech quality (PESQ) [42]. ...
... For each condition and for each instrumental quality measure, the best performance is highlighted by means of italic typeface to allow for an easier comparison. As expected, the selected instrumental measures do not always show completely consistent results [37,38]. Nevertheless, some common tendencies can clearly be observed, which will be summarized next. ...
... Since instrumental quality assessment, especially for the task of assessing dereverberation performance, may not always correlate well with the opinion of human listeners [37], we conducted a listening experiment in addition to the instrumental quality assessment described before. ...
This paper presents a system aiming at joint dereverberation and noise reduction by applying a combination of a beamformer with a single-channel spectral enhancement scheme. First, a minimum variance distortionless response beamformer with an online estimated noise coherence matrix is used to suppress noise and reverberation. The output of this beamformer is then processed by a single-channel spectral enhancement scheme, based on statistical room acoustics, minimum statistics, and temporal cepstrum smoothing, to suppress residual noise and reverberation. The evaluation is conducted using the REVERB challenge corpus, designed to evaluate speech enhancement algorithms in the presence of both reverberation and noise. The proposed system is evaluated using instrumental speech quality measures, the performance of an automatic speech recognition system, and a subjective evaluation of the speech quality based on a MUSHRA test. The performance achieved by beamforming, single-channel spectral enhancement, and their combination are compared, and experimental results show that the proposed system is effective in suppressing both reverberation and noise while improving the speech quality. The achieved improvements are particularly significant in conditions with high reverberation times.
... types of distortions that may be introduced by dereverberation algorithms, e.g. pre-, late-, and ringing echoes, distortions of the remaining speech signal or residual reverberation [3]. The subjective quality is assessed for the dimensions reverberant, colored, distorted and overall quality (cf. ...
... leads to the weighted least-squares (WLS) equalizer Contrary to complete equalization as in (1), RIR shaping as e.g. in (5) emphasizes the suppression of late parts of the equalized IR to prevent perceptually disturbing late echoes [2,3]. In (3) and (4), the constants N0, N1 and N2 are defined as follows [7]: N0 = (t0 + 0.2)fs, N1 = (t0 + 0.004)fs and N2 = L h + LEQ − 1 − N1 with t0, fs, L h and LEQ being the time of the direct path, the sampling rate, the length of the RIR and of the equalizer, respectively. ...
... In general, various objective quality measures exist that can be applied for quality assessment of dereverberated speech signals. [16,3]. The detailed description of the implementation of the objective quality measures for the dereverberation algorithms can be found in [3]. ...
This paper reports on the evaluation of several objective quality measures for predicting the quality of the dereverberated speech signals. The correlations between subjective quality assessment for single-channel dereverberation techniques and objective speech quality as well as speech intelligibility measures are analyzed and discussed. Six different single-channel dereverberation algorithms were included in the evaluation to account for different types of distortions. The subjective quality was assessed along the four attributes reverberant, colored, distorted and overall quality following the recommendations of ITU-T P.835. The objective measures included system-based, i.e.~channel-based, as well as signal-based measures.
... to result in the so-called weighted least-squares equalizer that emphasizes the suppression of late parts of the equalized impulse response to prevent perceptually disturbing late echoes [1,9]. In (4) and (5), the constants N0, N1 and N2 are defined as follows: N0 = (t0 + 0.2)fs, N1 = (t0 + 0.004)fs and N2 = L h + LEQ − 1 − N1 with t0, fs, L h and LEQ being the time of the direct path of the impulse response, the sampling rate, and the lengths of the RIR and of the equalization filter, respectively. ...
... For α = 1, the window corresponds to the masking found in human listeners [10]. It is known that impulse response shaping (e.g. by WLS equalization) is more robust regarding RIR estimation errors and spatial mismatch [9] than the conventional LS approach. Therefore, the third algorithm under test is the p-norm-based RIR shaping approach as described in [4], implemented here in two variants, i.e. (i) using the window function defined in (5) with α = 1 (denoted here as p-norm standard) and (ii) using the same approach with a windows function limited to -60 dB (denoted here as p-norm adapted) [8]. ...
In this contribution, six different single-channel dereverberation algorithms are evaluated subjectively in terms of speech intelligibility and speech quality. In order to study the influence of the dereverberation algorithms on speech intelligibility, speech reception thresholds in noise were measured for different reverberation times. The quality ratings were obtained following the ITU-T P.835 recommendations (with slight changes for adaptation to the problem of dereverberation) and included assessment of the attributes: reverberant, colored, distorted, and overall quality. Most of the algorithms improved speech intelligibility for short as well as long reverberation times compared to the reverberant condition. The best performance in terms of speech intelligibility and quality was observed for the regularized spectral inverse approach with pre-echo removal. The overall quality of the processed signals was highly correlated with the attribute reverberant or/and distorted. To generalize the present outcomes, further studies are needed to account for the influence of the estimation errors.
... A well-known multichannel equalization technique aiming at acoustic system inversion is the multipleinput/output inverse theorem (MINT) technique [11], which however suffers from drawbacks in practice. Since the available RIRs typically differ from the true RIRs due to, e.g., temperature or position variations [15][16][17] or due to the sensitivity of blind and supervised system identification methods to near-common zeros or background noise [13,[18][19][20][21], MINT fails to invert the true RIRs. This may lead to perceptually severe distortions in the output signal [13,14]. ...
Dereverberation techniques based on acoustic multichannel equalization, such as the relaxed multichannel least squares (RMCLS) technique and the partial multichannel equalization technique based on the multiple-input/output inverse theorem (PMINT), are known to be sensitive to room impulse response perturbations. In order to increase their robustness, several methods have been proposed, e.g., using a shorter reshaping filter length, incorporating regularization, or incorporating a sparsity-promoting penalty function. This paper focuses on evaluating the performance of these methods for single-source multi-microphone scenarios, both using instrumental performance measures as well as using subjective listening tests. While commonly used instrumental performance measures indicate that the regularized RMCLS technique yields the largest reverberant energy suppression, subjective listening tests show that the regularized and sparsity-promoting PMINT techniques yield the best perceptual speech quality. By analyzing the correlation between the instrumental and the perceptual results, it is shown that signal-based performance measures are more advantageous than channel-based performance measures to evaluate the perceptual speech quality of signals dereverberated by equalization techniques. Furthermore, this analysis also demonstrates the need to develop more reliable instrumental performance measures.
... AAL-2013-6-144) funded by the European Commission (EC) and the BMBF, as well as by the DFG-Cluster of Excellence EXC 1077/1 " Hearing4All " . and the microphones [5] [6], and (b) suppression of the late reverberation , which has a major impact on ASR performance [7], by employing a non-linear operation to the received microphone signals in the spectral domain, often ignoring the phases. These dereverberation approaches do not require the complete RIR, but operate on few parameters, e.g. the reverberation time T60 [8] and/or the direct-toreverberation ratio (DRR) [9]. ...
This work evaluates multi-microphone beamforming and single-microphone spectral enhancement strategies to alleviate the reverberation effect for robust automatic speech recognition (ASR) systems in different reverberant environments characterized by different reverberation times T60 and direct-to-reverberation ratios (DRRs). The systems consist of minimum variance distortion-less response (MVDR) beamformers in combination with minimum mean square error (MMSE) estimators, and late reverberation spectral variance (LRSV) estimators, the latter employing a generalized model of the room impulse response (RIR). Various system archi-tectures are analyzed with a focus on optimal speech recognition performance. The system combining an MVDR beamformer and a subsequent MMSE estimator was found to lead to the best results, with relative reductions of 27.7% compared to the baseline system. This is attributed to a more accurate LRSV estimate from spatial averaging and diffuse field refinement for the MMSE estimator.
Essential theory and proven techniques for acoustic echo and noise suppression Todays voice communication systems, with their increasing demand for user comfort, necessitate a growing focus on acoustic noise suppression and echo reduction. Drawing on the results of a twenty-year career with the Signal Theory Group at Darmstadt University of Technology, Darmstadt, Germany, Acoustic Echo and Noise Control: A Practical Approach addresses proven methods for suppressing acoustic echoes and noise in various sound systems, from hands-free telephones to video conferencing systems, hearing aids, and speech recognition systems. With emphasis on single-channel systems, the authors, both recognized experts in the field, deliver a combination of theoretical research and practical solutions to acoustical problems across a broad range of industries. In addition to presenting a detailed description of practical methods for controlling acoustic echoes and noise, they also develop a theory for optimal control parameters and present practical estimation and approximation methods. Some of the topics covered include:
Basic algorithms for filtering, linear prediction, and adaptation of filter coefficients
Application of these algorithms to acoustic echo cancellation and residual echo and noise suppression
Estimation of nonmeasurable quantities necessary to control algorithms
Devising control structures based on these quantities
Requiring only a basic knowledge of linear systems theory and digital signal processing, this text offers an ideal resource for students, researchers, and systems engineers seeking to familiarize themselves with the latest developments and practical applications in this fascinating field.
Acoustic MIMO Signal Processing
Yiteng (Arden) Huang
Jacob Benesty
Jingdong Chen
Telecommunication systems and human-machine interfaces start employing multiple microphones and loudspeakers in order to make conversations and interactions more lifelike, hence more efficient. This development gives rise to a variety of acoustic signal processing problems under multiple-input multiple-output (MIMO) scenarios, encompassing distant speech acquisition, sound source localization and tracking, echo and noise control, source separation and speech dereverberation, and many others. The last decade has witnessed a growing interest in exploring these problems, but there has been little effort to develop a theory to have all these problems investigated in a unified framework. This unique book attempts to fill the gap.
Acoustic MIMO Signal Processing is divided into two major parts - the theoretical and the practical. The authors begin by introducing an acoustic MIMO paradigm, establishing the fundamental of the field, and linking acoustic MIMO signal processing with the concepts of classical signal processing and communication theories in terms of system identification, equalization, and adaptive algorithms. In the second part of the book, a novel and penetrating analysis of aforementioned acoustic applications is carried out in the paradigm to reinforce the fundamental concepts of acoustic MIMO signal processing.
Acoustic MIMO Signal Processing is a timely and important professional reference for researchers and practicing engineers from universities and a wide range of industries. It is also an excellent text for graduate students who are interested in this exciting field.
Time-domain equalization is crucial in reducing channel state dimension in maximum likelihood sequence estimation and intercarrier and intersymbol interference in multicarrier systems. A time-domain equalizer (TEQ) placed in cascade with the channel produces an effective impulse response that is shorter than the channel impulse response. This paper analyzes two TEQ design methods amenable to cost-effective real-time implementation: minimum mean square error (MMSE) and maximum shortening SNR (MSSNR) methods. We reduce the complexity of computing the matrices in the MSSNR and MMSE designs by a factor of 140 and a factor of 16 (respectively) relative to existing approaches, without degrading performance. We prove that an infinite-length MSSNR TEQ with unit norm TEQ constraint is symmetric. A symmetric TEQ halves FIR implementation complexity, enables parallel training of the frequency-domain equalizer and TEQ, reduces TEQ training complexity by a factor of 4, and doubles the length of the TEQ that can be designed using fixed-point arithmetic, with only a small loss in bit rate. Simulations are presented for designs with a symmetric TEQ or target impulse response.
Resonances are fundamental to the production of musical pitch and timbre. A review of previous work and new experimental results describe the thresholds of audibility of resonances as a function of frequency, Q, relative amplitude, time delay, program material, listener hearing performance, loudspeaker directivity, and reverberation added during recording or reproduction. The findings are discussed in terms of the measured amplitude and time responses of the systems through which the audio signal is passed. While the emphasis is on reproduced sound, there are some interesting relationships to the perceived timbre of sound in live performances.