R.C. Hendriks

Technische Universiteit Delft, Delft, South Holland, Netherlands

Are you R.C. Hendriks?

Claim your profile

Publications (20)48.69 Total impact

  • Source
    Article: An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech
    C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen
    [show abstract] [hide abstract]
    ABSTRACT: In the development process of noise-reduction algorithms, an objective machine-driven intelligibility measure which shows high correlation with speech intelligibility is of great interest. Besides reducing time and costs compared to real listening experiments, an objective intelligibility measure could also help provide answers on how to improve the intelligibility of noisy unprocessed speech. In this paper, a short-time objective intelligibility measure (STOI) is presented, which shows high correlation with the intelligibility of noisy and time-frequency weighted noisy speech (e.g., resulting from noise reduction) of three different listening experiments. In general, STOI showed better correlation with speech intelligibility compared to five other reference objective intelligibility models. In contrast to other conventional intelligibility models which tend to rely on global statistics across entire sentences, STOI is based on shorter time segments (386 ms). Experiments indeed show that it is beneficial to take segment lengths of this order into account. In addition, a free Matlab implementation is provided.
    IEEE Transactions on Audio Speech and Language Processing 10/2011; · 1.50 Impact Factor
  • Article: A Generalized Poisson Summation Formula and its Application to Fast Linear Convolution
    J. Martinez, R. Heusdens, R.C. Hendriks
    [show abstract] [hide abstract]
    ABSTRACT: In this letter, a generalized Fourier transform is introduced and its corresponding generalized Poisson summation formula is derived. For discrete, Fourier based, signal processing, this formula shows that a special form of control on the periodic repetitions that occur due to sampling in the reciprocal domain is possible. The present paper is focused on the derivation and analysis of a weighted circular convolution theorem. We use this specific result to compute linear convolutions in the generalized Fourier domain, without the need of zero-padding. This results in faster, more resource-efficient computations. Other techniques that achieve this have been introduced in the past using different approaches. The newly proposed theory however, constitutes a unifying framework to the methods previously published.
    IEEE Signal Processing Letters 10/2011; · 1.39 Impact Factor
  • Source
    Conference Proceeding: Spectral magnitude minimum mean-square error binary masks for DFT based speech enhancement
    J. Jensen, R.C. Hendriks
    [show abstract] [hide abstract]
    ABSTRACT: Originally, ideal binary mask (idbm) techniques have been used as a tool for studying aspects of the auditory system. More recently, idbm techniques have been adapted to the practical problem of retrieving a target speech signal from a noisy observation. In this practical setting, the biliary mask techniques show similarities with existing DFT based speech enhancement techniques. In this context, we derive single-channel, binary mask estimators which minimize the spectral magnitude mean-square error. We show in simulation experiments with natural speech and noise signals that the proposed estimators perform significantly better than existing binary mask estimators. However, even the best of the proposed estimators is clearly out performed by non-binary estimators, both in terms of speech quality and intelligibility.
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on; 06/2011 · 4.63 Impact Factor
  • Source
    Conference Proceeding: A short-time objective intelligibility measure for time-frequency weighted noisy speech
    C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen
    [show abstract] [hide abstract]
    ABSTRACT: Existing objective speech-intelligibility measures are suitable for several types of degradation, however, it turns out that they are less appropriate for methods where noisy speech is processed by a time-frequency (TF) weighting, e.g., noise reduction and speech separation. In this paper, we present an objective intelligibility measure, which shows high correlation (rho=0.95) with the intelligibility of both noisy, and TF-weighted noisy speech. The proposed method shows significantly better performance than three other, more sophisticated, objective measures. Furthermore, it is based on an intermediate intelligibility measure for short-time (approximately 400 ms) TF-regions, and uses a simple DFT-based TF-decomposition. In addition, a free Matlab implementation is provided.
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on; 04/2010 · 4.63 Impact Factor
  • Source
    Conference Proceeding: MMSE based noise PSD tracking with low complexity
    R.C. Hendriks, R. Heusdens, J. Jensen
    [show abstract] [hide abstract]
    ABSTRACT: Most speech enhancement algorithms heavily depend on the noise power spectral density (PSD). Because this quantity is unknown in practice, estimation from the noisy data is necessary. We present a low complexity method for noise PSD estimation. The algorithm is based on a minimum mean-squared error estimator of the noise magnitude-squared DFT coefficients. Compared to minimum statistics based noise tracking, segmental SNR and PESQ are improved for non-stationary noise sources with 1 dB and 0.25 MOS points, respectively. Compared to recently published algorithms, similar good noise tracking performance is obtained, but at a computational complexity that is in the order of a factor 40 lower.
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on; 04/2010 · 4.63 Impact Factor
  • Source
    Conference Proceeding: On linear versus non-linear magnitude-DFT estimators and the influence of super-Gaussian speech priors
    R.C. Hendriks, R. Heusdens
    [show abstract] [hide abstract]
    ABSTRACT: Although the linear mean-squared error (MSE) complex-DFT estimator, i.e., the Wiener filter, is well-known, its magnitude-DFT (MDFT) counterpart has never been considered in the context of speech enhancement. Therefore, certain theoretical questions regarding MDFT estimators remained unanswered. For example, it is unknown to which extend the performance of existing MSE MDFT estimators depends on the chosen speech prior, or on the non-linearity of the estimators. In this paper we present linear MSE MDFT estimators for speech enhancement. In contrast to the linear complex-DFT estimator, the presented linear MSE MDFT estimators do depend on the assumed distribution of the speech DFT coefficients. Based on objective and subjective experiments, it can be concluded that the chosen speech prior, i.e., Gaussian versus super-Gaussian has a significant effect on the performance of MDFT estimators, while the linearity as compared to non-linearity has only a minor influence.
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on; 04/2010 · 4.63 Impact Factor
  • Source
    Conference Proceeding: On robustness of multi-channel minimum mean-squared error estimators under super-Gaussian priors
    R.C. Hendriks, R. Heusdens, J. Jensen
    [show abstract] [hide abstract]
    ABSTRACT: The use of microphone arrays in speech enhancement applications offer additional features, like directivity, over the classical single-channel speech enhancement algorithms. An often used strategy for multi-microphone noise reduction is to apply the multi-channel Wiener filter, which is often claimed to be mean-squared error optimal. However, this is only true if the estimator is constrained to be linear, or, if the speech and noise process are assumed to be Gaussian. Based on histograms of speech DFT coefficients it can be argued that optimal multi-channel minimum mean-squared error (MMSE) estimators should be derived under super-Gaussian speech priors instead. In this paper we investigate the robustness of these estimators when the steering vector is affected by estimation errors. Further, we discuss the sensitivity of the estimators when the true underlying distribution of speech DFT coefficients deviates from the assumed distribution.
    Applications of Signal Processing to Audio and Acoustics, 2009. WASPAA '09. IEEE Workshop on; 11/2009
  • Source
    Article: On Optimal Multichannel Mean-Squared Error Estimators for Speech Enhancement
    R.C. Hendriks, R. Heusdens, U. Kjems, J. Jensen
    [show abstract] [hide abstract]
    ABSTRACT: In this letter we present discrete Fourier transform (DFT) domain minimum mean-squared error (MMSE) estimators for multichannel noise reduction. The estimators are derived assuming that the clean speech magnitude DFT coefficients are generalized-Gamma distributed. We show that for Gaussian distributed noise DFT coefficients, the optimal filtering approach consists of a concatenation of a minimum variance distortionless response (MVDR) beamformer followed by well-known single-channel MMSE estimators. The multichannel Wiener filter follows as a special case of the presented MSE estimators and is in general suboptimal. For non-Gaussian distributed noise DFT coefficients the resulting spatial filter is in general nonlinear with respect to the noisy microphone signals and cannot be decomposed into an MVDR beamformer and a post-filter.
    IEEE Signal Processing Letters 11/2009; · 1.39 Impact Factor
  • Source
    Conference Proceeding: Fast noise PSD estimation with low complexity
    R.C. Hendriks, R. Heusdens, J. Jensen, U. Kjems
    [show abstract] [hide abstract]
    ABSTRACT: Although noise PSD estimation is a crucial part of noise reduction algorithms, most noise PSD estimators have problems in tracking non-stationary noise sources. Recently, a noise PSD estimator based on DFT-subspace decompositions was proposed, which improves estimation of the PSD of such noise sources. However, as this approach is based on eigenvalue decompositions per DFT bin, it might be too computationally demanding for low-complexity applications like hearing aids. In this paper we present a method with similar noise tracking performance as the DFT-subspace approach, but with low computational costs. This method is based on computation of high resolution perodiograms, and can estimate the noise PSD when both speech and noise are present in a frequency bin. When combined with a complete noise reduction system, the proposed method can lead to an improvement for non-stationary noise sources of more than 1 dB segmental SNR and 0.3 on a PESQ scale, compared to standard noise tracking methods such as minimum statistics and the quantile based approach, while computational complexity is in the same order of magnitude.
    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on; 05/2009 · 4.63 Impact Factor
  • Source
    Conference Proceeding: Comparison of complex-DFT estimators with and without the independence assumption of real and imaginary parts
    R.C. Hendriks, J.S. Erkelens, R. Heusdens
    [show abstract] [hide abstract]
    ABSTRACT: MMSE estimators for DFT-domain based single-microphone speech enhancement can broadly be classified in those that estimate the complex-DFT coefficients and those that estimate the DFT magnitudes. Existing complex-DFT MMSE estimators have generally been derived under assumptions that are in conflict with measured histograms and that are inconsistent with the assumptions made to derive DFT magnitude estimators. Recently it has been shown that these inconsistencies can be eliminated, i.e., no independency has to be assumed between real and imaginary parts of DFT coefficients if the phase of DFT coefficients is assumed uniformly distributed. In this paper we discuss the assumptions that underlie the different complex-DFT estimators and show that the uniform phase assumption matches actual speech data. Furthermore, we show experimentally that the estimators without the independence assumption lead to a lower mean-square error.
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on; 05/2008 · 4.63 Impact Factor
  • Source
    Article: Noise Tracking Using DFT Domain Subspace Decompositions
    R.C. Hendriks, J. Jensen, R. Heusdens
    [show abstract] [hide abstract]
    ABSTRACT: All discrete Fourier transform (DFT) domain-based speech enhancement gain functions rely on knowledge of the noise power spectral density (PSD). Since the noise PSD is unknown in advance, estimation from the noisy speech signal is necessary. An overestimation of the noise PSD will lead to a loss in speech quality, while an underestimation will lead to an unnecessary high level of residual noise. We present a novel approach for noise tracking, which updates the noise PSD for each DFT coefficient in the presence of both speech and noise. This method is based on the eigenvalue decomposition of correlation matrices that are constructed from time series of noisy DFT coefficients. The presented method is very well capable of tracking gradually changing noise types. In comparison to state-of-the-art noise tracking algorithms the proposed method reduces the estimation error between the estimated and the true noise PSD. In combination with an enhancement system the proposed method improves the segmental SNR with several decibels for gradually changing noise types. Listening experiments show that the proposed system is preferred over the state-of-the-art noise tracking algorithm.
    IEEE Transactions on Audio Speech and Language Processing 04/2008; · 1.50 Impact Factor
  • Source
    Article: On the Estimation of Complex Speech DFT Coefficients Without Assuming Independent Real and Imaginary Parts
    J.S. Erkelens, R.C. Hendriks, R. Heusdens
    [show abstract] [hide abstract]
    ABSTRACT: This letter considers the estimation of speech signals contaminated by additive noise in the discrete Fourier transform (DFT) domain. Existing complex-DFT estimators assume independency of the real and imaginary parts of the speech DFT coefficients, although this is not in line with measurements. In this letter, we derive some general results on these estimators, under more realistic assumptions. Assuming that speech and noise are independent, speech DFT coefficients have uniform phase, and that noise DFT coefficients have a Gaussian density, we show theoretically that the spectral gain function for speech DFT estimation is real and upper-bounded by the corresponding gain function for spectral magnitude estimation. We also show that the minimum mean-square error (MMSE) estimator of the speech phase equals the noisy phase. No assumptions are made about the distribution of the speech spectral magnitudes. Recently, speech spectral amplitude estimators have been derived under a generalized-Gamma amplitude distribution. As an example, we will derive the corresponding complex-DFT estimators, without making the independence assumption.
    IEEE Signal Processing Letters 02/2008; · 1.39 Impact Factor
  • Source
    Article: Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors
    [show abstract] [hide abstract]
    ABSTRACT: This paper considers techniques for single-channel speech enhancement based on the discrete Fourier transform (DFT). Specifically, we derive minimum mean-square error (MMSE) estimators of speech DFT coefficient magnitudes as well as of complex-valued DFT coefficients based on two classes of generalized gamma distributions, under an additive Gaussian noise assumption. The resulting generalized DFT magnitude estimator has as a special case the existing scheme based on a Rayleigh speech prior, while the complex DFT estimators generalize existing schemes based on Gaussian, Laplacian, and Gamma speech priors. Extensive simulation experiments with speech signals degraded by various additive noise sources verify that significant improvements are possible with the more recent estimators based on super-Gaussian priors. The increase in perceptual evaluation of speech quality (PESQ) over the noisy signals is about 0.5 points for street noise and about 1 point for white noise, nearly independent of input signal-to-noise ratio (SNR). The assumptions made for deriving the complex DFT estimators are less accurate than those for the magnitude estimators, leading to a higher maximum achievable speech quality with the magnitude estimators.
    IEEE Transactions on Audio Speech and Language Processing 09/2007; · 1.50 Impact Factor
  • Source
    Article: An MMSE Estimator for Speech Enhancement Under a Combined Stochastic–Deterministic Speech Model
    R. C. Hendriks, R. Heusdens, J. Jensen
    [show abstract] [hide abstract]
    ABSTRACT: Although many discrete Fourier transform (DFT) domain-based speech enhancement methods rely on stochastic models to derive clean speech estimators, like the Gaussian and Laplace distribution, certain speech sounds clearly show a more deterministic character. In this paper, we study the use of a deterministic model in combination with the well-known stochastic models for speech enhancement. We derive a minimum mean-square error (MMSE) estimator under a combined stochastic-deterministic speech model with speech presence uncertainty and show that for different distributions of the DFT coefficients the combined stochastic-deterministic speech model leads to improved performance of approximately 0.8 dB segmental signal-to-noise ratio (SNR) over the use of a stochastic model alone. Evaluation with perceptual evaluation of speech quality (PESQ) shows performance improvements of approximately 0.15 on an MOS scale
    IEEE Transactions on Audio Speech and Language Processing 03/2007; · 1.50 Impact Factor
  • Source
    Article: Adaptive Time Segmentation for Improved Speech Enhancement
    R.C. Hendriks, R. Heusdens, J. Jensen
    [show abstract] [hide abstract]
    ABSTRACT: Single-channel enhancement algorithms are widely used to overcome the degradation of noisy speech signals. Speech enhancement gain functions are typically computed from two quantities, namely, an estimate of the noise power spectrum and of the noisy speech power spectrum. The variance of these power spectral estimates degrades the quality of the enhanced signal and smoothing techniques are, therefore, often used to decrease the variance. In this paper, we present a method to determine the noisy speech power spectrum based on an adaptive time segmentation. More specifically, the proposed algorithm determines for each noisy frame which of the surrounding frames should contribute to the corresponding noisy power spectral estimate. Further, we demonstrate the potential of our adaptive segmentation in both maximum likelihood and decision direction-based speech enhancement methods by making a better estimate of the a priori signal-to-noise ratio (SNR) xi. Objective and subjective experiments show that an adaptive time segmentation leads to significant performance improvements in comparison to the conventionally used fixed segmentations, particularly in transitional regions, where we observe local SNR improvements in the order of 5 dB
    IEEE Transactions on Audio Speech and Language Processing 12/2006; · 1.50 Impact Factor
  • Source
    Conference Proceeding: Speech Enhancement Under a Combined Stochastic-Deterministic Model
    R.C. Hendriks, R. Heusdens, J. Jensen
    [show abstract] [hide abstract]
    ABSTRACT: Most DFT domain based enhancement methods rely on stochastic models to derive clean speech estimators. In this paper we investigate the use of a deterministic speech model and present an MMSE estimator under a combined stochastic-deterministic speech model. Experimental results show an increase in segmental SNR of 1.18 dB, compared to the use of a stochastic model alone. Furthermore, PESQ evaluations lead to an increase of 0.3 on the MOS scale. Listening tests show a preference for the proposed MMSE estimator under combined stochastic-deterministic speech model
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on; 06/2006 · 4.63 Impact Factor
  • Source
    Conference Proceeding: Adaptive Time Segmentation of Noisy Speech for Improved Speech Enhancement
    R.C. Hendriks, R. Heusdens, J. Jensen
    [show abstract] [hide abstract]
    ABSTRACT: Not Available
    Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on; 02/2005
  • Source
    Conference Proceeding: Perceptual linear predictive noise modelling for sinusoid-plus-noise audio coding
    R.C. Hendriks, R. Heusdens, J. Jensen
    [show abstract] [hide abstract]
    ABSTRACT: Sinusoidal coding of an audio subject to a bit-rate constraint, in general, results in a noise-like residual signal. This residual signal is of high perceptual importance; reconstruction of audio using the sinusoidal representation only typically results in an artificial sounding reconstruction. We present a new method, called perceptual linear predictive coding (PLPC), where the residual is encoded by applying LPC in the perceptual domain. This method minimizes a perceptual modelling error and therefore represents only residual components that are of perceptual relevance, while automatically discarding components masked by the sinusoidally coded part. Subjective listening tests show that PLPC performs significantly better than ordinary LPC as a sinusoidal residual coding technique. Furthermore, PLPC combined with a flexible segmentation and model order allocation algorithm leads to a significant gain in terms of R/D performance for fragments with fast changing characteristics.
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on; 06/2004 · 4.63 Impact Factor
  • Article: Low Complexity DFT-Domain Noise PSD Tracking Using High-Resolution Periodograms
    R. Heusdens, R.C. Hendriks, J. Jensen, U. Kjems
    [show abstract] [hide abstract]
    ABSTRACT: Although most noise reduction algorithms are critically dependent on the noise power spectral density (PSD), most procedures for noise PSD estimation fail to obtain good estimates in nonstationary noise conditions. Recently, a DFT-subspace-based method was proposed which improves noise PSD estimation under these conditions. However, this approach is based on eigenvalue decompositions per DFT bin, and might be too computationally demanding for low-complexity applications like hearing aids. In this paper we present a noise tracking method with low complexity, but approximately similar noise tracking performance as the DFT-subspace approach. The presented method uses a periodogram with resolution that is higher than the spectral resolution used in the noise reduction algorithm itself. This increased resolution enables estimation of the noise PSD even when speech energy is present at the time-frequency point under consideration. This holds in particular for voiced type of speech sounds which can be modelled using a small number of complex exponentials.
    EURASIP Journal on Advances in Signal Processing, Volume 2009 (2009), Article ID 925870, 16 pages.
  • Article: Advances in DFT-Based Single-Microphone Speech Enhancement
    R.C. Hendriks
    [show abstract] [hide abstract]
    ABSTRACT: The interest in the field of speech enhancement emerges from the increased usage of digital speech processing applications like mobile telephony, digital hearing aids and human-machine communication systems in our daily life. The trend to make these applications mobile increases the variety of potential sources for quality degradation. Speech enhancement methods can be used to increase the quality of these speech processing devices and make them more robust under noisy conditions. The name "speech enhancement" refers to a large group of methods that are all meant to improve certain quality aspects of these devices. Examples of speech enhancement algorithms are echo control, bandwidth extension, packet loss concealment and noise reduction. In this thesis we focus on single-microphone additive noise reduction and aim at methods that work in the discrete Fourier transform (DFT) domain. The main objective of the presented research is to improve on existing single-microphone schemes for an extended range of noise types and noise levels, thereby making these methods more suitable for mobile speech communication applications than state-of-the-art algorithms. The research topics in this thesis are three-fold. At first, we focus on improved estimation of the a priori signal-to-noise ratio (SNR) from the noisy speech. We focus on two aspects of a priori SNR estimation. Firstly, we present an adaptive time-segmentation algorithm, which we use to reduce the variance of the estimated a priori SNR. Secondly, an approach is presented to reduce the bias of the estimated a priori SNR, which is often present during transitions between speech sounds. Secondly, we investigate the derivation of clean speech estimators under models that take properties of speech into account. This problem is approached from two different angles. At first, we consider the derivation of clean speech estimators under the use of a combined stochastic/deterministic model for the complex DFT coefficients. The use of a deterministic model is based on the fact that certain speech sounds have a more deterministic character. Secondly, we focus on the derivation of complex DFT and magnitude DFT estimators under super-Gaussian densities. Derivation of clean speech estimators under these types of densities is based on measured histograms of speech DFT coefficients. We present two different type of estimators under super-Gaussian densities. Minimum mean-square error (MMSE) estimators are derived under a generalized Gamma density for the clean speech DFT coefficients and DFT magnitudes. Maximum a posteriori (MAP) estimators are derived under the multivariate normal inverse Gaussian (MNIG) density for the clean speech DFT coefficients. Estimators derived under the MNIG density have some theoretical advantages over estimators derived under the generalized Gamma density. More specifically, under the MNIG density the statistical models in the complex DFT and the polar domain are consistent, which is not the case for estimators derived under the generalized Gamma density. In addition, the MNIG density can model vector processes, which allows for taking into account the dependency between the real and imaginary part of DFT coefficients. Finally, we developed a method for tracking of the noise power spectral density (PSD). The developed method is based on the eigenvalue decomposition of correlation matrices that are constructed from time series of noisy DFT coefficients. This approach makes it possible, in contrast to existing methods, to update the noise PSD when speech is continuously present. Furthermore, the tracking delay is considerably reduced compared to state-of-the-art noise tracking algorithms. A comparison is performed between a combination of individual components presented in this thesis and a state-of-the-art speech enhancement system from literature. Subjective experiments by means of a listening test show that the system based on contributions of this thesis improves significantly over the state-of-the-art speech enhancement system.