ArticlePublisher preview available

Speech dereverberation and source separation using DNN-WPE and LWPR-PCA

Authors:
  • ROHINI COLLEGE OF ENGINEERING AND TECHNOLOGY
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Speech signals observed from distantly placed microphones may have some acoustic interference, such as noise and reverberation. These may lead to the degradation of the quality of blind speech. Hence, it is necessary to process the acquired speech signals to separate the blind source and eliminate the reverberation. Therefore, we proposed a novel speech separation and dereverberation method, which is based on the incorporation of Locally Weighted Projection Regression (LWPR)-based Principal Component Analysis (PCA) and Deep Neural Network (DNN)-based Weighted Prediction Error (WPE). The proposed method preprocesses the mixed reverberant signal prior to the application of Blind Source Separation (BSS) and Blind Dereverberation (BD). The preprocessing of the input sample signals is performed with the exploitation of fast Fourier transform (FFT) and whitening approaches to convert the time domain signal into frequency domain signal and to generate the transformation matrices. Besides, the utilization of LWPR-PCA can perform the BSS and the DNN-WPE can be used to conduct the BD. Moreover, the experimental analysis of our proposed method is compared with the existing RPCA-SNMF, CBF, BA-CNMF, AFMNMF, and ISC-LPKF approaches. The experimental outcomes depict that the proposed method effectively separates the original signal from the mixed reverberant signals.
This content is subject to copyright. Terms and conditions apply.
ORIGINAL ARTICLE
Speech dereverberation and source separation using DNN-WPE
and LWPR-PCA
Jasmine J. C. Sheeja
1
B. Sankaragomathi
2
Received: 7 April 2021 / Accepted: 22 September 2022 / Published online: 8 January 2023
The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023
Abstract
Speech signals observed from distantly placed microphones may have some acoustic interference, such as noise and
reverberation. These may lead to the degradation of the quality of blind speech. Hence, it is necessary to process the
acquired speech signals to separate the blind source and eliminate the reverberation. Therefore, we proposed a novel speech
separation and dereverberation method, which is based on the incorporation of Locally Weighted Projection Regression
(LWPR)-based Principal Component Analysis (PCA) and Deep Neural Network (DNN)-based Weighted Prediction Error
(WPE). The proposed method preprocesses the mixed reverberant signal prior to the application of Blind Source Separation
(BSS) and Blind Dereverberation (BD). The preprocessing of the input sample signals is performed with the exploitation of
fast Fourier transform (FFT) and whitening approaches to convert the time domain signal into frequency domain signal and
to generate the transformation matrices. Besides, the utilization of LWPR-PCA can perform the BSS and the DNN-WPE
can be used to conduct the BD. Moreover, the experimental analysis of our proposed method is compared with the existing
RPCA-SNMF, CBF, BA-CNMF, AFMNMF, and ISC-LPKF approaches. The experimental outcomes depict that the
proposed method effectively separates the original signal from the mixed reverberant signals.
Keywords Locally Weighted Projection Regression (LWPR) Blind Source Separation (BSS) Dereverberation
Reverberation PCA Speech signals
1 Introduction
The sound signals recorded by exploiting microphones are
usually mixed with unwanted signals such as noise,
reverberation [1], and interferences [2]. Of this reverbera-
tion is the distraction that happens in the source signal
while transmitting from the source to the destination
through different paths with variations in length and
attenuations. Meanwhile, the signal with a 50-ms delay
(reverberation) is acceptable for human perception and
Automatic Speech Recognition (ASR). However, with
further delay, both signals are immediately distracted [1].
Besides the noise in the signal can be appended depending
upon the position of the microphones. The noise increases
with the distance between the speakers and the micro-
phones. Consequently, interference is the addition of
unwanted signals while moving from the source to the
destination.
To separate the source signal several methods have been
designed such as independent component analysis [3],
independent vector analysis [4], spatial clustering-based
time–frequency masking, and beamforming [5]. However,
Blind Source Separation (BSS) is carried out to reduce the
detrimental signals from the signals that are obtained from
the source. Moreover, it also separates the source signals
from the mixture without any prior knowledge. Besides, a
weighted prediction error minimization (WPE) [6] method
has been adopted by many researchers for the blind dere-
verberation technique.
In recent times, several methods are utilized by
researchers in order to jointly optimize the BSS and blind
dereverberation techniques [7]. Moreover, these methods
also exploit multi-input and multi-output (MIMO)
&Jasmine J. C. Sheeja
jasminejcsheeja@gmail.com
1
Department of ECE, Rohini College of Engineering and
Technology, Palkulam, Kanyakumari, India
2
Department of Biomedical Engineering, Sri Sakthi Institue of
Engineering and Technology, Coimbatore, India
123
Neural Computing and Applications (2023) 35:7339–7356
https://doi.org/10.1007/s00521-022-07884-0(0123456789().,-volV)(0123456789().,-volV)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... As seen in Table 4, WOA-BSS results show that the polarization of the sources is an essential indicator of the performance of the Whale Optimization Algorithm. Significantly, three fundamental measures, the SDR, ISR, and SAR indices, show much higher noise reduction results for the enhanced dataset than the original (Sheeja and Sankaragomathi, 2023). For example, it inferred that the sound values of sensor 4 decreased a lot while those of sensor 5 rose sharply, showing the algorithm's good productivity in data enhancement. ...
Article
Full-text available
Particularly, environmental pollution, such as air pollution, is still a significant issue of concern all over the world and thus requires the identification of good models for prediction to enable management. Blind Source Separation (BSS), Copula functions, and Long Short-Term Memory (LSTM) network integrated with the Greylag Goose Optimization (GGO) algorithm have been adopted in this research work to improve air pollution forecasting. The proposed model involves preprocessed data from the urban air quality monitoring dataset containing complete environmental and pollutant data. The application of Noise Reduction and Isolation techniques involves the use of methods such as Blind Source Separation (BSS). Using copula functions affords an even better estimate of the dependence structure between the variables. Both the BSS and Copula parameters are then estimated using GGO, which notably enhances the performance of these parameters. Finally, the air pollution levels are forecasted using a time series employing LSTM networks optimized by GGO. The results reveal that GGO-LSTM optimization exhibits the lowest mean squared error (MSE) compared to other optimization methods of the proposed model. The results underscore that certain aspects, such as noise reduction, dependence modeling and optimization of parameters, provide much insight into air quality. Hence, this integrated framework enables a proper approach to monitoring the environment by offering planners and policymakers information to help in articulating efficient environment air quality management strategies.
... For this challenging speech augmentation challenge, Convolutional Neural Network (CNN) based models are provided in particular since to their parameter effectiveness and state-of-the-art performance. Sheeja et al. [13] developed a novel approach to voice separation and dereverberation using Principal Component Analysis (PCA) based on Locally Weighted Projection Regression (LWPR) and Weighted Prediction Error (WPE) based on a Deep Neural Network (DNN), The technique uses Blind Source Separation (BSS) as well as Blind Dereverberation (BD) after the preprocessing of the reverberant signal, resulting in a mixture of sources. BSS and BD are abbreviations for blind source separation and blind dereverberation, respectively. ...
... Для вирішення задачі видалення шуму та дереверберації записаних мовних сигналів в роботі [8] пропонується метод, який об'єднує локальну зважену регресію та зважені помилки прогнозування. Разом з тим цей метод не апробовано для ситуації, коли при записі мовних сигналів є стороннє джерело шуму різної природи. ...
Article
Full-text available
A microphone positioned far away observes speech signals with little acoustic interference, in terms of both reverberation and noise. As a result, the quality of blind speech degrades, blind source separation (BSS) from obtained speech samples and blind reverberation (BD) removal are the most challenging issues. The BSS and BD were examined separately in the previous studies. This study proposed a novel approach for both BD and BSS. Based on the discrete Fourier transform (DFT), the time-domain signals are converted into equal frequency-domain signals by adopting fast Fourier transform. The lightweight Convolutional Neural Network (CNN)-based Quantum Teaching–Learning-Based Optimization (QTLBO) called as lightweight CNN-QTLBO algorithm effectively removes the dereverberation prior to the BSS. Next, we applied Principal Component Discriminant Power-based Linear Discriminant Analysis (PCDP-LDA) for blind source separation. From the comparative results, the proposed technique demonstrated better results in terms of direct-to-reverberation ratio (DRR), signal-to-interference ratio (SIR), and target-to-interference ratio (TIR) than other existing techniques. From the mixed reverberant signals, the proposed techniques accurately separate the original signals.
Article
Full-text available
In order to resolve engineering problems that the performance of the traditional blind source separation (BSS) methods deteriorates or even becomes invalid when the unknown source signals are interfered by impulse noise with a low signal-to-noise ratio (SNR), a more effective and robust BSS method is proposed. Based on dual-parameter variable tailing (DPVT) transformation function, moving average filtering (MAF), and median filtering (MF), a filtering system that can achieve noise suppression in an impulse noise environment is proposed, noted as MAF-DPVT-MF. A hybrid optimization objective function is designed based on the two independence criteria to achieve more effective and robust BSS. Meanwhile, combining quantum computation theory with slime mould algorithm (SMA), quantum slime mould algorithm (QSMA) is proposed and QSMA is used to solve the hybrid optimization objective function. The proposed method is called BSS based on QSMA (QSMA-BSS). The simulation results show that QSMA-BSS is superior to the traditional methods. Compared with previous BSS methods, QSMA-BSS has a wider applications range, more stable performance, and higher precision.
Article
Full-text available
This paper describes a time-varying extension of independent vector analysis (IVA) based on the normalizing flow (NF), called NF-IVA, for determined blind source separation of multichannel audio signals. As in IVA, NF-IVA estimates demixing matrices that transform mixture spectra to source spectra in the complex-valued spatial domain such that the likelihood of those matrices for the mixture spectra is maximized under some non-Gaussian source model. While IVA performs a time-invariant bijective linear transformation, NF-IVA performs a series of time-varying bijective linear transformations (flow blocks) adaptively predicted by neural networks. To regularize such transformations, we introduce a soft volume-preserving (VP) constraint. Given mixture spectra, the parameters of NF-IVA are optimized by gradient descent with backpropagation in an unsupervised manner. Experimental results show that NF-IVA successfully performs speech separation in reverberant environments with different numbers of speakers and microphones and that NF-IVA with the VP constraint outperforms NF-IVA without it, standard IVA with iterative projection, and improved IVA with gradient descent.
Article
In space-based Automatic Identification Systems (AIS), due to high satellite orbits, several Ad Hoc cells within the observation range of the satellite are vulnerable to interference by an external signal. To increase efficiency in target detection and improve system security, a blind source separation method is adopted for processing the conflicting signals received by satellites. Compared to traditional methods, we formulate the separation problem as a clustering problem. Since our algorithm is affected by the sparse-ness of source signals, to get satisfactory results, our algorithm assumes that the distance between two arbitrary mixed-signal vectors is less than the doubled sum of variances of distribution of the corresponding mixtures. Signal sparsity is overcome by computing the Short-Time Fourier Transform, and the mixed source signals are separated using the improved PSO clustering. We evaluated the performance and the robustness of the proposed network architecture by several simulations. The experimental results demonstrate the effectiveness of the proposed method in not only improving satellite signal receiving ability but also in enhancing space-based AIS security.
Article
In this paper, we propose a fast time-frequency mask technique that relies on the sparseness of source signals for blind source separation (BSS) to separate a mixture of two input sounds in a single signal automatically. Due to the sparseness of source signals, the signal can be distinguished when it is transformed into the time-frequency domain. Most previous methods did not mention the effect of different angles on accuracy. To overcome such problems, we first define two features which are normalized level-ratio and phase-difference. Next, we use our method to decrease the variance of Direction of Arrival (DOA). This can reduce the variance of features so that it can reduce the iterations of k-means. Finally, a mask is generated according to the clustered features. Our method does not require any prior information or parameter estimation. The motivation of the proposed design is to incorporate the BSS system with some smart voice appliances. In the application scenario, all the non-human voices may appear and regard as interference. We use Signal to Distortion Ratio (SDR) and Signal to Interference Ratio (SIR) to make some comparison. Based on the proposed system, then we present a hardware design. We use the TSMC 90-nm CMOS process. As a cost-effective result, it consumes about 120 K gates and executes with a frequency of 10 MHz. The power consumption is only 2.92 mW with low power design considerations.
Article
Telecommunications systems with Multi-Input Multi-Output (MIMO) structure using Orthogonal Frequency Division Modulation (OFDM) has a great potential of efficient application to a network of Internet of Things (IoT) of a high data rate. When the IoT network is amongst the underwater sensory devices known as the Internet of Underwater Things (IoUT), the electromagnetic wave can not play the role of baseband signal due to rapid fall off inside the water. Thus, Acoustic OFDM is a reliable replacement for conventional OFDM inside the water. A blind structure for MIMO acoustic OFDM using Independent Component Analysis (ICA) brings even further advantages in data rate and energy consumption by avoiding the required pilot and preamble data. This research work presents a blind MIMO Acoustic OFDM blind transceiver for IoUT based on Probabilistic Stone's Blind Source Separation (PS-BSS). The proposed technique has multiple times lower complexity compared to the ICA-based technique while maintaining a comparable efficiency. As observed in the results carried out one hundred Monte Carlo runs of transmission random data bits, over a highly sparse channel that is the common case of an underwater environment the proposed PS-BSS-based technique dominates the ICA-based one, and as the sparseness of the channel decreases its efficiency is comparable to ICA-based technique. Thus in the case of a high sparse channel, the proposed technique is superior in both aspects of efficiency and complexity while over lower sparseness due to its comparative efficiency it can be hired as an optimum technique fulfilling a fair trade-off between efficiency and complexity.