About
428
Publications
77,105
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
15,115
Citations
Introduction
Updated information (including publications and software) is available on:
https://israelcohen.com
Current institution
Publications
Publications (428)
Deep learning has revolutionized speech enhancement, enabling impressive high-quality noise reduction and dereverberation. However, state-of-the-art methods often demand substantial computational resources, hindering their deployment on edge devices and in real-time applications. Computationally efficient approaches like deep filtering and Deep Fil...
We consider the problem of estimating the direction-of-arrival (DoA) of a desired source located in a known region of interest in the presence of interfering sources and multipath. We propose an approach that precedes the DoA estimation and relies on generating a set of reference steering vectors. The steering vectors' generative model is a free sp...
In this paper, we examine the use of the Single-Sideband Transform (SSBT) for convolutive beamformers. We explore its unique properties and implications for beamformer design. Our study sheds light on the tradeoffs involved in using the SSBT in beamforming applications, offering insights into both its strengths and limitations. Despite the advantag...
T1 mapping is a valuable quantitative MRI technique for diagnosing diffuse myocardial diseases. Traditional methods, relying on breath-hold sequences and echo triggering, face challenges with patient compliance and arrhythmias, limiting their effectiveness. Image registration can enable motion-robust T1 mapping, but inherent intensity differences b...
The success of large pretrained models in natural language processing (NLP) and computer vision (CV) has opened new avenues for constructing foundation models for time series forecasting (TSF). Traditional TSF foundation models rely heavily on numerical data fitting. In contrast, the human brain is inherently skilled at processing visual informatio...
In the realm of automatic speech recognition (ASR), robustness in noisy environments remains a significant challenge. Recent ASR models, such as Whisper, have shown promise, but their efficacy in noisy conditions can be further enhanced. This study is focused on recovering from packet loss to improve the word error rate (WER) of ASR models. We prop...
Real-time source localization is crucial for high-end automation and artificial intelligence (AI) products. However, a low signal-to-noise ratio (SNR) and limited processing time can reduce localization accuracy. This work proposes a new architecture for a time-domain feedback-based beamformer that meets real-time processing demands. The main objec...
This paper presents a unified model for combining beamforming and blind source separation (BSS). The validity of the model's assumptions is confirmed by recovering target speech information in noise accurately using Oracle information. Using real static human-robot interaction (HRI) data, the proposed combination of BSS with the minimum-variance di...
This work introduces a new architecture for a time-domain feedback-based beamformer designed to meet real-time processing demands. The primary goal of this design is to locate reflective sources by estimating their direction of arrival (DOA) and signal range. Incorporating a feedback mechanism in this architecture is a unique aspect that refines lo...
This paper introduces a new technique for automatic modulation classification (AMC) in Cognitive Radio (CR) networks. The method employs a straightforward classifier that utilizes high-order cumulant for training. It focuses on the statistical behavior of both analog modulation and digital schemes, which have received limited attention in previous...
In-car speech communication is particularly challenging due to environmental noise. The speaker's microphone also acquires car and road noises, resulting in a low signal-to-noise ratio and persistent frequency-howls that do not decrease, which degrade the system's output sound quality. In this paper, we address the problem of howling control for in...
A hybrid approach is proposed to efficiently design a constant elevation-beamwidth beamforming with concentric ring arrays (CRAs). The design exploits the degrees of freedom of the array geometry for superior performance. In particular, the ring radii and the beamformer coefficients are optimized simultaneously for all frequencies. We introduce a c...
We introduce a user-centric residual-echo suppression (URES) framework in double-talk. This framework receives a user operating point (UOP) that consists of two metric values: the residual echo suppression level (RESL) and the desired speech-maintained level (DSML) that the user expects from the RES outcome. Then, the URES pipeline undergoes three...
Incremental speech enhancement (ISE), with the ability to incrementally adapt to new noise domains, represents a critical yet comparatively under-investigated topic. While the regularization-based method has been proposed to solve the ISE task, it usually suffers from the dilemma wherein the gain of one domain directly entails the loss of another....
This paper presents a robust microphone array beamforming approach specifically designed for multispeaker audio conferencing, where the directions of arrival (DOAs) of the speakers can vary. First, we address the configuration of the array geometry. To achieve a consistent spatial response across all potential directions on the
$\mathsf {x-y}$
pl...
In this paper, we present a new method for multitask learning applied to ultrasound beamforming. Beamforming is a critical component in the ultrasound image formation pipeline. Ultrasound images are constructed using sensor readings from multiple transducer elements, with each element typically capturing multiple acquisitions per frame. Hence, the...
\(T_1\) mapping is a quantitative magnetic resonance imaging (qMRI) technique that has emerged as a valuable tool in the diagnosis of diffuse myocardial diseases. However, prevailing approaches have relied heavily on breath-hold sequences to eliminate respiratory motion artifacts. This limitation hinders accessibility and effectiveness for patients...
Determining the cyclic-alternating-pattern (CAP) phases in sleep using electroencephalography (EEG) signals is crucial for assessing sleep quality. However, most current methods for CAP classification primarily rely on classical machine learning techniques, with limited implementation of deep-learning-based tools. Furthermore, these methods often r...
This paper presents a novel method for automatic modulation classification (AMC) for cognitive radio (CR) networks based on a simple classifier that is trained with high-order cumulant. The proposed method focuses on the statistical behavior of modulated signals and includes analog modulation and digital schemes, which received less attention in th...
Weighted prediction error (WPE) is a linear prediction-based method extensively used to predict and attenuate the late reverberation component of an observed speech signal. This paper introduces an extended version of the WPE method to enhance the modeling accuracy in the time–frequency domain by incorporating crossband filters. Two approaches to e...
T1 mapping is a quantitative magnetic resonance imaging (qMRI) technique that has emerged as a valuable tool in the diagnosis of diffuse myocardial diseases. However, prevailing approaches have relied heavily on breath-hold sequences to eliminate respiratory motion artifacts. This limitation hinders accessibility and effectiveness for patients who...
This paper presents a novel approach utilizing uniform rectangular arrays to design a constant-beamwidth (CB) linearly constrained minimum variance (LCMV) beamformer, which also improves white noise gain and directivity. By employing a generalization of the convolutional Kronecker product beamforming technique, we decompose a physical array into vi...
Tuberculosis (TB) has long been recognized as a significant health concern worldwide. Recent advancements in noninvasive wearable devices and machine learning (ML) techniques have enabled rapid and cost-effective testing for the real-time detection of TB. However, small datasets are often encountered in biomedical and chemical engineering domains,...
This paper presents an echo suppression system that combines a linear acoustic echo canceller (AEC) with a deep complex convolutional recurrent network (DCCRN) for residual echo suppression. The filter taps of the AEC are adjusted in subbands by using the normalized sign-error least mean squares (NSLMS) algorithm. The NSLMS is compared with the com...
Modern depth sensors are often characterized by low spatial resolution, which hinders their use in real-world applications. However, the depth map in many scenarios is accompanied by a corresponding high-resolution color image. In light of this, learning-based methods have been extensively used for guided super-resolution of depth maps. A guided su...
Acoustic echo cancellers are integrated into various speech communication devices, such as hands-free conferencing systems and speakerphones. Microphone arrays can be employed to enhance the performance of such systems, though they assume a static environment when transitioning to double-talk, and rely on double-talk detection. This work introduces...
This paper presents a Kronecker-product (KP) beamforming approach incorporating sparse concentric circular arrays (SCCAs). The locations of the microphones on the SCCA are optimized concerning the broadband array directivity over a wide range of direction-of-arrival (DOA) deviations of a desired signal. A maximum directivity factor (MDF) sub-beamfo...
In this paper, we address the problem of howling detection in speech reinforcement system applications for utilization in howling control mechanisms. A general speech reinforcement system acquires speech from a speaker’s microphone, and delivers a reinforced speech to other listeners in the same room, or another room, through loudspeakers. The amou...
Acoustic echo in full-duplex telecommunication systems is a common problem that may cause desired-speech quality degradation during double-talk periods. This problem is especially challenging in low signal-to-echo ratio (SER) scenarios, such as hands-free conversations over mobile phones when the loudspeaker volume is high. This paper proposes a tw...
Source localization is a common problem in various fields and has applications in both military and civil sectors. Localization of acoustic sources generally requires a few microphones, but it is also possible to use a single microphone and data that was prerecorded in the same environment. Unfortunately, existing single-microphone localization met...
This paper presents a multistage rectangular approach for steerable differential beamforming. As a first step, we propose employing a two-dimensional (2-D) differentiation scheme that operates independently on the columns and rows of a uniform rectangular array (URA). This yields a differentials matrix controlled by two parameters, Pc and Pr, which...
Speech quality, as evaluated by humans, is most accurately assessed by subjective human ratings. The objective acoustic echo cancellation mean opinion score (AECMOS) metric was recently introduced and achieved high accuracy in predicting human perception during double-talk. Residual-echo suppression (RES) systems, however, employ the signal-to-dist...
We review current solutions and technical challenges for automatic speech recognition, keyword spotting, device arbitration, speech enhancement, and source localization in multidevice home environments to provide context for the INTERSPEECH 2022 special session, "Challenges and opportunities for signal processing and machine learning for multiple s...
In this paper, we address the problem of constant-beamwidth beamforming using nonuniform planar arrays. We propose two techniques for designing planar beamformers that can maintain different beamwidths in the XZ and YZ planes based on constant-beamwidth linear arrays. In the first technique, we utilize Kronecker product beamforming to find the weig...
In this paper, we address the problem of dual-microphone speech reinforcement for improving in-car speech communication via howling control. A speech reinforcement system acquires speech from a speaker’s microphone and delivers it to the other listeners in the car cabin through loudspeakers. A car cabin’s small space makes it vulnerable to acoustic...
Depth information captured by affordable depth sensors is characterized by low spatial resolution, which limits potential applications. Several methods have recently been proposed for guided super-resolution of depth maps using convolutional neural networks to overcome this limitation. In a guided super-resolution scheme, high-resolution depth maps...
Residual-echo suppression (RES) systems suppress the echo and preserve the speech from a mixture of the two. In hands-free speech communication, RES may also be addressed as a source separation (SS) or speech enhancement (SE) problem , where the echo can be manipulated as an interfering speech signal. In this study, we fine-tune three pre-trained d...
We propose a general framework for adaptation control using deep neural networks (NNs) and apply it to acoustic echo cancellation (AEC). First, the optimal step-size that controls the adaptation is derived offline by solving a constrained non-linear optimization problem that minimizes the adaptive filter misadjustment. Then, a deep NN is trained to...
Reverberation, whichis caused by late reflections, impairs not only speech quality but also intelligibility. Consequently, dereverberation, a process to mitigate the impact of reverberation, has attracted significant research interests. Numerous approaches have been developed in the literature, among which the weighted-prediction-error (WPE) one ha...
Solutions for frequency-invariant beamforming with concentric circular arrays are used in many applications, including microphone arrays and audio communication. However, existing methods focus on the azimuth-beamwidth and consider uniformly spaced circular arrays. This paper presents an approach to designing a constant elevation-beamwidth beamform...
In sparse coding, we attempt to extract features of input vectors, assuming that the data is inherently structured as a sparse superposition of basic building blocks. Similarly, neural networks perform a given task by learning features of the training dataset. Recently, both data- and model-driven feature extracting methods have become extremely po...
Designing beampatterns with constant beamwidth over a wide range of frequencies is useful in many applications in speech, radar, sonar and communication. In this paper, we design constant-beamwidth beamformers for concentric ring arrays. The proposed beamformers utilize the circular geometry to provide improved beamwidth consistency compared to bea...
In this paper, we introduce an extension of the image method for generating room impulse responses in a structure with more than a single confined space, namely, the structure image method (StIM). The proposed method, StIM, can efficiently generate a large number of environmental examples for a structure impulse response, which is required by curre...
We consider the problem of carrier frequency offset estimation in OFDM underwater acoustic communication. In our previous work, we suggested transmitting equi-power and equi-spaced pilot tones which led to a simple carrier frequency offset estimator. Here, we extend this work in two directions: First, equi-power pilots may result in a large peak to...
Human subjective evaluation is optimal to assess speech quality for human perception. The recently introduced deep noise suppression mean opinion score (DNSMOS) metric was shown to estimate human ratings with great accuracy. The signal-to-distortion ratio (SDR) metric is widely used to evaluate residual-echo suppression (RES) systems by estimating...
Human subjective evaluation is optimal to assess speech quality for human perception. The recently introduced deep noise suppression mean opinion score (DNSMOS) metric was shown to estimate human ratings with great accuracy. The signal-to-distortion ratio (SDR) metric is widely used to evaluate residual-echo suppression (RES) systems by estimating...
In sparse coding, we attempt to extract features of input vectors, assuming that the data is inherently structured as a sparse superposition of basic building blocks. Similarly, neural networks perform a given task by learning features of the training data set. Recently both data-driven and model-driven feature extracting methods have become extrem...
We propose a nonlinear acoustic echo cancellation system, which aims to model the echo path from the far-end signal to the near-end microphone in two parts. Inspired by the physical behavior of modern hands-free devices, we first introduce a novel neural network architecture that is specifically designed to model the nonlinear distortions these dev...
We address voice activity detection in acoustic environments of transients and stationary noises, which often occur in real life scenarios. We exploit unique spatial patterns of speech and non-speech audio frames by independently learning their underlying geometric structure. This process is done through a deep encoder-decoder based neural network...
In this paper, we propose a residual echo suppression method using a UNet neural network that directly maps the outputs of a linear acoustic echo canceler to the desired signal in the spectral domain. This system embeds a design parameter that allows a tunable tradeoff between the desired-signal distortion and residual echo suppression in double-ta...
State-of-the-art deep-learning-based voice activity detectors (VADs) are often trained with anechoic data. However, real acoustic environments are generally reverberant, which causes the performance to significantly deteriorate. To mitigate this mismatch between training data and real data, we simulate an augmented training set that contains nearly...
We propose a nonlinear acoustic echo cancellation system, which aims to model the echo path from the far-end signal to the near-end microphone in two parts. Inspired by the physical behavior of modern hands-free devices, we first introduce a novel neural network architecture that is specifically designed to model the nonlinear distortions these dev...
This paper studies the problem of designing square differential microphone arrays (SDMAs). It presents a multistage approach, which first divides an SDMA composed of M^2 microphones into (M − 1)^2 subarrays with each subarray being a 2 × 2 square array formed by four adjacent microphones. Then, differential beamforming is performed with each subarr...
Microphone arrays combined with beamforming have been widely used to solve many important acoustic problems in a wide range of applications. Much effort has been devoted in the literature to microphone array beamforming, among which the Kronecker product beamforming method developed recently has demonstrated some interesting properties. Generally,...
In this paper, we introduce a robust approach for rectangular differential beamforming. We present a 2-D multistage spatial mean operator which operates independently on the columns and rows of the observation signals of a uniform rectangular array (URA). The multistage approach enables high flexibility: two design parameters, Qc and Qr, set the nu...
In this paper, we introduce an optimal quadratic Wiener beamformer for magnitude estimation of a desired signal. For simplicity, we focus on a two-microphone array and develop an iterative algorithm for magnitude estimation based on a quadratic multichannel noise reduction approach. We analyze two test cases, with uncorrelated and correlated noises...
In this paper, we propose a frequency-domain adaptive line enhancer (ALE) to reduce nonstationary harmonic noise, such as medical equipment beeps, from a noisy speech signal captured by a single microphone. The reduction of nonstationary noise is very challenging, with the tradeoff between noise reduction and speech distortion, often resulting with...
Solutions for frequency-invariant or constant-beamwidth beamforming with concentric circular arrays (CCAs) generally control only the azimuth-beampattern. This work introduces a beamforming methodology, which instead of designing a fixed beamwidth across the spectrum, maximizes the directivity-factor under the constraint that the beamwidth is great...
In this paper, we present a generalized approach for differential microphone array (DMA) beamforming in the short-time Fourier transform (STFT) domain. We propose a multistage beamforming approach which considers a Kronecker product (KP) decomposition of the global beamformer into two independent sub-beamformers. We derive differential KP beamforme...
In this paper, we propose a residual echo suppression method using a UNet neural network that directly maps the outputs of a linear acoustic echo canceler to the desired signal in the spectral domain. This system embeds a design parameter that allows a tunable tradeoff between the desired-signal distortion and residual echo suppression in double-ta...
In this chapter, we briefly explain how graphs work and then show how beamforming with linear difference equations can be studied from this perspective.
This chapter is a generalization of the two previous ones. We clearly show how to do beamforming with any order linear difference equations by first explaining the signal model. Then, performance measures are defined and useful fixed and adaptive beamformers are derived in this general context, which also includes conventional beamforming as a part...
This chapter is dedicated to the study of beamforming with second-order linear difference equations.
In this chapter, we briefly review the well-established conventional linear beamforming technique in the two-dimensional scenario. We start by explaining the signal model. Then, some very important performance measures in this context are defined. Finally, the most well-known (fixed and adaptive) conventional beamformers are derived.
We all know from our own previous work and other works that pressure differences among microphones is another fundamental way to looking at beamforming, especially when sensors are close to each others. Conventional beamforming does not include this important information in its formulation and in an explicit way. Although, there are different manne...
The concept of linear difference beamforming can be applied to the general filtering technique for speech enhancement. This is the purpose of this chapter. First, we explain the signal model and show how linear difference filtering works in noise reduction. Second, we derive the most important performance measures. Finally, we develop some examples...
Reverberation impairs not only the speech quality, but also intelligibility. The weighted-prediction-error (WPE) method, which estimates the late reverberation component based on a multichannel linear predictor, is by far one of the most effective algorithms for dereverberation. Generally, the WPE prediction filter in every short-time-Fourier-trans...