
Elie Laurent Benaroya- PhD
- Research Associate at Institute for Research and Coordination in Acoustics/Music
Elie Laurent Benaroya
- PhD
- Research Associate at Institute for Research and Coordination in Acoustics/Music
About
33
Publications
5,402
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,098
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (33)
Voice conversion (VC) consists of digitally altering the voice of an individual to manipulate part of its content, primarily its identity, while maintaining the rest unchanged. Research in neural VC has accomplished considerable breakthroughs with the capacity to falsify a voice identity using a small amount of data with a highly realistic renderin...
This paper introduces voice reenactement as the task of voice conversion (VC) in which the expressivity of the source speaker is preserved during conversion while the identity of a target speaker is transferred. To do so, an original neural-VC architecture is proposed based on sequence-to-sequence voice conversion (S2S-VC) in which the speech proso...
This paper presents a sequence-to-sequence voice conversion (S2S-VC) algorithm which allows to preserve some aspects of the source speaker during conversion, typically its prosody, which is useful in many real-life application of voice conversion. In S2S-VC, the decoder is usually conditioned on linguistic and speaker embeddings only, with the cons...
Voice conversion (VC) consists of digitally altering the voice of an individual to manipulate part of its content, primarily its identity, while maintaining the rest unchanged. Research in neural VC has accomplished considerable breakthroughs with the capacity to falsify a voice identity using a small amount of data with a highly realistic renderin...
This paper presents non-negative factorization of audio signals for the binaural localization of multiple sound sources within realistic and unknown sound environments. Non-negative tensor factorization (NTF) provides a sparse representation of multi-channel audio signals in time, frequency, and space that can be exploited in computational audio sc...
Multimodal clustering/diarization tries to answer the question ”who spoke when” by using audio and visual information. Diarizationconsists of two steps, at first segmentation of the audio information and detection of the speech segments and then clustering of the speech segments to group the speakers. This task has been mainly studied on audiovisua...
In this paper, we propose a way of using the Audio-Visual Description Profile (AVDP) of the MPEG-7 standard for 2D or stereo video and multichannel audio content description. Our aim is to provide means of using AVDP in such a way, that 3D video and audio content can be correctly and consistently described. Since AVDP semantics do not include ways...
cote interne IRCAM: Papachristou14a
There is a rise in the number 3D audio-visual productions and archives that creates a need for indexation of 3D contents. Event detection using audio modality is a difficult task. The standard way to do classification on 3D audio is to first down-mix to mono audio and classify on that. In this paper, we describe a generic classifier for multi-chann...
This demo will show a prototype of a new software radio enabled broadcast media navigator implemented on an FPGA and quad-core processor, which is able to demodulate simultaneously all channel in the FM band and perform a real time classification of the musical genre. This prototype represents the elementary component of a navigator capable of sear...
Broadcast radio is a rich yet underexploited source of multimedia content. To make this content available to users, it will be indispensable to develop new types of navigators capable of searching the large quantities of information contained in the radio bands. The article introduces a prototype of a new software radio enabled broadcast media navi...
Broadcast radio is a rich but underexploited source of multimedia content. To make this available to users, it will be indispensable to develop new types of navigators capable of searching the large quantities of information contained in the radio bands. The article introduces a prototype of a new software radio enabled broadcast media navigator im...
The development of a continuous visual speech recognizer for a silent speech interface has been investigated using a visual speech corpus of ultrasound and video images of the tongue and lips. By using high-speed visual data and tied-state cross-word triphone HMMs, and including syntactic information via domain-specific language models, word-level...
This paper presents recent developments on our "silent speech interface" that converts tongue and lip motions, captured by ultrasound and video imaging, into audible speech. In our previous studies, the mapping between the observed articulatory movements and the resulting speech sound was achieved using a unit selection approach. We investigate her...
This article presents a segmental vocoder driven by ultrasound and optical images (standard CCD camera) of the tongue and lips for a “silent speech interface” application, usable either by a laryngectomized patient or for silent communication. The system is built around an audio–visual dictionary which associates visual to acoustic observations for...
Recent improvements are presented for phonetic decoding of continuous-speech from ultrasound and optical observations of the tongue and lips in a silent speech interface application. In a new approach to this critical step, the visual streams are modeled by context-dependent multi-stream Hidden Markov Models (CD-MSHMM). Results are compared to a ba...
This paper focuses on the problem of noise compensation in speech signals for robust speech recognition. We investigate on a novel paradigm based on source separation techniques to remove music from speech, a common situation in broadcast news transcription tasks. The two methods proposed, namely adaptive Wiener filtering and adaptive shrinkage, re...
In this paper, we address the problem of audio source separation with one single sensor, using a statistical model of the sources. The approach is based on a learning step from samples of each source separately, during which we train Gaussian scaled mixture models (GSMM). During the separation step, we derive maximum a posteriori (MAP) and/or poste...
The aim of this paper is to investigate the use of multiple- window Short-Time Fourier Transform (STFT) represen- tation for single sensor source separation. We propose to iteratively split the observed signal into target sources and residuals. Each target source is modeled as the sum of ele- mentary components with known power spectral densities (...
We propose a new method to learn overcomplete dictionaries for sparse coding structured as unions of orthonormal bases. The interest of such a structure is manifold. Indeed, it seems that many signals or images can be modeled as the superimposition of several layers with sparse decompositions in as many bases. Moreover, in such dictionaries, the ef...
In this paper, we address the problem of noise compensation in speech signals for robust speech recognition. Several classical denoising methods in the field of speech and signal processing are compared on speech corrupted by music, as it is often the case in broadcast news transcription tasks. We also present two new source separation techniques,...
We propose a preliminary step towards the construction of a global evaluation framework for Blind Audio Source Separation (BASS) algorithms. BASS covers many potential applications that involve a more restricted number of tasks. An algorithm may perform well on some tasks and poorly on others. Various factors affect the difficulty of each task and...
Dans cet article, nous présentons des applications de la séparation de sources audio et nous proposons quelques idées en vue de constituer des ressources communes pour l'évaluation des algorithmes de séparation. Notre démarche se décompose en trois parties : identifier les tâches typiques à résoudre par les algorithmes, construire des critères de m...
In this paper, we address the problem of noise compensation in speech signals for robust speech recognition. Several classical denoising methods in the field of speech and signal processing are compared on speech corrupted by music, which correspond to a frequent situation in broadcast news transcription tasks. We also present two new source separa...
We propose a new method to perform the separation of two sound sources from a single sensor. This method generalizes Wiener filtering with locally stationary, non-Gaussian, parametric source models. The method involves a learning phase for which we propose three different algorithm. In the separation phase, we use a sparse non negative decompositio...
In this paper, we address a few issues related to the evaluation of the performance of source separation algorithms. We propose several measures of distortion that take into account the gain indeterminacies of BSS algorithms. The total distortion includes interference from the other sources as well as noise and algorithmic artifacts, and we define...
We propose a new method to perform the separation of two sound sources from a single sensor. This method generalizes the Wiener filtering with locally stationary, non gaussian, parametric source models. The method involves a learning phase for which we propose three different algorithm. In the separation phase, we use a sparse non negative decompos...
We propose a new method to perform the separation of two audio sources from a single sensor. This method generalizes the Wiener filtering with Gaussian Mixture distributions and with Hidden Markov Models. The method involves a trai- ning phase of the models parameters, which is done with the classical EM algorithm. We derive a new algorithm for the...
- Nous étudions un paradigme pour la séparation de deux sources sonores avec un seul capteur. On compare expérimentalement trois méthodes de décomposition parcimonieuse pour un dictionnaire constitué de l'union de deux bases. On évalue la qualité de la séparation en fonction de la parcimonie de chaque source. De manière complémentaire, on étudie ég...
The aim of this paper is to investigate the use of multi- resolution framework for single sensor source separation based on pseudo-Wiener filtering. We propose a scheme in which the signal is iteratively split in target sources and a residual. Each target source is modeled as the sum of elementary components with known Power Spectral Den- sities (P...
We report on the evaluation of a new method for audio source separation using only one sensor. The method can be viewed as a generalization of Wiener filtering to locally stationary signals, where the sources are modelled using power spectral density dictionaries which are estimated during a training step. The experiments were designed to measure h...