David Virette's research while affiliated with Orange Labs and other places

Publications (35)

Patent
Full-text available
A method for coding a multi-channel audio signal representing a sound scene comprising a plurality of sound sources is provided. This method comprises decomposing the multi-channel signal into frequency bands and, per frequency band, obtaining directivity information per sound source of the sound scene, the information being representative of the s...
Patent
Full-text available
A method of binary allocation in an enhancement coding/decoding for improving a hierarchical coding/decoding of digital audio signals, including a core coding/decoding in a first frequency band and a band extension coding/decoding in a second frequency band. For a predetermined number of bits to be allocated for the enhancement coding/decoding, a f...
Patent
Full-text available
The invention is aimed at improving the quality of the filtering by transfer functions of HRTF type of signals (L, R) compressed in a transformed domain, for binaural playing on two channels (L-BIN, R-BIN), using a combination of HRTF filters (hL,L, hL,R) including a decorrelated version (HRTF-C*, HRTF-E*) of a few of these filters. For this purpos...
Patent
Full-text available
A method for encoding and decoding a digital audio signal is provided, said method comprising the steps of: encoding a first sequence of samples of the digital signal according to a transform encoding; encoding a second sequence of samples of the digital signal according to a predictive encoding; wherein the second sequence starts before the end of...
Patent
Full-text available
A method for processing sound data is provided for the reconstruction of multi-channel audio data on the basis at least of data on a reduced number of channels and of spatialization data. A test is carried out to determine whether the spatialization data received are valid. If the test is positive, a spatialization value is predicted according to a...
Patent
Full-text available
A method is provided for coding a multi-channel audio signal representing a sound scene comprising a plurality of sound sources. The method comprises decomposing the multi-channel signal into frequency bands and the following performed per frequency band: obtaining data representative of the direction of the sound sources of the sound scene, select...
Patent
A method of hierarchical coding of a digital audio frequency input signal into several frequency sub-bands, including a core coding of the input signal according to a first throughput and at least one enhancement coding of higher throughput, of a residual signal. The core coding uses a binary allocation according to an energy criterion. The method...
Patent
Full-text available
A method for updating the processing capacity of an encoder or decoder to use a modulated transform having a size greater than a predetermined initial size is provided, particularly, where the encoders or decoders are for storing an initial prototype filter defined by an ordered set of initial size coefficients. A step is provided for constructing...
Patent
A method of bitrate switching on decoding an audio signal coded by a audio coding system, said decoding comprising a post-processing step depending on the bitrate. On switching from an initial bitrate to a final bitrate, said method includes a transition step of continuous change from a signal at the initial bitrate to a signal at the final bitrate...
Patent
Full-text available
The invention relates to transform coding/decoding of a digital audio signal represented by a succession of frames, using windows of different lengths. For the coding within the meaning of the invention, it is sought to detect (51) a particular event, such as an attack, in a current frame (Ti); and, at least if said particular event is detected at...
Patent
Full-text available
The invention concerns a method and a system for sound spatialization of a first set of not less than one of the audio channels encoded on of a number of frequency subbands (SBk) and decoded in a transformed domain (Fl, C, Fr, Sr, SI, Ife) into a second set of not less than two (Bl, Br) sound channels in the time domain, from modelling filters conv...
Patent
Full-text available
A method and associated device are provided for spatial synthesis of a sum signal to obtain at least two output signals, the sum signal as well as the spatialization parameters being output from a parametric coding by matrixing of an original multi-channel signal. The method comprises: decorrelation of the sum signal to obtain a decorrelated signal...
Conference Paper
In this work, we study the usefulness of several types of sparsity penalties in the task of speech separation using supervised and semi-supervised Nonnegative Matrix Factorization (NMF). We compare different criteria from the literature to two novel penalty functions based on Wiener Entropy, in a large-scale evaluation on spontaneous speech overlai...
Conference Paper
This paper presents a novel low bit rate parametric stereo coding scheme which uses whole band inter-channel time difference (WITD) and whole band inter-channel phase difference (WIPD) together with a new effective downmixing method. The inter-channel level differences and inter-channel phase differences are also employed in the proposed stereo cod...
Conference Paper
We introduce a method for 2-D spatial multizone soundfield reproduction based on describing the desired multizone soundfield as an orthogonal expansion of basis functions over the desired reproduction region. This approach finds the solution to the Helmholtz equation that is closest to the desired soundfield in a weighted least squares sense. The b...
Conference Paper
We present a novel method to integrate noise estimates by unsupervised speech enhancement algorithms into a semi-supervised non-negative matrix factorization framework. A multiplicative update algorithm is derived to estimate a non-negative noise dictionary given a time-varying background noise estimate with a stationarity constraint. A large-scale...
Conference Paper
This paper presents the two new ITU-T Recommendations G.722 Annex D and G.711.1 Annex F, which are stereo extensions of the wideband codecs ITU-T G.722 and G.711.1 and their superwideband extensions (G.722 Annex B and G.711.1 Annex D). An embedded scalable structure is used to add stereo extension layers on top of the wideband or superwideband core...
Patent
Full-text available
The invention proposes the synthesis of a signal consisting of consecutive blocks. It proposes more particularly, on receipt of such a signal, to replace, by synthesis, lost or erroneous blocks of this signal. To this end, it proposes an attenuation of the overvoicing during the generation of a signal synthesis. More particularly, a voiced excitati...
Patent
A system for coding a hierarchical audio signal, comprising, at least, a core layer using parametric coding by analysis by synthesis in a first frequency band, a band extension layer for widening said first frequency band into a second frequency band, or wideband. The system also comprises a wideband audio coding quality enhancement layer based on...
Patent
Full-text available
A system and a method for coding by principal component analysis (PCA) of a multi-channel audio signal comprising the following steps: decomposing at least two channels (L, R) of said audio signal into a plurality of frequency sub-bands (I(b1), . . . , I(bN), r(b1), . . . , r(bN)), calculating at least one transformation parameter (θ(b1), . . . , θ...
Patent
Full-text available
A system and a method for the scalable coding of a multi-channel audio signal comprising a principal component analysis (PCA) transformation of at least two channels (L, R) of the audio signal into a principal component (CP) and at least one residual sub-component (r) by rotation defined by a transformation parameter (θ), comprising the following s...
Article
In recent years there has been a phenomenal increase in the number of products and applications which make use of audio coding formats. Amongthe most successful audio coding schemes, the MPEG-1 Layer III (mp3), the MPEG-2 Advanced Audio Coding (AAC) or its evolution MPEG-4High Efficiency-Advanced Audio Coding (HE-AAC) can be cited. More recently, p...
Conference Paper
Full-text available
In this paper, we present an on-line semi-supervised algorithm for real-time separation of speech and background noise. The proposed system is based on Nonnegative Matrix Factorization (NMF), where fixed speech bases are learned from training data whereas the noise components are estimated in real-time on the recent past. Experiments with spontaneo...
Conference Paper
In this paper, we present novel low complexity coherence estimation and synthesis algorithms and their application to parametric stereo coding. Inter-channel correlation /coherence (IC) is an important parameter for parametric stereo coding as it represents the degree of similarity of the channels and is strongly related to the perception of width...
Conference Paper
High quality audio communication is a current challenge addressed by the standardisation committees. In this context, ITU and MPEG recently issued standards for high quality coding of both speech and music contents. Transform coding is used and allows quality commensurate with bit rates regardless of the audio content. Up to now, only constant tran...
Conference Paper
Full-text available
Cosine-modulated transforms such as the Modified Discrete Cosine Transform (MDCT) are key elements in audio coding. They allow efficient energy compaction and perceptual irrelevancy reduction. The frequency localization can be adapted to the signal characteristics and fast implementations exist. All of this has made MDCT the most popular transform...
Conference Paper
Full-text available
This paper describes the scalable coder - G.729.1 - which has been recently standardized by ITU-T for wideband telephony and voice over IP (VoIP) applications. G.729.1 can operate at 12 different bit rates from 32 down to 8 kbit/s with wideband quality starting at 14 kbit/s. This coder is a bitstream interoperable extension of ITU-T G.729 based on...
Conference Paper
Full-text available
The MPEG Surround standard includes two "native" binaural processing modules for reproducing 3D audio content over headphones. In this paper, we present a novel and efficient Binaural Room Impulse Response (BRIR) modeling algorithm extending their possibilities. It is based on a parametric decomposition of the BRIR and is integrated within the subb...
Conference Paper
Full-text available
Dans cet article, nous présentons un codeur audio scalable basé sur le codeur paramétrique MPEG-4 SSC (SinuSoidal Coder). Ce nouveau codeur combine deux stratégies de codage, la première étant le codage audio sinusoïdal (MPEG-4 SSC) et la deuxième étant le codage de type ACELP (Algebraic Code-Excited Linear Prediction), habituellement utilisé pour...
Article
Full-text available
Low bit rate parametric coding of multichannel audio is mainly based on Binaural Cue Coding (BCC). Another multichannel audio processing method called upmix can also be used to deliver multichannel audio, typically 5.1 signals, at low data rates. More precisely, we focus on existing upmix method based on Principal Component Analysis (PCA). This PCA...
Conference Paper
This paper describes the CELP coding module within the Adaptive Rate-Distortion Optimized sound codeR (ARDOR). The ARDOR codec combines coding techniques of different nature using a rate-distortion control mechanism, and is able to adapt to a large range of signal characteristics and system constraints. The implemented CELP codec is derived from th...
Conference Paper
This paper describes a 8-32 kbit/s scalable speech and audio coder submitted as a candidate for the ITU-T G729-based embedded variable bitrate (G729EV) standardization. The coder is built upon a 3-stage coding structure consisting of: narrowband cascade CELP coding at 8 and 12 kbit/s, bandwidth extension based on wideband linear-predictive coding (...
Article
Full-text available
Low-bit-rate parametric audio coding for multichannel audio is mainly based on Binaural Cue Coding (BCC). In this paper we show that the Unified Domain Representation (UDR) of multichannel audio, recently introduced, is equivalent to BCC scheme. We also discuss another method, called multichannel audio upmix, which classically converts existing two...

Citations

... Peak detection Peak detection is proposed in [28] to encode a single acoustic response with the goal of accelerating run-time convolution of input signals during playback of MPEG streams. Neither spatial interpolation nor compactness is a concern. ...
... In contrast to source-driven SCSS methods described in previous Section, model-based SCSS methods have been of high popularity due to their top separation performance. They rely on prior knowledge in the form of pre-trained dictionaries of the spectral amplitude information, using some statistical modeling techniques including factorial hidden Markov models [14,15], vector quantizers [16,17], graphical models [2], Gaussian mixture models [18], non-negative matrix factorization (NMF) [19,20,21,22], and deep models [23,24]. As our second proof-of-concept in this work, we select active-set Newton algorithm for overcomplete NMF recently proposed in [19], as it outperforms other conventional source separation techniques. ...
... • The phase-align principle, where the signals are temporally or phase aligned prior to the mixing process. This has been proposed, for instance, to improve parametric stereo coders [6,7,8]. A continuous and robust phase-alignment is not an easy task and any misalignment will immediately result in comb-lter artifacts. ...
... A well-known technique named multi-zone reproduction has been studied over the years [99][100][101][102][103][104]. The original idea of multi-zone reproduction is to present multiple individual sound fields at the target areas. ...
... Other works have considered log-regularization (i.e., penalizing the sum of the logarithms of the entries of the factor) which leads to more "aggressive" sparsity, e.g., [22]- [24]. Sparse regularization with information measures is also considered in [25], [26]. Another approach to induce sparsity consists in applying hard constraints to the factors (rather than mere penalization), using ℓ 0 constraints [27], [28] or using the sparseness measure introduced in [13]. ...
... The recommendation annex B (G.729B) [6] is designed with an idea of reducing average bit rate by using silence compression techniques for DSVD and other bit rate sensitive applications. The ITU-T G.729 based Embedded Variable bit rate (G.729EV) [8] provides a scalable coding scheme with narrowband to wideband audio quality in the range of 8-32 Kbps. There has been a significant work carried out about various schemes to reduce the bit rate and improving the speech quality and not much work has been reported regarding the techniques used for secured speech data transmission. ...
... The aliasing cancelation process of the proposed algorithm is conceptually similar to that of the block switching compensation scheme proposed for low delay advanced audio coding (AAC-LD [6,7]). In the literature, the scheme introduced time domain weightings applicable as a post processing in the decoder in order to remove a look-ahead delay inevitable for a window transition from the long window to the short window. ...
... As a result this hybrid configuration performs well for both speech and music signals. For more detailed results see [26]. Another hybrid example is the combination of sinusoidal and transform coding. ...
... In semi-supervised approach for real-time speech and background noise separation researcher used NMF, in which fixed speech bases are learnt from training data and noise components are calculated in real time using current data. Experiments with spontaneous conversational speech and real-world nonstationary noise reveal that proposed system outperforms a supervised NMF algorithm that learns noise components from the same noise environment as the test sample [32]. ...
... Primary ambient extraction techniques can be used for the task on decomposing a sound scene into separate objects. Principal component analysis is a popular technique for this task, whose performance depends on the correlation between the different components [134]. The decomposition task can be also performed by utilising blind source separation (BSS) separation techniques, assuming that the objects in the multichannel recording are mutually uncorrelated or statistically uncorrelated [135]. ...