
Philippe Depalle- McGill University
Philippe Depalle
- McGill University
About
124
Publications
21,152
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,069
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (124)
Two experiments were conducted for the derivation of psychophysical scales of the following audio descriptors: spectral centroid, spectral spread, spectral skewness, odd-to-even harmonic ratio, spectral deviation, and spectral slope. The stimulus sets of each audio descriptor were synthesized and (wherever possible) independently controlled through...
Temporal audio features play an important role in timbre perception and sound identification. An experiment was conducted to test whether listeners are able to rank order synthesized stimuli over a wide range of feature values restricted within the range of instrument sounds. The following audio descriptors were tested: attack and decay time, tempo...
Estimating mixtures of damped chirp sinusoids in noise is a problem that affects audio analysis, coding, and synthesis applications. Phase-based non-stationary parameter estimators assume that sinusoids can be resolved in the Fourier transform domain, whereas high-resolution methods estimate superimposed components with accuracy close to the theore...
Supervised source separation requires expensive synthetic datasets containing clean, ground truth-source signals, while unsupervised separation requires only data mixtures. Existing unsupervised methods still use supervision to avoid over-separation and compete with fully supervised methods. We present a new method of completely unsupervised single...
Source code that performs variational inference and learning of a Bayesian linear dynamical system.
The "demo" script runs a variational Bayesian EM algorithm to infer and learn the parameters of the Bayesian hierarchical model. The expectation (E) step is completed by our fully Bayesian Kalman filter and smoother, which incorporates uncertainty...
A psychophysical experiment was conducted to perceptually validate several spectral audio features through ordinal scaling: spectral centroid, spectral spread, spectral skewness, odd-to-even harmonic ratio, spectral slope, and harmonic spectral deviation. Several sets of stimuli per audio feature were synthesized at different fundamental frequencie...
State space models have been extensively applied to model and control dynamical systems in disciplines including neuroscience, target tracking, and audio processing. A common modeling assumption is that both the state and data noise are Gaussian because it simplifies the estimation of the system's state and model parameters. However, in many real-w...
Orchestral blend happens when sounds coming from two or more instruments are perceived as a single sonic stream. Several studies have suggested that different musical properties contribute to create such an effect. We developed models to identify orchestral blend effects from symbolic information taken from scores based on calculations related to t...
Humans excel at using sounds to make judgements about their immediate environment. In particular, timbre is an auditory attribute that conveys crucial information about the identity of a sound source, especially for music. While timbre has been primarily considered to occupy a multidimensional space, unravelling the acoustic correlates of timbre re...
Matlab code that implements a state space filter for univariate Laplace-distributed data sequences, as detailed in the following paper.
This Bayesian filter uses exact inference to infer the latent state sequence from temporal data. It successfully filters outliers and heavy-tailed noise, in addition to Laplace noise, Gaussian noise, and Cauchy n...
No PDF available
ABSTRACT
Auditory Scene Analysis (ASA) research has provided knowledge about the principles underlying auditory organization processes and has been successfully applied in computational ASA. Based on these findings, we applied these principles within a musical context. The aim is to understand and computationally model the percepti...
The integration of vibrotactile feedback in digital music instruments (DMIs) is thought to improve the instrument’s response and make it more suitable for expert musical interactions. However, given the extreme requirements of musical performances, there is a need for solutions allowing for independent control of frequency and amplitude over a wide...
Variational inference of a Bayesian linear dynamical system is a powerful method for estimating latent variable sequences and learning sparse dynamic models in domains ranging from neuroscience to audio processing. The hardest part of the method is inferring the model's latent variable sequence. Here, we propose a solution using matrix inversion le...
We present a Bayesian filter for state space models with Laplace-distributed observation noise that is robust to heavy-tailed and outlier-ridden univariate time-series data. We analytically derive a closed-form expression of the exact posterior for a Laplace likelihood conditioned on a Gaussian prior. Posterior statistics are propagated forward in...
Variational inference of a Bayesian linear dynamical system is a powerful method for estimating latent variable sequences and learning sparse dynamic models in domains ranging from neuroscience to audio processing. The hardest part of the method is inferring the model's latent variable sequence. Here, we propose a solution using matrix inversion le...
Orchestral blend happens when sounds coming from two or more instruments are perceived as a single sonic stream. Several studies have suggested that different musical properties contribute to create such an effect. We developed models to identify orchestral blend effects from symbolic information taken from scores based on calculations related to t...
This paper proposes a new partial tracking method, based on linear programming, that can run in real-time, is simple to implement , and performs well in difficult tracking situations by considering spurious peaks, crossing partials, and a non-stationary short-term sinusoidal model. Complex constant parameters of a generalized short-term signal mode...
Musical instrument timbre has been intensively investigated through dissimilarity rating tasks. It is now well known that audio descriptors such as attack time and spectral centroid, among others, account well for the dimensions of the timbre spaces underlying these dissimilarity ratings. Nevertheless, it remains very difficult to reproduce these p...
In this chapter we consider control structures and mapping in the process of deciding upon the underlying sonic algorithm for a digital musical instrument. We focus on control of timbral and textural phenomena that arise from the interaction and modulation of stationary spectral components, as well as from stochastic elements of sound. Given this o...
In this paper, we introduce a function designed specifically for sparse audio representations. A progression in the selection of dictionary elements (atoms) to sparsely represent audio has occurred: starting with symmetric atoms, then to damped sinusoid and hybrid atoms, and finally to the re-appropriation of the gammatone (GT) and formant-wave-fun...
The ability of a listener to recognize sound sources, and in particular musical instruments from the sounds they produce, raises the question of determining the acoustical information used to achieve such a task. It is now well known that the shapes of the temporal and spectral envelopes are crucial to the recognition of a musical instrument. More...
Modulation Power Spectra include dimensions of spectral and temporal modulation that contribute significantly to the perception of musical instrument timbres. Nevertheless, it remains unknown whether each instrument's identity is characterized by specific regions in this representation. A recognition task was applied to tuba, trombone, cello, saxop...
The objective of this study is to understand listeners' sensitivity to directional variations in non-ideal diffuse field reverberation. An ABX discrimination test was conducted using a semi-spherical 28-loudspeaker array; perceptual thresholds were estimated by systematically varying the level of a segment of loudspeakers for lateral, height, and f...
We present Vibrato Nonnegative Tensor Factorization, an algorithm for single-channel unsupervised audio source separation with an application to separating instrumental or vocal sources with nonstationary pitch from music recordings. Our approach extends Nonnegative Matrix Factorization for audio modeling by including local estimates of frequency m...
We present Vibrato Nonnegative Tensor Factorization, an algorithm for single-channel unsupervised audio source separation with an application to separating instrumental or vocal sources with nonstationary pitch from music recordings. Our approach extends Nonnegative Matrix Factorization for audio modeling by including local estimates of frequency m...
While a variety of techniques exist to record and reproduce point sources, there is not a systematic tool for the recording and reproduction of diffuse sound fields. Diffuse Field Modeling uses decorrelation filters based on the statistical description of reverberation to "virtualize" an array of outward-facing cardioid microphones from linear comb...
This paper discusses methods for the adaptive reconstruction of the modes of multicomponent AM–FM signals by their time–frequency (TF) representation derived from their short-time Fourier transform (STFT). The STFT of an AM–FM component or mode spreads the information relative to that mode in the TF plane around curves commonly called ridges. An al...
This article contributes a holistic conceptual framework for the notion of “mapping” that extends the classical view of mapping as parameter association. In presenting this holistic approach to mapping techniques, we apply the framework to existing works from the literature as well as to new implementations that consider this approach in their cons...
This paper examines complex non-negative matrix factorization (CMF) as a tool for separating overlapping partials in mixtures of harmonic musical sources. Unlike non-negative matrix factorization (NMF), CMF allows for the development of source separation procedures founded on a mixture model rooted in the complex-spectrum domain (in which the super...
Matching Pursuit (MP) is a greedy algorithm that iteratively builds a sparse signal representation. This work presents an analysis of MP in the context of audio denoising. By interpreting the algorithm as a simple shrinkage approach, we identify the factors critical to its success, and propose several approaches to improve its performance and robus...
This paper considers the estimation of time-frequency coefficients of audio signals from the viewpoint of spectro-temporal modulation analysis. It is shown that estimators employing neighborhood-smoothed shrinkage masks are closely related to modulation filters. The usefulness of this perspective is first demonstrated by separating an artificial mi...
Noise-based nonlinear system identification techniques using Ham-merstein and Wiener forms have found wide application in biolog-ical system modeling, and been applied to modeling nonlinear au-dio processors such as the ring modulator. These methods apply noise to the system, and project the system output onto a set of or-thogonal polynomials to re...
In this paper, we present a unified view of three non-stationary sinusoidal parameter estimation methods which are based on taking linear transforms of a signal and its derivatives. These methods, the Distribution Derivative Method (DDM), the Generalized Derivative Method (GDM), and the Generalized Reassignment Method (GRM), are shown to be subcase...
In this paper, we present comparisons of non-stationary sinusoidal parameter estimation methods for an exponential polynomial signal model. We compare the estimations of the Distribution Derivative Method (DDM), the Generalized Derivative Method (GDM), the Generalized Reassignment Method (GRM), and the Quadratically Interpolated Fast Fourier Transf...
Sounds morphing is an important topic in signal processing of musical sounds and covers a wide variety of techniques whose aim is to \interpolate" between two sound signals. We present here an approach based on the alteration of time-frequency representation. Time-frequency analysis is a classical tool in sounds analysis/synthesis. A time-frequency...
Extreme metal genres such as death metal and black metal force music analysts to seek alternative methods to Western notation-based analysis, especially when one asks what means of expression their vocalists may draw from in order to seem convincing and powerful to fans. Using spectrograms generated by AudioSculpt, a powerful sound analysis, proces...
The digital waveguide mesh (DWM) has proven to an efficient and accurate method for simulating multi-dimensional wave propagation in various applications such as physical modeling of musical instruments and room acoustics. However, problems appear when fitting a DWM to an arbitrary boundary because of the geometric constraints of a given mesh eleme...
Three listening tests were conducted to perceptually evaluate different versions of a new real-time synthesis approach for sounds of sustained contact interactions. This study aims to identify the most effective algorithm to create a realistic sound for rolling objects. In Experiment 1 and 2, participants were asked to rate the extent to which 6 di...
A physically-informed audio analysis framework for the identification of plucking gestures on the classical guitar is presented. The Digital Waveguide (DW) paradigm is used to devise a model for the vibration of one guitar string that takes into account the AOR. The first step in the analysis chain is to pre-process the sound. Then, the signal is s...
The article proposes a simple but physically intuitive method to extract the excitation signal from a plucked string signal in the time domain. In order to observe the motion of a string when plucked, a string of an electric guitar is plucked and the output signal from an electromagnetic pickup is recorded using a high impedance port of an audio in...
In a previous study, mechanical and expressive clarinet performances of Bach's Suite no. II and Mozart's Quintet for Clarinet and Strings were analysed to determine whether some acoustical correlates of timbre (e.g., Spectral Centroid), timing (Intertone Onset Interval) and dynamics (Root Mean Square envelope) showed significant differences dependi...
Research into sparse atomic models has recently intensified in the image and audio processing communities. While other reviews exist, we believe this paper provides a good starting point for the uninitiated reader as it concisely summarizes the state-of-the-art, and presents most of the major topics in an accessible manner. We discuss several appro...
This study deals with the acoustical factors liable to account for expressiveness in clarinet performances. Mechanical and expressive performances of excerpts from Bach's Suite No. II and Mozart's Quintet for Clarinet and Strings were recorded. Timbre, timing, dynamics, and pitch descriptors were extracted from the recorded performances. The data w...
This paper presents some results on automatic characterisation of musical and acoustic signals in terms of features attributed to signal segments. These features describe some of the musical and acoustical content of the sound and can be used in applications such as intelligent sound processing, retrieval of music and sound in databases or music ed...
This paper introduces an analysis/synthesis scheme for the reproduction of sounds generated by sustained contact between rigid bodies. This scheme is rooted in a Source/Filter decomposition of the sound where the filter is described as a set of poles and the source is described as a set of impulses representing the energy transfer between the inter...
In this paper, the analysis and synthesis of a rolling ball sound is proposed. The approach is based on the assumption that the rolling sound is generated by a concatenation of micro-impacts between a ball and a surface, each having associated resonances. Contact timing information is first extracted from the rolling sound using an onset detection...
Producing a tone by increasing the blowing pressure to excite a higher frequency impedance minimum, or overblowing, is widely used in standard flute technique. In this paper, the effect of overblowing a fingering is explored with spectral analysis, and a fingering detector is designed based on acoustical knowledge and pattern classification techniq...
In the context of non-stationary sinusoidal analysis, the theoretical comparison of the reassignment method (RM) and the derivative method (DM) for the estimation of the frequency slope is investigated. It is shown that for the estimation of the frequency slope the DM differs from the RM in that it does not consider the group delay. Theoretical equ...
In the context of non-stationary sinusoidal modeling, this paper introduces the generalization of the derivative method (presented at the first DAFx edition) for the analysis stage. This new method is then compared to the reassignment method for the estimation of all the parameters of the model (phase, amplitude, frequency, amplitude modulation, and...
Our previous research focused on vibrato modeling of the harmonics of musical sounds. We demonstrated the need to account for fundamental frequency modulation, global amplitude modulation, but also spectral envelope modulation linked to brightness modulations. This may result from nonlinearities between the excitation signal and the resonating body...
The indirect acquisition of musicians' gestures consists in retrieving information about gestures from the sound. Inspired by previous studies investigating clarinetists' gestures, we focus on the flute, which features numerous playing modes and a rich palette of sounds. Our goal is to provide musicians with guidelines for using indirect acquisitio...
For the modeling of percussive (non-sustained) sounds, the excitation signal can be estimated from an original sound in several ways, usually by a time-domain deconvolution process. The source signal obtained by such a process cannot be compared with the original excitation because it is usually unknown. Hence in most of the approaches available in...
Source-filter models are widely used in numerous audio pro- cessing fields, from speech processing to percussive/contact sound synthesis. Thedesignoffiltersforthesemodels—beitfromscratch or from spectral analysis—usually involves tuning frequency and damping parameters and/or providing an all-pole model of the res- onant part of the filter. In this...
This chapter considers some of the tools available for analyzing musical sound from acoustical and psychoacoustical perspectives. Spectrograms are shown of various different kinds of musical sound, to demonstrate the features and limitations of this type of representation. Different ways of representing the basic attributes of sound are illustrated...
Time-frequency analysis and wavelet analysis are generally used for providing signal expansions that are suitable for various further tasks such as signal analysis, de-noising, compression, source separation, ... However, time-frequency analysis and wavelet analysis also provide efficient ways for constructing signals' transformations. They are mod...
The perceptual importance of timbre variations was inves- tigated in clarinet expressive music performance. Three basic transformations acting on timbre, rhythm and dy- namics and four combinations of them were applied to solo clarinet recordings in order to remove or flatten some of the expressive variations of the performer. Twenty skilled musicia...
In this paper we present an approach for the indirect acquisition of specific fingerings that produce harmonic notes on the flute. We analyse both temporal and spectral characteristics of the attack of harmonic notes produced by specific control gestures involving fingering and potentially over blowing. We then show that it is possible to acquire t...
A numerical technique for determining the acoustic impedance of axisymmetric waveguides based on the lattice Boltzmann method (LBM) is proposed here. This approach presents some desirable characteristics, namely, the possibility of explicitly considering the propagation of waves at low Mach number conditions with a relative low computational cost w...
This paper discusses explicit mapping strategies for gestural and adaptive control of digital audio effects. We address the problem of defining what is the control and what is the effect. We then propose a mapping strategy derived from mapping techniques used in sound synthesis. The explicit mapping strategy we developed has two levels and two laye...
This paper presents an implementation of the phase vocoder within a Gaussian state-space framework. Rather than formulate the prob- lem as a deterministic evolution of frequencies centered around a given bin, this evolution is treated stochastically by introduc- ing noise into the dynamics matrix of the recursive state equation. This produces effec...
jAudio is a new framework for feature extraction designed to eliminate the duplication of effort in calculating fea- tures from an audio signal. This system meets the needs of MIR researchers by providing a library of analysis al- gorithms that are suitable for a wide array of MIR tasks. In order to provide these features with a minimal learn- ing...
Classical guitarists vary plucking position to achieve different timbres from nasal and metallic - closer to the bridge - to round and mellow -closer to the middle of the string. An interesting set of timbre descriptors commonly used by guitarists seem to refer to phonetic gestures: thin, nasal, round, open, etc. The magnitude spectrum of guitar to...
The choice of mapping strategies to e#ectively map controller variables to sound synthesis algorithms is examined. Specifically, we look at continuous mappings that have a geometric representation. Drawing from underlying mathematical theory, this paper presents a way to compare mapping strategies, with the goal of achieving an appropriate match be...
In this paper we propose an adaptive digital signal processing system that aims at achieving intuitive control over singing voice timbre. The sound transformations are operated using an analysis/synthesis system based on the harmonic plus noise model, from which singing voice timbre descriptors are extracted and related to common perceptual timbre...
This paper provides a review of gestural control of sound synthesis in the context of the design and evaluation of digital musical instruments. It discusses research in various areas related to this field and equally focuses on four main topics: analysis of music performers' gestures, gestural capture technologies, real-time sound synthesis methods...
This paper takes the opportunity of presenting a set of new adap-tive effects to propose a generic scheme for adaptive effects built upon a spectral source-filter decomposition and a Short-Time Fou-rier analysis-resynthesis. This allows for a better formalization of the involved signal processing algorithms and leads to a simple classification of a...
This paper focuses on the extraction of the excitation point location on a guitar string by an iterative estimation of the structural parameters of the spectral envelope. In a simple digital physical model of a plucked-string instrument, the resonant modes translate into an all-pole structure while the initial conditions (a triangular shape for the...
In this paper, we describe a multi-level approach for the extraction of instrumental gesture parameters taken from the characteristics of the signal captured by a microphone and based on the knowledge of physical mechanisms taking place on the instrument. We also explore the relationships between some features of timbre and gesture parameters, taki...
cote interne IRCAM: McAdams03b
We present a new additive synthesis method based on spectral envelopes and inverse Fast Fourier Transform (FFT -1 ). User control is facilitated by the use of spectral envelopes to describe the characteristics of the short term spectrum of the sound in terms of sinusoidal and noise components. Such characteristics can be given by users or obtained...
cote interne IRCAM: Wanderley99c
In this paper we study the effects of performers' gesture on the sound produced by an acoustic instrument as well as the modeling of these effects. More particularly, we propose to focus on the effects of ancillary gestures – those not primarily intended to produce the sound, but nevertheless omnipresent in professional instrumentalists' technique....
This paper deals with vibrato detection, vibrato extraction on f 0 trajectory, and vibrato parameter estimation and modification. Vibrato detection and extraction are aimed at being a first step for note segmentation of singing voice signals. The aim is also to characterize sounds with the descriptor: "presence of vibrato" or "absence of vibrato"....
cote interne IRCAM: Wanderley98b
cote interne IRCAM: Depalle97a
cote interne IRCAM: Tassart97a
A new method which improves the estimation of frequency, amplitude and phase of the partials of a sound is presented. It allows the reduction of the analysis-window size from four periods to two periods. It therefore gives better accuracy in parameter determination, and has proved to remain efficient at low signal-to-noise ratios. The basic idea co...
We propose in this paper a new point of view which unifies two
well known filter families for approximating ideal fractional delay
filters: Lagrange interpolator filters (LIF) and Thiran allpass filters.
We achieve this unification by approximating the ideal Fourier transform
of the fractional delay according to two different Pade approximations:
s...
Sound recordings include transients and sustained parts. Their analysis with a basis expansion is not rich enough to represent efficiently all such components. Pursuit algorithms choose the decomposition vectors depending upon the signal properties. The dictionary among which these vectors are selected is much larger than a basis. Matching Pursuit...
Sound recordings include transients and sustained parts. Their
analysis with a basis expansion is not rich enough to represent
efficiently all such components. Pursuit algorithms choose the
decomposition vectors depending upon the signal properties. The
dictionary among which these vectors are selected is much larger than a
basis. Matching pursuit...
cote interne IRCAM: Tassart96a