• Home
  • Sacha Krstulovic
Sacha Krstulovic

Sacha Krstulovic

About

36
Publications
2,619
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
523
Citations

Publications

Publications (36)
Preprint
The ranking of sound event detection (SED) systems may be biased by assumptions inherent to evaluation criteria and to the choice of an operating point. This paper compares conventional event-based and segment-based criteria against the Polyphonic Sound Detection Score (PSDS)'s intersection-based criterion, over a selection of systems from DCASE 20...
Preprint
This work defines a new framework for performance evaluation of polyphonic sound event detection (SED) systems, which overcomes the limitations of the conventional collar-based event decisions, event F-scores and event error rates (ERs). The proposed framework introduces a definition of event detection that is more robust against labelling subjecti...
Chapter
After giving a brief overview of the relevance and value of deploying automatic audio event recognition (AER) in the smart home market, this chapter reviews three aspects of the productization of AER which are important to consider when developing pathways to impact between fundamental research and “real-world” applicative outlets. In the first sec...
Article
Full-text available
In the context of the Internet of Things (IoT), sound sensing applications are required to run on embedded platforms where notions of product pricing and form factor impose hard constraints on the available computing power. Whereas Automatic Environmental Sound Recognition (AESR) algorithms are most often developed with limited consideration for co...
Conference Paper
Full-text available
For the task of sound source recognition, we introduce a novel data set based on 6.8 hours of domestic environment audio recordings. We describe our approach of obtaining annotations for the recordings. Further, we quantify agreement between obtained annotations. Finally, we report baseline results for sound source recognition using the obtained da...
Patent
Full-text available
A text-to-speech method for use in a plurality of languages, including: inputting text in a selected language; dividing the inputted text into a sequence of acoustic units; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model, wherein the model has a plurality of model parameters describing probability d...
Article
The classical analogy between linear filtering and acoustical filtering by tubes is applied in the non-classical case where the tubes are made of unequal-length sections (such as the DRM case). It is shown that the filtering process identity is substantially more complicated than in the case of equal-length sections. In particular, it prevents the...
Conference Paper
Hidden Markov model-based text-to-speech (HMM-TTS) systems are often trained on manual voice corpus phonetic transcriptions, despite the fact that because these manual pronunciations cannot be predicted with complete accuracy at synthesis time, the result is training/synthesis mismatch. In this paper, an alternate approach is proposed in which a se...
Conference Paper
Full-text available
To achieve high quality synthetic emotional speech, unit- selection is the state-of-the-art technique. Nevert heless, a large expensive phonetically-segmented corpus is ne eded, and cost-effective automatic techniques should be studi ed. According to the HMM experiments in this paper: segmentation performance can depend heavily on the segmental or...
Conference Paper
In the context of the Neologos French speech database creation project, a general methodology was defined for the selection of representative speaker recordings. The selection aims at providing a good coverage in terms of speaker variability while limiting the number of recorded speakers. This is intended to make the resulting database both more ad...
Conference Paper
The paper assesses the capability of an HMM-based TTS sys- tem to produce German speech. The results are discussed in qualitative terms, and compared over three different choices of context features. In addition, the system is adapted to a small set of football announcements, in an exploratory attempt to synthe- sise expressive speech. We conclude...
Article
ABSTRACT The paper investigates the potential of HMM based synthesis to support the parameterisation of expressive speech in German. First, we review the assets of HMMs in the perspective of previous works in speech modelling and speech transformation. It is shown,that HMMs define a flexible parametric model of the speech acoustics, which readily i...
Article
In the context of the Neologos French speech database creation project,1 a general methodology was defined for the selection of representative speaker recordings. The selection aims at providing a good coverage in terms of speaker variability while limiting the number of recorded speakers. This is intended to make the resulting database both more a...
Conference Paper
Matching Pursuit (MP) aims at finding sparse decompositions of signals over redundant bases of elementary waveforms. Traditionally, MP has been considered too slow an algorithm to be applied to real-life problems with high-dimensional signals. Indeed, in terms of floating points operations, its typical numerical implementations have a complexity of...
Conference Paper
This paper focuses on under-determined source separation when the mixing parameters are known. The approach is based on a sparse decomposition of the mixture. In the proposed method, the mixture is decomposed with Matching Pursuit by introducing a new class of multi-channel dictionaries, where the atoms are given by a spatial direction and a wavefo...
Conference Paper
In the framework of audio signal analysis, it is desired to obtain sparse representations that are able to reflect the harmonic structures, e.g., issued from musical instruments. In this paper, we compare two approaches which introduce some explicit models of harmonic features into the matching pursuit analysis framework. The first approach is the...
Article
By definition, the Matching Pursuit algorithm with constant (or "flat") Gabor atoms provides a coarse estimate of frequency modulated sinusoids in music and voice signals. Chirped Gabor atoms, closer to the nature of these signals, would fit them in a finer and sparser way. Though a method for the direct analytic estimation of chirped Gabor atoms h...
Article
Full-text available
Cet article traite de la séparation de sources dans le cas sous-déterminé quand la matrice de mélange est connue. On se place dans le cadre des approches basées sur la décomposition parcimonieuse du mélange. Dans la nouvelle méthode proposée, on décompose le mélange par Matching Pursuit en introduisant une nouvelle classe de dictionnaires multi-can...
Conference Paper
Full-text available
The Neologos project is a speech database creation project for the French language, resulting from a collaboration between universities and industrial companies and supported by the French Ministry of Research. The goal of Neologos is to re-think the design of the speech databases in order to enable the development of new algorithms in the field of...
Conference Paper
This article presents a new class of constrained and specialized autoregressive (AR) processes. They are derived from lattice filters where some reflection coefficients are forced to zero at a priori locations. Optimizing the filter topology allows to build parametric spectral models that have a greater number of poles than the number of parameters...
Article
Thèse no 2501 sc. techn. EPF Lausanne. Literaturverz.
Article
Lab sessions given in relation to Herve Bourlard's Speech Recognition course at EPFL (Ecole Polytechnique Federale de Lausanne), second semester 2001. The full session is available from the web as ftp://ftp.idiap.ch/pub/sacha/labs/Session1.tgz .
Article
Lab sessions given in relation to Herve Bourlard's Speech Recognition course at EPFL (Ecole Polytechnique Federale de Lausanne), second semester 2001. The full session is available from the web as ftp://ftp.idiap.ch/pub/sacha/labs/Session2.tgz .
Article
Despite the approximations it supposes, performing LPC-based acoustico-articulatory inversion is justified in some applicative frameworks. By illustrating this assertion through experiments aiming at incorporating speech production constraints from the DRM model and from a factor-based model into an LPC modeling scheme, we promote the use of LPC-ba...
Conference Paper
A particular form of constraint is incorporated to Linear Prediction lattice filter models in the form of unequal-length delays. This constraint amounts to reducing the number of intrinsic degrees of freedom defined by the reflection coefficients without modifying the LPC order of the corresponding transfer function. It can be optimized by a simple...
Conference Paper
This paper proposes a method for recovering the articulatory parameters of a factor-based vocal tract shape model from the speech waveform. This is realized by analytically relating the shape model to a Linear Prediction lattice filter. Results pertaining to human vowels are presented. They show a good agreement with phonetic characteristics in a r...
Article
A particular form of constraint is incorporated to Linear Prediction lattice filter models in the form of unequal-length delays. This constraint amounts to reducing the number of intrinsic degrees of freedom defined by the reflection coefficients without modifying the LPC order of the corresponding transfer function. It can be optimized by a simple...
Conference Paper
Articulatory representations are expected to bring better speech recognition results. This requires to estimate the parameters of a speech production model from the speech sound, problem known as acoustico-articulatory inversion. Known methods to solve this problem usually introduce a heavy computational cost. Alternately, it is known that Linear P...
Article
Constraints related to the Distinctive Regions and Modes (DRM) speech production model are incorporated in the framework of speech analysis by inverse filtering. It is shown that the analogy between Auto-Regressive modeling and acoustic models based on acoustic tubes is still respected when using tubes with unequal length elementary sections. This...

Network

Cited By

Projects

Projects (2)
Archived project
Archived project