Gabriel Sargent's research while affiliated with IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires and other places

What is this page?


This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

Publications (20)


Estimating the Structural Segmentation of Popular Music Pieces Under Regularity Constraints
  • Article

December 2016

·

63 Reads

·

20 Citations

IEEE/ACM Transactions on Audio Speech and Language Processing

Gabriel Sargent

·

·

Emmanuel Vincent

Music structure estimation has recently emerged as a central topic within the field of Music Information Retrieval. Indeed, as music is a highly structured information stream, knowledge of how a music piece is organized represents a key challenge to enhance the management and exploitation of large music collections. This article focuses on the benefits that can be expected from a regularity constraint on the structural segmentation of popular music pieces. Specifically here, we study how a constraint which favors structural segments of comparable size provides a better conditioning of the boundary estimation process. Firstly, we propose a formulation of the structural segmentation task as an optimization process which separates the contribution from the audio features and the one from the constraint. We illustrate how the corresponding cost function can be minimized using a Viterbi algorithm. We present briefly its implementation and results in three systems designed for and submitted to the MIREX 2010, 2011 and 2012 evaluation campaigns. Then, we explore the benefits of the regularity constraint as an efficient mean for combining the outputs of a selection of systems presented at MIREX between 2010 and 2015, yielding a level of performance competitive to that of the state-of-the-art on the “MIREX10” dataset (100 JPop songs from the RWC database).

Share

Figure 5 : Result of the analysis of S&C #1  
Figure 6 : Result of the analysis of S&C #2  
System & Contrast : A Polymorphous Model of the Inner Organization of Structural Segments within Music Pieces
  • Article
  • Full-text available

January 2016

·

171 Reads

·

14 Citations

Music Perception: an interdisciplinary journal

Music Perception: an interdisciplinary journal

At a large timescale, music pieces can be described as the succession of structural segments which form the global organization of the piece. The present article proposes a model called "System & Contrast", which aims at describing the inner organization of such structural segments in terms of : (i) a carrier system, i.e. a sequence of morphological elements forming a matrix network of self-deducible syntagmatic relationships and (ii) a contrast, i.e. a substitutive element, usually the last one, which partly departs from the logic of the system. The S&C model applies at several timescales and to a wide variety of musical dimensions in a very polymorphous way, therefore offering an efficient meta-description of mid-level musical content.

Download


A scalable summary generation method based on cross-modal consensus clustering and OLAP cube modeling

September 2015

·

24 Reads

·

6 Citations

Multimedia Tools and Applications

Gabriel Sargent

·

·

·

[...]

·

Video summarization has been a core problem to manage the growing amount of content in multimedia databases. An efficient video summary should display an overview of the video content and most existing approaches fulfill this goal. However, such an overview does not allow the user to reach all details of interest selectively and progressively. This paper proposes a novel scalable summary generation approach based on the On-Line Analytical Processing data cube. Such a structure integrates tools like the drill down operation allowing to browse efficiently multiple descriptions of a dataset according to increased levels of detail. We adapt this model to video summary generation by expressing a video within a cross-media feature space and by performing clusterings according to particular subspaces. Consensus clustering is used to guide the subspace selection strategy at small dimensions, as the novelty brought by the least consensual subspaces is interesting for the refinements of a summary. Our approach is designed for weakly-structured contents such as cultural documentaries. We perform its evaluation on a corpus of cultural archives provided by the French Audiovisual National Institute (INA) using information retrieval metrics handling single and multiple reference annotations. The performances obtained overall improved results compared to two baseline systems performing random and arbitrary segmentations, showing a better balance between Precision and Recall.


Scalable video summarization of cultural video documents in cross-media space based on data cube approach

June 2014

·

27 Reads

·

3 Citations

Video summarization has been a core problem to manage the growing amount of content in multimedia databases. An efficient video summary should display an overview of the video content and most of existing approaches fulfil this goal. However the information does not allow user to get all details of interest selectively and progressively. This paper proposes a scalable video summarization approach which provides multiple views and levels of details. Our method relies on the usage of cross media space and consensus clustering method. A video document is modelled as a data cube where the level of details is refined over nonconsensual features of the space. The method is designed for weakly structured content such as cultural documentaries and was tested on the INA corpus of cultural archives.


Segmentation of Music Video Streams in Music Pieces through Audio-Visual Analysis

May 2014

·

105 Reads

·

2 Citations

Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

Today, technologies for information storage and transmission allow the creation and development of huge databases of multimedia content. Tools are needed to facilitate their access and browsing. In this context, this article focuses on the segmentation of a particular category of multimedia content, audio-visual musical streams, into music pieces. This category includes concert audio-video recordings, and sequences of music videos such as the ones found in musical TV channels. Current approaches consist in supervised clustering in a few audio classes (music, speech, noise), and, to our knowledge, no consistent evaluation has been performed yet in the case of audio-visual musical streams. In this paper, we aim at estimating the temporal boundaries of music pieces relying on the assumed homogeneity of their musical and visual properties. We consider an unsupervised approach based on the generalized likelihood ratio to evaluate the presence of statistical breakdowns of MFCCs, Chroma vectors, dominant Hue and Lightness over time. An evaluation of this approach on 15 manually annotated concert streams shows the advantage of combining tonal content features to timbral ones, and a modest impact from the joint use of visual features in boundary estimation.


Table 7 summarizes global statistics over the entire da- taset. 
Figure 8 : number of label occurrences in the annotations as a function of their frequency rank (log-log plot)
Table 11 in 
inventories the most frequent types of vari- ants and their annotation. 
Semiotic Description of Music Structure: an Introduction to the Quaero/Metiss Structural Annotations

January 2014

·

140 Reads

·

7 Citations

Interest has been steadily growing in semantic audio and music information retrieval for the description of music structure, i.e. the global organization of music pieces in terms of large-scale structural units. This article presents a detailed methodology for the semiotic description of music structure, based on concepts and criteria which are formulated as generically as possible. We sum up the essential principles and practices developed during an annotation effort deployed by our research group (Metiss) on audio data, in the context of the Quaero project, which has led to the public release of over 380 annotations of pop songs from three different data sets. The paper also includes a few case studies and a concise.


Music structure estimation using multi-criteria analysis and regularity constraints

February 2013

·

10 Reads

Recent progress in information and communication technologies makes it easier to access large collections of digitized music. New representations and algorithms must be developed in order to get a representative overview of these collections, and to browse their content efficiently. It is therefore necessary to characterize music pieces through relevant macroscopic descriptions. In this thesis, we focus on the estimation of the structure of music pieces : the goal is to produce for each piece a description of its organization by means of a sequence of a few dozen structural segments, each of them defined by its boundaries (starting time and ending time) and a label reflecting its audio content.The notion of music structure corresponds to a wide range of meanings depending on the musical properties and the temporal scale under consideration. We introduce an annotation methodology based on the concept of "semiotic structure" which covers a large variety of musical styles. Structural segments are determined through the analysis of their similarities within the music piece, the coherence of their inner organization ("system-contrast" model) and their contextual relationship. A corpus of 383 pieces has been annotated according to this methodology and released to the scientific community.In terms of algorithmic contributions, this thesis concentrates in the first place on the estimation of structural boundaries. We formulate the segmentation process as the optimization of a cost function which is composed of two terms. The first one corresponds to the characterization of structural segments by means of audio criteria. The second one relies on the regularity of the target structure with respect to a "structural pulsation period". In this context, we compare several regularity constraints and study the combination of audio criteria through fusion.Secondly, we consider the estimation of structural labels as a probabilistic finite-state automaton selection process : in this scope, we propose an auto-adaptive criterion for model selection, applied to a description of the tonal content. We also propose a labeling method derived from the system-contrast model.We evaluate several systems for structural segmentation of music based on these approaches in the context of national and international evaluation campaigns (Quaero, MIREX). Additional diagnostic is finally presented to complement this work.


A MUSIC STRUCTURE INFERENCE ALGORITHM BASED ON MORPHOLOGICAL ANALYSIS

October 2012

·

21 Reads

·

3 Citations

Music structure refers to the description of the long term organization of a music piece through a sequence of struc-tural segments. A structural segment can be defined by its structural borders (a start time, an end time) and a label reflecting the similarity of its music content compared to the other segments'. Its duration is typically around 16 s and more. This document presents the music structure estimation system submitted to MIREX's structural segmentation task in 2012. It is composed of three steps : feature extraction, structural border estimation and segment labeling. First, the system produces a sequence of chroma vectors [6] ex-pressed at the snap scale [1] (section 1). This sequence is used to calculate a segmentation criterion based on a morphological model of the structural segments [2] (sec-tion 2.1). The structural border estimation is performed by searching the segmentation with lowest cost, which com-bines this criterion and a regularity constraint (section 2.2). The segments are then labeled by clustering according to their similarity, through the minimization of an adaptive model selection criterion (section 3).


Figure 1 : composite labels (schematic configurations)  
Semiotic structure labeling of music pieces: Concepts, methods and annotation conventions

October 2012

·

133 Reads

·

21 Citations

Music structure description, i.e. the task of representing the high-level organization of music pieces in a concise, generic and reproducible way, is currently a scientific challenge both algorithmically and conceptually. In this paper, we focus on semiotic structure, i.e. the description of similarities and internal relationships within a music piece, as a low-rate stream of arbitrary symbols from a limited alphabet and we address methodological ques-tions related to annotation. We formulate the labeling task as a blind demodulation problem, whose goal is to identify a minimal set of semi-otic codewords, whose realizations within the music piece are subject to a number of connotative variations viewed as modulations. The determination of labels is achieved by combining morphological, paradigmatic and syntagmatic considerations relying respectively on (i) a morphological model of semiotic blocks in order to de-fine their individual properties, (ii) the support of proto-typical structural patterns to guide the comparison be-tween blocks and (iii) a methodology for the determina-tion of distinctive features across semiotic classes. Specific notations are introduced to account for unresolv-able semiotic ambiguities, which are occasional but must be considered as inherent to the music matter itself. A set of 500 music pieces labeled in accordance with the pro-posed concepts and annotation conventions is being re-leased with this article.


Citations (15)


... First, we need an overall model (a theory) of which segmentations are likely before observing any data. This is similar in a broad sense to the method employed in Sargent et al. (2017). For example, using their overall model, a segmentation that would divide a piece into a few very short segments and a very long one seems unlikely to be correct, whereas a segmentation comprising segments of similar and phrase-length sizes could be much more plausible, before even considering the data. ...

Reference:

End-to-End Bayesian Segmentation and Similarity Assessment of Performed Music Tempo and Dynamics without Score Information
Estimating the Structural Segmentation of Popular Music Pieces Under Regularity Constraints
  • Citing Article
  • December 2016

IEEE/ACM Transactions on Audio Speech and Language Processing

... This can be a simple function of the duration: on average, sync points occur every 15 to 30 seconds. Close topics that will be worth studying for sync point detection are video summarization (Sargent et al., 2016;Baraldi, Grana, and Cucchiara, 2017) and captioning (Chen et al., 2018). Finally, if we are able to reliably detect sync points in the silent video, we can consider using those to automatically segment the queries. ...

A scalable summary generation method based on cross-modal consensus clustering and OLAP cube modeling
  • Citing Article
  • September 2015

Multimedia Tools and Applications

... In that viewpoint, C. Guichaoua [1] developed a compression-driven model for retrieving the musical structure, based on the "System and Contrast" model [2], and on polytopes, which are extension of nhypercubes. We present this model, which we call "polytopic analysis of music", along with a new opensource dedicated toolbox called MusicOnPolytopes 1 (in Python). ...

System & Contrast : A Polymorphous Model of the Inner Organization of Structural Segments within Music Pieces
Music Perception: an interdisciplinary journal

Music Perception: an interdisciplinary journal

... The second MIREX dataset was MIREX10, formed by the RWC [37] dataset. This dataset has 2 annotation versions; RWC-A 9 of QUAERO project which is the one which corresponds to MIREX10 and RWC-B 10 [38], which is the original annotated version following the annotation guidelines established by Bimbot el al. [39]. ...

Methodology and conventions for the latent semiotic annotation of music structure

... Le second paramétrage revientà ignorer l'étape d'ajustement de n iter . 7. Cette durée cible τ0 = √ T pour un morceau de durée T , est obtenu par la minimisation du contexte informatif prédominant défini dans [BLSV10a]. Il s'agit d'un a priori sur le nombre de trames utiles pour prédire le contenu acoustique d'une trame particulière pour un morceau de musique donné. ...

Décomposition en blocs autonomes comparables - Une proposition de description et d'annotation de structure pour le traitement automatique des morceaux de musique
  • Citing Article
  • May 2010

... The proposed segmentation pipeline is studied on the RWC Pop dataset, which consists in 100 Pop songs of high recording quality [16], along with the MIREX10 annotations [31]. We compared the barwise compression schemes described in the present work with three blind methods and a supervised method: Foote's novelty kernel [3], Spectral Clustering by McFee and Ellis [5], our former NTD method [9], and the supervised CNN of Grill and Schlüter [13], the latter being the current state-of-theart. ...

Semiotic Description of Music Structure: an Introduction to the Quaero/Metiss Structural Annotations

... (ANVIL) and real-time annotating during video playback (VIA) [9]. These tools have been used to analyze sign languages [8] [30], gestures [7] [17], eye movements [5], head pose, velocity and acceleration [21] [14], children's touch-screen supported collaboration [10], children's conversational behaviors while interacting with people from cartoon and video [13], music video streams [25], video-based e-learning [2], gesture and speech production for humanoid robots [20], interactions between cognitively impaired older adults and the therapeutic robot [6], and many others. Manual annotation is a laborious and time consuming task. ...

Segmentation of Music Video Streams in Music Pieces through Audio-Visual Analysis

Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

... More recently, we also showed how to exploit the morphological properties of the blocks for this task [103]. These algorithms are too complex to be detailed here, but let us just say that the algorithm in [53] ranked first for the "Audio Structural Segmentation" task of MIREX 2011 in terms of segment boundary F-measure, both with 0.5 s and 3 s tolerance [104]. ...

A music structure inference algorithm based on symbolic data analysis
  • Citing Article
  • October 2011

... Music structure in MIR most often refers to sectional form, with the task of structure analysis simplified to identifying boundaries and assigning labels indicating similar sections. Methodologies for annotating music corpora (Peeters and Deruty, 2009;Smith et al., 2011;Bimbot et al., 2012) and for evaluating structural analyses (Lukashevich, 2008;Nieto et al., 2014;McFee et al., 2015) have become important subtopics in MIR. Music corpora annotated with structure information still privilege abstract compositional form and not actual experienced or performed music structures. ...

Semiotic structure labeling of music pieces: Concepts, methods and annotation conventions

... 2) Segmentation Performance: While many evaluation methods [34] exist for measuring the performance of structural segmentation algorithms, the majority attempt to address two factors, the temporal accuracy of the segment boundaries and [36], KSP3 [35] and SBV1 [37]. (b) AMU compared to different parameterizations. ...

A MUSIC STRUCTURE INFERENCE ALGORITHM BASED ON MORPHOLOGICAL ANALYSIS
  • Citing Article
  • October 2012