Hélène C. Crayencour’s research while affiliated with CentraleSupélec and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (8)


Dig That Lick:: Exploring Patterns in Jazz with Computational Methods
  • Chapter
  • Full-text available

September 2024

·

38 Reads

·

4 Citations

Lucas Henry

·

·

Gabriel Solis

·

[...]

·

Hélène-Camille Crayencour
Download

Self-Supervised Learning of Multi-Level Audio Representations for Music Segmentation

January 2024

·

34 Reads

·

1 Citation

IEEE/ACM Transactions on Audio Speech and Language Processing

The task of music structure analysis refers to automatically identifying the location and the nature of musical sections within a song. In the supervised scenario, structural annotations generally result from exhaustive data collection processes, which represents one of the main challenges of this task. Moreover, both the subjectivity of music structure and the hierarchical characteristics it exhibits make the obtained structural annotations not fully reliable, in the sense that they do not convey a “universal ground-truth” unlike other tasks in music information retrieval. On the other hand, the quickly growing quantity of available music data has enabled weakly supervised and self-supervised approaches to achieve impressive results on a wide range of music-related problems. In this work, a self-supervised learning method is proposed to learn robust multi-level music representations prior to structural segmentation using contrastive learning. To this end, sets of frames sampled at different levels of detail are used to train a deep neural network in a disentangled manner. The proposed method is evaluated on both flat and multi-level segmentation. We show that each distinct sub-region of the output embeddings can efficiently account for structural similarity at their own targeted level of detail, which ultimately improves performance of downstream flat and multi-level segmentation. Finally, complementary experiments are carried out to study how the obtained representations can be further adapted to specific datasets using a supervised fine-tuning objective in order to facilitate structure retrieval in domains where human annotations remain scarce.


Neural Models for Target-Based Computer-Assisted Musical Orchestration: A Preliminary Study

February 2022

·

39 Reads

·

3 Citations

Journal of Creative Music Systems

In this paper we will perform a preliminary exploration on how neural networks can be used for the task of target-based computer-assisted musical orchestration. We will show how it is possible to model this musical problem as a classification task and we will propose two deep learning models. We will show, first, how they perform as classifiers for musical instrument recognition by comparing them with specific baselines. We will then show how they perform, both qualitatively and quantitatively, in the task of computer-assisted orchestration by comparing them with state-of-the-art systems. Finally, we will highlight benefits and problems of neural approaches for assisted orchestration and we will propose possible future steps. This paper is an extended version of the paper "A Study on Neural Models for Target-Based Computer-Assisted Musical Orchestration" published in the proceedings of The 2020 Joint Conference on AI Music Creativity.


Figure 1. Example of microtiming deviations at the sixteenth note level for a beat-length rhythmic pattern from the
tamborim in samba de enredo.
Figure 4. Example of the microtiming values for a chico
drum recording in the candombe dataset. Dark and light
lines represent the ground-truth with and without median
filtering, respectively.
Figure 6. Microtiming distribution depending on style,
view of the plane m 2 t m 3 t , denoted as m 2 m 3 for simplic-
ity. A dot at (0.50, 0.75) indicates the expected position of
a beat that shows no microtiming, that is, where all onsets
are evenly spaced.
Figure 7. Microtiming distribution depending on per-
former (top musician plays candombe and the others play
samba). A dot at (0.25, 0.50, 0.75) indicates the point of
no microtiming.
Tracking Beats and Microtiming in Afro-Latin American Music Using Conditional Random Fields and Deep Learning

July 2019

·

228 Reads

Events in music frequently exhibit small-scale temporal deviations (microtiming), with respect to the underlying regular metrical grid. In some cases, as in music from the Afro-Latin American tradition, such deviations appear systematically , disclosing their structural importance in rhythmic and stylistic configuration. In this work we explore the idea of automatically and jointly tracking beats and microtiming in timekeeper instruments of Afro-Latin American music, in particular Brazilian samba and Uruguayan candombe. To that end, we propose a language model based on conditional random fields that integrates beat and onset likelihoods as observations. We derive those activations using deep neural networks and evaluate its performance on manually annotated data using a scheme adapted to this task. We assess our approach in controlled conditions suitable for these timekeeper instruments, and study the microtiming profiles' dependency on genre and performer, illustrating promising aspects of this technique towards a more comprehensive understanding of these music traditions.


Figure 5. Flowchart of the test system for estimating the robustness to noise. a simple midi-track made up of 8 verses containing 4 bars in 4/4 of Em, C, G and D, repeated 4 times each (one distinct chord per bar), resulting in a total of 128 chords. The midi partition is then converted at 60 BPM to CD quality audio using the grand piano virtual instrument of Ableton Live. Chroma are then extracted using the Python Library Librosa [26]. First the harmonic part is extracted with the function harmonic(y=y,margin=5) (see [27]) and then CQT-chromas are computed with the function feature.chroma cqt. Finally, an average chroma for each beat is computed. Gaussian Noise with a standard deviation of σ is then added to the chroma vectors and the observations vectors are computed as in [28] from the corrupted chroma. For each algorithm and each σ, a set of 100 corrupted chroma vectors is generated and the average chord estimation error rate is recorded. The results are presented in Figure 6. We see that the max-product BP clearly beats the sum-product flavour. All in all, we argue that adding noise to the chroma allows for blending the whole complexity of music into the performance estimation process: chroma vectors are indeed sensitive to arrangements, e.g., percussive events that may
Figure 6. Chord estimation error rate of the various algorithms vs noise amplitude.
Belief Propagation algorithm for Automatic Chord Estimation

May 2019

·

88 Reads

·

1 Citation

This work aims at bridging the gap between two completely distinct research fields: digital communications and Music Information Retrieval. While works in the MIR community have long used algorithms borrowed from speech signal processing, text recognition or image processing, to our knowledge very scarce work based on digital communications algorithms has been produced. This paper specifically targets the use of the Belief Propagation algorithm for the task of Automatic Chord Estimation. This algorithm is of widespread use in iterative decoders for error correcting codes and we show that it offers improved performances in ACE by genuinely incorporating the ability to take constraints between distant parts of the song into account. It certainly represents a promising alternative to traditional MIR graphical models approaches, in particular Hidden Markov Models.


Fig. 1. SCCRF graph. Observations and labels are indicated as gray and white nodes respectively. Beats of repeated section occurrences are connected to each other.
A Music Structure Informed Downbeat Tracking System Using Skip-Chain Conditional Random Fields and Deep Learning

May 2019

·

138 Reads

·

23 Citations

In recent years the task of downbeat tracking has received increasing attention and the state of the art has been improved with the introduction of deep learning methods. Among proposed solutions, existing systems exploit short-term musical rules as part of their language modelling. In this work we show in an oracle scenario how including longer-term musical rules, in particular music structure, can enhance downbeat estimation. We introduce a skip-chain conditional random field language model for downbeat tracking designed to include section information in an unified and flexible framework. We combine this model with a state-of-the-art convolutional-recurrent network and we contrast the system's performance to the commonly used Bar Pointer model. Our experiments on the popular Beatles dataset show that incorporating structure information in the language model leads to more consistent and more robust downbeat estimations.


FIGURE 1 | Extract of Schubert's lied "Der Doppelgänger" (Henrich Heine). Chords in a chord progression are not independent from each other but are linked according to musical rules or musical ideas. An example of such musical idea is shown in the work of Mishkin (1978), who analyzes that Schubert employs in some lieder of his last year harmonic parallelism between triads a half step apart when the poetic images evoke the supernatural. Mishkin provides an example with "Der Doppelgänger" where "the vocal line is supported by hollow sonorities in the piano accompaniment to suggest the dreamlike horror of beholding one's own double weeping before the house of the departed beloved. The hallucinatory impression is intensified, in the concluding passage for piano alone (Indicated by the blue line in this figure), through an organum-like progression that moves in parallel motion through a triad on the lowered supertonic degree".
FIGURE 4 | Depending on the context, identical sets of notes can assume different functions and, as consequence, different names. In the figure we see the chord made of the notes [C, D, Eb, G]: this chord can be called C minor 9th in the context of the key of C minor and G minor sus4 6th in the context of the key of G minor. Note that this is not a simple change of function: in this case not only the function is different but also the very name of the chord. This ambiguity can be a problem in the context of data labeling, where different experts could assume different hypothesis on the context thus leading to different naming.
FIGURE 7 | The figure shows the beginning of Mendelssohn's Song without words Op. 19 No. 5. The excerpt contains two musical phrases (bars 1-4 and 5-7) that have highly regular structure; indeed, bars 1 and 5 and bars 2 and 6 are the, respectively, equal while the only differences are in the ending part of bars 4 and 7. This type of musical regularity is very common in classical and romantic music.
FIGURE 9 | The figure shows a high-level compositional representation used by the author in his piece Reflets de l'ombre (2013) for large orchestra and electronics. In the left part of the image, there is a drawing aimed at outlining the spectral morphology of the piece, together with high-level specific information such as musical figures and harmonies. It sketches a high-level visualization of an imaginary musical situation before the actual realization by means of notes, durations and so on. From a composer's perspective, such image is a static mono-dimensional entity that is related to other musical ideas during the development of the piece: for example, this drawing represents a sound that the composer defines hollow. In the right part, there is the corresponding actual orchestral score: the actual realization of the musical idea lies in a high-dimensional space given by the multiple entities involved: notes, durations, playing styles and so on. Each entity has a large space of variations that interacts with other entities: each time-point of the piece can be used for any combination of notes with any duration thus having a large space of possibilities. It is very difficult (if possible) to find a formal relation between the two representations. Even if these mental representations probably cannot be fully captured and described even by StarAI methods, we believe that these methods can be used to approach better such kind of representations: a logic-based language in connection with sound-related description helps in defining the connection between musical ideas and their realizations.
FIGURE 10 | Although the data are of different natures, images in films and audio samples in music signals share common semantic relations. For instance the phenomenon of fade-out/fade-in in a film, where images are superimposed during the transition [see the example of D. W. Griffith's Abraham Lincoln film (Top)] finds a correspondence in music when one subphrase finishes while another begins [see the extract of Schubert D.960 sonata (Bottom)]. Transfer of knowledge of transitions in movies could help finding transitions in audio signals. Source of the images: screenshots of the movie. No permission is required for their use: the film entered the public domain in 1958 when the initial copyright expired. See Paolo Cherchi Usai (2008), The Griffith Project: Essays on D.W. Griffith, British Film Institute. p. 208. Retrieved January 16, 2016.
Learning, Probability and Logic: Toward a Unified Approach for Content-Based Music Information Retrieval

April 2019

·

626 Reads

·

2 Citations

Frontiers in Digital Humanities

Within the last 15 years, the field of Music Information Retrieval (MIR) has made tremendous progress in the development of algorithms for organizing and analyzing the ever-increasing large and varied amount of music and music-related data available digitally. However, the development of content-based methods to enable or ameliorate multimedia retrieval still remains a central challenge. In this perspective paper, we critically look at the problem of automatic chord estimation from audio recordings as a case study of content-based algorithms, and point out several bottlenecks in current approaches: expressiveness and flexibility are obtained to the expense of robustness and vice versa; available multimodal sources of information are little exploited; modeling multi-faceted and strongly interrelated musical information is limited with current architectures; models are typically restricted to short-term analysis that does not account for the hierarchical temporal structure of musical signals. Dealing with music data requires the ability to tackle both uncertainty and complex relational structure at multiple levels of representation. Traditional approaches have generally treated these two aspects separately, probability and learning being the usual way to represent uncertainty in knowledge, while logical representation being the usual way to represent knowledge and complex relational information. We advocate that the identified hurdles of current approaches could be overcome by recent developments in the area of Statistical Relational Artificial Intelligence (StarAI) that unifies probability, logic and (deep) learning. We show that existing approaches used in MIR find powerful extensions and unifications in StarAI, and we explain why we think it is time to consider the new perspectives offered by this promising research field.


Figure 2. Encoder architecture: the input representation is either a beat/tatum-synchronous chromagram or multiband spectral flux. Each time unit is fed into the encoder with a context window. The CNN outputs a sequence of dimension T × N which is fed to the Bi-GRU. Finally, the encoder output dimension is T × 512.
Analysis of Common Design Choices in Deep Learning Systems for Downbeat Tracking

September 2018

·

132 Reads

·

21 Citations

Downbeat tracking consists of annotating a piece of musical audio with the estimated position of the first beat of each bar. In recent years, increasing attention has been paid to applying deep learning models to this task, and various architectures have been proposed, leading to a significant improvement in accuracy. However, there are few insights about the role of the various design choices and the delicate interactions between them. In this paper we offer a systematic investigation of the impact of largely adopted variants. We study the effects of the temporal granularity of the input representation (i.e. beat-level vs tatum-level) and the encoding of the networks outputs. We also investigate the potential of convolutional-recurrent networks, which have not been explored in previous downbeat tracking systems. To this end, we exploit a state-of-the-art recurrent neural network where we introduce those variants, while keeping the training data, network learning parameters and post-processing stages fixed. We find that temporal granularity has a significant impact on performance, and we analyze its interaction with the encoding of the networks outputs.

Citations (5)


... One influential theoretical framework suggests improvisation is only possible because improvisers reuse learned patterns of notes inserted in the ongoing improvisation 3 . This view is supported by evidence of a high degree of repeated musical patterns in extant improvisations by experts 4,5 . These patterns are part of expert performers' knowledgebase which also includes strategies for concatenating the known patterns, rules for generating new patterns, how the patterns relate to underlying harmonic and rhythmic contexts, which motor movements are needed to execute the patterns, and more general information about style and performance context 3 . ...

Reference:

Functional network connectivity during Jazz improvisation
Dig That Lick:: Exploring Patterns in Jazz with Computational Methods

... Our approach can serve as a blueprint for other tasks that benefit from hierarchical embeddings, including few-shot audio event recognition [6], audio source separation [15], and music structure analysis [16]. Also, with the model's interlabel generalisation ability improved, it can better handle inthe-wild data by linking new examples to familiar (and similar) labels. ...

Self-Supervised Learning of Multi-Level Audio Representations for Music Segmentation
  • Citing Article
  • January 2024

IEEE/ACM Transactions on Audio Speech and Language Processing

... The study of music features, distinct attributes encapsulating the essence of a piece of music, is pivotal for understanding the intricate structure of compositions and propelling the advancement of music information retrieval (MIR) technologies [1]. In the digital era, machine learning algorithms have become instrumental in MIR tasks due to their precision and efficiency in analyzing vast music datasets [17]. At the core of this technological revolution is the process of feature extraction [18]. ...

Learning, Probability and Logic: Toward a Unified Approach for Content-Based Music Information Retrieval

Frontiers in Digital Humanities

... It is a longstanding area of research in music information retrieval (MIR) with applications ranging from automatic DJ mixing [2] to musicological studies [3]. Meter tracking has gone through a big transformation in the last decade due to the introduction of deep learning (DL) techniques [4][5][6][7], which brought an improvement in performance as well as a change in the design paradigm of related methods [8]. ...

Analysis of Common Design Choices in Deep Learning Systems for Downbeat Tracking

... downbeat, phrase) are built on the concept of beat. Beat tracking for music audio has been one of the central topics in MIR research community, and beat tracking has been achieved using diverse techniques including RNNs [45], CRNNs [46], Transformer-based models [47], CNNbased models such as temporal convolutional networks (TCNs) [48]. For cross-modal applications, beat timing can serve as an effective reference to align audio and visual modalities, since audio and visual data often show corresponding features at beat. ...

A Music Structure Informed Downbeat Tracking System Using Skip-Chain Conditional Random Fields and Deep Learning