September 2024
·
38 Reads
·
4 Citations
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
September 2024
·
38 Reads
·
4 Citations
January 2024
·
34 Reads
·
1 Citation
IEEE/ACM Transactions on Audio Speech and Language Processing
The task of music structure analysis refers to automatically identifying the location and the nature of musical sections within a song. In the supervised scenario, structural annotations generally result from exhaustive data collection processes, which represents one of the main challenges of this task. Moreover, both the subjectivity of music structure and the hierarchical characteristics it exhibits make the obtained structural annotations not fully reliable, in the sense that they do not convey a “universal ground-truth” unlike other tasks in music information retrieval. On the other hand, the quickly growing quantity of available music data has enabled weakly supervised and self-supervised approaches to achieve impressive results on a wide range of music-related problems. In this work, a self-supervised learning method is proposed to learn robust multi-level music representations prior to structural segmentation using contrastive learning. To this end, sets of frames sampled at different levels of detail are used to train a deep neural network in a disentangled manner. The proposed method is evaluated on both flat and multi-level segmentation. We show that each distinct sub-region of the output embeddings can efficiently account for structural similarity at their own targeted level of detail, which ultimately improves performance of downstream flat and multi-level segmentation. Finally, complementary experiments are carried out to study how the obtained representations can be further adapted to specific datasets using a supervised fine-tuning objective in order to facilitate structure retrieval in domains where human annotations remain scarce.
February 2022
·
39 Reads
·
3 Citations
Journal of Creative Music Systems
In this paper we will perform a preliminary exploration on how neural networks can be used for the task of target-based computer-assisted musical orchestration. We will show how it is possible to model this musical problem as a classification task and we will propose two deep learning models. We will show, first, how they perform as classifiers for musical instrument recognition by comparing them with specific baselines. We will then show how they perform, both qualitatively and quantitatively, in the task of computer-assisted orchestration by comparing them with state-of-the-art systems. Finally, we will highlight benefits and problems of neural approaches for assisted orchestration and we will propose possible future steps. This paper is an extended version of the paper "A Study on Neural Models for Target-Based Computer-Assisted Musical Orchestration" published in the proceedings of The 2020 Joint Conference on AI Music Creativity.
July 2019
·
228 Reads
Events in music frequently exhibit small-scale temporal deviations (microtiming), with respect to the underlying regular metrical grid. In some cases, as in music from the Afro-Latin American tradition, such deviations appear systematically , disclosing their structural importance in rhythmic and stylistic configuration. In this work we explore the idea of automatically and jointly tracking beats and microtiming in timekeeper instruments of Afro-Latin American music, in particular Brazilian samba and Uruguayan candombe. To that end, we propose a language model based on conditional random fields that integrates beat and onset likelihoods as observations. We derive those activations using deep neural networks and evaluate its performance on manually annotated data using a scheme adapted to this task. We assess our approach in controlled conditions suitable for these timekeeper instruments, and study the microtiming profiles' dependency on genre and performer, illustrating promising aspects of this technique towards a more comprehensive understanding of these music traditions.
May 2019
·
88 Reads
·
1 Citation
This work aims at bridging the gap between two completely distinct research fields: digital communications and Music Information Retrieval. While works in the MIR community have long used algorithms borrowed from speech signal processing, text recognition or image processing, to our knowledge very scarce work based on digital communications algorithms has been produced. This paper specifically targets the use of the Belief Propagation algorithm for the task of Automatic Chord Estimation. This algorithm is of widespread use in iterative decoders for error correcting codes and we show that it offers improved performances in ACE by genuinely incorporating the ability to take constraints between distant parts of the song into account. It certainly represents a promising alternative to traditional MIR graphical models approaches, in particular Hidden Markov Models.
May 2019
·
138 Reads
·
23 Citations
In recent years the task of downbeat tracking has received increasing attention and the state of the art has been improved with the introduction of deep learning methods. Among proposed solutions, existing systems exploit short-term musical rules as part of their language modelling. In this work we show in an oracle scenario how including longer-term musical rules, in particular music structure, can enhance downbeat estimation. We introduce a skip-chain conditional random field language model for downbeat tracking designed to include section information in an unified and flexible framework. We combine this model with a state-of-the-art convolutional-recurrent network and we contrast the system's performance to the commonly used Bar Pointer model. Our experiments on the popular Beatles dataset show that incorporating structure information in the language model leads to more consistent and more robust downbeat estimations.
April 2019
·
626 Reads
·
2 Citations
Frontiers in Digital Humanities
Within the last 15 years, the field of Music Information Retrieval (MIR) has made tremendous progress in the development of algorithms for organizing and analyzing the ever-increasing large and varied amount of music and music-related data available digitally. However, the development of content-based methods to enable or ameliorate multimedia retrieval still remains a central challenge. In this perspective paper, we critically look at the problem of automatic chord estimation from audio recordings as a case study of content-based algorithms, and point out several bottlenecks in current approaches: expressiveness and flexibility are obtained to the expense of robustness and vice versa; available multimodal sources of information are little exploited; modeling multi-faceted and strongly interrelated musical information is limited with current architectures; models are typically restricted to short-term analysis that does not account for the hierarchical temporal structure of musical signals. Dealing with music data requires the ability to tackle both uncertainty and complex relational structure at multiple levels of representation. Traditional approaches have generally treated these two aspects separately, probability and learning being the usual way to represent uncertainty in knowledge, while logical representation being the usual way to represent knowledge and complex relational information. We advocate that the identified hurdles of current approaches could be overcome by recent developments in the area of Statistical Relational Artificial Intelligence (StarAI) that unifies probability, logic and (deep) learning. We show that existing approaches used in MIR find powerful extensions and unifications in StarAI, and we explain why we think it is time to consider the new perspectives offered by this promising research field.
September 2018
·
132 Reads
·
21 Citations
Downbeat tracking consists of annotating a piece of musical audio with the estimated position of the first beat of each bar. In recent years, increasing attention has been paid to applying deep learning models to this task, and various architectures have been proposed, leading to a significant improvement in accuracy. However, there are few insights about the role of the various design choices and the delicate interactions between them. In this paper we offer a systematic investigation of the impact of largely adopted variants. We study the effects of the temporal granularity of the input representation (i.e. beat-level vs tatum-level) and the encoding of the networks outputs. We also investigate the potential of convolutional-recurrent networks, which have not been explored in previous downbeat tracking systems. To this end, we exploit a state-of-the-art recurrent neural network where we introduce those variants, while keeping the training data, network learning parameters and post-processing stages fixed. We find that temporal granularity has a significant impact on performance, and we analyze its interaction with the encoding of the networks outputs.
... One influential theoretical framework suggests improvisation is only possible because improvisers reuse learned patterns of notes inserted in the ongoing improvisation 3 . This view is supported by evidence of a high degree of repeated musical patterns in extant improvisations by experts 4,5 . These patterns are part of expert performers' knowledgebase which also includes strategies for concatenating the known patterns, rules for generating new patterns, how the patterns relate to underlying harmonic and rhythmic contexts, which motor movements are needed to execute the patterns, and more general information about style and performance context 3 . ...
September 2024
... Our approach can serve as a blueprint for other tasks that benefit from hierarchical embeddings, including few-shot audio event recognition [6], audio source separation [15], and music structure analysis [16]. Also, with the model's interlabel generalisation ability improved, it can better handle inthe-wild data by linking new examples to familiar (and similar) labels. ...
January 2024
IEEE/ACM Transactions on Audio Speech and Language Processing
... The study of music features, distinct attributes encapsulating the essence of a piece of music, is pivotal for understanding the intricate structure of compositions and propelling the advancement of music information retrieval (MIR) technologies [1]. In the digital era, machine learning algorithms have become instrumental in MIR tasks due to their precision and efficiency in analyzing vast music datasets [17]. At the core of this technological revolution is the process of feature extraction [18]. ...
April 2019
Frontiers in Digital Humanities
... It is a longstanding area of research in music information retrieval (MIR) with applications ranging from automatic DJ mixing [2] to musicological studies [3]. Meter tracking has gone through a big transformation in the last decade due to the introduction of deep learning (DL) techniques [4][5][6][7], which brought an improvement in performance as well as a change in the design paradigm of related methods [8]. ...
September 2018
... downbeat, phrase) are built on the concept of beat. Beat tracking for music audio has been one of the central topics in MIR research community, and beat tracking has been achieved using diverse techniques including RNNs [45], CRNNs [46], Transformer-based models [47], CNNbased models such as temporal convolutional networks (TCNs) [48]. For cross-modal applications, beat timing can serve as an effective reference to align audio and visual modalities, since audio and visual data often show corresponding features at beat. ...
May 2019