Fig 4 - uploaded by Maximos Kaliakatsos-Papakostas
Content may be subject to copyright.
Context in source publication
Context 1
... respectively) that come after the tonality∼ prefix. Apparently, the transformer component is able to successfully extract this information and pass it to the tonality classifier. As the masking error exhibits a steeper decrease, it becomes evident that errors in other downstream tasks also experience a more pronounced reduction, as shown in Fig. 4. Consequently, the impact of unmasking seems to expedite error reduction for the validation set, providing additional evidence that this approach mitigates the risk of ...
Similar publications
Citations
... Such models can theoretically be applied to all kinds of music, including Western classical (Ens & Pasquier, 2020;Kong et al., 2020;Mahmud Rafee et al., 2023;Stamatatos & Widmer, 2005;Tang et al., 2023;Yang & Tsai, 2021; and popular music (Chou et al., 2024). Here, we provide a case study of jazz, which is particularly interesting for the freedoms afforded to performers and has been studied in a number of prior works (e.g., Cheston et al., 2024b;Edwards et al., 2023;Ramirez et al., 2010;Velenis et al., 2023). Through improvisation, jazz performers manipulate many different aspects of the music they play, such as the harmony, melody, and rhythm of a composition. ...
... This multi-input architecture is different to the multi-task learning approach described earlier by Velenis et al. (2023), where a single input representation from a jazz "lead sheet" was used to generate multiple downstream predictions (for e.g., composer, tonality, form identification). Instead, our approach can be thought of as conceptually similar to a mixture-of-experts model without a gating network, allowing us to explicitly control the contribution of each input and sub-network ("expert") towards a single downstream task (performer identification). ...
Artistic style has been studied for centuries, and recent advances in machine learning create new possibilities for understanding it computationally. However, ensuring that machine-learning models produce insights aligned with the interests of practitioners and critics remains a significant challenge. Here, we focus on musical style, which benefits from a rich theoretical and mathematical analysis tradition. We train a variety of supervised-learning models to identify 20 iconic jazz musicians across a carefully curated dataset of 84 hours of recordings, and interpret their decision-making processes. Our models include a novel multi-input architecture that enables four musical domains (melody, harmony, rhythm, and dynamics) to be analysed separately. These models enable us to address fundamental questions in music theory and also advance the state-of-the-art in music performer identification (94% accuracy across 20 classes). We release open-source implementations of our models and an accompanying web application for exploring musical styles.