Figure 11 - uploaded by Dorien Herremans
Content may be subject to copyright.
A fragment from Kabalevsky's Clowns (a) is pitch randomised (b) and then morphed (c).

A fragment from Kabalevsky's Clowns (a) is pitch randomised (b) and then morphed (c).

Source publication
Article
Full-text available
Research applying machine learning to music modeling and generation typically proposes model architectures, training methods and datasets, and gauges system performance using quantitative measures like sequence likelihoods and/or qualitative listening tests. Rarely does such work explicitly question and analyse its usefulness for and impact on real...

Citations

... Nsynth contains 305,979 notes, each with a unique pitch, timbre and envelope. For the 1,006 instruments from the commercial sample library, the four-second, singlesingle instrument is generated in the range of each pitch and five different speeds (25,50,75,100,127) of a standard MIDI piano. ...
... Additionally, due to individual differences in audience taste, the uncertainty in music ratings or selections increases, resulting in inconsistent ratings. While Sturm et al. [75] extensively discuss the advantages of evaluating music in the form of concerts or by extending concerts to music competitions, such as setting a concert as one of the most natural ways to experience music, which can to some extent reduce the fatigue of subjects compared to laboratory settings, this method inevitably introduces differences due to variations in performers, venues, and environments. ...
Article
Full-text available
With the introduction of ChatGPT, the public’s perception of AI-generated content has begun to reshape. Artificial intelligence has significantly reduced the barrier to entry for non-professionals in creative endeavors, enhancing the efficiency of content creation. Recent advancements have seen significant improvements in the quality of symbolic music generation, which is enabled by the use of modern generative algorithms to extract patterns implicit in a piece of music based on rule constraints or a musical corpus. Nevertheless, existing literature reviews tend to present a conventional and conservative perspective on future development trajectories, with a notable absence of thorough benchmarking of generative models. This paper provides a survey and analysis of recent intelligent music generation techniques, outlining their respective characteristics and discussing existing methods for evaluation. Additionally, the paper compares the different characteristics of music generation techniques in the East and West as well as analysing the field’s development prospects.
... En lo referente a la evaluación de la creatividad, Boden Mientras que, sobre la audiencia, Jordanous (2019) menciona la importancia de que lo que produce un sistema se presenta a otros cuya reacción crítica determina su valía. Finalmente, en cuanto a contexto se propone el evaluar la música generada de manera artificial en un concierto, igual que los oyentes evaluarían una situación musical en directo (Sturm et al., 2018). (2007) presentan una perspectiva que sirve para neutralizar sesgos en la evaluación de lo producido por sistemas artificiales incorporando criterios para calificar la producción de un sistema con potencial creativo. ...
Article
Full-text available
La integración de la Inteligencia Artificial en el ámbito artístico despierta interrogantes sobre su capacidad creativa, su influencia en la esencia y apreciación del arte, así como el rol del artista. Esta investigación realiza una revisión sistemática de la literatura para abordar estas cuestiones centrando el foco en la música y conocer la situación del estado del arte. En la actualidad, hay IAs que tienen la capacidad de generar composiciones de manera autónoma, planteando si la percepción de originalidad y belleza cambia cuando es una máquina la que crea. Al simular procesos cognitivos, la IA brinda perspectivas sobre la manifestación de la creatividad humana. Sin embargo, la creatividad de las personas es única, influida por emociones y vivencias. En contraposición, las máquinas se basan en replicar patrones preexistentes, pero con la dirección adecuada, pueden potenciar la creatividad humana. Aun así, requieren calibración constante, confiando en criterios humanos para juzgar su producción. La convergencia de tecnología y creatividad ha llevado a debates éticos y de derechos de autor de las obras. Es crucial que la tecnología esté al servicio del ser humano, subrayando la urgencia de establecer un marco ético firme en nuestra era digital.
... There are many ways, and with the help of various technology, this music generation can be implemented. Machine learning algorithms can be used to analyze large datasets of existing music and identify patterns and structures that can be used to generate new musical material [1]. Machine learning can also automatically create accompaniment or harmonization for a given tune, among other facets of the music-making process. ...
Article
Full-text available
The task of music generation is complex and demanding, necessitating the understanding and modeling intricate musical patterns and structures. RNNs have been demonstrated to be effective in generating music, as they can learn to generate sequence data, including musical notes. This project proposes a novel approach for generating melodic creations using Long Short-Term Memory (LSTM) networks. As an RNN, LSTMs are well-equipped to learn long-lasting dependencies in sequence data, making them an ideal choice for music generation, where the model must learn patterns and relationships among musical notes over a long period. The proposed methodology is derived from a Hierarchical LSTM structure. This structure enables the model to comprehend the various levels of musical structure, including the melody's note order, rhythm, and contour. The model is initially trained on a set of MIDI files, enabling it to comprehend musical patterns and structures across various genres and styles. Once the model has been trained, it can create new melodies. To begin with, the model is provided with a seed melody. The model then utilizes its musical knowledge to extend the seed melody. The model is also capable of generating melodies in a particular style. This is achieved by training the model on a set of melodies in that particular style. Subsequently, the model can create new melodies in the same style as those in the training set. This approach can be applied to various applications, including creating new and original music for games, films, and other uses, and educational resources for musicians and songwriters.
... Music creation: It can be used to improve and create the music by automatically identifying the musical notes and accompaniment [106,107]. ...
Article
Full-text available
Music transcription is the process of transforming recorded sound of musical performances into symbolic representations such as sheet music or MIDI files. Extensive research and development have been carried out in the field of music transcription and technology. This comprehensive review paper surveys the diverse methodologies, techniques, and advancements that have shaped the landscape of music transcription. The paper outlines the significance of music transcription in preserving, analyzing, and disseminating musical compositions across various genres and cultures. It also provides a historical perspective by tracing the evolution of music transcription from traditional manual methods to modern automated approaches. It also highlights the challenges in transcription posed by complex singing techniques, variations in instrumentation, ambiguity in pitch, tempo changes, rhythm, and dynamics. The review also categorizes four different types of transcription techniques, frame-level, note-level, stream-level, and notation-level, discussing their strengths and limitations. It also encompasses the various research domains of music transcription from general melody extraction to vocal melody, note-level monophonic to polyphonic vocal transcription, single-instrument to multi-instrument transcription, and multi-pitch estimation. The survey further covers a broad spectrum of music transcription applications in music production and creation. It also reviews state-of-the-art open-source as well as commercial music transcription tools for pitch estimation, onset and offset detection, general melody detection, and vocal melody detection. In addition, it also encompasses the currently available python libraries that can be used for music transcription. Furthermore, the review highlights the various open-source benchmark datasets for different areas of music transcription. It also provides a wide range of references supporting the historical context, theoretical frameworks, and foundational concepts to help readers understand the background of music transcription and the context of our paper.
... A variety of generative music systems has been proposed [35], [36] along with methods to evaluate them [37]. Lately the attention of researchers has focused primarily on the use machine learning techniques to create artificial agents capable of generating music in an effective and artistically meaningful manner [38], [39]. ...
... However, the creation of high-quality chord music remains a formidable undertaking, necessitating a profound comprehension of musical patterns and compositional techniques associated with chords [21,22,41]. Long short-term memory (LSTM) neural networks, characterized by recurrent architecture with memory units and gate mechanisms, excel in capturing and retaining long-term dependencies within sequential data [18,19], [34]. LSTM has demonstrated remarkable achievements across diverse domains, including natural language processing, speech recognition, and time series prediction. ...
Article
Full-text available
With the rapid development of artificial intelligence (AI), music generation has gained widespread attention. Long short-term memory (LSTM) has advantages in handling time series data and has achieved success in the field of music generation. This neural network is capable of capturing the long-term dependencies in music, thus generating chord music that is coherent and innovative. Therefore, to develop a creative and artistic music generation model, this study initially establishes a hidden Markov model (HMM) for chord recognition in music. Subsequently, the algorithm, leveraging the multi-style chord music generation (MSCMG) network, is proposed and applied for chord music generation. Furthermore, an evaluation of the chord music generation algorithm is conducted, utilizing LSTM neural networks within the context of AI. The findings indicate that the HMM, devised in this study, attains an impressive 81.8% chord recognition rate for piano compositions. Additionally, the algorithm, based on the MSCMG network, achieves a notable similarity score of 82.1% for generating classical-style music, with corresponding scores of 3.45, 3.42, and 3.44 for folk-style, classical-style, and pop-style music, respectively. This investigation lays the groundwork for the fusion of AI technology and music composition, exploring novel avenues for music generation and providing novel tools and insights for creative and theoretical exploration within the realm of music.
... The system uses basic rules to represent each musical element, such as pitch, duration, and ornamentation. One of the advantages of ABC notation is its simplicity and ease of use, as it can be quickly learned by musicians and non-musicians alike (Sturm et al., 2018). ...
Conference Paper
Full-text available
Variational Autoencoders (VAEs) have proven to be effective models for producing latent representations of cognitive and semantic value. We assess the degree to which VAEs trained on a prototypical tonal music corpus of 371 Bach's chorales define latent spaces representative of the circle of fifths and the hierarchical relation of each key component pitch as drawn in music cognition. In detail, we compare the latent space of different VAE corpus encodings-Piano roll, MIDI, ABC, Tonnetz, DFT of pitch, and pitch class distributions-in providing a pitch space for key relations that align with cognitive distances. We evaluate the model performance of these encodings using objective metrics to capture accuracy, mean square error (MSE), KL-divergence, and computational cost. The ABC encoding performs the best in reconstructing the original data, while the Pitch DFT seems to capture more information from the latent space. Furthermore, an objective evaluation of 12 major or minor transpositions per piece is adopted to quantify the alignment of 1) intra-and inter-segment distances per key and 2) the key distances to cognitive pitch spaces. Our results show that Pitch DFT VAE latent spaces align best with cognitive spaces and provide a common-tone space where overlapping objects within a key are fuzzy clusters, which impose a well-defined order of structural significance or stability-i.e., a tonal hierarchy. Tonal hierarchies of different keys can be used to measure key distances and the relationships of their in-key components at multiple hierarchies (e.g., notes and chords). The implementation of our VAE and the encodings framework are made available online.
... Creating computing systems which can generate music has arguably been both a dream and a goal of researchers since the 1800s when Ada Lovelace noted that machines would one day generate "elaborate and scientific pieces of music of any degree of complexity and extent" [1]. Recently, advances in the field of generative music have relied on increasingly complex Machine Learning models [2][3][4] -such as neural networks [5,6] and deep learning techniques [7][8][9][10] -to create convincing musical outputs. However, the complex nature of these models means that people often require some knowledge of these techniques and algorithms in order to use or adapt them effectively, making them difficult for people, especially non-experts, to understand and debug. ...
Preprint
Full-text available
Explainable AI has the potential to support more interactive and fluid co-creative AI systems which can creatively collaborate with people. To do this, creative AI models need to be amenable to debugging by offering eXplainable AI (XAI) features which are inspectable, understandable, and modifiable. However, currently there is very little XAI for the arts. In this work, we demonstrate how a latent variable model for music generation can be made more explainable; specifically we extend MeasureVAE which generates measures of music. We increase the explainability of the model by: i) using latent space regularisation to force some specific dimensions of the latent space to map to meaningful musical attributes, ii) providing a user interface feedback loop to allow people to adjust dimensions of the latent space and observe the results of these changes in real-time, iii) providing a visualisation of the musical attributes in the latent space to help people understand and predict the effect of changes to latent space dimensions. We suggest that in doing so we bridge the gap between the latent space and the generated musical outcomes in a meaningful way which makes the model and its outputs more explainable and more debuggable.
... Amershi et al. [2] provide guidelines on dealing with such unpredictable AI systems, mostly focusing on keeping the user informed on the system's capabilities and understanding its outputs. AI systems have seen use in musical practice-based research [12] [19] with the Folk-RNN model by Sturm et al. being noted to have a number of impacts on musical creation such as a way to inspire ideas, break habits, and a sense of creating something that could not have been created otherwise. ...
Preprint
Full-text available
Recent work in the field of symbolic music generation has shown value in using a tokenization based on the GuitarPro format, a symbolic representation supporting guitar expressive attributes, as an input and output representation. We extend this work by fine-tuning a pre-trained Transformer model on ProgGP, a custom dataset of 173 progressive metal songs, for the purposes of creating compositions from that genre through a human-AI partnership. Our model is able to generate multiple guitar, bass guitar, drums, piano and orchestral parts. We examine the validity of the generated music using a mixed methods approach by combining quantitative analyses following a computational musicology paradigm and qualitative analyses following a practice-based research paradigm. Finally, we demonstrate the value of the model by using it as a tool to create a progressive metal song, fully produced and mixed by a human metal producer based on AI-generated music.
... However, these systems still face challenges in generating high-quality music that is comparable to human compositions. The most important requirement for music generation was hereby a data set and training methods [13], which was also the basis for this research. ...
Preprint
Full-text available
Having a computer do the work for you has become more and more common over time. But in the entertainment area, where a human is a creator, we want to avoid having too much influence on technology. On the other hand, inspiration is still important; we developed a virtual conductor that can generate an emotionally associated interpretation of known music work. This was done by surveying a set number of people to determine, which emotions were associated with a specific interpretation and instruments. As a result of machine learning this conductor was then able to achieve his goal. Unlike earlier studies of virtual conductors, which would replace the role of a human conductor, this new one is supposed to be an assisting tool for conductors. As a result, starting on a new interpretation will be easier because it streamlines research time and provides a technical perspective that can inspire new ideas. By using this technology as a supplement to human creativity, we can create richer, more nuanced interpretations of musical works.