ThesisPDF Available

Guiding human-computer music improvisation: introducing authoring and control with temporal scenarios

Authors:

Abstract and Figures

This thesis focuses on the introduction of authoring and controls in human-computer music improvisation through the use of temporal scenarios to guide or compose interactive performances, and addresses the dialectic between planning and reactivity in interactive music systems dedicated to improvisation. An interactive system dedicated to music improvisation generates music ''on the fly'', in relation to the musical context of a live performance. This work follows on researches on machine improvisation seen as the navigation through a musical memory: typically the music played by an ''analog'' musician co-improvising with the system during a performance or an offline corpus. These researches were mainly dedicated to free improvisation, and we focus here on pulsed and ''idiomatic'' music. Within an idiomatic context, an improviser deals with issues of acceptability regarding the stylistic norms and aesthetic values implicitly carried by the musical idiom. This is also the case for an interactive music system that would like to play jazz, blues, or rock... without being limited to imperative rules that would not allow any kind of transgression or digression. Various repertoires of improvised music rely on a formalized and temporally structured object, for example a harmonic progression in jazz improvisation. The same way, the models and architecture we developed rely on a formal temporal structure. This structure does not carry the narrative dimension of the improvisation, that is its fundamentally aesthetic and non-explicit evolution, but is a sequence of formalized constraints for the machine improvisation. This thesis thus presents: a music generation model guided by a ''scenario'' introducing mechanisms of anticipation; a framework to compose improvised interactive performances at the ''scenario'' level; an architecture combining anticipatory behavior with reactivity using mixed static/dynamic scheduling techniques; an audio rendering module to perform live re-injection of captured material in synchrony with a non-metronomic beat; a study carried out with ten musicians through performances, work sessions, listening sessions and interviews. First, we propose a music generation model guided by a formal structure. In this framework ''improvising'' means navigating through an indexed memory to collect some contiguous or disconnected sequences matching the successive parts of a ''scenario'' guiding the improvisation (for example a chord progression). The musical purpose of the scenario is to ensure the conformity of the improvisations generated by the machine to the idiom it carries, and to introduce anticipation mechanisms in the generation process, by analogy with a musician anticipating the resolution of a harmonic progression. Using the formal genericity of the couple ''scenario / memory'', we sketch a protocol to compose improvisation sessions at the scenario level. Defining scenarios described using audio-musical descriptors or any user-defined alphabet can lead to approach others dimensions of guided interactive improvisation. In this framework, musicians for whom the definition of a musical alphabet and the design of scenarios for improvisation is part of the creative process can be involved upstream, in the ''meta-level of composition'' consisting in the design of the musical language of the machine. This model can be used in a compositional workflow and is ''offline'' in the sense that one run produces a whole timed and structured musical gesture satisfying the designed scenario that will then be unfolded through time during performance. We present then a dynamic architecture embedding such generation processes with formal specifications in order to combine anticipation and reactivity in a context of guided improvisation. In this context, a reaction of the system to the external environment, such as control interfaces or live players input, cannot only be seen as a spontaneous instant response. Indeed, it has to take advantage of the knowledge of this temporal structure to benefit from anticipatory behavior. A reaction can be considered as a revision of mid-term anticipations, musical sequences previously generated by the system ahead of the time of the performance, in the light of new events or controls. To cope with the issue of combining long-term planning and reactivity, we therefore propose to model guided improvisation as dynamic calls to ''compositional'' processes, that it to say to embed intrinsically offline generation models in a reactive architecture. In order to be able to play with the musicians, and with the sound of the musicians, this architecture includes a novel audio rendering module that enables to improvise by re-injecting live audio material (processed and transformed online to match the scenario) in synchrony with a non-metronomic fluctuating pulse. Finally, this work fully integrated the results of frequent interactions with expert musicians to the iterative design of the models and architectures. These latter are implemented in the interactive music system ImproteK, one of the offspring of the OMax system, that was used at various occasions during live performances with improvisers. During these collaborations, work sessions were associated to listening sessions and interviews to gather the evaluations of the musicians on the system in order to validate and refine the scientific and technological choices.
Content may be subject to copyright.
A preview of the PDF is not available
... Cela est dû à l'explosion combinatoire des possibilités si l'on veut modéliser l'évolution sur plusieurs dimensions. Suite à ces recherches, les travaux autour de la thèse de Jérôme Nika [Nika, 2016] ont donné naissance au logiciel ImproteK dont l'objectif est d'introduire le concept de scénario temporel pour le guidage de l'improvisation automatique dans le cas de l'improvisation idiomatique. Cette méthode est appropriée en particulier pour les styles musicaux s'appuyant sur la présence d'une structure, comme, par exemple, une grille harmonique dans l'improvisation jazz ou rock. ...
... L'agent numérique doit également être capable de s'adapter à une situation de jeu improvisée avec un style local et doit donc aussi répondre au problème de la quantité limitée de données pour la modélisation du style. Nous voulons également généraliser et effectuer un apprentissage automatique du concept de scénario proposé dans Nika [2016], permettant une modélisation multi-niveaux de la forme par un modèle hiérarchique et, par conséquent, un meilleur guidage de l'improvisation pour les mécanismes d'anticipation de scénario et de navigation de la mémoire. ...
... À noter que certains extraits issus de ces ateliers ont par la suite été utilisés par Varèse pour son Poème électronique. [Nika, 2016]. ...
Thesis
Les systèmes actuels d’improvisation musicales sont capables de générer des séquences musicales unidimensionnelles par recombinaison du matériel musical. Cependant, la prise en compte de plusieurs dimensions (mélodie, harmonie...) et la modélisation de plusieurs niveaux temporels sont des problèmes difficiles. Dans cette thèse, nous proposons de combiner des approches probabilistes et des méthodes issues de la théorie des langages formels afin de mieux apprécier la complexité du discours musical à la fois d’un point de vue multidimensionnel et multi-niveaux dans le cadre de l’improvisation où la quantité de données est limitée. Dans un premier temps, nous présentons un système capable de suivre la logique contextuelle d’une improvisation représentée par un oracle des facteurs tout en enrichissant son discours musical à l’aide de connaissances multidimensionnelles représentées par des modèles probabilistes interpolés. Ensuite, ces travaux sont étendus pour modéliser l’interaction entre plusieurs musiciens ou entre plusieurs dimensions par un algorithme de propagation de croyance afin de générer des improvisations multidimensionnelles. Enfin, nous proposons un système capable d’improviser sur un scénario temporel avec des informations multi-niveaux représenté par une grammaire hiérarchique. Nous proposons également une méthode d’apprentissage pour l’analyse automatique de structures temporelles hiérarchiques. Tous les systèmes sont évalués par des musiciens et improvisateurs experts lors de sessions d’écoute
... In recent years, another family of software that provides musical control, while being style-sensitive has been proposed by Nika et al. [4]. In this work, the generative strategy is further improved through the concept of guidance. ...
... The specification of high-level musical structures that define hard constraints on the generated sequences has been defined by Nika [4] as a musical scenario. This approach allows for defining a global orientation of the music generation process. ...
... An example of scenario-based systems is ImproteK [4], which uses pattern-matching algorithms on symbolic sequences. The generation process navigates a memory while matching predefined scenarios at a symbolic level. ...
... We refer here to guided improvisation when the control on music generation follows a more declarative approach, that is, specifying targeted outputs or behaviors using an aesthetic, musical, or audio vocabulary independent of the system implementation, whether this control is short term or long term (see Nika [2016] for a comprehensive review). ...
... We presented a guided generation model and a reactive architecture introducing authoring and control in an interactive music improvisation process. Both are implemented in the improvisation system ImproteK, whose development process integrates frequent interactions with numerous expert musicians through concerts, work sessions, and filmed listening sessions and interviews (see Nika [2016]). ...
Article
Full-text available
This article focuses on the introduction of control, authoring, and composition in human-computer music improvisation through the description of a guided music generation model and a reactive architecture, both implemented in the software ImproteK. This interactive music system is used with expert improvisers in work sessions and performances of idiomatic and pulsed music and more broadly in situations of structured or composed improvisation. The article deals with the integration of temporal specifications in the music generation process by means of a fixed or dynamic “scenario” and addresses the issue of the dialectic between reactivity and planning in interactive music improvisation. It covers the different levels involved in machine improvisation: the integration of anticipation relative to a predefined structure in a guided generation process at a symbolic level, an architecture combining this anticipation with reactivity using mixed static/dynamic scheduling techniques, and an audio rendering module performing live re-injection of captured material in synchrony with a non-metronomic beat. Finally, it sketches a framework to compose improvisation sessions at the scenario level, extending the initial musical scope of the system. All of these points are illustrated by videos of performances or work sessions with musicians.
... My work on the Voyager "interactive virtual improvisor" and "virtual orchestra" systems, which began in 1987, has been taken by many as emblematic of early work in this area; indeed, in his 2016 doctoral thesis on human-computer music improvisation, Jérôme Nika, who received his doctorate in computer science under the guidance of Gérard Assayag and Marc Chemillier at IRCAM and the Université Pierre et Marie Curie, graciously called Voyager a "pioneer system" (Nika 2016). Even so, my work at IRCAM from 1982-84, which preceded the advent of Voyager (Beurot 1982, Davaud 1984, Lewis 1984, could be regarded as an early adoption of Assayag and Chemillier's notion of co-creation. ...
... The system is primarily focused on the analysis, learning, and construction of musical elements. In a recent dissertation, Nika elaborates on the OMax system, emphasizing the themes of intentions, anticipations, playing, and practicing (Nika, 2016). ...
Article
Full-text available
We report on a series of workshops with musicians and robotics engineers aimed to study how human and machine improvisation can be explored through interdisciplinary design research. In the first workshop, we posed two leading questions to participants. First, what can AI and robotics learn by how improvisers think about time, space, actions, and decisions? Second, how can improvisation and musical instruments be enhanced by AI and robotics? The workshop included sessions led by the musicians, which provided an overview of the theory and practice of musical improvisation. In other sessions, AI and robotics researchers introduced AI principles to the musicians. Two smaller follow-up workshops comprised of only engineering and information science students provided an opportunity to elaborate on the principles covered in the first workshop. The workshops revealed parallels and discrepancies in the conceptualization of improvisation between musicians and engineers. These thematic differences could inform considerations for future designers of improvising robots.
... Moreover, such predictive models can suffer from error propagation if used to predict more than a single chord at a time. Since we want to use our chord predictor in a real-time improvisation system [1,10], the ability to predict coherent long-term sequences is of utmost importance. ...
... Moreover, such predictive models can suffer from error propagation if used to predict more than a single chord at a time. Since we want to use our chord predictor in a real-time improvisation system [1,10], the ability to predict coherent long-term sequences is of utmost importance. ...
Preprint
This paper studies the prediction of chord progressions for jazz music by relying on machine learning models. The motivation of our study comes from the recent success of neural networks for performing automatic music composition. Although high accuracies are obtained in single-step prediction scenarios, most models fail to generate accurate multi-step chord predictions. In this paper, we postulate that this comes from the multi-scale structure of musical information and propose new architectures based on an iterative temporal aggregation of input labels. Specifically, the input and ground truth labels are merged into increasingly large temporal bags, on which we train a family of encoder-decoder networks for each temporal scale. In a second step, we use these pre-trained encoder bottleneck features at each scale in order to train a final encoder-decoder network. Furthermore, we rely on different reductions of the initial chord alphabet into three adapted chord alphabets. We perform evaluations against several state-of-the-art models and show that our multi-scale architecture outperforms existing methods in terms of accuracy and perplexity, while requiring relatively few parameters. We analyze musical properties of the results, showing the influence of downbeat position within the analysis window on accuracy, and evaluate errors using a musically-informed distance metric.
... The music generation model anticipates the future of a scenario while ensuring consistency with the past of the improvisation. It received positive feedback from professional improvisers and evolved according to musicians expertise [17]. However, these scenarios are purely sequences of symbols. ...
Conference Paper
Full-text available
This paper presents a method taking into account the form of a tune upon several levels of organisation to guide music generation processes to match this structure. We first show how a phrase structure grammar can represent a hierarchical analysis of chord progressions and be used to create multi-level progressions. We then explain how to exploit this multi-level structure of a tune for music generation and how it enriches the possibilities of guided machine improvisation. We illustrate our method on a prominent jazz chord progression called 'rhythm changes'. After creating a phrase structure grammar for 'rhythm changes' with a professional musician, the terminals of this grammar are automatically learnt on a corpus. Then, we generate melodic improvisations guided by multi-level progressions created by the grammar. The results show the potential of our method to ensure the consistency of the improvisation regarding the global form of the tune, and how the knowledge of a corpus of chord progressions sharing the same hierarchical organisation can extend the possibilities of music generation.
... On the other hand, guiding can mean defining upstream temporal structures or descriptions driving the generation process of a whole music sequence. Some studies aim at generating new sequences favoring transitions or subsequences that are characteristic of a chosen corpus [12,13,14], at enforcing a formalized temporal structure in a music generation process [15,16], or at running generative models using temporal queries [17,18] (see [19] for an exhaustive taxonomy of the related work dedicated to guided human-computer music improvisation). The milestones of the project described in the two following subsections 4 serve as a basis for the ongoing research and development presented later in Section 3. ...
Conference Paper
Full-text available
The collaborative research and development project DYCI2, Creative Dynamics of Improvised Interaction, focuses on conceiving, adapting, and bringing into play efficient models of artificial listening, learning, interaction, and generation of musical contents. It aims at developing creative and autonomous digital musical agents able to take part in various human projects in an interactive and artistically credible way; and, in the end, at contributing to the perceptive and communicational skills of embedded artificial intelligence. The concerned areas are live performance, production, pedagogy, and active listening. This paper gives an overview focusing on one of the three main research issues of this project: conceiving multi-agent architectures and models of knowledge and decision in order to explore scenarios of music co-improvisation involving human and digital agents. The objective is to merge the usually exclusive "free" , "reactive", and "scenario-based" paradigms in interactive music generation to adapt to a wide range of musical contexts involving hybrid temporality and multimodal interactions.
Chapter
Billie Holiday, Edith Piaf, and Elisabeth Schwarzkopf are three great musical ladies, born in 1915. Could we make them singing together? This was the goal of a performance made for a music festival in L’Aquila (Italy) in 2015. This raises musical questions: what kind of sound could link Billie Holiday to Schwarzkopf, Schwarzkopf to Piaf? What kind of musical structure? With the help of the software ImproteK, the three ladies eventually sang together Autumn Leaves and The Man I love. ImproteK allowsmony of the used material (called “memories”) to the harmonic progression of the reference song (called “scenario”). The machine can also include into its “memory” what is currently played during the performance. This process raises fascinating questions about musical notation and style: the chosen common notation will define how the improvisation will follow the reference scenario, where the chosen memories inds and character interacting with this structure. The result can be arresting, surrealistic, kitsch; it is often fun.
Article
Full-text available
We present the formal model and implementation of a computer-aided composition system allowing for the "composition of musical processes". Rather than generating static data, this framework considers musical objects as dynamic structures likely to be updated and modified at any time. After formalizing a number of basic concepts, this paper describes the architecture of a framework comprising a scheduler, programming tools and graphical interfaces. The operation of this architecture, allowing to perform both regular and dynamic-processes composition, is explained through concrete musical examples.
Chapter
Full-text available
This chapter introduces the vision and the technical challenges of the Flow Machines project. Flow Machines aim at fostering creativity in artistic domains such as music and literature. We first observe that typically, great artists do not output just single artefacts but develop novel, individual styles. Style mirrors an individual’s uniqueness; style makes an artist’s work recognised and recognisable. Artists develop their own style after prolonged periods of imitation and exploration of the style of others. We envision style exploration as the application of existing styles, considered as texture, to arbitrary constraints, considered as structure. The goal of Flow Machines is to assist this process by allowing users to explicitly manipulate styles as computational objects. During interactions with Flow Machines, the user can create artefacts (melodies, texts, orchestrations) by combining styles with arbitrary constraints. Style exploration under user-defined constraints raises complex sequence generation issues that were addressed and solved for the most part during the first half of the project. We illustrate the potential of these techniques for style exploration with three examples.
Article
Full-text available
Cet article rend compte d'une enquête menée en 2011-2013 auprès du musicien de jazz Bernard Lubat pour le développement du logiciel ImproteK dédié à l’improvisation. Ce logiciel capte le jeu du musicien des phrases enregistrées pour en créer de nouvelles dans un cadre idiomatique (standards de jazz) où l’improvisation est basée sur une pulsation régulière et une grille harmonique. Nous avons recueilli de nombreuses appréciations formulées par Bernard Lubat sur les résultats produits par la machine qui constituent autant de «jugements de goût». Ceux-ci traitent d’aspects purement idiomatiques comme le phrasé (raideur versus souplesse) ou la justesse du tempo, et plus généralement des notions d’erreur et de transgression des limites de ce qui est acceptable au regard de l’idiome. Mais, au-delà de ces aspects techniques, l’enquête dévoile une pensée esthétique qui entremêle la musique (improvisation considérée comme le franchissement d’obstacles) et des considérations plus politiques et philosophiques.
Conference Paper
Full-text available
Markov processes are widely used to generate sequences that imitate a given style, using random walk. Random walk generates sequences by iteratively con-catenating states to prefixes of length equal or less than the given Markov order. However, at higher orders, Markov chains tend to replicate chunks of the corpus with a size possibly higher than the order, a primary form of plagiarism. The Markov order defines a maximum length for training but not for generation. In the framework of constraint satisfaction (CSP), we introduce MAXORDER. This global constraint ensures that generated sequences do not include chunks larger than a given maximum order. We exhibit an automaton that recognises the solution set, with a size linear in the size of the corpus. We propose a linear-time procedure to generate this automaton from a corpus and a given max order. We then use this automaton to achieve gener-alised arc consistency for the MAXORDER constraint, holding on a sequence of size n, in O(n.T) time, where T is the size of the automaton. We illustrate our approach by generating text sequences from text corpora with a maximum order guarantee, effectively controlling plagiarism.
Conference Paper
Full-text available
This work builds upon existing work in which second-order allpass filters are used in a feedback network, with parameters made time-varying to enable effects such as phase distortion in a generative audio system; the term " audio " is used here to distinguish from generative " music " systems, emphasizing the strong coupling between processes governing the production of high-level music and lower-level audio. The previous system was subject to issues of instability that can arise when time-invariant filter parameters are allowed to vary over time. These instabilities are examined herein, along with the adoption of a power-preserving rotation matrix formulation of the allpass filter to ensure stability and ultimately an improved synthesis for a generative audio system.
Article
Full-text available
In this paper, we present the programing of time and interaction in Antescofo, a real-time system for performance coordination between musicians and computer processes during live music performance. To this end, Antescofo relies on artificial machine listening and a domain specific real-time programing language. It extends each paradigm through strong coupling of the two and strong emphasis on temporal semantics and behavior of the system. The challenge in bringing human actions in the loop of computing is strongly related to temporal semantics of the language, and timeliness of live execution despite heterogeneous nature of time in the two mediums. Interaction scenarii are expressed at a symbolic level through the management of musical time (i.e. events like notes or beats in relative tempi) and of the 'physical' time (with relationships like succession, delay, duration, speed). Antescofo unique features are presented through a series of paradigmatic program samples which illustrate how to manage execution of different audio processes through time and their interactions with an external environment. The Antescofo approach has been validated through numerous uses of the system in live electronic performances in contemporary music repertoire by various international music ensembles.
Article
A pattern matching algorithm is one of the important algorithms often used in the intrusion detection system. A BM algorithm as described in this paper has been improved in accordance with their characteristics of small average shift distance in the single pattern matching algorithm to propose a novel pattern matching algorithm which could increase the average shift distance. Besides novel algorithm calculates the shift by setting a new preprocessing function, the algorithm has also defined a variable e during pattern matching to record a part of matched substrings in the last pattern matching such that a minimum shift can be achieved. The maximum value among shift, delta(x) which is the shift distance under the bad-char rules, and delta2 (j) which is the shift distance under the good-suffix rules, is then obtained to determine a next comparison position. This will increase a shift to a certain degree, such that the efficiency of pattern matching algorithm is greatly improved. The results of experimental tests have shown that the novel algorithm can actively improve the efficiency of matching process.
Chapter
Music is a pattern of sounds in time. A swarm is a dynamic pattern of individuals in space. The structure of a musical composition is shaped in advance of the performance, but the organization of a swarm is emergent, without pre-planning. What use, therefore, might swarms have in music?