Conference PaperPDF Available

Abstract and Figures

This paper presents an ongoing interdisciplinary research project that deals with free improvisation and human-machine interaction , involving a digital player piano and other musical instruments. Various technical concepts are developed by student participants in the project and continuously evaluated in artistic performances. Our goal is to explore methods for co-creative collaborations with artificial intel-ligences embodied in the player piano, enabling it to act as an equal improvisation partner for human musicians.
Content may be subject to copyright.
Spirio Sessions: Experiments in Human-Machine
Improvisation with a Digital Player Piano
Sebastian Trump1,2, Ismael Agchar2, Ilja Baumann2, Franziska Braun2,
Korbinian Riedhammer2, Lea Siemandel2, and Martin Ullrich1
1Nuremberg University of Music
2Technische Hochschule N¨urnberg
sebastian.trump@hfm-nuernberg.de
Abstract. This paper presents an ongoing interdisciplinary research
project that deals with free improvisation and human-machine inter-
action, involving a digital player piano and other musical instruments.
Various technical concepts are developed by student participants in the
project and continuously evaluated in artistic performances. Our goal
is to explore methods for co-creative collaborations with artificial intel-
ligences embodied in the player piano, enabling it to act as an equal
improvisation partner for human musicians.
Keywords: Human-Machine Improvisation, Co-creativity, Player Piano
1 Introduction
Many attempts have been made in the last decades to develop interfaces which
allow computers to become actors in musical performances, ranging from direct
control as in digital musical instruments (cf. e.g. Miranda, Wanderley, & Kirk,
2006) to virtual autonomous players with a higher degree of creative agency
(Gifford et al., 2018).
The Sprio Sessions project aims to explore concepts of free improvisation
among humans and machines in different research directions by prototype devel-
opment, different combinations of software modules, and artistic evaluation. To
give the computer-generated musical material in this human-machine collabora-
tion scenario a physical presence comparable to that of other traditional musical
instruments the machine player here acts in an embodied form of a digital player
piano3(cf. similar approaches e.g. in Brown, 2018, or the marimba-playing robot
improvisor Shimon by Hoffman & Weinberg, 2010) instead of using loudspeakers
for the actual sonic realization. Within this framing of a duo setting consisting of
the player piano controlled by an AI system and a human musician, we are aim-
ing at the exploration of various computational approaches for the interactive
generation of musical material.
3We use a Steinway & Sons digitally-enabled Spirio-R grand piano, which gave the
project its initial working title that has been retained since then.
2 Trump et al.
The theoretical-epistemic foundations of our project refer to the concept of
“musical cyborgs”, which assembles the diverse configurations of human-machine
co-creativity in the context of musical performance within the framework of crit-
ical posthumanities (Braidotti, 2016). From this perspective, the setting exam-
ined here can be described as one possible variation of more-than-human sonic
collaborations (Ullrich & Trump, in press). Therefore, the objective is not to sim-
ulate human pianism—even if distinct building blocks and processes involving
machine learning seem to point in this direction—but to establish a relational
aesthetics that encourages genuine machine artifacts and at the same time min-
imises human preselection.
This paper will briefly outline our research design and general methodological
approach, then go into more detail on each of the current research directions,
and give an outlook towards future work.
2 Research Design
The Spirio Sessions pro ject is designed around questions of interactivity in
free musical improvisation with computational systems following Rowe’s (1993)
player paradigm and its constitutive criterion of creative agency (Bown & Mc-
Cormack, 2011). The improvisational setting around the player piano forms
a conceptual framework within which a wide-ranging spectrum of technical
approaches—music information retrieval (MIR), rule-based AI, statistical model-
ing, and neural networks—is to be prototypically explored. The interdisciplinary
research group involved here brings together scholars from interdisciplinary mu-
sic research and computer science, as well as graduate students from computer
science, media computer science, jazz performance and music pedagogy.
2.1 Methods
All newly developed software elements in this project are modular in design
and intended to enable the most flexible combinations between each other. A
Max/MSP patch serves as a hub for the individual modules, which are inte-
grated via virtual MIDI ports and Open Sound Control (OSC). The project uses
an experimental approach and therefore asks for artistic potentials and creative
capacities of different technical concepts within the given setting rather than
looking for an ideal solution. Many of the AI techniques studied so far have
already been used in other computational music generation projects, but often
not in interactive scenarios. Hence, the artistic research (Klein, 2018), carried
out by the participating music students, is a crucial methodological component
for the evaluation of modified software prototypes. Such elements of subjective
assessment commonly used in research on computational systems for music im-
provisation (Gifford et al., 2018, 25) are applied here in systematically recorded
sessions4after each major development step and in defined parameter configu-
rations.
4Demo videos of performances using software from the following research directions
are available in Trump (2021).
Spirio Sessions 3
2.2 Research Directions
First prototype: Markov Chains, Adaptive Attention, and Arpeggios
During the preparatory phase of the research project, a first prototype operating
with simple Markov chains as elementary building blocks of machine learning
was created. This early experiment was implemented entirely in Max/MSP and
used the extension ml.lib (Smith & Garnett, 2012) to embed 2nd order Markov
chains for pitch progressions modeling. A continuous measure of note density
influences the degree of attention to new input material from a pitch-tracked
audio signal and the addition of randomly selected arpeggios of symmetrical
interval structures (cf. Fig. 1).
Pitch Tracker MIDI Data
Density Measure
MIDI Data to Player Piano
Listen?
Arpeggio Generator
Markov Chain Model
Sampling
Rhythm Generator
Fig. 1: Process flow of first prototype Max/MSP patch
HMM-based Improvisation Building on the statistical approach of the first
prototype, the focus of this subproject is on the investigation of automatically
generated musical improvisation using Hidden Markov Models (HMM). Extend-
ing regular Markov Models the HMM topology is defined by the number of
hidden states, the arrangement of the state transitions, and a set of possible ob-
servable emissions (Jurafsky & Martin, 2020). For musical improvisation, states
and emissions can be assigned different meanings (Marom, 1997), such as notes,
note durations, velocity, intervals, or chords (Simon, Morris, & Basu, 2008). In
HMM training, a distinction can be made whether training is event-triggered,
e.g., after a note is played, or time-triggered, e.g., after each quarter beat. In-
vestigated parameters affecting the training process itself are the window size,
the transition and emission probabilities, as well as the weighting for retraining.
The probabilities can be pre-trained on MIDI data or initialized with specific
distributions such as Gaussian, Discrete, or randomly. It can make a difference
whether the training is performed with a flat start or if retraining algorithms like
Viterbi and EM are applied (Jurafsky & Martin, 2020). For music generation,
it is possible to sample from the HMM or to make a prediction based on an
observed sequence. The number of generated samples and the sample rate will
affect the resulting melody.
4 Trump et al.
Frontend
(Piano, Sampler)
Player Input
Sampling
Backend
(Model, Generator)
MIDI Data (keydown)
Generated MIDI Data (multi-track)
MIDI Data (keyup)
Player
Prediction
Fig. 2: Client-server architecture for neural network experiments
Neural Networks for Interactive Multitrack Music Generation As part
of a master’s thesis, neural network approaches for the generation of an inter-
active multi-instrument accompaniment with the lowest possible latency have
been tested. The system is implemented using a client-server architecture (cf.
Fig. 2) with a web interface for symbolic MIDI input. For the generation, exist-
ing models, e.g. from the Google Magenta project, were examined and adapted
where necessary. In particular, generative deep learning models such as Varia-
tional Autoencoders (VAE) (Roberts, Engel, Raffel, Hawthorne, & Eck, 2019),
Generative Adversarial Networks (GANs) (Dong, Hsiao, Yang, & Yang, 2017)
and Transformers (Huang et al., 2018) were considered. The MusicVAE model
has turned out to be the most suitable for this purpose and was adopted as the
basis for the implementation. The server provides a REST API so that other
front-end systems can be connected. After the server has generated the accom-
paniment, the data is sent to the front-end and rendered time-synchronously
into MIDI data.
Fig. 4: Steps of the rhythm detector following Bello et al. (2005): (1) Wave-form signal,
(2) Smoothed by convoluting it with a Hann-window w(n) = sin2πn
Nwith frame index
nand window length N,(3) Detection signal with characteristics of rhythmically
important (energetically high, i.e. a dominant peak in the amplitude envelop) beats:
calculated as the discrete derivative s0(n) = s(n)s(n1) of (2), (4) Maxima in (3)
exceeding a certain threshold marked as found beats.
Rhythm Detector In order to improve rhythmic synchronization and entrain-
ment (Clayton, Sager, & Will, 2005), a custom implementation for a beat detec-
Spirio Sessions 5
tor (cf. Fig. 4) was developed as an additional subproject. We found the window
length Nof the Hann-window and the peak threshold most crucial to achieving
good detection results, heavily depending on the input instrument. To enable for
realtime processing here, the input audio signal must be processed in suitable
chunks and analyzed in parallel on multiple threads. The detected beats are then
sent out as OSC messages.
Sequence-to-Sequence Neural Networks The goal of this upcoming subpro-
ject is to use sequence-to-sequence neural networks (S2SNN) to model one part
of the interacting musical duo. As a typical example throughout epochs, it will
focus on a duo of a melody (woodwind) and accompanying instrument (piano),
comparing the very structured baroque basso continuo setup with free improvi-
sation. The central question is: Provided symbolic input/output (eg. MIDI), can
a S2SNN generate an accompaniment for a melody and vice versa? Related ques-
tions are: how much context is needed, how can the model anticipate the other
player, how to achieve rhythmic synchronization? The models will be trained
on prerecorded duo performances, potentially leveraging the full context. The
test scenario however will be stream-based, i.e. the model may store history but
can’t look ahead. This work will focus on symbolic data (eg. MIDI), which can
easily be discretized (input) or synthesized (output) via the player piano.
3 Future Work
We can already see that the creative potential for our systems lies less in isolated
software elements than in their intelligent combination and the choice of appro-
priate parameters. In this sense, each new Spirio Sessions subproject expands
the field of possibilities in several new directions. One followup research during
the next phase of the project will further dive into the idea of rhythm-like, rule-
driven music generation techniques. For this, probabilistic and transformational
grammars will be explored in a linguistic appraoch to the process of music gener-
ation (Keller & Morrison, 2007 and Putman & Keller, 2015). Another promising
approach will address the specifics of the piano pedal and examine modeling
techniques for this purpose. In addition, the comparison of the different con-
ceptual designs should also contribute to the desideratum of clearly specified
evaluation methods (Gifford et al., 2018, 32) for such systems. The artistic re-
search in music will extend from the dyadic interaction between human soloist
and machine to more complex collective settings.
4 Acknowledgements
This work was supported by LEONARDO – Center for Creativity and Innova-
tion, which is funded by the German Federal and States initiative “Innovative
Hochschule”.
6 Trump et al.
References
Bello, J., Daudet, L., Abdullah, S., Duxbury, C., Davies, M., & Sandler, M.
(2005). A tutorial on onset detection in music signals. IEEE Transactions
on Speech and Audio Processing,13 (5).
Bown, O., & McCormack, J. (2011). Creative Agency: A Clearer Goal for
Artificial Life in the Arts. In G. Kampis, I. Karsai, & E. Szathm´ary (Eds.),
Advances in Artificial Life. Darwin Meets von Neumann (pp. 254–261).
Berlin, Heidelberg: Springer.
Braidotti, R. (2016). The Critical Posthumanities; or, Is Medianatures to Na-
turecultures as Zoe Is to Bios? Cultural Politics,12 (3), 380–390.
Brown, A. R. (2018). Creative improvisation with a reflexive musical bot. Digital
Creativity,29 (1), 5–18.
Clayton, M., Sager, R., & Will, U. (2005). In time with the music: The con-
cept of entrainment and its significance for ethnomusicology. In European
Meetings in Ethnomusicology (Vol. 11, pp. 1–82). Romanian Society for
Ethnomusicology.
Dong, H.-W., Hsiao, W.-Y., Yang, L.-C., & Yang, Y.-H. (2017). Musegan:
Multi-track sequential generative adversarial networks for symbolic music
generation and accompaniment. arXiv:1709.06298 .
Gifford, T., Knotts, S., McCormack, J., Kalonaris, S., Yee-King, M., & d’Inverno,
M. (2018). Computational systems for music improvisation. Digital Cre-
ativity,29 , 19–36.
Hoffman, G., & Weinberg, G. (2010). Shimon: An interactive improvisational
robotic marimba player. In CHI ’10 Extended Abstracts on Human Factors
in Computing Systems (pp. 3097–3102). New York, NY, USA: Association
for Computing Machinery.
Huang, C.-Z. A., Vaswani, A., Uszkoreit, J., Shazeer, N., Simon, I., Hawthorne,
C., . . . Eck, D. (2018). Music Transformer. arXiv:1809.04281 .
Jurafsky, D., & Martin, J. H. (2020). Speech and language processing. Prentice
Hall.
Keller, R., & Morrison, D. (2007). A grammatical approach to automatic impro-
visation. Proceedings of the 4th Sound and Music Computing Conference,
SMC 2007 .
Klein, J. (2018). The Mode is the Method - or How Research Can Be-
come Artistic. Artistic Research - Is There Some Method? . Retrieved
from https://www.academia.edu/36239994/The Mode is the Method -
or How Research Can Become Artistic
Marom, Y. (1997). Improvising Jazz with Markov Chains (Unpublished doctoral
dissertation). The University of Western Australia.
Miranda, E. R., Wanderley, M. M., & Kirk, R. (2006). New digital musical
instruments: Control and interaction beyond the keyboard. A-R Editions.
Putman, A., & Keller, R. (2015). A transformational grammar framework for
improvisation. In First International Conference on New Music Concepts.
References 7
Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2019). A hi-
erarchical latent vector model for learning long-term structure in music.
arXiv:1803.05428 .
Rowe, R. (1993). Interactive music systems: Machine listening and composing.
Cambridge, Mass.: MIT Press.
Simon, I., Morris, D., & Basu, S. (2008). MySong: Automatic accompaniment
generation for vocal melodies. In Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems (pp. 725–734). New York, NY,
USA: Association for Computing Machinery.
Smith, B. D., & Garnett, G. E. (2012). Unsupervised play: Machine learning
toolkit for max. In G. Essl, R. B. Gillespie, M. Gurevich, & S. O’Modhrain
(Eds.), 12th international conference on new interfaces for musical expres-
sion, NIME 2012, ann arbor, michigan. nime.org.
Trump, S. (2021). Spirio Sessions Demo Videos. Zenodo. Retrieved
from https://doi.org/10.5281/zenodo.4635617 doi: 10.5281/zenodo
.4635617
Ullrich, M., & Trump, S. (in press). Sonic Collaborations between Humans,
Nonhuman Animals and Artificial Intelligences: Contemporary and Future
Aesthetics in More-Than-Human Worlds. Organised Sound,28 (1).
... In "Spirio Sessions" project (Trump et al., 2021), we combine and test different generative approaches in an interactive setting as a duo improvisation between a digital player piano and a melody instrument. Specific criteria and guidelines for evaluating the systems involved in such interactive scenarios are the subject of further research in this project. ...
Preprint
Full-text available
A Note to the Reader This report largely written back in Spring 2021, but unfortunately never submitted to peer review. In the meantime, two extensive reviews of similar nature have been published by Civit et al. (peer reviewed) and Zhao et al. (arxiv). We believe that this manuscript still adds value to the scientific community, since it focuses on music generation in an interactive, hence real-time, scenario, and how such systems can be evaluated. 2 Abstract In recent years, machine learning, and in particular generative adversarial neural networks (GANs) and attention-based neural networks (transformers), have been successfully used to compose and generate music, both melodies and polyphonic pieces. Current research focuses foremost on style replication (e.g., generating a Bach-style chorale) or style transfer (e.g., classical to jazz) based on large amounts of recorded or transcribed music, which in turn also allows for fairly straightforward "performance" evaluation. However, most of these models are not suitable for human-machine co-creation through live interaction, neither is clear, how such models and resulting creations would be evaluated. This article presents a thorough review of music representation, feature analysis, heuristic algorithms, statistical and parametric modelling, and human and automatic evaluation measures, along with a discussion of which approaches and models seem most suitable for live interaction.
Conference Paper
Full-text available
Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, and thereby introducing a chronological ordering of notes is not naturally suitable. In this paper, we propose three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs). The three models, which differ in the underlying assumptions and accordingly the network architectures, are referred to as the jamming model, the composer model and the hybrid model. We trained the proposed models on a dataset of over one hundred thousand bars of rock music and applied them to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings. A few intratrack and inter-track objective metrics are also proposed to evaluate the generative results, in addition to a subjective user study. We show that our models can generate coherent music of four bars right from scratch (i.e. without human inputs). We also extend our models to human-AI cooperative music generation: given a specific track composed by human, we can generate four additional tracks to accompany it. All code, the dataset and the rendered audio samples are available at https://salu133445.github.io/musegan/.
Article
Full-text available
This paper discusses improvisatory musical interactions between a musician and a machine. The focus is on duet performances, in which a human pianist and the Controlling Interactive Music (CIM) software system both perform on mechanized pianos. It also discusses improvisatory behaviours, using reflexive strategies in machines, and describes interfaces for musical communication and control between human and machine performers. Results are derived from trials with six expert improvising musicians using CIM. Analysis reveals that creative partnerships are fostered by several factors. The reflexive generative system provides aesthetic cohesion by ensuring that generated material has a direct relationship to that played by the musician. The interaction design relies on musical communication through performance as the primary mechanism for feedback and control. It can be shown that his approach to musical human-machine improvisation allows technical concerns to fall away from the musician's awareness and attention to shift to the musical dialogue within the duet.
Article
Full-text available
This article situates the geological turn in media theory within the critical posthumanities, defining them in both quantitative and qualitative terms. They can be assessed quantitatively by reviewing the proliferation of interdisciplinary “studies” areas — such as media and gender studies — that have transformed the modes of knowledge production within the academic humanities and beyond. They are framed qualitatively by the neomaterialist, vital philosophy proposed by Gilles Deleuze’s Spinozism, based on the concepts of monism, radical immanence, and relational ontology. They not only support the idea of a nature-culture continuum but also provide the philosophical grounding for technological mediation to be defined not as a form of representation but as the expression of “medianaturecultural” ethical relations and forces.
Conference Paper
Full-text available
Jazz improvisations can be constructed from common idioms woven over a chord progression fabric. Prior art has shown that probabilistic generative grammars are one effective means of achieving such improvisations. Here we introduce another approach using transformational grammars instead. One advantage that transformational grammars provide is a form of steering from an underlying melodic outline. We demonstrate by showing how idioms can be defined in a transformational grammar and how the placement of idioms conforms to the outline and chord structure. We illustrate how transformational grammars can provide unique and varied improvisations that are suggestive of the outline. We illustrate the application of this approach in an educational software tool.
Article
Computational music systems that afford improvised creative interaction in real time are often designed for a specific improviser and performance style. As such the field is diverse, fragmented and lacks a coherent framework. Through analysis of examples in the field, we identify key areas of concern in the design of new systems, which we use as categories in the construction of a taxonomy. From our broad overview of the field, we select significant examples to analyse in greater depth. This analysis serves to derive principles that may aid designers scaffold their work on existing innovation. We explore successful evaluation techniques from other fields and describe how they may be applied to iterative design processes for improvisational systems. We hope that by developing a more coherent design and evaluation process, we can support the next generation of improvisational music systems.
Conference Paper
We introduce MySong, a system that automatically chooses chords to accompany a vocal melody. A user with no musical experience can create a song with instrumental accompaniment just by singing into a microphone, and can experiment with different styles and chord patterns using interactions designed to be intuitive to non-musicians. We describe the implementation of MySong, which trains a Hidden Markov Model using a music database and uses that model to select chords for new melodies. Model parameters are intuitively exposed to the user. We present results from a study demonstrating that chords assigned to melodies using MySong and chords assigned manually by musicians receive similar subjective ratings. We then present results from a second study showing that thirteen users with no background in music theory are able to rapidly create musical accompaniments using MySong, and that these accompaniments are rated positively by evaluators.
Conference Paper
Shimon is an autonomous marimba-playing robot designed to create interactions with human players that lead to novel musical outcomes. The robot combines music perception, interaction, and improvisation with the capacity to produce melodic and harmonic acoustic responses through choreographic gestures. We developed an anticipatory action framework, and a gesture-based behavior system, allowing the robot to play improvised Jazz with humans in synchrony, fluently, and without delay. In addition, we built an expressive non-humanoid head for musical social communication. This paper describes our system, used in a performance and demonstration at the CHI 2010 Media Showcase.
Conference Paper
One of the goals of artificial life in the arts is to develop systems that exhibit creativity. We argue that creativity {it per se} is a confusing goal for artificial life systems because of the complexity of the relationship between the system, its designers and users, and the creative domain. We analyse this confusion in terms of factors affecting individual human motivation in the arts, and the methods used to measure the success of artificial creative systems. We argue that an attempt to understand emph{creative agency} as a common thread in nature, human culture, human individuals and computational systems is a necessary step towards a better understanding of computational creativity. We define creative agency with respect to existing theories of creativity and consider human creative agency in terms of human evolution. We then propose how creative agency can be used to analyse the creativity of computational systems in artistic domains. @InProceedings{bown_et_al:DSP:2009:2216, author = {Oliver Bown and Jon McCormack}, title = {Creative Agency: A Clearer Goal for Artificial Life in the Arts}, booktitle = {Computational Creativity: An Interdisciplinary Approach}, year = {2009}, editor = {Margaret Boden and Mark D'Inverno and Jon McCormack}, number = {09291}, series = {Dagstuhl Seminar Proceedings}, ISSN = {1862-4405}, publisher = {Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany}, address = {Dagstuhl, Germany}, URL = {http://drops.dagstuhl.de/opus/volltexte/2009/2216}, annote = {Keywords: Creativity, agency} }