ArticlePDF Available

Collaborating with an Autonomous Agent to Generate Affective Music

  • Libera Università degli Studi di Bozen-Bolzano

Abstract and Figures

Multidisciplinary research recently has been investigating solutions to offer new experiences of music making to musically untrained users. Our approach proposes to distribute the process of music making between the user and an autonomous agent by encoding this collaboration in the emotional domain. In this framework, users communicate the emotions they wish to express to Robin, the autonomous agent, which interprets this information to generate music with matching affective flavor. Robin is taught a series of basic compositional rules of tonal music, which are used to create original compositions in Western classical-like music. Associations between alterations to musical factors and changes in the communicated emotions are operationalized on the basis of recent outcomes that have emerged from research in the field of psychology of music. At each new bar, a number of stochastic processes determine the values of seven musical factors, whose combinations best match the intended emotion. The ability of Robin to validly communicate emotions was tested in an experimental study (N = 33). Results indicated that listeners correctly identified the intended emotions. Robin was employed for the purposes of two interactive artworks, which are also discussed in the article, showing the potential of the algorithm to be employed in interactive installations.
Content may be subject to copyright.
Collaborating With An Autonomous Agent
To Generate Affective Music
FABIO MORREALE, Queen Mary University of London
ANTONELLA DE ANGELI, University of Trento
Multidisciplinary research has been recently investigating solutions to offer new experiences of music
making to musically untrained users. Our approach proposes to distribute the process of music making
between the user and an autonomous agent by encoding this collaboration in the emotional domain. In this
framework, users communicate the emotions they wish to express to Robin, the autonomous agent, which
interprets this information to generate music with matching affective flavour. Robin is taught a series of
basic compositional rules of tonal music, which are used to create original compositions in Western
classical-like music. Associations between alterations to musical factors and changes in the communicated
emotions are operationalised on the basis of recent outcomes emerged from research in the field of
research in the psychology of music. At each new bar, a number of stochastic processes determine the
values of seven musical factors, whose combinations best match the intended emotion. The ability of Robin
to validly communicate emotions was tested in an experimental study (N=33). Results indicated that
listeners correctly identified the intended emotions. Robin was employed for the purposes of two
interactive artworks, which are also discussed in the paper, showing the potential of the algorithm to be
employed in interactive installations.
Applied computing ~ Sound and music computing • Human computer interaction ~ Auditory
feedback Interaction design ~ User interface design
Additional Key Words and Phrases: Algorithmic composition, music and emotions, musical metacreation,
interactive installations
Musical metacreation is a branch of computational creativity that investigates the
capability of an autonomous agent to generate creative musical output on its own
[Pasquier, Eigenfelt and Bown, 2012]. So far, researchers and practitioners working
in this area have been mainly focused on producing software that can (a) improvise
with performers playing traditional instruments, or (b) autonomously compose new
scores offline [Eigenfelt, Bown, Pasquier and Martin, 2013]. Our intuition is to
employ metacreative software as a support tool to ease music making, thus opening
this activity to musically untrained users. Creating musical experiences accessible to
anyone is a challenge that has been increasingly tackled by multidisciplinary
research in the last couple of decades [Machover, 1996]. In this paper we propose an
algorithmic composer that allows users to control some aspects of the composition in
real time.
Autonomous agents can be defined as algorithmic solutions to create new music
with limited or absent human supervision. Autonomous agents can directly map the
user input into musical and sonic features [Franinovic and Salter, 2013].
Alternatively, the input can be arbitrarily mapped into combinations of musical
parameters and acoustic events using some kind of representation conceived by the
artist. Being an arbitrary decision, the mapping is often unclear and many artists are
likely to specifically pursue this aspect to provide their users with ambiguous
experiences. For instance, Iamascope processes visual information describing the
current status of the installation and maps it into specific pitches [Fels and Mase,
1999]. Metaphone detects bio-data from the visitors and uses this information to
modulate the frequency and the amplitude of predefined tones [Šimbelis et al., 2014].
In both cases, the mapping between sensed input and musical output is unintelligible
to the listeners, who might fail to give a meaning to music and to understand how to
control it.
Our work has addressed this challenge through design solutions aimed at
increasing the transparency of input-output, so that users can intentionally
manipulate the melody they are creating. The idea was to reconsider the process of
interactive music making as a meaningful collaboration between the human and an
autonomous agent, structuring it as an interaction based on emotions, which are
available to everybody, intuitive and naturally connected with music [Morreale et al.,
2014]. As a consequence, a metacreative software had to be developed, able to
autonomously generate a musical composition and systematically convert user input,
described in terms of emotions, into musical rules, which are in turn used to direct
the composition.
This paper presents a new type of generative system based on a certain type of
representation of emotions, which are used as an interactive metaphor to allow the
user to control the music. Specifically, the user communicates their intended
emotions to Robin, an algorithmic composer that interprets this information and
immediately reconfigures the composition, so that it mirrors the emotions conveyed
by the user [Morreale, Masu and De Angeli, 2013]. Associations between alterations
of musical parameters and changes in the communicated emotions were
operationalised following research in the psychology of music [Gabrielsson and
Lindström, 2010; Juslin and Sloboda, 2010]. Robin was manually fed a series of rules
that are used to generate original music played by virtual instruments. These rules
drive a number of stochastic processes that constantly update the values of seven
musical factors (i.e. tempo, mode, sound level, pitch register, pitch contour,
consonance, and repetitions), whose combinations best match the intended emotion.
This work falls under the domain of the recent research area of algorithmic
affective composers (i.e. autonomous systems that generate music with affective
flavour. This branch currently, counts only a handful of studies [Hoeberechts and
Shantz, 2009; Legaspi et al., 2007; Livingstone et al., 2010; Oliveira and Cardoso,
2010; Wallis et al., 2011], and it is characterised by a number of shortcomings largely
ascribable to the early stage of its development In particular, none of these systems
was systematically tested to validate its capability to communicate the intended
response in the listener. Also, the quality of the musical output still has large
improvement margins.
The contribution of the present study is threefold. First, it presents a new
autonomous agent that creates original music by collaborating in real time with the
user employing emotions as a medium. Second, it proposes a methodology to evaluate
the capability of an algorithmic affective composer to communicate the intended
emotions and to test user liking. Third, it presents two interactive systems where the
collaboration between Robin and the user is employed to create music with specific
emotion character.
This paper is structured as follows. Section 2 reviews existing algorithmic
affective composers and the related theoretical foundations grounded in the
psychology of music and algorithmic composition. Section 3 introduces Robin,
detailing the architecture and the implementation. Section 4 describes the
experimental study aimed at testing Robin. Section 5 presents two interactive
applications of Robin: The Music Room and The TwitterRadio. The paper concludes
with reflections about the implications of this autonomous agent and discusses
possible future works.
The automatic generation of musical output with affective flavour is a
multidisciplinary subject that relates closely to the psychology of music and
algorithmic composition. While the former investigates the human perception to
music variations, eventually drawing a mapping between combinations of musical
parameters and perceived attributes, the latter studies the musicians’ capability of
composing musical scores and employs findings to automatically generate new music.
Psychology of music
Research in psychology of music has been long investigating the association between
variations of musical factors and changes in the emotional expression [Bresin and
Friberg, 2000; Bresin and Friberg, 2011; Fritz et al., 2009; Gabrielsson and
Lindström, 2010; Hevner, 1937; Meyer, 2008]. Two main approaches can be adopted
for measuring classifying emotions: the categorical and the dimensional approach.
The categorical approach postulates that all emotions can be derived from a finite
number of monopolar factors of universal basic affects [Ekman, 1992]. This approach
was adopted by several experimental studies; yet, a severe disagreement about the
number and the labels of categories is evidenced [Zentner and Eerola, 2010]. The
dimensional approach, on the other hand, discredits the assumption of independence,
postulating that emotions are systematically related to each other and can be
described using a limited number of dimensions. The most common dimensional
model was proposed by [Russell, 1980]. It describes emotions as a continuum along
two dimensions: valence, which refers to the pleasure vs. displeasure affective state,
and arousal, which refers to the arousal vs. sleep difference. Even though this model
is largely adopted in a wide range of research field, its limitations were acknowledged
by the author himself [Russell, 1980]. Among others shortcomings, he noted that the
affective states in which the two dimensions are convergent (i.e. positive valence and
high arousal, and negative valence and low arousal) occur more frequently than the
affective states in which they diverge [Russell, 1980].
In the psychology of music, both approaches have been widely employed [Juslin
and Sloboda, 2010], with a predominance of the dimensional approach [Ilie and
Thompson, 2006; Juslin and Sloboda, 2010; Schubert, 1999]. A general consensus
suggests that the most expressive parameters are tempo and mode, with a slight
predominance of tempo1 [Gagnon and Perez, 2003; Gundalach, 1935; Juslin, 1997;
Rigg, 1964]. Reporting these findings in the valence/arousal dimensions, tempo has a
major impact on arousal and a minor impact on valence, while mode only impacts on
valence (Figure 1) [Gagnon and Perez, 2003]. Specifically, fast tempo communicates
high arousal and, to a lesser extent, positive valence, while slow tempo communicates
low arousal and, to a lesser extent, negative valence. Mode influences valence only:
major mode generally communicates positive valence and minor mode generally
communicates negative valence [Gabrielsson and Lindström, 2010]. Interestingly,
music played with diverging conditions of mode and tempo (i.e. major mode and slow
tempo, or minor mode and fast tempo) seems to communicate similar, neutral levels
of valence [Webster and Weir, 2005]. Non-musicians, in particular, cannot easily
differentiate the valence in musical pieces where valence and arousal diverge
[Morreale et al., 2013].
1 In most cases, tempo describes the quantity of notes for the unity of time rather than simply measuring
BPMs. This measure is also known as note density. For the sake of simplicity, to the term tempo also refers
to note density in this paper.
Fig 1. The double effect of mode and tempo on valence and arousal.
In addition to tempo and mode, other musical factors have a clear influence on the
expressiveness of a composition. In particular, we wish to focus on sound level, pitch
contour, pitch register and dissonance. This subset of musical factors was selected on
the basis of their relevance for communicating emotions [Gabrielsson and Lindström,
2010] and their applicability to the architecture of Robin, which mainly operates on
structural factors (i.e. those related to the musical score itself), given the objective of
algorithmically generating new compositions. The emotional response related to
these musical factors is discussed below and summarised in Table I.
Sound level. Sound level is a continuous variable that determines the volume
of the musical outcome i.e. the velocity of individual notes. It is directly
proportional to the arousal communicated to the listener [Gabrielsson and
Lindström, 2010].
Pitch contour. The emotional effect of ascending and descending melodic lines
has been widely discussed in literature [Zeiner-Henriksen, 2015], but a
general consensus on its relevance for emotional expression has not been
reached. However, a number of studies have suggested that ascending
melodies tend to be associated with positive emotions while descending
melodies are associated with negative emotions [Gabrielsson and Lindström,
Pitch register. High pitch register is associated with positive emotions (but, at
times, also fear and anger). Low pitch register is mostly associated with
sadness [Gabrielsson and Lindström, 2010].
Dissonance. [Fritz et al., 2009] suggested that consonance is universally
perceived as more positive than dissonance. Moreover, listeners’ culture and
musical training do not seem to influence the perception of consonance.
In addition to these factors, the psychological response of expectations is particularly
relevant to the emotional response to a musical piece. Related work suggests that the
emotional impact of expectation is remarkably complex [Huron, 2006]. In general,
listener expectations can be either fulfilled or frustrated. Fulfilment/frustration
affects the emotional response of the listener [Meyer, 2008]. According to this
perspective, resolution and repetitions suggest positive emotions, while lack of
resolution is indicative of negative emotions.
minor major
- +
- +
Table I. Mapping between musical structures and the emotional dimensions of valence and arousal
Sound level
Pitch contour
Pitch register
Algorithmic composition
The algorithmic composition of original music is a creative process combining formal
compositional rules with randomness. This combination has been exploited to
compose music for centuries. Mozart’s Musicalisches Würfelspiel (‘Musical Dice
Game’) uses the randomness associated with dice to compose a minuet. Short sections
of music are assembled by rolling dice to form a composition with 1.3 × 1029 possible
combinations. Given these rules, the musicality of the resulting outcome relied on the
coherence of the pre-composed sections. Three of the 20th-century finest musicians,
John Cage, Iannis Xenakis and Lejaren Hiller, engaged with a number of
compositions that explored stochastic processes for composing music [Schwartz and
Godfrey, 1993]. In the final decades of the last century, the interest in exploiting
randomness in composition resurfaced, partly due to the improved power of
computational systems. Computers have been used to develop algorithms capable of
generating unpredictable complex structures that are correct from a phraseological
perspective [Cope, 2005; Jacob, 1996; Lewis, 1999].
The next three subsections review the most common approaches to algorithmic
composition: rule-based, learning-based, and evolutionary [Todd and Werner, 1999].
For a more complete review, refer to [Roads and Strawn, 1985; Miranda, 2001].
Finally, the last subsection discusses the algorithmic affective composers, an
encounter between studies in music perception and algorithmic composition.
2.2.1 Rule-Based Approach
The rule-based approach proposes to manually or statistically define a set of
compositional rules that provide the system with information on how to compose
music autonomously [Boenn, Brain and De Vos, 2008; Henz, Lauer and
Zimmermann, 1996]. These rules drive a number of stochastic processes that
generates an original music composition. They can be very basic, as in the previously
mentioned musical dice games by Mozart, but they can also embody complex
harmonisation rules [Todd and Werner, 1999]. The quality of the music generated
with this approach substantially depends on the quality of human intervention, i.e.
the number of taught rules [Steedman, 1984]. As a consequence, meta-composers
(those who design the algorithm) need to have a deep knowledge of music theory and
a clear sense of their compositional goals.
2.2.2 Learning-Based Approach
The learning-based approach proposes to reduce the reliance on human skills.
Systems adopting this approach are trained with existing musical excerpts and
automatically learn compositional rules [Hiller and Isaacson, 1957]. Following this
approach, [Simon, Morris and Basu, 2008] developed MySong, a system that
automatically selects chord accompaniments given a vocal track. This study was
followed by Songsmith2, a commercial application that empowers users to compose an
entire song starting from the vocal track sung by the user. After roughly predicting
the notes in the vocal melody, the system selects the sequence of chords that best fits
the singing. A music database of 300 excerpts trained a Hidden Markov Model
(HMM) that feeds the system with basic statistics related to chord progressions.
Another system exploiting the learning-based approach is The Continuator [Pachet,
2003], ideated to provide realistic interaction with human players. The algorithm
exploits Markov models to react to musical input, and can learn and generate any
style of music While this approach reduces the human involvement in the
algorithmic composition process, the quality of music is heavily dependent on the
training set. Also, this approach is not suitable when there is a need to have direct
control on individual musical factors.
2.2.3 Evolutionary Approach
Evolutionary algorithms are stochastic optimisation techniques loosely based upon
the process of evolution by natural selection. Evolutionary algorithms have been used
to generate original musical compositions [Mitchell, 1996; Miranda, 2007]. In most
cases, evolutionary compositions attempt to evolve music pieces in the style of a
particular composer or genre [Miranda, 2007]. In this approach, a population of
short, monophonic motifs evolves during the composition. Some other systems also
evolve pitch and/or rhythm sequences [Miranda, 2007]. In general, the evolutionary
approach is particularly effective in producing unpredictable, and at times chaotic,
outputs. However, the music might sound unnatural and experimental if compared
with rule-based systems, which are generally superior by virtue of the context-
sensitive nature of tonal music [Nierhaus, 2009]. Furthermore, the evolutionary
approach lacks structure in its reasoning and cannot simulate human composers’
ability to develop subtle solutions to solve compositional problems such as
harmonisation [Wiggins et al., 1998].
2.2.4 Algorithmic Affective Compositions
Over the last few years, a handful of studies have attempted to combine theory on
music and emotion with algorithmic composition in order to automatically compose
expressive music. One of the most interesting examples is AMEE, a patented rule-
based algorithm focused on generating adaptive soundtracks [Hoeberechts and
Shantz, 2009]. The algorithm generates monophonic piano melodies that can be
influenced in real time by adjusting the values of ten emotions with a web applet.
The categorical approach was also adopted by [Legaspi et al., 2007], who employed an
evolutionary approach to composition. Both systems propose interesting methods to
adaptive composition, but they employ a categorical approach to emotion
classification that fails to address the complexity of the human emotional space
(Section 2.1.1).
The dimensional approach can limit this problem as explained in [Livingstone et
al., 2010], who follows a rule-based approach that manually collated a set of rules of
2 Microsoft Corporation. Microsoft Research Songsmith, 2009.
music theory. The system maps emotions, described along the dimensions of valence
and arousal, into structural and performative features. The user interaction is
limited to a GUI, where the user can select the desired values of valence and arousal.
A similar interface is proposed by [Wallis et al., 2011] and by [Oliveira and Cardoso,
2010] to allow users to interact with the composition. These systems have
contributed to defining a novel research topic concerning algorithmic composition of
tonal music, allowing users to alter its expressivity in real time. However, a number
of significant limitations reduce their practical applicability:
1. The actual capability of the algorithms to communicate correct emotions in the
listener has not been validated. The only attempt to determine the extent to which
the listener evaluation of valence and arousal in music corresponded to system
parameters was performed by [Wallis et al., 2011]. However, the limited number
of participants who took part in the study (11), combined with the lack of
discussion on the results, undermined the validity of the study.
2. By our estimation, the quality of the music generated by these systems seems to
be only acceptable only used for testing the possibilities of an autonomous agent to
compose expressive music, rather than being enjoyable by listeners on the basis of
its own merits. Again, a formal user study that can disprove this assertion is
missing from all of the reviewed literature.
3. In most cases, the actual interface consists of a simple applet that allows users to
select the intensity of discrete emotions, or values of valence and arousal.
This limited utilisation, combined with the low quality of the compositions, suggests
that these systems are primarily intended as pioneering explorations of a new
research field, rather than serving as fully functional systems to be used in
interactive contexts. To date, indeed, only [Oliveira and Cardoso, 2010] have
attempted to apply their algorithm to a simple interactive installation, but the
audience interaction merely consists in transforming pre-composed musical pieces
rather than creating music de novo.
Robin was designed to make the experience of musical creation accessible to all
users. The system generates a tonal composition in real time while the user interacts
with it through control strategies based on basic emotions, described in the valence
and arousal dimensions. To ensure consistency with user interaction, the system
continuously monitors input changes and adapts the music accordingly, by managing
seven musical factors (Section 2.1). As these factors need to be directly accessed and
manipulated, a rule-based approach to composition was adopted. This approach
allows the designer to manually code the compositional rules and therefore to have
full control on the musical factors of interest. As this approach largely relies on
human intervention (Section 2.2.1), the quality of the generated music depends on
the characteristics and the correctness of the taught rules. To this end, a professional
composer was continuously involved at design and testing stages.
Considering the target user population, a second requirement had to be met: the
generated music style had to be understandable even by musically untrained users.
For this reason, tonal music was adopted. As opposed to atonal and experimental
music, tonal compositions are indeed ubiquitously present in Western culture: even
those who lack musical training internalise the grammar of tonality as a result of
being exposed to it [Winner, 1982]. The process of score generation is grounded upon
a number of compositional rules of tonal music driving stochastic processes, which in
turn generate harmony, rhythm, and melody (Figure 2). The harmony module
determines the chord progression following a probabilistic approach. The selected
chord is combined with (i) a rhythmic pattern that is completed with pitches from the
scale thus generating the solo line; and (ii) an accompaniment line selector that
generates an accompaniment line. Finally, the system outputs a stream of MIDI
messages that are processed by a Digital Audio Workstation and transformed into
music. Robin is currently implemented in SuperCollider.
Fig 2. The architecture of Robin, the algorithmic affective composer.
Traditionally, harmony is examined on the basis of chord progressions and cadences.
Following previous works [Nierhaus, 2009; Steedman, 1984], the transition
probabilities between successive chords are defined as Markov processes. Chord
transition data can be collected by analysing existing music, surveying music theory,
or following personal aesthetic principles [Chai and Vercoe, 2001]. In our case, a
Markov process determines the harmonic progression as a continuous stream of
chords. The algorithm starts from a random key and then iteratively processes a
Markov matrix to compute the successive chords (Table II). The architecture of the
system supports nth order Markov chains. However, for the sake of simplicity, in the
current version of the system chords correlation does not depend on previous states of
the system.
Table II. Transition probability matrix among the degrees of the scale.
line selector
accompaniment solo line
rhythmic pattern selection
The 10 x 10 matrix contains the transition probabilities among the degrees of the
scale. The entries are the seven degrees of the scale as triads in root position, and
three degrees (II, IV, V) set in the VII chord. The transition probabilities are based
on the study of harmony presented by [Piston, 1941]. For each new bar, the system
analyses the transition matrix and selects the degree of the successive bar. The
probability for a degree to be selected is directly proportional to the transition value:
for instance, being VII the current degree of the scale, the I degree will be selected as
the successive chord in the 80% of cases on average, whereas the II7 degree will be
selected in the 20% of cases. In addition, in order to divide the composition into
phrases, every eight3 bars the system forces the harmonic progression to a cadence
(i.e. a conclusion of a phrase or a period). Finally, in order to to generate compositions
with more variability, Robin can switch between different keys performing V and IV
For each new bar, a new rhythmic pattern is selected; nearly all rhythm
combinations composed of whole, half, quarter, eight and sixteenth notes are
available. The same combinations of notes in triplets are also available. The rhythmic
pattern is computed in three steps. First, the time signature of the bar is chosen;
second, the values of all the notes played in the bar are selected; third, the selected
note values are placed in a particular order. Different time signatures and note
values and placement are influenced by two factors, complexity and density, as
1. Time signature. The complexity factor determines the time signature: in case of
simple rhythms the time signature is duple, triple, or quadruple. By contrast,
complex rhythms have irregular time signatures.
2. Notes values selection. The selection of the values of the notes is influenced by
both by complexity density. In case of simple rhythms, all the notes in the bar
have similar values. By contrast, complex rhythms permit notes with very
different values to be played in the same bar. In addition, density determines the
value of the longest note available: very dense rhythms have generally short
notes values, whereas low density rhythms are mostly composed of long note
3. Notes values placement. The complexity factor also determines the placement of
the notes values. In case of simple rhythms, notes of the same value are placed
one after another, whereas in complex rhythms notes with very different values
can be placed nearby.
Figure 3 illustrates a number of rhythmic patterns generated by Robin in 4/4 time
signature in the complexity/density dimensions. This technique results in a space of
possible solutions with a Gaussian distribution (grey area). Very complex rhythms
can happen only when combined with mid-density, and very high (low) density
necessarily correspond to very low (high) complexity.
3 Setting the length of a section to eight bars is an arbitrary choice made by the authors. Given the
architecture of the system, adopting a different unit or even using a different unit for each section are
feasible options.
Fig 3. The grey area represents the space of possible rhythmic solutions.
In order to generate the melody of the solo line, the rhythmic pattern is filled with
suitable pitches. This process happens in three steps:
1. The pitch selector receives the rhythmic pattern and the current chord (Figure
2. All the significant notes in the bar are filled with notes of the chord. The notes
regarded as significant are those whose duration is an eighth note, or longer,
or that are at the first or the last place in the sequence (Figure 4.b).
3. The remaining spaces are filled with notes of the scale. Starting from the
leftmost note, when Robin meets an empty space, it checks the note on the left
and it turns it into a higher or a lower pitch, depending on the value of the
pitch contour (Figure 4.c). The pseudo-code of the algorithm follows.
Input: The rhythmic pattern
Output: The rhythmic pattern filled with pitches
starting from the leftmost note of the bar
if (current note value > eight note) do
melody(current note) = random(note from the chord)
else do
if (pitch contour == ascending) do
melody(current note) = melody(previous note).next note from the scale
else do
melody(current note) = melody(previous note).previous note from the
until (number of notes left) > 0; !
The accompaniment line is selected at each new bar. A number of accompaniment
lines are available. The accompaniment lines essentially differ in the density of the
notes in the arpeggio. Each accompaniment line defines the rhythm of the
accompaniment, and the notes of the accompaniment are degrees of the chord.
Fig 4. Melody notes selection. a) The pitch selector receives the rhythmic pattern and the chord. b) The
relevant notes of the melody are filled with notes of the chord. c) The remaining spaces are filled with
notes of the scale to form a descending or ascending melody.
Definition of High-Level Musical Structures
As opposed to similar affective composers such as AMEE [Hoeberechts and Shantz,
2009], Robin does not allow the definition of high-level musical structures like
phrases and sections. Human composers often make wide use of high-level structures
such as phrasing and articulation, to create emotional peaks, or to develop changes in
the character of the composition. However, including such structures in a real-time
algorithmic composer is not a viable solution. In order to deal with such structures,
the system would need to know the evolution of the piece from the beginning.
However, we cannot predict the evolution in advance, as it is controlled by the user.
AMEE simulated high-level musical structures by introducing forced abortions in the
process of music generation [Hoeberechts and Shantz, 2009]. However, this solution
causes dramatic interruptions, thus reducing both musical coherence and the natural
evolution of the composition itself. To this end, the only high-level structural
elements manipulated by Robin are repetitions of short themes (which partially
simulate choruses and phrases) and cadences (which define phrases).
Operational Definition of Emotion
Seven musical factors are manipulated to infer changes in the communicated
emotions, defined in terms of valence and arousal. These factors are: tempo, mode,
sound level, pitch contour, pitch register, dissonance and expectations (Table I). This
section discusses the operationalization of the alteration of the emotional response of
these parameters.
Tempo. Tempo is a continuous variable measured in BPM. Note density is also
manipulated by selecting rhythmic patterns and accompaniment lines with
appropriate density (Section 3.2).
Mode. The change between modes is supported in the Harmony module, where
the chords transition probability matrix is populated with notes based on the
selected mode.
Sound level. Sound level changes by manipulating the velocity of the MIDI.
Pitch Contour. The direction of the melody is determined employing the
method described in Section 3.3.
Pitch Register. The pitch register centre of the compositions generated by
Robin ranges from C2 (lowest valence) to C5 (highest valence).
Dissonance. Dissonance is achieved by inserting a number of out-of-scale notes
in both melody and harmony.
Expectations. Fulfilment of expectations is operationalised repeating themes
and recurring patterns that the listener quickly comes to recognise as
familiar. By contrast, frustration of expectations is operationalised avoiding
This section reports an experimental study aimed at testing the capability of Robin to
communicate specific emotions to the listener. For the purposes of this experiment,
participants were asked to listen to a number of snippets generated by Robin in
different emotional conditions and to self-report the communicated levels of valence
and arousal. The experiment could be declared successful if participants correctly
identified the intended levels of valence and arousal.
The experimental design was a 2*2 within-subjects design with intended valence
(positive vs. negative) and intended arousal (high vs. low). The tested variables were
the reported valence and arousal. For each condition, we used Robin to generate five
different piano snippets (30 seconds long), for a total of 20 snippets4. A 3-second fade-
out effect was added at the end of each snippet. No other processing was made, nor
was any generated snippet discarded. Robin manipulated mode, tempo, sound level,
pitch contour, pitch register and expectations5 were manipulated in order to generate
music in the four emotional conditions. All the other musical parameters were left
constant and high-level structures were not considered at this time.
All factors, except for tempo, influence either valence or arousal. Tempo, on the other
hand, has a major effect on arousal, but it also influences valence (Table I). This
secondary effect is particularly evident for non-musicians [Morreale et al., 2013]. The
double influence of tempo was operationalised as follows:
snippets with high arousal were twice as fast as snippets with low arousal;
snippets with high valence were 8/7 times faster than the snippets with low
5 Control on dissonance was added to the architecture of the system at a latter stage so it was let out of the
experiment. Evaluating listener’s response to changes in dissonance is left to future work.
Table III shows the mapping between the six factors and the four conditions of
valence/arousal (+ + = positive valence / high arousal, + - = positive valence / low
arousal, - + = negative valence / high arousal, - - = negative valence / low arousal).
The hypotheses of the study are listed in Table IV.
Table III. The value of the six factors in the four different conditions of valence and arousal.
+ +
+ -
- +
- -
Tempo (BPM)
Sound level
P. Contour
P. Register
Table IV. The expected values of valence and arousal.
Intended emotion
Reported Valence
Reported Arousal
Positive valence High Arousal
Positive valence Low Arousal
Negative valence High Arsoual
Negative valence Low Arousal
Participants were recruited among students and staff of the University of Trento,
Italy. A total of 33 participants (11 F, average age 29) took part in the experiment.
Sessions ran in a silent room at the Department of Information Engineering and
Computer Science]. Participants sat in front of a monitor wearing AKG K550
headphones. Before the experiment, participants were given written instructions
about the task they had to perform. They were initially presented with four training
excerpts in order to become familiar with the interface and the task. Then, the
snippets were presented in a random order. In order to measure valence and arousal
separately, participants were asked to rate them on two semantic differential items,
from 1 (negative or relaxing) to 7 (positive or exciting). In addition, they were asked
to indicate, from 1 to 7, how much they liked each snippet (liking). To assign the
desired value of valence, arousal and liking they typed the numbers 1-7 on a
keyboard when prompted by the interface (e.g. “Please rate 1-7 arousal”). Between
each listening, the computer played a sequence of random notes arbitrarily selected
from a set of five pre-recorded 15-seconds snippets composed of random notes. Such
random sequences are necessary to mask the effects of previously played music
[Bharucha and Stoeckig, 1987].
A two-way within-subjects ANOVA was performed on reported valence, arousal and
liking ratings separately. In both cases, intended valence (positive and negative) and
arousal (high and low) were the within-subject factors. To disambiguate between the
intended valence and arousal (independent variables) and the tested valence and
arousal (dependant variables), we will refer to the first couple as intended and the
second couple as reported. We used a p level of .05 for all statistics, and we reported
all analyses that reach these levels. The average values of reported valence, arousal
and liking are illustrated in Figure 5.
Fig 5. Graphs describing the averages for the reported valence, arousal and liking in the four conditions
4.2.1 Reported Valence
The analysis showed significant main effects for intended valence [F(1,32) = 32.90,
p<.001] and for intended arousal [F(1,32) = 36.8, p<.001]. The interaction between
the two factors was not significant. As expected, the analysis of the means of the
reported valence revealed that + + scored the highest value (5.21), and - - scored the
lowest value (3.22). The double effect of tempo on arousal and valence produced side
effects: + - and - + resulted in similar neutral scores (4.26 and 3.91, respectively).
These results indicate that the manipulation of valence and arousal contributes to
defining the perception of valence, but that the two factors do not intersect.
regardless of arousal, the snippets with positive valence result in more positive
values than those with negative valence;
regardless of valence, the snippets with high arousal result in higher values than
those with low arousal.
4.2.2 Reported Arousal
The ANOVA showed significant main effects for intended valence [F(1,32) = 29.4,
p<.001] and for intended arousal [F(1,32) = 147.9, p<.001]. The interaction between
the two factors was also significant [F(1,32) = 12.6, p<.005]. The analysis of the
means of the reported arousal matched our expectations. Snippets composed with
high arousal (+ + and - +) scored high values (5.31 and 4.95 respectively), and those
with low arousal (- + and - -) scored low values (3.52 and 2.52 respectively). These
data suggest that the manipulation of both valence and arousal contributes to
defining the perception of arousal, and that their intersection also has en effect,
which is evident in the difference between the + - and - - conditions: snippets
composed with low arousal communicate higher arousal when combined with positive
4.2.3 Liking
The rating values for each snippet varied between 3.72 and 5.15, with an average of
4.38. The ANOVA revealed that intended arousal was the most significant factor
with respect to liking [F(1,32) = 8.978, p<.01]. The interaction effect of arousal and
valence was also significant [F(1,32) = 4.735 p<.05]. The favourite condition was high
valence combined with high arousal (mean 4.80, SD .88), while all other conditions
produced the same values (4.18, 1.21).
The experiment showed that listeners’ emotional responses to the music composed by
Robin met our expectations to a significant extent. The reported arousal matched the
intended arousal in all conditions. Results on reported valence are more complex. The
reported valence matched the intended valence only when the conditions converged.
In the case of diverging conditions, the reported valences of - + and + - reported
similar, neutral averages (Table V).
Table V. Measured levels of reported valence and arousal.
Intended emotion
Reported Valence
Reported Arousal
Positive valence High Arousal
Positive valence Low Arousal
Negative valence High Arousal
Negative valence Low Arousal
This finding can be explained in the light of the difficulty experienced by non-
musicians in distinguishing divergent emotional stimulations [Morreale et al., 2013;
Webster and Weir, 2005]. A possible solution to improve the accuracy of Robin in
eliciting correct valence among listeners in these condition would be to decrease
tempo in the + condition or to increase it in the + -. The new values for rebalancing
tempo might follow the results of a recent study conducted by [Bresin and Frieberg,
2011], who suggested that happy performances are usually played almost 4 times
faster than sad performances.
This section presents two interactive installations, The Music Room and The
TwitterRadio, in which the generated music results from a collaboration between
Robin and the visitors. The contribution to the field of interactive art lies in the
employment of an algorithmic composer that is specifically designed to communicate
predictable emotions. The input communicates to the system the emotions users
want to convey; Robin interprets this information by adapting the values of seven
musical parameters to match the desired emotional configuration. The collaboration
is mediated by the metaphor of emotions and put into practice through application-
specific metaphors. In The Music Room, the user communicates their intended
emotions via body gestures; in the TwitterRadio, emotions are inferred from textual
information describing people feelings on trending topics.
Fig 6. The Music Room
The Music Room
The Music Room (Figure 6) is an interactive installation for collaborative music
making [Morreale et al., 2014]. The installation was designed to be experienced by
couples of visitors, which can direct the emotional character of music by means of
their movements. In order to communicate the desired emotions in an intuitive and
engaging manner, we adopted the metaphor of intimacy. Distance between people
influences valence: the more proximal the visitors are, the more positive the music.
The speed of their movements influences arousal: the faster they move, the louder
and faster the music. The process of generating music from user movements involves
two steps.
1. Participants’ movements are detected using computer vision techniques. The
motion of the couples is captured through a downwards-looking bird-eye
camera installed on the ceiling of the room. The detection of the moving
subjects has been implemented by applying a standard background
subtraction algorithm.
2. The extracted values of average speed and relative distance are communicated
to Robin. Following the mapping detailed in Section 3.5, valence and arousal
are transformed into combinations of musical factors, which determine the
change produced in the generated music. By matching the values of speed and
proximity to emotions, Robin adapts the musical flow, as has been previously
For the purpose of increasing the diversity and the liking of the composed music,
different musical instruments were associated with as many conditions. The piano
was constantly present in all conditions, a violin harmonised the piano voice when
couples were particularly close, and a trombone harmonised the piano voice when
couples were on the opposite sides of the room6. This choice, which was grounded on
6 An extract from The Music Room can be viewed at
both personal taste and related work [Eerola, Friberg and Bresin, 2013; Juslin and
Sloboda, 2010], was particularly appreciated by the audience of the installation.
Formal evaluations of the installation are reported in previous publications, which
describe visitor experiences as collected during three exhibitions of the installation
[Morreale and De Angeli, 2015]. On each occasion, people were queuing up for long to
try the installation and attendee reviews also seemed to confirm its successful
reception. Integrating evidences collected through an array of evaluation techniques
disclosed a number of interesting themes. Several visitors reported the feeling of
being empowered to create a “meaningful” music simply by means of their
movements. Others stressed that they had been enabled to have control on music for
the very first time in their life. Furthermore, a quantitative analysis revealed that
there was a significant negative correlation between visitors’ musical expertise and
engagement, suggesting that non-musicians had a more creative experience
[Morreale and De Angeli, 2015].
These results confirmed that the system is capable of offering the audience a
unique experience of music making where the control over the composition is shared
between the visitor and Robin. Some user argued that they would have preferred to
have more control on the music, for instance by moving their limbs or fingers. We
purposely decided to let the user interact on a semantic level only, as to ensure a
quick engagement with the installation, which might have been hindered by a more
complex mapping between user gestures and musical output. However, given the
modular architecture of Robin, future editions of the installation could allow users to
directly interact with lower-level parameters such as rhythmic complexity and pitch
Fig 7. The TwitterRadio
The TwitterRadio
The next case study utilises Robin as a sonification tool for interactive visualization
of data. The TwitterRadio offers a novel environment for experiencing user-generated
contents in an auditory form [Morrele, Miniukovich and De Angeli, 2014]. The idea is
to use music as a means to express data describing public opinions on trending
topics. The visitors of The TwitterRadio can browse a list of trending topics and listen
to the mood of the world population on those specific topics. The adopted data source
is Twitter, which counts over 300 million active users (by May 2015) who constantly
share their thoughts and feelings on personal and social issues. The system collects
all recent tweets labelled with trending hashtags and retrieves information about
their emotions and popularity. These features are then mapped in the musical
domain in order to create melodies that match the mood of the tweets. The
architecture of The TwitterRadio is composed of three main modules: the user
interface, the server and Robin.
1. The user interface resembles a retro-style radio composed of a wooden box, a
colour display, a knob, and four led lights (Figure 7). The display shows
information about the list of available channels and a red bar indicating the
currently playing station. The user can operate the radio rotating the knob,
whose position information is digitalised by an Arduino hidden inside the box.
Besides choosing existing trending topics, the user can type their favourite
hashtag with a wireless keyboard. Finally, the led lights communicate the status
of the system: playing, loading or waiting.
2. The server forwards the user requests for a new station to Twitter and gathers
all the tweets labelled with that particular hashtag that were posted within the
previous 5 hours. The scraped messages are then processed and information
about the average tweet mood, the tweet frequency, and the re-tweet percentage
is processed. Tweet mood is computed by means of the MPQA Subjectivity
Lexicon [Riloff and Wiebe, 2003], which describes the polarity (positive, neutral,
negative) of 8221 English words. The frequency of the tweets is defined as the
number of tweets per minute. Re-tweet percentage refers to the overall amount of
re-tweets. This information is then forwarded to Robin, while data describing the
status of the system is displayed through the led lights.
3. Robin collects the information coming from the server and generates music
accordingly, diffusing sound through two desktop loudspeakers, which are also
hidden inside the box. Tweet mood is mapped into valence and tweet frequency
into arousal. Also, when the re-tweet percentage is above a certain threshold,
theme repetition is triggered. Resembling the functionality of traditional radios,
when the bar is not perfectly aligned with the indicator of a radio channel, the
auditory output is buzzing.
The installation was showcased during two academic events held in Trento and at
the Art Museum of Rovereto7. A formal evaluation of the experience is currently
under study. However, preliminary observations suggested that the audience visibly
appreciated both the aesthetic and the functionality of The TwitterRadio, and found
it particularly entertaining. Furthermore, a number of creative interpretations of the
system took place. For instance, some visitors rotated the knob in and out of a
channel to rhythmically alternate noise and music and others tried to create a song
structure by purposely switching between themes with different moods.
7 A short video demoing The TwitterRadio can be found at
The work presented in this paper provided substantial contributions to the research
field intertwining metacreative software with interactive installation. The first
contribution is Robin, an algorithmic composer that generates real-time tonal
compositions with affective connotations. A possible alternative to the generative
approach would have a database of pre-composed melodies with different
combinations of valence and arousal. However, we maintained that the generative
approach would better match our objectives for two main reasons. First, the
generative approach permits continuous adaptations of different musical parameters,
and therefore nearly endless number combinations. The combinatory approach would
have required an enormous database of pre-composed melodies, which, beside
requiring a huge amount of time to be created for each new interaction, would have
dramatically increased the size of the software. Second, the generative approach
creates a completely new and original composition, thus allowing users to create
unique music.
This paper also offered a methodology to assess the capability of an algorithmic
affective composer to communicate the intended emotions. We validated such
capability in an experimental study. This study was of primary importance in that
the collaboration between the human and the autonomous agent is encoded with the
metaphor of emotions, thus it was necessary to make sure that the music created by
Robin actually stirs among listeners the intended emotional flavour. A systematic
validation of the mapping proposed to communicate user meanings is new to the
interactive art community. In interactive artworks mapping strategies have to be
defined to transform audience behaviours into musical output. Instead of arbitrarily
mapping audience behaviours into musical parameters, we introduced an
intermediate layer to mediate users intentionalities through semantic descriptors
translating them into rules that are used by Robin to compose matching music.
Robin put forward other contributions in the field of automatic composition of
affective music. In particular, the authors’ opinion is that the quality of the music
generated by Robin constitutes a progress with respect to the music generated by
related studies. Moreover, the tunes generated by these systems do not match our
personal aesthetic. We believe that this is an important issue that should be taken
into primary consideration when discussing works intersecting art and research. This
belief echoes the statement of [Eigenfeldt, Burnett, and Pasquier, 2012], who
suggested that metacreative works should reflect the artistic sentiment of their
designer. Aesthetic and tastes play indeed a crucial role in the evaluation of such
systems, which might potentially be flawless from a methodological point of view, but
still unable to meet the wishes of designers and listeners.
A number of shortcomings, partly ascribable to infancy of this field, suggests that
there are indeed wide margins for future improvement. First and foremost, the
evaluation of Robin so far has been limited to validating its actual capability to
communicate the intended emotions to the listener. Appreciation on the quality of the
music, however, was simply questioned during an exhibition of The Music Room. On
that occasion, visitors generally enjoyed the music [Morreale and De Angeli, 2015]. In
the future, we aim at setting up an experimental study with both experts and non-
musicians to systematically enquire into the quality of the compositions generated by
Our investigations disclosed that naïve listeners tend to use an emotional
vocabulary when describing musical pieces. In the process of simplifying music access
to this category of users, then, our first objective was to allow them to have control on
the affective flavour of the song. However, we acknowledge that musical grammar is
much more complex and can by no means be reduced to an emotional grammar.
Future implementation of the system will allow users to interact on other dimensions
Currently, the system does not support high-level musical structures; as music
progression cannot be predicted in advance, being the evolution of the piece under
the control of the user. Should high-level structures be included in the system,
sudden input changes from the user would make the transition unnatural. This issue
remains open to investigation.
The current implementation of Robin only deals with structural factors to infer a
change in the communicated emotions. To enhance communication of the correct
emotional flavour, future implementations of the system will include those
performative behaviours whose variations define a change in the communicated
emotions. Phrasing, for instance, has a direct effect on the communicated emotions:
forward-phrasing is usually associated with sad and tender performances, whereas
reverse-phrasing is usually associated with aggressive performances [Bresin and
Friberg, 2011]. The real-time score generation capability of Robin can be easily
combined with existing systems for the automatic modelling of expressive contents of
musical scores, such as pDM from [Bresin and Friberg, 2011], in which performative
factors are mapped into emotions.
So far, musical metacreation systems have been mainly designed for the community
of musicians, advancing solutions to autonomously improvise with performers or to
autonomously generate new compositions [Eigenfelt, Bown, Pasquier and Martin,
2013]. This paper proposed a new direction for musical metacreation by employing
computational creativity to provide musically untrained users with experiences of
music making. We presented a computational system that distributes the complexity
of music making between the user and Robin, an autonomous agent that generates
music on its own, allowing the user to interact with the composition on a semantic
level. This protocol, which was employed in two interactive artworks, proved
particularly efficient in the light of the engagement the boost of musical creativity
experienced by the users.
Georg Boenn, Martin Brain, and Marina De Vos. 2008. Automatic composition of melodic and harmonic
music by answer set programming. Logic Programming. Springer Berlin Heidelberg, 2008. 160-174.
Jamshed J. Bharucha, and Keiko Stoeckig. 1987. Priming of chords: spreading activation or overlapping
frequency spectra?. Perception & Psychophysics 41.6: 519-524.
Roberto Bresin, and Anders Friberg. 2000. Emotional coloring of computer-controlled music performances.
Computer Music Journal 24.4 (2000): 44-63.
Roberto Bresin, and Anders Friberg. 2011. Emotion rendering in music: range and characteristic values of
seven musical variables. Cortex 47.9 (2011): 1068-1081.
Wei Chai, and Barry Vercoe. 2001. Folk music classification using hidden Markov models. Proceedings of
International Conference on Artificial Intelligence. Vol. 6. No. 6.4.
David Cope. 2005. Computer models of musical creativity (p. xi462). Cambridge: MIT Press.
Tuomas Eerola, Anders Friberg, and Roberto Bresin. 2013. Emotional expression in music: contribution,
linearity, and additivity of primary musical cues. Frontiers in psychology 4.
Arne Eigenfeldt, Oliver Bown, Philippe Pasquier, and Aengus Martin. 2013. Towards a Taxonomy of
Musical Metacreation: Reflections on the First Musical Metacreation Weekend. Workshop on Musical
Metacreation: 4047.
Arne Eigenfeldt, Adam Burnett, and Philippe Pasquier. 2012. Evaluating musical metacreation in a live
performance context. Proceedings of the Third International Conference on Computational Creativity.
Paul Ekman. 2009. An argument for basic emotions. Cognition & emotion 6.3-4 (1992): 169-200.
Sydney Fels and Kenji Mase. 1999, Iamascope: A graphical musical instrument. Computers & Graphics
23.2: 277-286.
Karmen Franinovic and Christopher Salter. 2013. 2 The Experience of Sonic Interaction. Sonic Interaction
Design: 39.
Thomas Fritz, et al. 2009. Universal recognition of three basic emotions in music. Current biology 19.7
(2009): 573-576.
Alf Gabrielsson, and Erik Lindström. 2010. The role of structure in the musical expression of emotions.
Handbook of music and emotion: Theory, research, applications (2010): 367-400.
Lise Gagnon, and Isabelle Peretz. 2003. Mode and tempo relative contributions to “happy-sad” judgements
in equitone melodies. Cognition & Emotion 17.1 (2003): 25-40.
Ralph H. Gundlach. 1935. Factors determining the characterization of musical phrases. The American
Journal of Psychology (1935): 624-643.
Martin Henz, Stefan Lauer, and Detlev Zimmermann. 1996. COMPOzE-intention-based music
composition through constraint programming. Tools with Artificial Intelligence, 1996., Proceedings
Eighth IEEE International Conference on. IEEE.
Kate Hevner. 1935. The affective character of the major and minor modes in music. The American Journal
of Psychology, 103-118.
Lejaren A. Hiller Jr, and Leonard M. Isaacson. 1957. Musical composition with a high speed digital
computer. Audio Engineering Society Convention 9. Audio Engineering Society.
Maia Hoeberechts, and Jeffrey Shantz. 2009. Realtime Emotional Adaptation in Automated Composition.
Audio Mostly (2009): 1-8.
David B. Huron. 2006. Sweet anticipation: Music and the psychology of expectation. MIT press, 2006.
Gabriella Ilie, and William Forde Thompson. 2006. A comparison of acoustic cues in music and speech for
three dimensions of affect. (2006): 319-330.
Bruce L. Jacob. 1996. Algorithmic composition as a model of creativity." Organised Sound 1.03 (1996): 157-
Patrik N. Juslin. 1997. Perceived emotional expression in synthesized performances of a short melody:
Capturing the listener's judgment policy. Musicae scientiae 1.2 (1997): 225-256.
Patrik N. Juslin, and John A. Sloboda. 2010. Handbook of music and emotion: Theory, research,
applications. Oxford University Press.
Roberto Legaspi, et al. 2007. Music compositional intelligence with an affective flavor. Proceedings of the
12th international conference on Intelligent user interfaces. ACM, 2007.
Steven R. Livingstone, et al. 2010. Changing musical emotion: A computational rule system for modifying
score and performance. Computer Music Journal 34.1 (2010): 41-64.
George E. Lewis. 1999. Interacting with latter-day musical automata."Contemporary Music Review 18.3:
Tod Machover. 1996. The Brain Opera and active music. Catálogo Ars Electronica.
Leonard B. Meyer. 2008. Emotion and meaning in music. University of Chicago Press.
Eduardo Miranda. 2001. Composing Music with Computers with Cdrom. Butterworth-Heinemann.
Eudardo Miranda. 2007. Evolutionary computer music. London: Springer, 2007.
Fabio Morreale, et al. 2013. The Effect of Expertise in Evaluating Emotions in Music. 3rd International
Conference on Music & Emotion.
Fabio Morreale, Raul Masu, and Antonella De Angeli. 2013. Robin: an algorithmic composer for interactive
scenarios. Proceedings of Sound and Music Computing Conference.
Fabio Morreale, et al. 2014. Collaborative creativity: The music room. Personal and Ubiquitous Computing,
18(5): 1187-1199.
Fabio Morreale, Aliaksei Miniukovich, and Antonella De Angeli. 2014. Twitterradio: translating tweets
into music. Extended Abstracts on Human Factors in Computing Systems. ACM.
Fabio Morreale, and Antonella De Angeli. 2015. Evaluating Visitor Experiences with Interactive Art.
Proceedings of the 11th Biannual Conference on Italian SIGCHI Chapter (pp. 50-57). ACM.
Gerhard Nierhaus. 2009. Algorithmic composition: paradigms of automated music generation. Springer
Science & Business Media.
António Pedro Oliveira, and Amílcar Cardoso. 2010. A musical system for emotional expression.
Knowledge-Based Systems 23.8 (2010): 901-913.
Francois Paque. 2003. The continuator: Musical interaction with style. Journal of New Music Research
32.3 (2003): 333-341.
Philippe Pasquier, Arne Eigenfelt, and Oliver Bown. 2012. Preface. Musical Metacreation: Papers from the
2012 AIIDE Workshop AAAI Technical Report WS-12-16 .(2012).
Piston, W. 1941. Harmony. W. W. Norton, Incorporated.
Ellen Riloff, and Janyce Wiebe. 2003. Learning extraction patterns for subjective expressions." Proceedings
of the 2003 conference on Empirical methods in natural language processing. Association for
Computational Linguistics.
Melvin G. Rigg. 1964. The mood effects of music: A comparison of data from four investigators. The journal
of psychology 58.2 (1964): 427-438.
Curtis Roads, and John Strawn. 1985. Foundations of computer music. Vol. 28. Cambridge, MA: Mit Press.
James A. Russell, "A circumplex model of affect." Journal of personality and social psychology 39.6 (1980):
Emery Schubert. 1999. Measuring emotion continuously: Validity and reliability of the two-dimensional
emotion-space. Australian Journal of Psychology 51.3 (1999): 154-165.
Elliott Schwartz, and Daniel Godfrey. 1993. Music since 1945: issues, materials, and literature. New York:
Schirmer Books.
Ian Simon, Dan Morris, and Sumit Basu. 2008. MySong: automatic accompaniment generation for vocal
melodies. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM.
Mark J. Steedman. 1984. A generative grammar for jazz chord sequences. Music Perception (1984): 52-77.
Vygandas Šimbelis et al. 2014. Metaphone: machine aesthetics meets interaction design." Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems. ACM.
Peter M. Todd, and Gregory M. Werner. 1999. Frankensteinian methods for evolutionary music. Musical
networks: parallel distributed perception and performace (1999): 313.
Isaac Wallis, et al. 2011. A rule-based generative music system controlled by desired valence and arousal.
Proceedings of 8th international sound and music computing conference.
Gregory D. Webster, and Catherine G. Weir. 2005. Emotional responses to music: Interactive effects of
mode, texture, and tempo. Motivation and Emotion 29.1 (2005): 19-39.
Geraint Wiggins, et al. 1998. Evolutionary methods for musical composition. Evolutionary methods for
musical composition. International Journal of Computing Anticipatory Systems.
Ellen Winner. 1982. Invented worlds: The psychology of the arts. Harvard University Press.
Hans T. Zeiner-Henriksen. 2015. Emotional Experiences of Ascending Melodic Lines, Proc. of the 11th
International Symposium on CMMR.
Marcel Zentner, and Tuomas Eerola. 2010. Self-report measures and models. Handbook of music and
emotion (2010): 187-221.
... The evolution of the population then takes place after the repeated application of the above operators (Goldberg 1989;Whitley and Sutton 2012). recent, with almost all related works and publications appearing over the last few years, e.g., (Hoeberechts and Shantz 2009;Huang and Lin 2013;Kirke and Miranda 2011;Kirke and Miranda 2017;Livingstone et al. 2010;Morreale and de Angeli 2016). Most methods in this area fall within the translation-based AMC category, with some methods partly including simple mathematical-based AMC constructs. ...
... Then, valence/arousal levels are inferred from the users' EEGs using another set of rules (e.g., high left versus right frontal alpha implies high arousal). Similarly to Morreale and De Angeli's approach in Morreale and de Angeli (2016), the valence/arousal scores are used to represent the composer's understanding of musical features. Yet, the solution in Kirke and Miranda (2011) offers the user a greater control over the composition process by allowing them to specify the musical structure and a (simple grammar-like) template of the piece they are composing (such as ABA or ABCA, where A, B, C and D are generic predefined theme labels), thereby promising better organized musical pieces. ...
... Table 14 summarizes the main differences between our method and related solutions. On the one hand, MUSEC: (i) accepts as input a vector of scaled ([[0, 1]) sentiment scores, or a piece of (MIDI) music that can be processed to extract a scaled sentiment vector, (ii) adopts a categorical sentiment model (consisting of six basic sentiments) which is intuitive and simple to understand by users who will use it to express their input sentiments accordingly, (iii) produces pieces that reflect a target crisp sentiment (e.g., love) or a collection of fuzzy sentiments (e.g., 65% happy, 20% sad, and 15% angry), (iv) consists of an evolutionary composition module integrating a music sentiment-based machine learning module as its fitness evaluation function, in order to ensure flexibility and variability of MUSEC's compositions for similar input sentiment requests as the system gains experience and dynamically adapts to its user's particular inclinations, (v) utilizes an extensible set of 18 different music-theoretic mutation operators (trille, staccato, repeat, compress, etc.), stochastically orchestrated within the evolutionary process, to add atomic and thematic variability to the compositions, (vi) composes (Kirke and Miranda 2017), or EEGs (Kirke and Miranda 2011)] to describe sentiments following the valence/arousal dimensional model, which are not always intuitive or easy to produce by non-expert users, (ii) produce as output crisp-only or two-dimensional (valence/ arousal) sentiment scores which are less descriptive in their sentiment expressiveness compared with the categorical model, 47 (iii) utilize translation-based models (Hoeberechts and Shantz 2009;Huang and Lin 2013;Livingstone et al. 2010), or mathematical model-based techniques (Kirke and Miranda 2011;Kirke and Miranda 2017;Morreale and de Angeli 2016), creating relatively simpler or less creative music where the main challenge lies in selecting appropriate inputs and converting them into music, (iv) are mostly static and heavily reliant on predefined rules or rule-based models, (v) produce mostly monophonic music (Hoeberechts and Shantz 2009;Kirke and Miranda 2017), while few approaches produce polyphonic music (Kirke and Miranda 2011;Morreale and de Angeli 2016) although using author-developed heuristics to extend an initial monophonic melody into polyphonic music. ...
Full-text available
Over the past years, several approaches have been developed to create algorithmic music composers. Most existing solutions focus on composing music that appears theoretically correct or interesting to the listener. However, few methods have targeted sentiment-based music composition: generating music that expresses human emotions. The few existing methods are restricted in the spectrum of emotions they can express (usually to two dimensions: valence and arousal) as well as the level of sophistication of the music they compose (usually monophonic, following translation-based, predefined templates or heuristic textures). In this paper, we introduce a new algorithmic framework for autonomous music sentiment-based expression and composition, titled MUSEC, that perceives an extensible set of six primary human emotions (e.g., anger, fear, joy, love, sadness, and surprise) expressed by a MIDI musical file and then composes (creates) new polyphonic (pseudo) thematic, and diversified musical pieces that express these emotions. Unlike existing solutions, MUSEC is: (i) a hybrid crossover between supervised learning (SL, to learn sentiments from music) and evolutionary computation (for music composition, MC), where SL serves at the fitness function of MC to compose music that expresses target sentiments, (ii) extensible in the panel of emotions it can convey, producing pieces that reflect a target crisp sentiment (e.g., love) or a collection of fuzzy sentiments (e.g., 65% happy, 20% sad, and 15% angry), compared with crisp-only or two-dimensional (valence/arousal) sentiment models used in existing solutions, (iii) adopts the evolutionary-developmental model, using an extensive set of specially designed music-theoretic mutation operators (trille, staccato, repeat, compress, etc.), stochastically orchestrated to add atomic (individual chord-level) and thematic (chord pattern-level) variability to the composed polyphonic pieces, compared with traditional evolutionary solutions producing monophonic and non-thematic music. We conducted a large battery of tests to evaluate MUSEC’s effectiveness and efficiency in both sentiment analysis and composition. It was trained on a specially constructed set of 120 MIDI pieces, including 70 sentiment-annotated pieces: the first significant dataset of sentiment-labeled MIDI music made available online as a benchmark for future research in this area. Results are encouraging and highlight the potential of our approach in different application domains, ranging over music information retrieval, music composition, assistive music therapy, and emotional intelligence.
... It is suggested that music generation approaches can be divided into two categories: transformational and generative algorithms [115]. Affective music synthesis models all fall within the latter category, which needs to create the musical structures, as opposed to acting on prepared structure [116]. Some models take visual information as input, some composite music for computer games related to dynamic environments, and others take advantage of human physiology or communication. ...
... Other affective music composition works include autonomous agent-assisted affective music generation [116], where users were supposed to communicate with an autonomous agent about their emotion before it composed music with the desired emotion preference. The agent was programmed with the basic compositional rule of tonal music. ...
Full-text available
Affective computing is an emerging interdisciplinary field where computational systems are developed to analyze, recognize, and influence the affective states of a human. It can generally be divided into two subproblems: affective recognition and affective generation. Affective recognition has been extensively reviewed multiple times in the past decade. Affective generation, however, lacks a critical review. Therefore, we propose to provide a comprehensive review of affective generation models, as models are most commonly leveraged to affect others' emotional states. Affective computing has gained momentum in various fields and applications, thanks to the leap of machine learning, especially deep learning since 2015. With critical models introduced, this work is believed to benefit future research on affective generation. We conclude this work with a brief discussion on existing challenges.
... These systems are shifting the focus to a direct user approach, where users can directly interact with the system in real-time to manipulate music and emotional expression. For instance, Robin [24,25] is an algorithmic composer utilising a rule-based approach to produce new music by changing certain structural musical cues depending on the user's choice of emotion terms or inputted values on a valence-arousal model; EDME [26], an emotiondriven music engine, generates music sequences from existing music fragments varying in pitch, rhythm, silence, loudness, and instrumentation features, depending on emotion terms or valence-arousal values selected by the user; the Emotional Music Synthesis (EMS) system uses a rule-based algorithm which controls seven structural parameters to produce music depending on valence-arousal values chosen by the user [27]. Hoeberechts and Shantz [28] created an Algorithmic Music Evolution Engine (AMEE) which is capable of changing six musical parameters in real-time to convey ten selected emotions, while Legaspi et al. [29] programmed a Constructive Adaptive User Interface (CAUI) which utilises specific music theory concepts such as chord inversions and tonality as well as cadences and other structural parameters to create personalised new music or alter existing ones depending on the user's preferences. ...
... Therefore, the user might not agree with how the music is being created or altered by the system to convey the desired emotion. Thirdly, not all systems have carried out validation studies that confirm that the music generated by the systems is indeed conveying the desired emotion [24,29,30]. Finally, these systems are unfortunately not accessible in any form. ...
Full-text available
Several computer systems have been designed for music emotion research that aim to identify how different structural or expressive cues of music influence the emotions conveyed by the music. However, most systems either operate offline by pre-rendering different variations of the music or operate in real-time but focus mostly on structural cues. We present a new interactive system called EmoteControl, which allows users to make changes to both structural and expressive cues (tempo, pitch, dynamics, articulation, brightness, and mode) of music in real-time. The purpose is to allow scholars to probe a variety of cues of emotional expression from non-expert participants who are unable to articulate or perform their expression of music in other ways. The benefits of the interactive system are particularly important in this topic as it offers a massive parameter space of emotion cues and levels for each emotion which is challenging to exhaustively explore without a dynamic system. A brief overview of previous work is given, followed by a detailed explanation of EmoteControl’s interface design and structure. A portable version of the system is also described, and specifications for the music inputted in the system are outlined. Several use-cases of the interface are discussed, and a formal interface evaluation study is reported. Results suggested that the elements controlling the cues were easy to use and understood by the users. The majority of users were satisfied with the way the system allowed them to express different emotions in music and found it a useful tool for research.
... The objective of our research is to extract features from Emotion Flow, specifically the Valence Curve and Arousal Curve [24], [25], and then systematically associate those features with the generated accompaniment. Previous research, as shown in the paper [22], used dynamic programming and template-matching methods to complete the Emotion-Flow Guided Accompaniment Generation. ...
Music accompaniment generation is a crucial aspect in the composition process. Deep neural networks have made significant strides in this field, but it remains a challenge for AI to effectively incorporate human emotions to create beautiful accompaniments. Existing models struggle to effectively characterize human emotions within neural network models while composing music. To address this issue, we propose the use of an easy-to-represent emotion flow model, the Valence/Arousal Curve, which allows for the compatibility of emotional information within the model through data transformation and enhances interpretability of emotional factors by utilizing a Variational Autoencoder as the model structure. Further, we used relative self-attention to maintain the structure of the music at music phrase level and to generate a richer accompaniment when combined with the rules of music theory.
... Estimated cognitive workload has been used e.g. for intelligent music tutoring systems [100] and automatic accompaniment [101]. Estimated affective states have been used for automatic generation of music, for composition [102], in computer games [103], or for modulating the affective user state [104]. Direct EEG sonification has also been explored as a way of representing mental states using auditory output [105], for monitoring, diagnostics, neuro-feedback, and communication. ...
Full-text available
Exponential increases of available computational resources, miniaturization, and sensors, are enabling the development of digital musical instruments that use non-conventional interaction paradigms and interfaces. This scenario opens up new opportunities and challenges in the creation of accessible instruments to include persons with disabilities into music practice. This work focuses in particular on instruments dedicated to people who can not use limbs, for whom the only means for musical expression are the voice and a small number of traditional instruments. First, a modular and adaptable conceptual framework is discussed for the design of accessible digital musical instruments targeted at performers with motor impairments. Physical interaction channels available from the neck upwards (head, mouth, eyes, brain) are analyzed in terms of potential and limitations for musical interaction. Second, a systematic survey of previously developed instruments is presented: each is analyzed in terms of design choices, physical interaction channels and related sensors, mapping strategies, performer interface and feedback. As a result of this survey, several open research directions are discussed, including the use of unconventional interaction channels, musical control mappings, multisensory feedback, design, evaluation, and adaptation.
... Evaluation on music in a general sense involves subjective assessment and qualitative judgment, and evaluation on the styles of music may only be harder, since it is on a higher, more abstract level of recognition. Because, unless there are specific goals or targets, evaluation is extremely difficult to be appropriately done with quantitative approaches, many studies proposed the incorporation of human components, usually for evaluating and guiding the created music pieces, and developed interactive systems [71][72][73][74][75][76][77][78][79] in order to strike a balance between arts and engineering. ...
Full-text available
Creative behavior is one of the most fascinating areas in intelligence. The development of specific styles is the most characteristic feature of creative behavior. All important creators, such as Picasso and Beethoven, have their own distinctive styles that even non-professional art lovers can easily recognize. Hence, in the present work, attempting to achieve cantus firmus composition and style development as well as inspired by the behavior of natural ants and the mechanism of ant colony optimization (ACO), this paper firstly proposes a meta-framework, called ants on multiple graphs (AntsOMG), mainly for roughly modeling creation activities and then presents an implementation derived from AntsOMG for composing cantus firmi, one of the essential genres in music. Although the mechanism in ACO is adopted for simulating ant behavior, AntsOMG is not designed as an optimization framework. Implementations can be built upon AntsOMG in order to automate creation behavior and realize autonomous development on different subjects in various disciplines. In particular, an implementation for composing cantus firmi is shown in this paper as a demonstration. Ants walk on multiple graphs to form certain trails that are composed of the interaction among the graph topology, the cost on edges, and the concentration of pheromone. The resultant graphs with the distribution of pheromone can be interpreted as a representation of cantus firmus style developed autonomously. Our obtained results indicate that the proposal has an intriguing effect, because significantly different styles may be autonomously developed from an identical initial configuration in separate runs, and cantus firmi of a certain style can be created in batch simply by using the corresponding outcome. The contribution of this paper is twofold. First, the presented implementation is immediately applicable to the creation of cantus firmi and possibly other music genres with slight modifications. Second, AntsOMG, as a meta-framework, may be employed for other kinds of autonomous development with appropriate implementations.
... While not directly related to the two previous works, the system Robin [11] offers an interactive tool for generating music based on a certain type of representation of emotions. In fact, the emotions are used as an interactive metaphor that allows the users to control the music. ...
Conference Paper
Full-text available
Literature work reading is an essential activity for human communication and learning. However, several relevant tasks as selection, filter or analyze in a high number of such works become complex. For dealing with this requirement, several strategies are proposed to rapidly inspect substantial amounts of text, or retrieve information previously read, exploiting graphical, textual or auditory resources. In this paper, we propose a methodology to generate audiovisual summaries by the combination of emotion-based music composition and graph-based animation. We applied natural language processing algorithms for extracting emotions and characters involved in literary work. Then, we use the extracted information to compose a musical piece to accompany the visual narration of the story aiming to convey the extracted emotion. For that, we set important musical features as chord progressions, tempo, scale, and octaves, and we assign a set of suitable instruments. Moreover, we animate a graph to sum up the dialogues between the characters in the literary work. Finally, to assess the quality of our methodology, we conducted two user studies that reveal that our proposal provides a high level of understanding over the content of the literary work besides bringing a pleasant experience to the user.
In musical instrument interfaces, such as piano keyboards, the player’s communication channels may be limited by the expressivity and resolution of input devices, the expressivity of relevant body parts, and human attention bottlenecks. In this chapter, we consider intelligent musical interfaces that can measure cognitive or affective states implicitly in real-time to allow musically appropriate adaptations by the system without conscious effort on the part of the user. This chapter focuses on two specific areas in music where the detection of cognitive and affective states has been applied to interaction design for music: musical learning (including learning instruments or pieces of music) and musical creativity (including composing and improvisation). The motivation, theory, and technological basis for work of this kind are discussed. Relevant existing work is considered. The design and evaluation of two systems of this kind for musical learning and musical creativity implemented by the authors is presented and critiqued.
Conference Paper
Full-text available
The Music Room is an interactive installation that allows visitor to compose classical music by moving throughout a space. The distance between them and their average speed maps the emotionality of music: in particular, distance influences the pleasantness of the music, while speed influences its intensity. This paper focuses on the evaluation of visitors' experience with The Music Room by examining log-data, video footages, interviews, and questionnaires, as collected in two public exhibitions of the installation. We examined this data to the identify the factors that fostered the engagement and to understand how players appropriated the original design idea. Reconsidering our design assumptions against behavioural data, we noticed a number of unexpected behaviours, which induced us to make some considerations on design and evaluation of interactive art.
Full-text available
Through our art project, Metaphone, we explored a particular form of aesthetics referred to in the arts tradition as machine aesthetics. The Metaphone machine collects the participant's bio-data, Galvanic Skin Response (GSR) and Heart Rate (HR), creating a process of movement, painting and sound. The machine behaves in machine-like, aesthetically evocative ways: a shaft on two large wheels rotates on the floor, carrying paint that is dripped onto a large sheet of aquarelle paper on the floor according to bio-sensor data. A soundscape rhythmically follows the bio-sensor data, but also has its own machine-like sounds. Six commentators were invited to interact with the machine. They reported a strangely relaxing atmosphere induced by the machine. Based on these experiences we discuss how different art styles can help to describe aesthetics in interaction design generally, and how machine aesthetics in particular can be used to create interesting, sustained, stylistically coherent interactions.
Full-text available
Automatic music classification is essential for implementing efficient music information retrieval systems; meanwhile, it may shed light on the process of human's music perception. This paper describes our work on the classification of folk music from different countries based on their monophonic melodies using hidden Markov models. Music corpora of Irish, German and Austrian folk music in various symbolic formats were used as the data set. Different representations and HMM structures were tested and compared. The classification performances achieved 75%, 77% and 66% for 2-way classifications and 63% for 3-way classification using 6-state left-right HMM with the interval representation in the experiment. This shows that the melodies of folk music do carry some statistical features to distinguish them. We expect that the result will improve if we use a more discriminable data set and the approach should be applicable to other music classification tasks and acoustic musical signals. Furthermore, the results suggest to us a new way to think about musical style similarity.
An overview of emerging topics, theories, methods, and practices in sonic interactive design, with a focus on the multisensory aspects of sonic experience. Sound is an integral part of every user experience but a neglected medium in design disciplines. Design of an artifact's sonic qualities is often limited to the shaping of functional, representational, and signaling roles of sound. The interdisciplinary field of sonic interaction design (SID) challenges these prevalent approaches by considering sound as an active medium that can enable novel sensory and social experiences through interactive technologies. This book offers an overview of the emerging SID research, discussing theories, methods, and practices, with a focus on the multisensory aspects of sonic experience. Sonic Interaction Design gathers contributions from scholars, artists, and designers working at the intersections of fields ranging from electronic music to cognitive science. They offer both theoretical considerations of key themes and case studies of products and systems created for such contexts as mobile music, sensorimotor learning, rehabilitation, and gaming. The goal is not only to extend the existing research and pedagogical approaches to SID but also to foster domains of practice for sound designers, architects, interaction designers, media artists, product designers, and urban planners. Taken together, the chapters provide a foundation for a still-emerging field, affording a new generation of designers a fresh perspective on interactive sound as a situated and multisensory experience. ContributorsFederico Avanzini, Gerold Baier, Stephen Barrass, Olivier Bau, Karin Bijsterveld, Roberto Bresin, Stephen Brewster, Jeremy Coopersotck, Amalia De Gotzen, Stefano Delle Monache, Cumhur Erkut, George Essl, Karmen Franinović, Bruno L. Giordano, Antti Jylhä, Thomas Hermann, Daniel Hug, Johan Kildal, Stefan Krebs, Anatole Lecuyer, Wendy Mackay, David Merrill, Roderick Murray-Smith, Sile O'Modhrain, Pietro Polotti, Hayes Raffle, Michal Rinott, Davide Rocchesso, Antonio Rodà, Christopher Salter, Zack Settel, Stefania Serafin, Simone Spagnol, Jean Sreng, Patrick Susini, Atau Tanaka, Yon Visell, Mike Wezniewski, John Williamson
The evolutionary computation approach to music is an exciting new development for composers and musicologists alike. For composers, it provides an innovative and natural means for generating musical ideas from a specifiable set of primitive components and processes. For musicologists, these techniques are used to model the cultural transmission and change of a population's body of musical ideas over time. In both cases, musical evolution can be guided by a variety of constraints and tendencies built into the system, such as realistic psychological factors that influence the way music is expressed, experienced, learned, stored, modified, and passed on among individuals. This book discusses not only the applications of evolutionary computation to music, but also the tools needed to create and study such systems. These tools are drawn in part from research into the origins and evolution of biological organisms, ecologies, and cultural systems on the one hand, and from computer simulation methodologies on the other. They can be combined to create surrogate artificial worlds populated by interacting simulated organisms in which complex musical experiments can be performed that would otherwise be impossible. This authoritative book, with contributions from experts from around the globe, demonstrates that evolutionary systems can be used to create and to study musical compositions and cultures in ways that have never before been achieved. Eduardo Reck Miranda is a Professor in Computer Music at the University of Plymouth, UK, where he heads the Interdisciplinary Centre for Computer Music Research (ICCMR). He has recently been appointed the Edgard Varèse Guest Professor of Computer Music at the Technical University of Berlin. Al Biles is a Professor and the Undergraduate Program Coordinator in the Department of Information Technology at the Rochester Institute of Technology in Rochester, New York. Between performances with GenJam over the last thirteen years, he has been active in helping establish information technology as a recognized academic discipline.
Conference Paper
Music has the ability to cause intense emotional experiences with perceptible physical reactions as their outcome [1?3]. ?Chills? or ?goose bumps? have been found to be reliable indicators of emotional peaks in music listening through the combination of self-reporting methods and the concrete measurements of physical reactions [4?6]. In this way it is possible to identify musical passages that are especially effective at producing emotional peak experiences [5]. These experiences can be evoked in many ways and are often connected to music on a highly subjective level [7]. Nevertheless, large groups of listeners often seem to agree on the effectiveness of certain music and even specific passages in this regard [5, 6, 8].