Conference PaperPDF Available

Sound recycling from public databases: Another BigData Approach to Sound Collections

Authors:

Abstract and Figures

Discovering new sounds from large databases or Internet is a tedious task. Standard search tools and manual exploration fails to manage the actual amount of information available. This paper presents a new approach to the problem which takes advantage of grown technologies like Big Data and Machine Learning, keeping in mind compositional concepts and focusing on artistic performances. Among several different distributed systems useful for music experimentation, a new workflow is proposed based on analysis techniques from Music Information Retrieval (MIR) combined with massive online databases, dynamic user interfaces, physical controllers and real-time synthesis. Based on Free Software tools and standard communication protocols to classify, cluster and segment sound. The control architecture allows multiple clients request the API services concurrently enabling collaborative work. The resulting system can retrieve well defined or pseudo-aleatory audio samples from the web, mix and transform them in real-time during a live-coding performance, play like another instrument in a band, as a solo artist combined with visual feedback or working alone as automated multimedia installation.
Content may be subject to copyright.
Sound recycling from public databases
Another BigData approach to sound collections
Hern´an Ordiales
RedPanal.org
MediaLab CASo
Riobamba 985
Buenos Aires C1116ABC, Argentina
h@ordia.com.ar
Mat´ıas Lennie Bruno
RedPanal.org
MediaLab CASo
Riobamba 985
Buenos Aires C1116ABC, Argentina
matias.lennie@casadelbicentenario.gob.ar
ABSTRACT
Discovering new sounds from large databases or Inter-
net is a tedious task. Standard search tools and manual
exploration fails to manage the actual amount of infor-
mation available. This paper presents a new approach
to the problem which takes advantage of grown tech-
nologies like Big Data and Machine Learning, keeping
in mind compositional concepts and focusing on artistic
performances. Among several dierent distributed sys-
tems useful for music experimentation, a new workflow is
proposed based on analysis techniques from Music Infor-
mation Retrieval (MIR) combined with massive online
databases, dynamic user interfaces, physical controllers
and real-time synthesis. Based on Free Software tools
and standard communication protocols to classify, clus-
ter and segment sound. The control architecture allows
multiple clients request the API services concurrently
enabling collaborative work. The resulting system can re-
trieve well defined or pseudo-aleatory audio samples from
the web, mix and transform them in real-time during a
live-coding performance, play like another instrument in
a band, as a solo artist combined with visual feedback
or working alone as automated multimedia installation.
CCS CONCEPTS
Applied computing Performing arts
;
Sound
and music computing
;
Information systems
Multimedia databases
;Digital libraries and archives ;
Collaborative search; RESTful web services; Clustering;
Publication rights licensed to ACM. ACM acknowledges that
this contribution was authored or co-authored by an employee,
contractor or aliate of a national government. As such, the
Government retains a nonexclusive, royalty-free right to publish or
reproduce this article, or to allow others to do so, for Government
purposes only.
AM ’17, London, United Kingdom
©
2017 Copyright held by the owner/author(s). Publication rights
licensed to ACM. 978-1-4503-5373-1/17/08. . . $15.00
DOI: 10.1145/3123514.3123550
Computing methodologies
Classification and
regression trees; Markov decision processes;
KEYWORDS
Music Information Retrieval, BigData, Audio discovery,
Performance, Experimental, Live coding, Collaborative,
NetMusic, Realtime, User Interface, Machine Learning
ACM Reference format:
Hern´an Ordiales and Mat´ıas Lennie Bruno. 2017. Sound
recycling from public databases. In Proceedings of AM ’17,
London, United Kingdom, August 23–26, 2017, 8pages.
DOI: 10.1145/3123514.3123550
1INTRODUCTION
Nowadays many musicians use the same tools, both
for composition as live performances. From acoustic or
electronic instruments to the same computer software
for synthesis or real-time processing. Even the same
workflows or eects chains. Leading to similar results, as
Curtis Roads mentions in the opening of one of his books
“Each tool opens up aesthetic possibilities but also imposes
aesthetic constraints” [
25
]. On the other hand, in the
technical domain (not music related) people are realizing
the potential of Big Data and Analytics, leaving as an
indirect consequence the easy access to the IT resources
that enable those.
In the music domain, that can reflect in new instru-
ments based on the web or local networks, using new
sounds discovered from large online databases, both pre-
viously classified by Music Information Retrieval (MIR)
techniques or by clustering algorithms. Also, the synthe-
sis process could be rethought using already available
live-coding tools, some of them, usually reserved only
for a small group of highly qualified users. The Internet
and the incredible amount of dierent information also
allow the rising of new collaboration processes and com-
position techniques. Even the live-coding environment
can improve by adding dynamic graphic user interfaces
to control the whole or part of the process (based on the
usage or public participation).
AM ’17, August 23–26, 2017, London, United Kingdom H. Ordiales et al.
2RELATEDWORK
Although in the past decade most of the topics covered by
this work had many publications, most related research
are either on MIR, Big Data or music performances.
Little existing work combines these three areas. One
from Music Technology Group [
26
] (MTG) developed
an approach, identifying loops in unstructured audio
and presents an instrument prototype, but not focusing
on it neither the performer experience, just working on
Freesound.org web sounds and its public (but close in
the open-source sense) API [
1
]. More focused in Music
Information Retrieval or Audio Content Analysis [
15
],
there do exist impressive works oriented to classify or
organize large audio databases [
2
][
23
] or with a pedagog-
ical intentions [
30
] but they haven’t applied on profes-
sional performing. There’s also some work regarding net-
work architectures and collaborative development [
17
][
8
]
proposing frameworks for algorithmic composition or
performing, using digital musical instruments and about
exploring musical collections [
9
] through graphical views,
allowing navigation of the visual representations and
hear those samples or audio extracts. Regarding sonifi-
cation using online datasets, there is the ATLAS project
which translate CERN data into sound, more precisely
mapping as notes and rhythms [13].
This work also explores and proposes adapting user
interfaces to control sound in real-time.
3WHYSOUNDRECYCLING?
The actual online amount of recorded audio is impossible
to hear and manage in measurable time. Unquestionably,
resources availability is evolving faster than musical tools
and workflows. The lack of a previous classification and
easy access to those sounds through uncomplicated and
intuitive interfaces leaves open a world of possibilities
and enables collaborative developments. For example,
nowadays there are about a billion of Creative Com-
mons [
10
] (CC) licensed works (of all kinds) available
online (see Figure 1). Audio related content beyond mu-
sic pieces such as samples, live and speech recordings
are hosted on online platforms like the Internet Archive,
SoundCloud, Freesound.org, Free Music Archive (FMA),
Redpanal.org, ccMixter and others. Audio tracks reach
4 million, videos and other multimedia are close to 43
million [
21
]. Initiatives like Europeana aggregate content
from national libraries and institutions. This content
had been shared by CC or public domain [11].
In this context, the development of new ways of live
collaboration with musical goals and the exploitation of
available resources is mandatory to take advantage and
achieve new results.
Figure 1: Growth of Creative Commons licensed
works
4DEFININGAWORKFLOWFORLIVE
PERFORMANCES
A series of guidelines were defined to model the whole
process.
Use the availability of technologies from Big
Data and Analytics, especially those related with
Music Information Retrieval
Modular design
Enable collaboration over a private or public
network
Free software components
Cross-platform (at least Windows, Mac, Linux
and ARM OS)
Live-coding oriented, live performances and ex-
perimental music composition
Include physical controller
Dynamic user interface
Use of well-known standards and open protocols
(MIDI, OSC)
5ARCHITECTUREDESIGN
A distributed system, including an online database, a
public REST API, controllers and synthesis engine tools
is proposed as a solution (see Figure 2). This workflow
can be useful for oine work but is also eective for
live performances or multimedia installations. A primary
goal of a distributed system is to appear to the users of
the system as a single computer beyond if it is composed
oa collection of independent computers network or not.
The end user must see a single coherent entity [28].
In the proposed system, sounds are retrieved from a
local database or network or the Internet (Cloud) using
and standard REST API. When and how to retrieve
them can be triggered from various controllers (digital
Sound recycling from public databases AM ’17, August 23–26, 2017, London, United Kingdom
Figure 2: Live performance workflow
or physical) over the local network. And control how
to perform the synthesis process too, modifying it in
real-time, maybe concurrently (many users) because this
architecture allows many clients. Under this architec-
ture, there is the option to be deterministic or stochastic.
Stochastic procedures dierentiate from a deterministic
one since it integrates random choice into the decision
making process [
25
]. Under this scheme, every time a
sample is retrieved, there are many candidates (or neigh-
bors in a cluster) and if one of them is chosen using a
random function (in a direct or indirect way) the system
is defined as a stochastic one. That enables the option
of the kind of the decision making to the composer.
6IMPLEMENTATIONOFTHECLOUD
INSTRUMENT
All the software, web services, samples and documen-
tation built were named APICultor (standing as the
Application Programing Interface of the Culture). All
the tools are publicly available online
1
. This work makes
extensive use of techniques from Music Information Re-
trieval or Audio Content Analysis (ACA) [
15
] to extract
audio features or descriptors, in order to apply them
with musical purposes. It started as a dedicated soft-
ware, evolved and became into a framework for live music
performances. Built in Python (a well known general
purpose language) includes sound pre-processing, MIR
analysis using Essentia [
4
] and LibRosa [
19
] libraries. In
RedPanal.org case, the sounds were previously processed,
classified, clustered and segmented using the Essentia [
4
]
and aubio[
5
] libraries to perform MIR analysis. An of-
fline demo database was also built, including a mock of
1
Another Big Data approach to sound collections. Available at
https://github.com/sonidosmutantes/apicultor.
an HTTP API for testing purposes. Engine tools like
SuperCollider [
18
] or Python plus Pyo [
3
] module are
proposed as a solution, although any software which
performs real-time process and can receive MIDI or OSC
messages could be used.
MIR Descriptors and sound textures
Defining a sound texture is not an easy task, but many
agree to be treated as a low-level phenomenon [
27
]. Sim-
ple sound elements are called atoms, and can be classified
as a low-level property, and their distribution and ar-
rangement as a high-level one. There is some consensus
among people about that a sound texture should exhibit
similar characteristics over the time.
Audio features or descriptors extracted from raw input
after pertinent analysis can be thought in three levels
or data representations. Low-level descriptors like Cep-
strum, spectral flux (amount of change of spectral shape),
HFC and LFC (high and low frequency content). Mid-
level representations such as pitch, onsets or beats. And
high-level like music style, artist, mood, etc. Another
classification consists in separate into spectral, temporal,
tonal or rhythm descriptors, and there is no limitation
into building new ones to describe new features [12].
For example, using the Spectral Centroid calculation,
which shows the balancing point of the spectrum (in
other words its center of gravity). It determines the
frequency area around which most of the signal energy
concentrates and in the literature is often correlated with
the timbre dimension brightness or sharpness [
15
]. And
is defined as the frequency-weighted sum of the power
spectrum normalized by its unweighted sum:
Centroid(n)=PK/21
k=0 k|X(k, n)|2
PK/21
k=0 |X(k, n)|2
With
X
(
k, n
) representing the Short Time Fourier Trans-
form (STFT) of the frame
n
and
k
moving along each
frequency bin (block length equal to
K
). Resulting value,
can be converted to a parameter in the 0
..
1 range divid-
ing by (
K
2
1) or to frequency value using
Centroid
(
n
)
Fs/N
with
Fs
sampling rate and N the number of FFT
points used. Tweaking this value could be one of the
many approaches to obtain dierent sounds with diverse
textures.
A more elaborated use, consist in define or configure
many MIR descriptor values in order to find a sound
that matches all of them. Even “
x, y
” diagrams which
compares two descriptors could be useful and interesting
to find new sounds. Like in the Figure 3 where guitar
sounds are grouped according those descriptors and num-
bers representing the ID of each sound. Later in section 6
AM ’17, August 23–26, 2017, London, United Kingdom H. Ordiales et al.
this idea is developed and an user interface with “
x, y
widgets” or controls is showed (Figure 8).
Figure 3: Mean dissonance value vs. high hrequency
content mean
Early experiments
State machine (Markov-process). One of the first exper-
iments include an automated music machine. A Markov
model [
22
] was used, and for each state a sound with
dierent MIR descriptors was defined. Each “MIR state”
in the meaning of a series of values from each descrip-
tor (for example mean values for Inharmonicity or low
frequency content) plus the probability of transition as-
sociated with each state, see Figure 4. This schema could
be thought as a stochastic composition using web sounds
as the source. Useful for multimedia installations.
Figure 4: Simple MIR sound state machine
As Figure 5 shows is very simple to “compose” or
define transitions in plain text. With another JSON file
defining MIR states, giving values for each proposed
descriptor, all the needed as “score” for the system is
fulfilled.
{”statesArray”:[
{”id”:0,”text”:”harmonic” },
{”id”:1,”text”:”inharmonic },
{”id”:2,”text”:”idle” }
],
”linkArray”:[
{”from”:0,”to”:0,”text”:0.7},
{”from”:0,”to”:1,”text”:0.1},
{”from”:1,”to”:0,”text”:0.1},
{”from”:1,”to”:1,”text”:0},
{”from”:1,”to”:2,”text”:0.9},
{”from”:2,”to”:1,”text”:0.1},
{”from”:0,”to”:2,”text”:0.2},
{”from”:2,”to”:0,”text”:0.4},
{”from”:2,”to”:2,”text”:0.5}
],
”statesArray”:[
{”id”:0,”mir” :[{
”content”:”hang”,
”sfx . duration”:TO 3,
”sfx . inharmonicity.mean:0.1}]},
{”id”:1,”mir” :[{
”sfx . duration”:7.2,
”lowlevel. hfc.mean:TO 0.0005,
”lowlevel. spectral complexity .mean”:1}]
}]
}
Figure 5: JSON definition of the state machine
Clustering. Identifying clusters using a k-means algo-
rithm [
16
] which allows predefining the number of groups
wanted, so them can be associated to each of the states
from Section 6. A concept, idea or sound texture can
be settled for each cluster and guarantee similar sound
characteristics, choosing the correct MIR descriptors and
values for each one. This allows automated performances
evolving in time, always dierent but with the same
underlying structure.
This algorithm, minimizes the neighbors distance us-
ing:
arg min
k
X
i=1
X
x2Si
kxµik2
Where
x
1
,x
2
,...,xn
are each observation or descrip-
tor value, and
µ
its mean. The value of
i
represents each
of the kclusters required.
Visualization is also useful, for example, clustering
sounds from a test dataset using Inharmonicity as a
value, each radio circle represent the amount of neighbors
for each cluster (the number is the ID of the sound), see
Figure 6.
On the other hand, its possible to find outliers and dis-
card unwanted samples by using a hierarchical clustering
algorithm.
Sound recycling from public databases AM ’17, August 23–26, 2017, London, United Kingdom
Figure 6: Inharmonicity clusters
Prototype
Our first prototype instrument was based on the work-
flow described in Figure 2 and keeping in mind that the
audience will be developers and experimental musicians
with some algorithmic knowledge. To test the system,
Free Libre Sounds were used as a source, licensed un-
der Creative Commons licenses [
10
] from RedPanal.org.
Some short public live performances were produced to
demonstrate and test all the technologies involved.
All the development was influenced by agile method-
ologies and software engineering techniques in general.
Including, iterative and incremental approach [
7
], auto-
matic testing, version control and branching strategies.
That derived in creating a developing community with
collaborators on dierent places all over the world. The
system was contrasted periodically with real users dur-
ing live-coding and live performance sessions. Keeping
in mind all the guidelines (defined in Section 4) during
the process and testing it consistently in many OS like
Linux (Ubuntu from 15.04 to 16.10), Mac OS (10.11 El
Capitan and Sierra), and Windows 10 operating systems.
The synthesis module was also tested in a Raspberry Pi
architecture (using Raspbian OS).
API calls are based on dierent MIR descriptors like
tempo, rhythm, mood, frequency content and others.
Working with frame values (time series of feature vectors)
or statistical ones like mean or variance when needed.
Eects chain
Freeze eect plays a central part in the proposed workflow
because allows managing previously unknown sounds,
achieving infinite sustain and privileging (or not) har-
monic content. Following the chain of Figure 7 unknown
segmented samples retrieved from a database can fit and
be played during a live performance. Basic parameters
are the width of the freeze and start position. Other real-
time eects like pitch shifter, modulation, and panning
(and others) help to elaborate the sound modeling.
Figure 7: Freeze FX chain
Supercollider [
18
] synth based on a standard library
freeze eect is enough to obtain decent results.
With Pyo (the Python DSP toolbox [
3
]) a tool oriented
to music composers who wants to build tools, is easy
to implement much of the real-time spectral processing
like the freeze eect using Phase Vocoder [
6
] or granular
synthesis.
An attractive option to freeze audio signals in time do-
main consists in using the audio extrapolation method [
14
].
Using an autoregressive processes (AR) where each time
domain signal is modeled by
x(n)=Xakx(nk)
And
a1,a
2,...,a
p
are the AR coecients (that need to
be identified by another algorithm).
UI Design
Basic approach is to use currently available commercial
controllers (capable of send MIDI or OSC messages)
to control MIR parameters, sample change, eects con-
figuration, etc. Pads, knobs, sliders, xy controls, touch
surfaces (tablets or mobile phones) and keyboards are
useful, but not enough to explore the new dynamics and
possibilities enabled by the proposed workflow.
Dynamic UI design using microservice provided by
OpenStageControl
2
(sends OSC and MIDI) enables
2
AlibredesktopOSCbi-directionalcontrolsurfaceapplication.
Available at http://osc.ammd.net.
AM ’17, August 23–26, 2017, London, United Kingdom H. Ordiales et al.
Figure 8: MIR State UI Prototype
UI prototyping and live edition from a browser. Indeed
allows concurrent access and widget synchronization,
allowing collaborative uses. MIR descriptors values are
controlled with knobs and sliders, before sending them
to the synthesis process (see Figure 8). Its also possi-
ble to retrieve neighbors of the cluster defined by
x, y
controls which, for example, allows to define clusters
quickly using SpectralCentroid and Dissonance or HFC
(High Frequency Content) and Tempo or every desired
combination.
7SEARCHINGFORNEWCOMPOSING
TECHNIQUES
Some lines of work have been explored and incorporated
to aforementioned APICultor framework which includes
linking or matching sound processes with climate ones
(using online datasets and freeze eect). The use of
complementary no-input techniques [
29
] and geographic
and demographic info of a determinate place to link with
folk music.
The architecture (Section 5) is open to deterministic
or stochastic compositions, only changing the nature of
the algorithms involved in the retrieving method and
sample selection. Either choosing which one of the group
or when to change the sample sound.
Manipulation of the samples through freeze eect (see
Figure 7) allows establishing textural concept of music
instead of the typical approach based on defined melodies
or motives.
On the other hand, most eorts were centered in con-
structing ecient models with intrinsic value, reflecting
creative ideas and giving meaning to the process. Having
a proper care in the calculation of the aforementioned
MIR descriptors, testing and contrasting them experi-
mentally.
Slightly explored fields (not used yet in a live per-
formance) includes Big Data sonification and allowing
participation from the public through mobile devices
and local area networks.
8RESULTS
The new software APICultor was achieved successfully
enabling live-performances since its beginnings. Sounds
from the Cloud (RedPanal.org and Freesound.org) or
local databases were retrieved without problems using
the same REST API. These calls were based on MIR de-
scriptors previously selected such tempo, rhythm, mood,
frequency content (and others) allowing the composer
surface its criteria to sample select. Using live-coding
tools and convenient real-time eects like pitch transpos-
ing (shifting) and freeze eect (to achieve an harmoni-
cally sustained sound) to process those samples during
live performances. MIDI controllers and OSC devices
like mobile phones and tablets were used to trigger those
sounds in real-time. A couple of interdisciplinary perfor-
mances combined with other musicians and conventional
instruments took place in Buenos Aires during 2016. In-
cluding “La Casa del Bicentenario” and “Noche de los
museos”.Sound recycling emerged as a side work and
useful tool for electronic musicians. And in context with
the contemporary world, the Internet and the amount
of sound data public available on the web.
Also, a soundwork named “Dialectic in suspense” was
composed and performed using the proposed workflow
and tools. The central theme treats conflicting relation
between nature and human contradictory development.
Natural spaces and ambient sounds mixed with human
residual pollution, are combined with real-time audio
and data processing that shows both human and na-
ture strategies to overcome the critical anthropocentric
presence.
No specific operating system is required, all the tools
involved were tested and worked well on all the platforms
(Windows, Mac, Linux x86 and ARM). All the guidelines
in section 4 were accomplished (in more or less detail).
The final instrument architecture, including the User
Interface to Live Coding, can be seen in Figure 9 and
samples can be heard at RedPanal website 3.
9CONCLUSIONSANDFUTUREWORK
New workflows were tested with success, and during
the process, new related research paths were discovered
like dynamic UI and online public participation through
setting MIR features. APICultor fulfilled its goals and
3
Sound recycling of the sounds from the RedPanal free culture com-
munity. Available at http://redpanal.org/p/reciclado-de-samples.
Sound recycling from public databases AM ’17, August 23–26, 2017, London, United Kingdom
Figure 9: Cloud Instrument
became a framework, although because of its experi-
mental nature, is still in development. The architecture
based on Big Data, networks, controllers and a synthesis
process proved to be useful, original and interesting to
many musicians. On the other hand, , the experimental
prototype performed live with commitment but without
notice, because without a previous explanation nobody
of the audience can perceive the nature of the instrument
since by definition a distributed system should appear
as a single one. Was observed that this system could
be easily integrated with a real-time visualization en-
gine, taking MIR descriptors or final sound as the input
source. Even an exciting option is to take information
from a previously modeled 3D environments (i.e. games
or buildings) and add to the process in real-time, having
feedback and sound modeling at the same time [24].
In the remaining work, there is the option of enhanc-
ing the sample recommendation process during the live
performance (even the choice of UI advice based on past
use) and provide the system as a Docker [
20
] image, to
have dependencies into lightweight container package
and avoid any dependencies trouble.
ACKNOWLEDGMENTS
Thanks to the support of the MediaLab/CASo (Centro
de Arte Sonoro) from “La Casa del Bicentenario” Ciudad
Autonoma de Buenos Aires, Argentina. RedPanal and
APICultor community.
REFERENCES
[1]
Vincent Akkermans, Frederic Font, Jordi Funollet, Bram De
Jong, Gerard Roma, Stelios Togias, and Xavier Serra. [n.
d.]. FREESOUND 2: AN IMPROVED PLATFORM FOR
SHARING AUDIO CLIPS. ([n. d.]). http://mtg.upf.edu/
system/files/publications/freesound
[2]
Paolo Annesi, Roberto Basili, Raaele Gitto, Alessandro Mos-
chitti, and Riccardo Petitti. 2007. Audio Feature Engineering
for Automatic Music Genre Classification. In RIAO2007.
http://art.uniroma2.it/publications/docs/2007
[3]
Olivier Belanger and Olivier. 2016. Pyo, the Python DSP
toolbox. In Proceedings of the 2016 ACM on Multimedia
Conference - MM ’16.ACMPress,NewYork,NewYork,
USA, 1214–1217. https://doi.org/10.1145/2964284.2973804
[4]
D Bogdanov, Nicolas Wack, Emilia G´omez, Sankalp Gulati,
P Herrera, O Mayor, G Roma, J Salamon, J Zapata, and
Xavier Serra. 2013. ESSENTIA: An audio analysis library for
music information retrieval. Proceedings of the International
Conference on Music Information Retrieval (ISMIR) (2013),
493–498.
[5]
Paul M Brossier. 2006. The Aubio Library at MIREX 2006. In
Proc. International Society for Music Information Retrieval
Conference.
[6]
Jean-Fran¸c Ois Charles. 2008. A Tutorial on Spectral Sound
Processing Using Max/MSP and Jitter. (2008). http://www.
mitpressjournals.org/doi/pdf/10.1162/comj.2008.32.3.87
[7]
Alistair Cockburn. 2008. Using Both Incremental
and Iterative Development. CrossTalk: The Jour-
nal of Defense Software Engineering 21, 5 (2008), 27–
30. http://alistair.cockburn.us/Using+both+incremental+
and+iterative+development
[8]
Diego Dorado and Mat´ıas Zabalj´auregui. 2016. Opensemble:
Aframeworkforcollaborativealgorithmicmusic.(2016).
[9]
Stephane Dupont, Thomas Dubuisson, Jerome Urbain,
Raphael Sebbe, Nicolas D’Alessandro, and Christian Fris-
son. 2009. AudioCycle: Browsing musical loop libraries. In
7th International Workshop on Content-Based Multimedia
Indexing, CBMI 2009. 73–80. https://doi.org/10.1109/CBMI.
2009.19
[10]
Terry Flew. 2005. Creative Commons and the Creative In-
dustries. Media and Arts Law Review 10, 4 (2005), 257–264.
[11]
Frederic Font, T. Brookes, G. Fazekas, M. Guerber, A. La
Burthe, D. Plans, M. Plumbey, M. Shaashua, W. Wang,
and Xavier Serra. 2016. Audio Commons: bringing Creative
Commons audio content to the creative industries. 61st AES
Conference on Audio for Games (2016). http://mtg.upf.edu/
node/3423
[12]
Georoy Peeters. 2014. A large set of audio features for sound
description (similarity and classification) in the CUIDADO
project. Technical Report. http://recherche.ircam.fr/anasyn/
peeters/ARTICLES/Peeters
[13]
Ewan Hill, Juliana Cherston, Steven Goldfarb, and Joseph A
Paradiso. 2016. ATLAS data sonification: a new interface
for musical expression and public interaction ATLAS data
sonification: a new interface for musical expression. (2016).
http://pos.sissa.it/
[14]
I Kauppinen and K Roth. 2002. Audio signal extrapola-
tion Theory and applications. Proceedings of the 5th In-
ternational Conference on Digital Audio Eects (DAFx-02)
1 (2002), 105–110. http://www.unibw-hamburg.de/EWEB/
ANT/dafx2002/papers/DAFX02
[15]
Alexander Lerch. 2012. An introduction to audio con-
tent analysis: Applications in signal processing and mu-
sic informatics. https://doi.org/10.1002/9781118393550
arXiv:arXiv:1011.1669v3
AM ’17, August 23–26, 2017, London, United Kingdom H. Ordiales et al.
[16]
David J. C. MacKay. 2004. Information theory, inference,
and learning algorithms. Cambridge University Press, Chap-
ter 20, 628. http://www.inference.phy.cam.ac.uk/mackay/
itila/book.html
[17]
Joseph Malloch, Stephen Sinclair, and Marcelo M. Wander-
ley. 2008. A network-based framework for collaborative de-
velopment and performance of digital musical instruments.
In Proceedings of the International Conference On Sound
and Music Computing. Vol. 4969. Springer Berlin Heidel-
berg, Berlin, Heidelberg, 401–425. https://doi.org/10.1007/
978-3- 540-85035- 9 28
[18]
James McCartney. 1996. SuperCollider: a new real time
synthesis language. In Proceedings of the 1996 Interna-
tional Computer Music Conference. 257–258. http://www.
audiosynth.com/icmc96paper.html
[19]
Brian Mcfee, Colin Rael, Dawen Liang, Daniel P W Ellis,
Matt Mcvicar, Eric Battenberg, and Oriol Nieto. 2015. librosa:
Audio and Music Signal Analysis in Python. PROC. OF
THE 14th PYTHON IN SCIENCE CONF (2015). http:
//dawenl.github.io/publications/McFee15- librosa.pdf
[20]
Dirk Merkel. 2014. Docker: lightweight Linux containers for
consistent development and deployment. (2014), 2 pages.
https://doi.org/10.1097/01.NND.0000320699.47006.a3
[21]
Ryan Merkley. 2015. State of the Commons 2015.
Creative Commons Newsletter (2015). https://stateof.
creativecommons.org/2015/
[22]
F. Richard Moore. 1991. Elements of Computer Music.Vol.1.
Prentice-Hall, Inc., Upper Saddle River, NJ, USA. 110 pages.
https://doi.org/10.2307/1513134
[23]
Nikolaos Nikolaou. 2011. Music Emotion Classification.
(2011).
[24]
Natanael Olaiz, Pau Arumi, Toni Mateos, and David Garcia.
2009. 3D-audio with CLAM and Blender’s Game Engine.
Linux Audio Conference (2009). http://lac.linuxaudio.org/
2009/cdm/Thursday/05
[25]
Curtis Roads. 2015. Composing Electronic Music: A New
Aesthetic. Oxford University Press.
[26]
Gerard Roma and Xavier Serra. 2015. Music performance by
discovering community loops. (2015). http://mtg.upf.edu/
node/3177https://www.youtube.com/watch?v=uDC3M
[27]
Nicolas Saint-Arnaud and Kris Popat. 1998. Analysis and
Synthesis of Sound Textures. (1998). http://alumni.media.
mit.edu/
[28]
Andrew S. Tanenbaum and Maarten van. Steen. 2007. Dis-
tributed systems : principles and paradigms. Pearson Prentice
Hall. 686 pages. https://books.google.com.ar/books/about/
Distributed
[29]
Dominic Thibault. 2015. fXfD, an Digital Ap-
proach to the No-Input Practice. ICMC (2015).
https://quod.lib.umich.edu/cgi/p/pod/dod-idx/
fxfd-a-digital-approach-to-the- no-input- practice.pdf ?
c=icmc; idno=bbp2372.2015.053
[30]
Anna Xamb´o, Alexander Lerch, and Jason Freeman. 2016.
Learning to code through MIR. Extended abstracts for the
Late-Breaking Demo Session of the 17th International Society
for Music Information Retrieval Conference (2016), 2–4.
... Music live coding is a music improvisation practice that is based on generating code in real time by either writing it directly or using interactive programming [4,9,12,21]. With a few exceptions [18], most of the examples of audio repurposing in live coding use audio clips retrieved by textual queries, as shown in live coding sessions with programming languages for music such as Gibber [19] and Tidal [17], social media remixing tools such as Live Coding Youtube [13], and data-agnostic tools such as DataToMusic [27]. In gibberwocky [20], multiple audio clips are similarly used from a sound palette managed in Ableton Live connected to the live coding environment. ...
... A number of systems have explored content-based audio retrieval in creative domains, such as Floop [23], Freesound Explorer [11], BBCut [8], LoopMashVST,9 CataRT [25], and APICultor [18], which can inform our application to live coding. The common trends of these systems are: (1) content-based retrieval is combined often with textual queries which give flexibility and a higher level query closer to the human language (Floop, Freesound Explorer); (2) a combination of low-level and mid-level content-based queries used for real time interation (CataRT); (3) generally, the descriptors available are limited to a small number to avoid the complexity of navigating a high-dimensional space (Floop, Freesound Explorer, BBCut, LoopMashVST); (4) the systems start from filtered subspaces of the database that assure the sound results are within a range with the purpose of having a more controlled subspace (Floop, Freesound Explorer); (5) most of these systems are based on GUIs or hardware to control the MIR parameters (LoopMashVST, CataRT, APICultor). ...
... The commands related to query by content were the most frequently used by the performer. 18 Target sounds were retrieved by mid-level descriptors (e.g., bpm, pitch), and similar sounds were found by filtering results according to some metrics (e.g., 120 bpm, pitch with high confidence measure). When rehearsing with the tool, the performer found preferred combinations of parameters and values (e.g., applying a low confidence measure to be able to create contrast). ...
Conference Paper
Full-text available
The recent increase in the accessibility and size of personal and crowdsourced digital sound collections brought about a valuable resource for music creation. Finding and retrieving relevant sounds in performance leads to challenges that can be approached using music information retrieval (MIR). In this paper, we explore the use of MIR to retrieve and repurpose sounds in musical live coding. We present a live coding system built on SuperCollider enabling the use of audio content from online Creative Commons (CC) sound databases such as Freesound or personal sound databases. The novelty of our approach lies in exploiting high-level MIR methods (e.g., query by pitch or rhythmic cues) using live coding techniques applied to sounds. We demonstrate its potential through the reflection of an illustrative case study and the feedback from four expert users. The users tried the system with either a personal database or a crowdsourced database and reported its potential in facilitating tailorability of the tool to their own creative workflows.
... Music live coding is a music improvisation practice that is based on generating code in real time by either writing it directly or using interactive programming [4,9,12,21]. With a few exceptions [18], most of the examples of audio repurposing in live coding use audio clips retrieved by textual queries, as shown in live coding sessions with programming languages for music such as Gibber [19] and Tidal [17], social media remixing tools such as Live Coding Youtube [13], and data-agnostic tools such as DataToMusic [27]. In gibberwocky [20], multiple audio clips are similarly used from a sound palette managed in Ableton Live connected to the live coding environment. ...
... A number of systems have explored content-based audio retrieval in creative domains, such as Floop [23], Freesound Explorer [11], BBCut [8], LoopMashVST,9 CataRT [25], and APICultor [18], which can inform our application to live coding. The common trends of these systems are: (1) content-based retrieval is combined often with textual queries which give flexibility and a higher level query closer to the human language (Floop, Freesound Explorer); (2) a combination of low-level and mid-level content-based queries used for real time interation (CataRT); (3) generally, the descriptors available are limited to a small number to avoid the complexity of navigating a high-dimensional space (Floop, Freesound Explorer, BBCut, LoopMashVST); (4) the systems start from filtered subspaces of the database that assure the sound results are within a range with the purpose of having a more controlled subspace (Floop, Freesound Explorer); (5) most of these systems are based on GUIs or hardware to control the MIR parameters (LoopMashVST, CataRT, APICultor). ...
... The commands related to query by content were the most frequently used by the performer. 18 Target sounds were retrieved by mid-level descriptors (e.g., bpm, pitch), and similar sounds were found by filtering results according to some metrics (e.g., 120 bpm, pitch with high confidence measure). When rehearsing with the tool, the performer found preferred combinations of parameters and values (e.g., applying a low confidence measure to be able to create contrast). ...
Conference Paper
Full-text available
The recent increase in the accessibility and size of personal and crowd-sourced digital sound collections brought about a valuable resource for music creation. Finding and retrieving relevant sounds in performance leads to challenges that can be approached using music information retrieval (MIR). In this paper, we explore the use of MIR to retrieve and repurpose sounds in musical live coding. We present a live coding system built on SuperCollider enabling the use of audio content from Creative Commons (CC) sound databases such as Freesound or personal sound databases. The novelty of our approach lies in exploiting high-level MIR methods (e.g. query by pitch or rhythmic cues) using live coding techniques applied to sounds. We demonstrate its potential through the reflection of an illustrative case study and the feedback from four expert users. The users tried the system with either a personal database or a crowd-source database and reported its potential in facilitating tailorability of the tool to their own creative workflows. This approach to live repurposing of sounds can be applied to real-time interactive systems for performance and composition beyond live coding, as well as inform live coding and MIR research.
... Collaborative live performance can balance out the potential issues with speed and mistakes that make solo live coding more challenging [2]. There exist several approaches to collaboration in live coding [3][4] [5][6] [7][8] [9][10] [2], including the use of crowdsourced sounds from the cloud [11] [12][13] [14], which can be seen as an example of asynchronous collaboration. When using large crowdsourced databases, there is a risk factor in the unpredictability of search results. ...
... Collins, who adapted music information retrieval (MIR) algorithms for tasks such as beat tracking, onset detection, and pitch detection in the Algoravethmic remix system [30]. In Ordiales and Bruno's APICultor [11], sounds are mixed and transformed in a live coding fashion using hardware where multiple audio features based on tempo, rhythm, mood, and frequency content, among others, are selected to filter the incoming sounds in real time. Another example is Navarro and Ogborn's Cacharpo [7], which provides a VA that 'listens' to the audio produced by the live coder using machine listening and music information retrieval techniques. ...
Conference Paper
Full-text available
The use of crowdsourced sounds in live coding can be seen as an example of asynchronous collaboration. It is not uncommon for crowdsourced databases to return unexpected results to the queries submitted by a user. In such a situation, a live coder is likely to require some degree of additional filtering to adapt the results to her/his musical intentions. We refer to this context-dependent decisions as situated musical actions. Here, we present directions for designing a customisable virtual companion to help live coders in their practice. In particular, we introduce a machine learning (ML) model that, based on a set of examples provided by the live coder, filters the crowdsourced sounds retrieved from the Freesound online database at performance time. We evaluated a first illustrative model using objective and subjective measures. We tested a more generic live coding framework in two performances and two workshops, where several ML models have been trained and used. We discuss the promising results for ML in education, live coding practices and the design of future NIMEs.
... Combinations of content descriptor values called MIR states can be defined by UI elements and nearest neighbours of the specified parameters and are returned from the web service API (Ordiales and Bruno, 2017). Automated categorization The idea behind this feature is, just like text-based categorization described in 3.1.1, ...
Thesis
Full-text available
After years of research and prototyping, audio search for non-musical files still does not incorporate content-based methods to enhance the search process. This thesis concentrates on exploring possibly novel methods to professional sound designers and will lead to an expansion of a user interface towards an united experience of sophisticated text-retrieval and novel audio content-analysis based data visualization methods in order to showcase the potential of a hybrid approach. Info: Reviewed until chapter 5, more still to come.
... Sampleswap.org and others, are used by composers and producers for various types of multimedia applications, such as motion picture, advertisement, video games and music compositions [28]. The APICultor [26] uses machine learning techniques to provide an environment for re-purposing sound samples from online databases. Lee et al. proposed a live coding tool with the YouTube API for free improvisation [20]. ...
Conference Paper
Playsound.space is a web-based tool to search for and play Creative Commons licensed-sounds which can be applied to free improvisation, experimental music production and soundscape composition. It provides a fast access to about 400k non-musical and musical sounds provided by Freesound, and allows users to play/loop single or multiple sounds retrieved through text based search. Sound discovery is facilitated by use of semantic searches and sound visual representations (spectrograms). Guided by the motivation to create an intuitive tool to support music practice that could suit both novice and trained musicians, we developed and improved the system in a continuous process, gathering frequent feedback from a range of users with various skills. We assessed the prototype with 18 non musician and musician participants during free music improvisation sessions. Results indicate that the system was found easy to use and supports creative collaboration and expressiveness irrespective of musical ability. We identified further design challenges linked to creative identification, control and content quality.
Article
Full-text available
Music information retrieval (MIR) has a great potential in musical live coding because it can help the musician–programmer to make musical decisions based on audio content analysis and explore new sonorities by means of MIR techniques. The use of real-time MIR techniques can be computationally demanding and thus they have been rarely used in live coding; when they have been used, it has been with a focus on low-level feature extraction. This article surveys and discusses the potential of MIR applied to live coding at a higher musical level. We propose a conceptual framework of three categories: (1) audio repurposing, (2) audio rewiring, and (3) audio remixing. We explored the three categories in live performance through an application programming interface library written in SuperCollider, MIRLC. We found that it is still a technical challenge to use high-level features in real time, yet using rhythmic and tonal properties (midlevel features) in combination with text-based information (e.g., tags) helps to achieve a closer perceptual level centered on pitch and rhythm when using MIR in live coding. We discuss challenges and future directions of utilizing MIR approaches in the computer music field.
Conference Paper
Full-text available
An approach to teaching computer science (CS) in high-schools is using EarSketch, a free online tool for teaching CS concepts while making music. In this demonstration we present the potential of teaching music information retrieval (MIR) concepts using EarSketch. The aim is twofold: to discuss the benefits of introducing MIR concepts in the classroom and to shed light on how MIR concepts can be gently introduced in a CS curriculum. We conclude by identifying the advantages of teaching MIR in the classroom and pointing to future directions for research.
Thesis
Full-text available
In this thesis we focus on the automatic emotion classification of music samples. We extract a set of features from the music signal and examine their discriminatory capability using various classification techniques. Our goal is to determine the features and the classification methods that lead to the best classification of the emotion a music sample conveys. During the course of the thesis, we generated our own dataset of annotated song samples and we examined two distinct methods of describing an emotion: using clusters consisting of various emotional states, and using a two-dimensional representation of the emotion in the Valence-Activation plane. The latter method was chosen as the most successful. We also tried other approaches of music emotion classification (MEC) as well, such as treating the song sample as an amplitude and frequency modulated (AM-FM) signal, on which we subsequently perform multiband demodulation analysis (MDA) testing various Gabor filter banks (Mel scale-based filter bank, Bark scale-based filter bank, and a number of fractional octave-based filter banks). Statistics of the Frequency Modulation Percentages (FMPs) of each band derived from the demodulation, proved to be quite successful features in the classification of emotion. Finally, we explored other modalities besides the music sound signal itself, such as a number of features derived from the chords of the song samples, classification of the song samples' lyrics using various techniques and a brief investigation of Electroencephalogram (EEG) data generated by one of the annotators while performing the annotation of the song samples. Our final feature-pack included a combination of the most successful features among the ones we studied: (i) music-inspired features (features based on music theory and psychoacoustics, derived from either the sound signal or the chords of the sample), (ii) statistics of the FMPs and (iii) statistics of the Mel-frequency cepstral coefficients (MFCCs). This feature-pack proved to be more robust than its three individual components and in the end we achieved results that reached 85.7% correct classification rate in the dimension of Valence and 85.1% correct classification rate in the dimension of Activation. We finally demonstrate that by discarding training samples that are assigned a label too close to the neutral value, our results can improve even further, especially in the dimension of Activation.
Conference Paper
This paper introduces pyo, a python module dedicated to the digital processing of sound. This audio engine distinguishes itself from other alternatives by being natively integrated to a common general programming language. This integration allows incorporating audio processes quickly to other programming tasks, like mathematical computations, network communications or graphical interface programming. We will expose the main features of the library as well as the different contexts of use where pyo can be of a great benefit to composers and audio software developers.
Article
Incremental development is distinctly different from iterative development in its purpose and also from its management implications. Teams get into trouble by doing one and not the other, or by trying to manage them the same way. This article illustrates their differences and how to use them together.