Conference PaperPDF Available

TU-Note Violin Sample Library – A Database of Violin Sounds with Segmentation Ground Truth

Authors:

Abstract and Figures

The presented sample library of violin sounds is designed as a tool for the research, development and testing of sound analy- sis/synthesis algorithms. The library features single sounds which cover the entire frequency range of the instrument in four dynamic levels, two-note sequences for the study of note transitions and vi- brato, as well as solo pieces for performance analysis. All parts come with a hand-labeled segmentation ground truth which mark attack, release and transition/transient segments. Additional rele- vant information on the samples’ properties is provided for single sounds and two-note sequences. Recordings took place in an ane- choic chamber with a professional violinist and a recording engi- neer, using two microphone positions. This document describes the content and the recording setup in detail, alongside basic sta- tistical properties of the data.
Content may be subject to copyright.
Proceedings of the 21st International Conference on Digital Audio Effects (DAFx-18), Aveiro, Portugal, September 4–8, 2018
TU-NOTE VIOLIN SAMPLE LIBRARY – A DATABASE OF VIOLIN SOUNDS WITH
SEGMENTATION GROUND TRUTH
Henrik von Coler
Audio Communication Group
TU Berlin
Germany
voncoler@tu-berlin.de
ABSTRACT
The presented sample library of violin sounds is designed as
a tool for the research, development and testing of sound analy-
sis/synthesis algorithms. The library features single sounds which
cover the entire frequency range of the instrument in four dynamic
levels, two-note sequences for the study of note transitions and vi-
brato, as well as solo pieces for performance analysis. All parts
come with a hand-labeled segmentation ground truth which mark
attack, release and transition/transient segments. Additional rele-
vant information on the samples’ properties is provided for single
sounds and two-note sequences. Recordings took place in an ane-
choic chamber with a professional violinist and a recording engi-
neer, using two microphone positions. This document describes
the content and the recording setup in detail, alongside basic sta-
tistical properties of the data.
1. INTRODUCTION
Sample libraries for the use in music production are manifold.
Ever since digital recording and storage technology made it possi-
ble, they have been created for most known instruments. Commer-
cial products like the Vienna Symphonic Library1or The EastWest
Quantum Leap2offer high quality samples with many additional
techniques for expressive sample based synthesis. For several rea-
sons, these libraries are not best suited for the use in research on
sound analysis and synthesis. Many relevant details are subject to
business secrets and thus not documented. Copyright issues may
prevent a free use as desired in a scientific application. These li-
braries also lack annotation and metadata which is essential for
research applications, if used for machine learning or sound anal-
ysis / synthesis tasks.
The audio research community has released several databases
with single instrument sounds in the past, usually closely related to
a specific aspect. Libraries like the RWC [1] or the MUMS [2] aim
at genre or instrument classification and timbre analysis [3]. Data-
bases for onset and transient detection which include hand labeled
onset segments have been presented by Bello et al. [4] and von
Coler et al. [5].
The presented library of violin sounds is designed as a tool for
the research, development and testing of sound analysis/synthesis
algorithms or machine learning tasks. The contained data is struc-
tured to enable the training of sinusoidal modeling systems which
distinguish between stationary and transient segments. By design,
the library allows the analysis of several performance aspects, such
1www.vsl.co.at/
2http://www.soundsonline.com/
symphonic-orchestra
as different articulation styles, glissando [6] and vibrato. It fea-
tures recordings of a violin in an anechoic chamber and consists of
three parts:
1. single sounds
2. two-note sequences
3. solo (scales and compositions/excerpts)
For single sounds and two-note sequences, hand-labeled seg-
mentation files are delivered with the data set. These files focus
on the distinction between steady state and transient or transitional
segments. The prepared audio files and the segmentation files are
uploaded to a static repository with a DOI [7]3. A Creative Com-
mons BY-ND 4.0 license ensures the unaltered distribution of the
library.
The purpose of this paper is a more thorough introduction of
the library. Section 2will explain the composition of the content,
followed by details on the recording setup and procedure in Sec-
tion 3. The segmentation data will be introduced in Section 4. Sec-
tion 5presents selected statistical properties of the sample library.
Final remarks are included in Section 6.
2. CONTENT DESCRIPTION
2.1. Single Sounds
Similar to libraries for sample based instruments, the single sounds
capture the dynamic and frequency range of the violin, using sus-
tained sounds. The violinist was instructed to play the sounds
as long as possible, using just one bow, without any expression.
Steady state segments, respectively the sustain parts, of these notes
are thus as played as steady as possible. This task showed to be
highly demanding and unusual, even for an experienced concert
violinist.
On all of the four strings, the number of semitones listed in
Table 1was captured, each starting with the open string. This
leads to a total of 84 positions. All positions are captured in four
dynamic levels which were specified as pp - mp - mf - ff result-
ing in a total amount of 336 single sounds. According to Meyer
[8], the dynamic interval interval of a violin covers a range from
58 . . . 99 dB.
3https://depositonce.tu-berlin.de//handle/
11303/7527
DAFX-1
Proceedings of the 21st International Conference on Digital Audio Effects (DAFx-18), Aveiro, Portugal, September 4–8, 2018
Table 1: Number of positions on each string
String Positions
G 18
D 18
A 18
E 30
Each item was recorded in several takes, until recording engi-
neer, the author and the violinist agreed on success. Although all
sounds were explicitly captured in both up- and down-stroke tech-
niques, these modes have not been considered individually in the
data set and thus appear randomly.
2.2. Two-Note Sequences
0 2 7 12 30
Position
Fifth, two strings
Fourth, low
Fourth, high
Fifth, one string
Figure 1: Violin board with positions for two-note sequences
For the study of basic articulation styles, a set of two-note
sequences was recorded at different intervals, listed in Table 2.
The respective positions on the board are visualized in Figure 1.
All combinations were recorded at two dynamic levels mp and
ff. Three different articulation styles (detached, legato, glissando)
were used and some combinations were captured with additional
vibrato. These combinations lead to a grand total of 344 two-note
items.
5 semitones on one string were captured in 8 pairs with 24
versions (2 dynamic levels, 2 directions, with and without vibrato,
3 articulation styles): 2·2·3 = 24.
Repeated tones were captured in 4 pairs with 6 versions (2
dynamic levels, legato and detached, the latter with and without
vibrato): 22+ 2 = 6
7 semitones on one string were captured in pairs with 20 ver-
sions (2 dynamic levels, two directions, detached only without vi-
brato, legato and glissando with and without vibrato): 2·2+24=
20
7 semitones on two strings were captured in 3 pairs with 16
versions (2 dynamic levels, two directions, with and without vi-
brato and two articulation styles [legato, detached]):24= 16
Table 2: All two-note pairs
5 semitones, one string
Two-note Note 1 Note 2
item no. ISO Pos. String ISO Pos. String
01-24 D4 7 G A3 2 1
25-48 A4 7 D E4 2 2
49-72 E5 7 A B4 2 3
73-96 B5 7 E F#5 2 4
97-120 D4 7 G G4 12 1
121-144 A4 7 D D5 12 2
145-168 E5 7 A A5 12 3
169-192 B 7 E E6 13 4
Repeated tones
Two-note Note 1 Note 2
item no. ISO Pos. String ISO Pos. String
193-198 D4 7 G D4 7 G
199-204 A4 7 D A4 7 D
205-210 E5 7 A E5 7 A
211-216 B5 7 E B5 7 E
7 semitones, one string
Two-note Note 1 Note 2
item no. ISO Pos. String ISO Pos. String
217-236 D4 7 G G3 0 G
237-256 A4 7 D D4 0 D
257-276 E5 7 A A4 0 A
277-296 B5 7 E E5 0 E
7 semitones, two strings
Two-note Note 1 Note 2
item no. ISO Pos. String ISO Pos. String
297-312 D4 7 G A4 7 D
313-328 A4 7 D E5 7 A
329-344 E5 7 A B5 7 E
2.3. Solo: Scales and Compositions
Two scales – an ascending major scale and a descending minor
scale – were each played in three interpretation styles, as listed in
Table 3. The first style was plain, without any expressive gestures,
followed by two expressive interpretations. Six solo pieces and
excerpts, listed in Table 4which mostly contain cantabile legato
passages were recorded. All compositions were proposed by the
violinist, ensuring familiarity with the material.
Table 3: Scales in the solo part
Item Type Interpretation
01 major, ascending plain
02 major, ascending expressive 1
03 major, ascending expressive 2
04 minor, descending plain
05 minor, descending expressive 1
06 minor, descending expressive 2
DAFX-2
Proceedings of the 21st International Conference on Digital Audio Effects (DAFx-18), Aveiro, Portugal, September 4–8, 2018
Table 4: Solo recordings
Item Composition Composer
07 Sonata in A major for Vio-
lin and Piano César Franck
08 Violin Concerto in E mi-
nor, Op. 64, 2nd move-
ment
Felix Mendelssohn
09 Méditation (Thaïs) Jules Massenet
10 Chaconne in g minor Tomaso Antonio Vitali
11 Violin Concerto in E mi-
nor, Op. 64, 3rd move-
ment
Felix Mendelssohn
12 Violin Sonata no.5, Op.24,
1st movement Ludwig van Beethoven
3. RECORDING SETUP
The recordings took place in the anechoic chamber at SIM4, Berlin.
Above a cutoff frequency of 100 Hz the room shows an attenua-
tion coefficient of µ > 0.99, hence the recordings are free of re-
verberation in the relevant frequency range. The recordings were
conducted within two days, taking one day for the single sounds
and the second day for two-note sequences and solo pieces. All
material was captured with a sample-rate of of 96 kHz and a depth
of 24 Bit.
Microphones
The following microphones were used:
1x DPA 4099 cardiod clip microphone
1x Brüel & Kjær 4006 omnidirectional small diaphragm
microphone with free-field equalization, henceforth BuK
The DPA microphone was mounted as shown in Figure 2,
above the lower end of the f-hole in 2 cm distance. Due to its
fixed position, movements of the musician do not influence the
recording. The B&K microphone was mounted in 1.5 m distance
above the instrument, at an elevation angle of approximately 45,
as shown in Figure 3.
Figure 2: Position of the DPA microphone
4http://www.sim.spk-berlin.de/refelxionsarmer_
raum_544.html
Figure 3: Position of the B&K microphone
Instructions
For each of the single-sound, two-note and scale items, a mini-
mal score snippet was generated using LilyPond [9]. Examples
for items’ instructions are shown in Fig. 4. The resulting 63 page
score was then used to guide the recordings. Although the isolated
tasks may seem simple and unambiguous, this procedure ensured
smooth recording sessions.
2
vib.
ff
2
(a) Two-note example with
vibrato and glissando
3
Ë
mp
mp
3
(b) Single-sound example
with upbow and downbow
Figure 4: Instruction scores for two-note aand single-sound b
4. SEGMENTATION
The segmentation of a monophonic musical performance into notes,
and even more into a note’s subsegments is not trivial [10,11].
During the labeling process, the best of the takes for each item
was selected from the raw recordings and the manual segmenta-
tion scheme proposed by by von Coler et al. [5] was applied using
Sonic Visualiser [12].
DAFX-3
Proceedings of the 21st International Conference on Digital Audio Effects (DAFx-18), Aveiro, Portugal, September 4–8, 2018
(a) Energy trajectory
(b) Peak frequency spectrogram
Figure 5: Sonic Visualiser setup for annotation of single sound 333
4.1. Single Sounds
Each single sound is divided into three segments, which are de-
fined by four location markers in the segmentation files5, as shown
in Table 5. The first time instant (A) marks the beginning of the
attack segment, the second instant (C) marks the end of the attack
segment, respectively the beginning of the sustain part. The end
of the sustain, which is also the beginning of the release segment,
is labeled with the (D). The label (B) marks the end of the release
portion and the complete sound. The left column holds the related
time instants in seconds.
Table 5: Example for a single-sound segmentation file
(SampLib_DPA_01.txt)
0.000000 A
0.940646 C
7.373000 D
8.730500 B
The definition of the attack segment is ambiguous in literature
[13] and shall thus be specified for this context: Attack here refers
to the actual attack-transient, the very first part of a sound with
a significant inharmonic content and rapid fluctuations. In other
contexts, the attack may be regarded the segment of rise in energy
to the local maximum. Often, there is still a significant increase in
energy after the attack-transient is finished. As the attack-transient
is characterized by unsteady, evolving partials and low relative par-
tial amplitudes, the manual segmentation process is performed us-
ing a temporal and a spectral representation. Figure 5shows a
typical Sonic Visualiser setup for the annotation of a single sound.
The noisiness of the signal during attack and release can be seen in
the spectral representation. How attack transient and rising slope
may differ, is illustrated in Fig. 6. The gray area represents the la-
beled attack segment, which is finished before the end of the rising
slope is reached.
Less ambiguous, the release part is labeled as the segment
from the end of the excitation until the complete disappearance
5The segmentation files are part of the repository [7]
0 0.511.522.5
0.00
0.02
0.04
0.06
0.08
t/s
RMS
RMS
Attack segment
End of rising slope
Figure 6: RMS trajectory of a note beginning with attack segment
(gray) and end of the rising slope (single sound no. 19)
34567
0.02
0.04
0.06
0.08
t/s
RMS
RMS
Release segment
Beginning of falling slope
Figure 7: RMS trajectory of a note end with release segment (gray)
and beginning of the falling slope (SampLib_19)
of the tone. As shown in Fig. 7, there is often a significant de-
crease in signal energy before the actual release starts. For items
with low dynamics, the release is also covering the very last part
of the excitation.
The ease of annotation varies between dynamic levels, as well
as between the fundamental frequency of the items. Notes played
at fortissimo show clear attack and decay segments with a steady
sustain part, whereas pianissimo tones have less prominent bound-
ary segments and a parabolic amplitude envelope. The higher SNR
in fortissimo notes allows a better annotation of the transients.
Tones with a high fundamental frequency have less prominent par-
tials, whereas the bow noise is emphasized. They are thus more
difficult to label, since attack transient are less clear in the spectro-
gram. The segmentation of high pitched notes at low velocities is
hence most complicated.
4.2. Two-Note Sequences
The two-note sequences contain the the segments note, rest and
transition with the labels listed in Table 6. Stationary sustain parts
are labeled as notes, whereas the transition class includes attack
and release segments, as well as note transitions, such as glissando.
All two-note sequences follow the same sequence of segments
(0-2-1-2-1-2). Figure 8shows a labeling project in Sonic Visu-
aliser for a two-note item with glissando. The transition segment
is placed according to the slope of the glissando transition.
DAFX-4
Proceedings of the 21st International Conference on Digital Audio Effects (DAFx-18), Aveiro, Portugal, September 4–8, 2018
Table 6: Segments in the two-note labeling scheme
Label Segment
0 rest
2 transition
1 note
(a) Energy trajectory
(b) Peak frequency spectrogram
Figure 8: Sonic Visualiser setup for annotation of two-note item
22
4.3. Solo
Solo items have been annotated using the guidelines proposed by
von Coler et al. [5]. Due to the choice of the compositions, only
few parts violated the restriction to pure monophony. Solo item
10, for example, starts with a chord, which is labeled as a single
transitional segment.
5. STATISTICS
This section reports selected descriptive statistical properties of the
sample library which are potentially useful when considering the
use of the data.
5.1. Single Sounds
Fig. 9shows the RMS for all single sounds, in box plots for each
dynamic level. The median for the dynamic levels is logarithmi-
cally spaced.
Table 7: Segment length statistics for the single-sounds
l/s µ/s
Attack 0.247 0.206
Sustain 5.296 1.118
Release 0.705 0.802
Statistics for the segment lengths of the single sounds are pre-
sented in Table 7and Figure 10, respectively. With a mean of
5.296 s, the sustain segments are the longest, followed by release
segments with a mean of 0.705 s. Attack segments have a mean
pp mp mf ff
6
4
2
log(rms)
Figure 9: Boxplot of RMS for the sustain from the BuK micro-
phone
length of 0.247 s. Extreme outliers in the mean attack length are
caused by high pitched notes with low dynamics.
Attack Sustain Release
0
2
4
6
8
l[s]
Figure 10: Box plots of segment lengths for all single sounds
5.2. Two-Note
The two-note sequences allow a comparison of different articu-
lation styles. Figure 11 shows the lengths for detached, legato
and glissando transitions in a box plot. With a median duration of
0.72 s, glissando transitions tend to be longer than legato (0.38 s)
and detached (0.37 s) transitions.
detached legato glissando
0.5
1
1.5
Transition type
l[s]
Figure 11: Box plot of transition lengths for all two-note sequences
DAFX-5
Proceedings of the 21st International Conference on Digital Audio Effects (DAFx-18), Aveiro, Portugal, September 4–8, 2018
5.3. Solo
Table 8: Note statistics for items in the solo category
Solo item Number of notes l/s µ/s
1 8 0.698 0.745
2 8 0.721 0.768
3 8 0.728 0.776
4 8 0.707 0.753
5 8 0.724 0.771
6 8 0.774 0.848
7 104 0.695 0.661
8 75 1.074 0.899
9 89 0.911 0.923
10 63 0.735 0.690
11 76 0.689 0.707
12 56 0.615 0.740
For the solo category, the basic statistics on the note occur-
rences and lengths are listed in Table 8. All scales (items 1 - 6)
contain 8 notes, compositions (items 7-12) have a mean of 77 notes
per item. With a mean note length of 0.614 906 s, item 12 has the
shortest, and with 1.074 361 s, item 8 has the longest notes.
6. CONCLUSION
The presented sample library is already in application within sinu-
soidal modeling projects and for the analysis of expressive musi-
cal content. Overall recording quality proves to be well suited for
most tasks in sound analysis. Since the segmentation ground truth
follows strict rules and has undergone repeated reviews, it may be
considered consistent.
7. ACKNOWLEDGMENTS
The author would like to thank the violin player, Michiko Feuer-
lein, and the sound engineer, Philipp Pawlowski, for their work
during the recordings, as well as the SIM Berlin for the support.
Further acknowledgment is addressed to Moritz Götz, Jonas Mar-
graf, Paul Schuladen and Benjamin Wiemann for the contributions
to the annotation.
8. REFERENCES
[1] Masataka Goto et al. “Development of the RWC music database”.
In: Proceedings of the 18th International Congress on Acous-
tics (ICA 2004). Vol. 1. 2004, 553â556.
[2] Tuomas Eerola and Rafael Ferrer. “Instrument library (MUMS)
revised”. In: Music Perception: An Interdisciplinary Jour-
nal 25.3 (2008), 253â255.
[3] Gregory J Sandell. “A Library of Orchestral Instrument Spec-
tra”. In: Proceedings of the International Computer Music
Conference. 1991, 98â98.
[4] J.P. Bello et al. “A Tutorial on Onset Detection in Music
Signals”. In: IEEE Transactions on Speech and Audio Pro-
cessing 13.5 (2005), 1035â1047.
[5] Henrik von Coler and Alexander Lerch. “CMMSD: A Data
Set for Note-Level Segmentation of Monophonic Music”.
In: Proceedings of the AES 53rd International Conference
on Semantic Audio. London, England, 2014.
[6] Henrik von Coler, Moritz GÃtz, and Steffen Lepa. “Para-
metric Synthesis of Glissando Note Transitions - A user
Study in a Real-Time Application”. In: Proc. of the 21st Int.
Conference on Digital Audio Effects (DAFx-18). Aveiro,
Portugal, 2018.
[7] Henrik von Coler, Jonas Margraf, and Paul Schuladen. TU-
Note Violin Sample Library. TU-Berlin, 2018. DOI:10 .
14279/depositonce-6747.
[8] JÃ1
4rgen Meyer. “Musikalische Akustik”. In: Handbuch der
Audiotechnik. Ed. by Stefan Weinzierl. VDI-Buch. Springer
Berlin Heidelberg, 2008, 123â180.
[9] Han-Wen Nienhuys and Jan Nieuwenhuizen. “LilyPond, a
system for automated music engraving”. In: Proceedings
of the XIV Colloquium on Musical Informatics (XIV CIM
2003). Vol. 1. 2003, 167â171.
[10] E. GÃ3mezEtal.. “Melodic Characterization of Monophonic
Recordings for Expressive Tempo Transformations”. In: Pro-
ceedings of the Stockholm Music and Acoustics Conference.
2003.
[11] Norman H. Adams, Mark A. Bartsch, and Gregory H. Wake-
field. “Note Segmentation and Quantization for Music In-
formation Retrieval”. In: IEEE Transactions on Speech and
Audio Processing 14.1 (2006), 131â141.
[12] Chris Cannam, Christian Landone, and Mark Sandler. “Sonic
visualiser: An open source application for viewing, analysing,
and annotating music audio files”. In: Proceedings of the
18th ACM international conference on Multimedia. ACM.
2010, 1467â1468.
[13] Xavier Rodet and Florent Jaillet. “Detection and Modeling
of Fast Attack Transients”. In: Proceedings of the Interna-
tional Computer Music Conference. 2001, 30â33.
DAFX-6
... For evaluating the numerical modeling qualities, all 96 two-note sequences from the TU-Note Violin Sample Library [10,11] with glissando transitions were used. This selection contains upward and downward glissandi at different positions and dynamics. ...
... Stimuli for the reproduction tasks were generated using the TU-Note Violin Sample Library [11,10], which features two-note sequences with annotated glissando transitions. Four upward and three downward two-note sequences, listed in Table 3, were selected with different note frequencies, in order to cover the range of the instrument. ...
... Stimuli employed in the seven tasks of the user study, stemming from the TU-Note Violin Library[10] ...
Conference Paper
Full-text available
This paper investigates the applicability of different mathe- matical models for the parametric synthesis of fundamental fre- quency trajectories in glissando note transitions. Hyperbolic tan- gent, cubic splines and Bézier curves were implemented in a real- time synthesis system. Within a user study, test subjects were pre- sented two-note sequences with glissando transitions, which had to be re-synthesized using the three different trajectory models, employing a pure sine wave synthesizer. Resulting modeling er- rors and user feedback on the models were evaluated, indicating a significant disadvantage of the hyperbolic tangent in the modeling accuracy. Its reduced complexity and low number of parameters were however not rated to increase the usability.
... Despite the increasing interest in Chinese traditional instruments, researchers face a shortage of datasets to support their studies. While several Western bowed-string databases are available [29][30][31][32][33][34][35][36][37], fewer Chinese music datasets have been established. Notable Chinese datasets include the Chinese folk musicimage dataset by Xing et al., the DCMI dataset by Zijin Li et al., and Zhang's JinYue and CCOM-HuQin datasets [24][25][26][27][28]. ...
Article
Full-text available
Recent work has compared the different emotional characteristics between the violin and erhu on the Butterfly Lovers Concerto. In this study, we investigate whether the previous studies’ results hold generally in Chinese and Western classical music. Building upon previous research, we hypothesize that the violin conveys more positive emotions, while the erhu is perceived as sadder. We also expect the violin to be better at conveying high-arousal excerpts. To test these hypotheses, 46 subjects were presented with 14 excerpts, with each excerpt represented by at least four different performances of both instruments. For each performance, subjects were asked 4 binary questions: whether it was happy, sad, agitated, and calm. Results show that the erhu is consistently perceived as sadder than the violin, while the violin is perceived as happier, calmer, and more agitated when significant differences appear. Linear regression analysis suggests that instrument is a more significant factor in the emotional perception of low-arousal excerpts than high-arousal ones. Additionally, performance was a less important factor than instrument on affect perception, and made more of a difference on low-arousal than high arousal excerpts. Meaning, there was more variation between different performances of the same except on low-arousal excerpts than high-arousal excerpts.
... For fingering techniques, we get statistics of Port in Erhu excerpts from CCOM-HuQin, which covers a variety of compositions as described in Section 4. For violin, we obtain 16.2 minutes of solo excerpts from two publicly available datasets (von Coler, 2018;Thickstun et al., 2016) and our own recordings. Violin excerpts are Western classical music pieces of nine composers from the Baroque period to the 20th century. ...
Article
Full-text available
HuQin (胡琴) is a family of traditional Chinese bowed string instruments. Playing techniques (PTs) embodied in various playing styles add abundant emotional coloring and aesthetic feelings to HuQin performance. The complex applied techniques make HuQin music a challenging source for fundamental MIR tasks such as pitch analysis, transcription and score-audio alignment. In this paper, we present a multimodal performance dataset of HuQin music that contains audio-visual recordings of 11,992 single PT clips and 57 annotated musical excerpts of classical pieces. We systematically describe the HuQin PT taxonomy based on musicological theory and practical use cases. Then we introduce the dataset creation methodology and highlight the annotation principles featuring PTs. We analyze the statistics in different aspects to demonstrate the variety of PTs played in HuQin subcategories and perform preliminary experiments to show the potential applications of the dataset in various MIR tasks and cross-cultural music studies. Finally, we propose future work to extend the dataset.
... A third corpus was created using a portion of the TU-Note Violin Sample Library [26]. The dataset consists of recordings of a violin in an anechoic chamber playing single sounds, two-note sequences, and solo performances such as scales and compositions. ...
Preprint
We compare standard autoencoder topologies' performances for timbre generation. We demonstrate how different activation functions used in the autoencoder's bottleneck distributes a training corpus's embedding. We show that the choice of sigmoid activation in the bottleneck produces a more bounded and uniformly distributed embedding than a leaky rectified linear unit activation. We propose a one-hot encoded chroma feature vector for use in both input augmentation and latent space conditioning. We measure the performance of these networks, and characterize the latent embeddings that arise from the use of this chroma conditioning vector. An open source, real-time timbre synthesis algorithm in Python is outlined and shared.
... The dynamic levels pp, mp, mf and ff are captured in the timbre plane. The material has been recorded at 96 kHz with 24 bit resolution and can be downloaded using a static repository [15]. ...
Conference Paper
Full-text available
Statistical sinusoidal modeling represents a method for transferring a sample library of instrument sounds into a data base of sinusoidal parameters for the use in real time additive synthesis. Single sounds, capturing a musical instrument in combinations of pitch and intensity, are therefor segmented into attack, sustain and release. Partial amplitudes, frequencies and Bark band energies are calculated for all sounds and segments. For the sustain part, all partial and noise parameters are transformed to probabilistic distributions. Interpolated inverse transform sampling is introduced for generating parameter trajectories during synthesis in real time, allowing the creation of sounds located at pitches and intensities between the actual support points of the sample library. Evaluation is performed by qualitative analysis of the system response to sweeps of the control parameters pitch and intensity. Results for a set of violin samples demonstrate the ability of the approach to model dynamic timbre changes, which is crucial for the perceived quality of expressive sound synthesis.
... The TU-Note Violin Sample Library [15], [16], is used as audio content for generating the sinusoidal model. Designed in the style of classic sample libraries, this data set contains single sounds of a violin in different pitches and intensities, recorded at an audio sampling rate of 96 kHz with 24 Bit resolution. ...
Conference Paper
Full-text available
This paper presents a real-time additive sound synthesis application with individual outputs for each partial and noise component. The synthesizer is programmed in C++, relying on the Jack API for audio connectivity with an OSC interface for control input. These features allow the individual spatialization of the partials and noise, referred to as spectro-spatial synthesis, in connection with an OSC capable spatial rendering software. Additive synthesis is performed in the time domain, using previously extracted partial trajectories from instrument recordings. Noise is synthesized using bark band energy trajectories. The sinusoidal data set for the synthesis is generated from a custom violin sample library in advance. Spatialization is realized using established rendering software implementations on a dedicated server. Pure Data is used for processing control streams from an expressive musical interface and distributing it to synthesizer and renderer.
... In this scenario, the availability of multimodal databases combining and synchronizing different streams of information (audio, video, kinematic data of the instrument and performer in action, physiological signals, interactions among musicians etc.) is increasingly recognized as an essential asset for studying music performance. Recent examples include the "multimodal string quartet performance dataset" (QUARTET) [22], the "University of Rochester multimodal music performance dataset" (URMP) [23], the "database for emotion analysis using physiological signals" (DEAP) [24], the "TU-Note violin sample library" [25]. Furthermore, initiatives aiming at systematizing the creation of these databases have recently appeared, such as RepoVizz [26], a framework for storing, browsing, and visualizing synchronous multimodal data. ...
Article
Full-text available
A library of piano samples composed of binaural recordings and keyboard vibrations has been built, with the aim of sharing accurate data that in recent years have successfully advanced the knowledge on several aspects about the musical keyboard and its multimodal feedback to the performer. All samples were recorded using calibrated measurement equipment on two Yamaha Disklavier pianos, one grand and one upright model. This paper documents the sample acquisition procedure, with related calibration data. Then, for sound and vibration analysis, it is shown how physical quantities such as sound intensity and vibration acceleration can be inferred from the recorded samples. Finally, the paper describes how the samples can be used to correctly reproduce binaural sound and keyboard vibrations. The library has potential to support experimental research about the psycho-physical, cognitive and experiential effects caused by the keyboard’s multimodal feedback in musicians and other users, or, outside the laboratory, to enable an immersive personal piano performance.
Article
Full-text available
In 1995, an unusually perforated femur of a juvenile cave bear was found in the Divje babe I Palaeolithic cave site in Slovenia. The supposition that it could be a flute led to heated debates. According to its archaeological context and chronostratigraphic position, if made by humans, it could only be attributed to Neanderthals. The crucial question was related to the origin of the holes. These could only have been made either by a carnivore or by human intervention. Results of experimental testing of both hypotheses do not support a carnivore origin of the holes. Furthermore, the method of artificial creation of the holes, which left no conventional traces of manufacture, was defined. Computed tomography revealed traces, which could be the result of human agency and called into serious question the origin of some features previously declared to be solely of carnivore origin. Recent musical experiments performed on a replica of the reconstructed musical instrument revealed its great musical capability. Together with some other findings from Divje babe I, the Mousterian musical instrument offers a unique insight into the Neanderthals’ symbolic behaviour and their cognitive abilities. The multidisciplinary results of comprehensive analyses of this exceptional find are first presented here together with its chronostratigraphic, palaeo-environmental, and archaeological contexts.
Data
Full-text available
The presented sample library of violin sounds is designed as a tool for the research, development and testing of sound analysis / synthesis algorithms. The library features single sounds which cover the entire frequency range of the instrument in four dynamic levels, two-note sequences for the study of note transitions, and solo pieces and scales. All parts come with hand-labeled segmentation ground-truth files which mark attack, release and transition/transient segments. Additional relevant information on the samples' properties is provided for single sounds and two-note sequences. Recordings took place in an anechoic chamber with a professional violinist and a recording engineer, using two microphone positions. This document briefly describes the content and structure of the data set.
Conference Paper
Full-text available
This paper investigates the applicability of different mathe- matical models for the parametric synthesis of fundamental fre- quency trajectories in glissando note transitions. Hyperbolic tan- gent, cubic splines and Bézier curves were implemented in a real- time synthesis system. Within a user study, test subjects were pre- sented two-note sequences with glissando transitions, which had to be re-synthesized using the three different trajectory models, employing a pure sine wave synthesizer. Resulting modeling er- rors and user feedback on the models were evaluated, indicating a significant disadvantage of the hyperbolic tangent in the modeling accuracy. Its reduced complexity and low number of parameters were however not rated to increase the usability.
Conference Paper
Full-text available
A musical data set for note-level segmentation of monophonic music is presented. It contains 36 excerpts from commercial recordings of monophonic classical western music and features the instrument groups strings, woodwind and brass. The excerpts are self-contained phrases with a mean length of 17.97 seconds and an average of 20 notes. All phrases are played in moderate tempo, mostly with significant amounts of expressive articulation. A manually annotated ground truth splits each item into a sequence of the three states note, transition and rest. The set is designed as an open source project, aiming at the development and evaluation of algorithms for segmentation, music performance analysis and feature selection. This paper presents the process of ground truth labeling and a detailed description of the data set and its properties.
Article
Full-text available
An overview of the main instrument sample libraries used in psychoacoustics, sound analysis, and instrument classification research is presented. One of the central libraries, the McGill University Master Samples (MUMS, Opolko & Wapnick, 2006) is reviewed in detail. This library has over 6000 sound samples representing most classical and popular musical instruments and a wide variety of articulation styles.A closer scrutiny revealed a conspicuous amount of labeling errors, intonation inaccuracies, and the absence of an integrated database. These errors are identified and catalogued, and revisions are implemented in a form of an installer.
Conference Paper
Full-text available
Sonic Visualiser is a friendly and flexible end-user desktop application for analysis, visualisation, and annotation of music audio files. Its stated goal is to be "the first program you reach for when want to study a musical recording rather than simply listen to it". To this end, it has a user interface that resembles familiar audio editing applications, a set of useful standard visualisation facilities, and support for a plugin format for additional automated analysis methods.
Chapter
Im weitesten Sinne umfasst die musikalische Akustik alle Themenbereiche, in denen Akustik und Musik in irgendeiner Form miteinander verbunden sind oder zumindest gleichzeitig eine Rolle spielen (Meyer 2004a). Im Mittelpunkt steht dabei die Akustik der Musikinstrumente und Gesangsstimmen, die sich mit der physikalischen Funktionsweise einschließlich der Einflussmöglichkeiten des Spielers auf die Tongestaltung und mit der Schallabstrahlung einschließlich der Klangeigenarten und der Richtungsabhängigkeit befasst (Meyer 2004b, Fletcher u. Rossing 1991). Sobald dabei klangästhetische Gesichtspunkte mit ins Spiel kommen, ist eine Einbeziehung psychoakustischer Aspekte unumgänglich. Erst auf dieser Grundlage ist eine qualitative Bewertung von Instrumenten möglich, die es den Instrumentenbauern erlaubt, anhand objektiver Kriterien bautechnische Veränderungen zur Verbesserung ihrer Instrumente vorzunehmen.
Article
Much research in music information retrieval has focused on query-by-humming systems, which search melodic databases using sung queries. The database retrieval aspect of such systems has received considerable attention, but query processing and the melodic representation have not been examined as carefully. Common methods for query processing are based on musical intuition and historical momentum rather than specific performance criteria; existing systems often employ rudimentary note segmentation or coarse quantization of note estimates. In this work, we examine several alternative query processing methods as well as quantized melodic representations. One common difficulty with designing query-by-humming systems is the coupling between system components. We address this issue by measuring the performance of the query processing system both in isolation and coupled with a retrieval system. We first measure the segmentation performance of several note estimators. We then compute the retrieval accuracy of an experimental query-by-humming system that uses the various note estimators along with varying degrees of pitch and duration quantization. The results show that more advanced query processing can improve both segmentation performance and retrieval performance, although the best segmentation performance does not necessarily yield the best retrieval performance. Further, coarsely quantizing the melodic representation generally degrades retrieval accuracy.
Development of the RWC music database
  • Masataka Goto
Masataka Goto et al. "Development of the RWC music database". In: Proceedings of the 18th International Congress on Acoustics (ICA 2004). Vol. 1. 2004, 553â556.
A Library of Orchestral Instrument Spectra
  • J Gregory
  • Sandell
Gregory J Sandell. "A Library of Orchestral Instrument Spectra". In: Proceedings of the International Computer Music Conference. 1991, 98â98.