Content uploaded by Asteris Zacharakis
Author content
All content in this area was uploaded by Asteris Zacharakis on Sep 27, 2023
Content may be subject to copyright.
Proceedings of Timbre 2023, 3rd International Conference on Timbre, Thessaloniki, Greece
!
Marika Ninou vs. Sotiria Bellou: a timbral comparison
between two iconic female singers in Rebetiko
Asterios Zacharakis1†, Savvas Kazazis1 and Emilios Cambouropoulos1
1 School of Music Studies, Aristotle University of Thessaloniki, Thessaloniki, Greece
† Corresponding author: aszachar@mus.auth,gr
Keywords: voice, audio features, Timbre Toolbox, Rebetiko
Introduction
Marika Ninou and Sotiria Bellou were two iconic female voices of the urban popular Greek idiom Rebetiko.
Both singers became prominent through their collaboration with composer Vassilis Tsitsanis, a leading
figure of the Greek Rebetiko scene. However, their singing styles were quite different. On the one hand,
Ninou was a highly accomplished soprano whose vocal interpretations were characterized by high
expressivity and embellishments, elements that were drawn from the Asia Minor singing tradition. Bellou,
on the other hand, featured a distinctive unadorned and metallic vocal quality.
One of the few studies on computational analysis of Rebetiko was made by Holzapfel & Stylianou (2007)
using a database of Rebetiko songs to apply a singer identification task. The motivation behind the current
study is not to achieve singer identification per se, but to examine whether differences in perceived vocal
qualities between the two singers could be captured by some of the audio features that are often used in
timbre research. Until recently, this type of analysis would not have been possible for old recordings with
no access to separate tracks. However, recent advances in audio source separation through deep learning
techniques make it possible to achieve a decent isolation of a singer from a musical mix and thus extract
audio features to characterize vocal qualities. Acoustic analyses of the singing voice typically involve a
plethora of acoustic measures such as positions of formants, jitter, shimmer, vibrato, etc. (see Gunjawate et
al. (2018) for a review) but seem to lack a few of the standard acoustic correlates of timbre such as, for
example, the spectral centroid or spectrotemporal features. This study will serve to test whether such
acoustic representations should be taken into consideration in future singing voice analysis studies.
Method
The vocal part of a number of recorded compositions by Vasilis Tsitsanis made between 1947 and 1960
and sung by either Ninou or Bellou was extracted using the Demucs (v4) Music Source Separation
(Défossez, 2021; Rouard et al., 2023) that is available in Github. Then, one musical phrase from each song
where the singer of interest was solo and not backed by a second voice (as is usually the case in Rebetiko)
was manually edited and retained. Subsequently, silent parts within these phrases were computationally
eliminated. This resulted in 27 solo vocal parts for Bellou and 12 for Ninou each coming from a different
song by Τsitsanis. The duration of these parts ranged from 5 to 53 s (mean = 17.8 s, std = 9.8 s). Finally, a
number of sustained vowels were manually extracted from the above musical phrases creating a collection
of 33 vowels for both Ninou and Bellou containing the phonemes /a/, /i/, /ε/, /u/ and /ɔ/ in relatively equal
proportions. These sustained tones were used to acquire characteristics of the vibrato that is deemed
important in providing identity to a singing voice (Herbst, et al., 2016). The sound stimuli are available
online
1
.
An updated version of the Timbre Toolbox (Kazazis et al., 2021) was used for extracting harmonic audio
features both from the musical phrases and the isolated vowels. The window size used for the musical
phrases was 4096 samples (512 samples hop size) while for the analysis of the vowels it was set at 2048
1
https://asteriszacharakis.wixsite.com/science/sounds
Proceedings of Timbre 2023, 3rd International Conference on Timbre, Thessaloniki, Greece
!
samples (512 samples hop size). The features extracted from the longer excerpts were: pitch (median and
standard deviation), normalized spectral centroid (median and standard deviation), frequency and amplitude
of energy modulation, Tristimulus 1, 2 and 3 (medians and standard deviations). For the short vowels, odd
even ratio, spectral variation, spectral deviation and inharmonicity (medians) were also added in the feature
set.
As mentioned above the timbral features were complemented by a calculation of the basic vibrato
characteristics from the isolated vowels, namely the rate, the extent and the regularity (Sundberg, 1995;
Herbst et al., 2016). The rate is defined as the dominant modulation frequency in Hz, the extent as the
maximum amplitude deviation from the mean in cents ((
!"#$%#"&
) and the regularity as the deviation of
the modulation pattern from a pure sinewave. For the latter we have adopted the formula by Herbst et al.
(2016) that estimates this deviation by comparing the amplitudes between the two most prominent
frequency components (A1 and A2) of the vibrato spectrum (
!'!"#$ ()
(1 – A2)/A1). This metric ranges
from 0 to indicate the presence of two equally strong frequency components to 1 for a vibrato featuring a
pure sinewave modulation pattern. Prior to the estimation of vibrato parameters, a DC removal was applied
to the extracted F0 along with some slight moving average smoothing (smoothing factor: .1 for the
smoothdata Matlab function). The full amplitude (ΔF) of the F0 modulation was calculated as the range
between the 5th and the 95th percentile to eliminate artifacts originating either from the audio file itself or
from the F0 estimation algorithm.
Results
The features with statistically significant different distributions (Mann-Whitney U Test, p<.05) for the
musical phrases between the two singers are shown in the boxplots of Figure 1. It is evident that the median
pitch sung by Ninou is higher than the median pitch of Bellou (324 Hz vs. 295 Hz). However, the
distribution of energy in the harmonic spectrum is skewed towards the higher partials for Bellou (median
normalised spectral centroid: 4.48 vs. 4 and median Tristimulus 3: .25 vs. .15) and is correspondingly more
concentrated on the fundamental frequency for Ninou (median Tristimulus 1: .22 vs. .35). Finally, the
standard deviation of the normalized spectral centroid is significantly higher for Ninou compared to Bellou
(median value 2 vs. 1.25).
The above findings for the longer phrases were also confirmed for the isolated vowels and the only
additional feature showing statistically significant difference (Mann-Whitney U Test, p<.05) between the
two singers was inharmonicity, with Bellou being significantly more inharmonic than Ninou (median
values: 0.137 vs. 0.122 correspondingly).
The analysis of vibrato characteristics also revealed statistically significant differences (Mann-Whitney U
Test, p<.05) between the two singers. It is obvious even from a visual inspection of Figure 2 that Ninou’s
vibrato had a faster rate compared to Bellou’s (median rate frequency = 5.8 vs. 2.6 Hz) and was also deeper
(Ninou’s median extent
*
36 cents vs. Bellou
*
24 cents). At the same time, they both featured quite low
levels of vibrato regularity according to the formula by Herbst et al. (2016). The median
!'+",-.
equals
.27 for Ninou and .19 for Bellou but not significantly different according to a Mann-Whitney U Test at
p<.05. The distributions of the vibrato features are shown in the boxplots of Figure 3.
Discussion
The above results show that it was possible to differentiate between the two singers and potentially identify
some sources of their distinctive vocal properties through a timbral feature analysis complemented with
extraction of vibrato characteristics. For example, Ninou, apart from reaching higher pitches in comparison
to Bellou, also featured a significantly stronger fundamental frequency. This is a consistent finding since a
strong fundamental is indicative of soprano voices (Khine, et al., 2008). On the contrary, Bellou possessed
Proceedings of Timbre 2023, 3rd International Conference on Timbre, Thessaloniki, Greece
!
Figure 2. The modulation of pitch for the 33 sustained vowels for each singer.
Figure 3. Boxplots showing the median distributions of rate, extent and regularity of
vibrato by Ninou and Bellou. Rate and extent feature statistically significant differences
(p < .05).
0 0.5 1 1.5
Time (s)
200
250
300
350
400
450
500
Pitch (Hz)
Ninou vowels
0 0.5 1 1.5 2
Time (s)
150
200
250
300
350
400
450
Pitch (Hz)
Bellou vowels
Bellou Ninou
0
5
10
15
Hz
Rate (Median)
Bellou Ninou
0
20
40
60
80
100
120
cents
Extent (Median)
Bellou Ninou
0
0.2
0.4
0.6
Regularity (Median)
Figure 1. Boxplots showing the median distributions of pitch, normalised spectral
centroid, normalised spectral centroid standard deviation and Tristimulus values for
musical phrases by both Bellou and Ninou. Apart for Tristimulus 2, every other
difference was statistically significant (p < .05).
Bellou Ninou
250
300
350
400
Hz
Pitch (Median)
Bellou Ninou
1
1.5
2
2.5
3
3.5
SC Norm Std (Median)
Bellou Ninou
3
4
5
6
7
SC Norm (Median)
Bellou Ninou
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5 T1 (Median)
Bellou Ninou
0.2
0.3
0.4
0.5
0.6 T2 (Median)
Bellou Ninou
0.1
0.2
0.3
0.4
T3 (Median)
Proceedings of Timbre 2023, 3rd International Conference on Timbre, Thessaloniki, Greece
!
a richer spectrum with a higher normalised spectral centroid and more energy in the partials from the 5th
and above (i.e., Tristimulus 3). The stronger variation of the normalized spectral centroid could be
potentially attributed to the stronger expressive characteristics of Ninou’s delivery, while the higher
inharmonicity of Bellou could be partially responsible for the metallic quality of her voice.
The vibrato attributes quantified what is apparent by ear. Ninou had a deeper and faster vibrato that falls
within the lower range that is exhibited by classical singers (Sundberg, 1994; Prame, 1997). On the contrary,
Bellou’s median rate was only at around 2.5 Hz, which is quite lower than the 4 Hz that usually defines the
lower end of vibrato for singers (Hirano et al., 1995). Similarly, the median extent of Bellou’s vibrato (24
cents) is below the lower end (30.9 cents) reported for contemporary popular music (Pecoraro et al., 2013)
but the median extent of Ninou (36 cents) is significantly higher and almost equal to the 35.38 cents reported
for Freddie Mercury (Herbst et al., 2016). The above show that vibrato properties were able to capture the
characteristically flat and mostly unembellished delivery by Bellou in contrast to the more expressive style
of Ninou. Finally, both singers demonstrated a quite irregular vibrato indicating a modulation pattern that
varied substantially from a pure sinusoid.
Such findings could be applied to enrich existing music databases with additional information regarding
the timbral properties of singers. In addition, future behavioural experiments on description of singing
voices could help determine the perceptual salience of each of the features employed in this work.
Acknowledgements
This research was Co‐financed by the European Regional Development Fund of the European Union and
Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation,
under the call Research– Create - Innovate (MIS 5149205). We would like to thank George Kokkonis and
Nikos Ordoulidis for kindly providing the dataset of the audio files for analysis and offering musicological
context around Marika Ninou, Sotiria Bellou, and Rebetiko in general.
References
Défossez, A. (2021). Hybrid spectrogram and waveform source separation. In Proceedings of the ISMIR
2021 Workshop on Music Source Separation.
Gunjawate, D. R., Ravi, R., & Bellur, R. (2018). Acoustic analysis of voice in singers: A systematic review.
Journal of Speech, Language, and Hearing Research, 61, 40-51.
Herbst, C. T., Hertegard, S., Zangger-Borch, D., & Lindestad, P. Å. (2016). Freddie Mercury—acoustic
analysis of speaking fundamental frequency, vibrato, and subharmonics. Logopedics Phoniatrics
Vocology, 42, 29-38.
Hirano, M., Hibi, S., & Hagino, S. (1995). Physiological aspects of vibrato. In Dejonckere, P. H., Hirano,
M., Sundberg, J. (Ed.), Vibrato (pp. 9-33). San Diego: Singular Publishing Group.
Holzapfel, A., & Stylianou, Y. (2007). Singer identification in Rembetiko music. In Conference on Sound
and Music Computing (SMC) (pp. 326-329). Sound and music Computing network.
Kazazis, S., Depalle, P., and McAdams, S. (2021). The Timbre Toolbox Version R2021a, User’s Manual.
Available online at: https://github.com/MPCL-McGill/TimbreToolbox-R2021a
Khine, S. Z. K., Nwe, T. L., & Li, H. (2008). Exploring perceptual based timbre feature for singer
identification. In Computer Music Modeling and Retrieval. Sense of Sounds: 4th International
Symposium, CMMR 2007, Copenhagen, Denmark, August 27-31, 2007. Revised Papers 4 (pp. 159-171).
Springer Berlin Heidelberg.
Prame, E. (1997). Vibrato extent and intonation in professional Western lyric singing. The Journal of the
Acoustical Society of America, 102, 616-621.
Proceedings of Timbre 2023, 3rd International Conference on Timbre, Thessaloniki, Greece
!
Pecoraro, G., Curcio, D. F., & Behlau, M. (2013). Vibrato rate variability in three professional singing
styles: Opera, Rock and Brazilian country. In Proceedings of Meetings on Acoustics ICA2013 (Vol. 19,
No. 1, p. 035026). Acoustical Society of America.
Rouard, S., Massa, F., & Défossez, A. (2023). Hybrid Transformers for Music Source Separation. ICASSP
23.
Sundberg, J. (1994). Acoustic and psychoacoustic aspects of vocal vibrato. STL-QPSR, 35, 45-68.