Conference PaperPDF Available

Impact of prosodic structure and information density on vowel space size

Authors:
Impact of prosodic structure and information density on vowel space size
Erika Schulz1, Yoon Mi Oh2, Zofia Malisz1, Bistra Andreeva1, Bernd M¨
obius1
1Computational Linguistics and Phonetics, Saarland University, Saarbr¨
ucken, Germany
2Laboratoire Dynamique du Langage, Universit´
e de Lyon and CNRS, France
{eschulz; zmalisz; andreeva; moebius}@coli.uni-saarland.de, yoon-mi.oh@univ-lyon2.fr
Abstract
We investigated the influence of prosodic structure and infor-
mation density on vowel space size. Vowels were measured in
five languages from the BonnTempo corpus, French, German,
Finnish, Czech, and Polish, each with three female and three
male speakers. Speakers read the text at normal, slow, and fast
speech rate. The Euclidean distance between vowel space mid-
point and formant values for each speaker was used as a mea-
sure for vowel distinctiveness. The prosodic model consisted of
prominence and boundary. Information density was calculated
for each language using the surprisal of the biphone Xn|Xn1.
On average, there is a positive relationship between vowel space
expansion and information density. Detailed analysis revealed
that this relationship did not hold for Finnish, and was only
weak for Polish. When vowel distinctiveness was modeled as
a function of prosodic factors and information density in lin-
ear mixed effects models (LMM), only prosodic factors were
significant in explaining the variance in vowel space expansion.
All prosodic factors, except word boundary, showed significant
positive results in LMM. Vowels were more distinct in stressed
syllables, before a prosodic boundary and at normal and slow
speech rate compared to fast speech.
Index Terms: vowel space, prosodic structure, speech rate, in-
formation density
1. Introduction
Vowel space size is influenced by several factors, such as sex
[1], speaking style [2], language redundancy and prosodic struc-
ture [3]. Regarding its relationship with speech rate, there have
been inconsistent findings. Some studies found strong vowel re-
duction as speech rate increases [4, 5] which was also reflected
in the perception of speech tempo [6], while others only found
minimal or no impact of speech rate on vowel space size [7, 8].
Fast speech rate is associated with overall shorter segment
durations leading to reduced spectral characteristics of vowels
[9]. This means that vowel formants move to a more central
position in the vowel space. These effects have also been ob-
served in German for naturally occurring differences in local
speech rate [5]. Also, with varied intended speech rates the US
English vowel space has been shown to be reduced in size with
increasing speech rate. Vowel space size also added to speech
intelligibility: normal and slow speech received higher ratings
in intelligibility than fast speech [4].
Small differences in vowel quality were found between
slow and fast speech of US English when the expansion of the
vowel space was described as the Euclidean distance between
the measured formant values and a neutral position of the for-
mant track [8]. This neutral position is described by a uniform
vocal tract [10] and is not based on the measured formant val-
ues. Spectral vowel reduction with increased tempo was neither
found in the analysis of Dutch read speech from newspapers [7]
nor in dynamic vowel measurements of Dutch [11]. The two
latter studies have only investigated speech by a single individ-
ual.
Prosodic structure, in general, is thought to have an influ-
ence on vowel space size [12]. In Aylett and Turk’s Smooth
Signal Redundancy hypothesis [13, 3] language redundancy and
acoustic redundancy show an inverse relationship which is me-
diated and implemented through prosodic structure. In their
study on the influence of prosodic structure and information
density on vowel characteristics in US English they investigated
read speech from the Rhetorical corpus [3]. Their language re-
dundancy model was designed using high, mid and low lan-
guage redundancy based on unigram, bigram and trigram prob-
abilities of syllables. The prosodic model defined prominence
and prosodic boundaries. Results of the study showed that vow-
els were more centralized with increased language redundancy,
vowel quality in prominent syllables was more distinct than in
syllables that were not prominent, and spectral characteristics
of vowels were also more distinct in syllables before prosodic
boundaries than in syllables at word or no boundary.
While Aylett and Turk [3] made use of the language redun-
dancy model described above there are alternative measures of
information density, e. g. surprisal which is frequently used in
psycholinguistic studies [14]. Surprisal S(Xn)measures the
surprise of encountering a linguistic unit Xnin a specific con-
text Xcbased on language models (LMs) which estimate the
distribution of sequences of linguistic units in a language [15]
(see eq. 1).
S(Xn) = log2P(Xn|Xc)(1)
The current study investigated the influence of information den-
sity and prosodic structure on vowel space size. French (FRA),
German (DEU), Polish (POL), Czech (CES), and Finnish (FIN)
were included in the analysis. Contrary to previous research [3],
the current study used data with various intended speech rates.
In addition, the language redundancy model and the measure-
ment of vowel distinctiveness differed in this cross-linguistic
approach from Aylett and Turk [3] to simplify comparisons be-
tween languages.
2. Method
2.1. Materials
A subset of the BonnTempo corpus [16] was analyzed with
three female and three male speakers of FRA, DEU, FIN, CES,
and POL. FIN was added to the BonnTempo corpus using the
original instructions [16]. Speakers were given an excerpt of
a novel in their native language, and were asked to familiar-
ize themselves with the text. Next, speakers were recorded at
what they considered to be reading at normal pace. Then, sub-
Speech Prosody 2016, May 31 - June 3, 2016, Boston, MA, USA
350
Table 1: Number of items and vowel identity per language.
Language Items Vowels
FRA 689 /i,e,a,u/
DEU 825 /i:, ı, e:,E:,E,a:,a,u:,U/
FIN 1178 /i,e:,e, æ, æ:,A:,A,u:,u/
POL 790 /i,E,a,u/
CES 1156 /i:, ı, E:,E,a:,a,u/
Table 2: Corpus and corpus size for language modeling.
Language Corpus No. of tokens (M)
FRA LEXIQUE 3.80 9.1
DEU WebCelex 4.6
FIN Finnish PAROLE 180
POL Frequency dictionary 901
CES Frequency dictionary 398
jects were asked to slow down, and to slow down even more.
In a third step, fast speech rate was recorded asking speakers
to speak fast, and speed up their speech rate until they consid-
ered they could not speed up any more. From these acceleration
steps, normal speech rate, as well as the first steps of slow and
fast speech rate were used for analysis.
The corpus was automatically segmented using SPPAS for
French [17] and WebMaus [18] for all other languages. Auto-
matic segmentation was manually verified by phonetic experts
based on the beginnings and ends of vowels which were marked
by clearly visible formant structure. Vowel phonemes were cho-
sen to facilitate comparative analysis between the different lan-
guages in the corpus. If available in the data, tense and lax
vowels in closed front, closed back, low and front mid position
were used for analysis (see table 1). The total number of ana-
lyzed tokens was 4638.
In order to build a LM, large text corpora were collected for
the five languages (see table 2). First, each corpus was cleaned
by removing erroneous entries. Then, the data was phoneti-
cally transcribed into IPA and, if not provided, automatically
syllabified. For French, Lexique 3.80 [19] was retrieved on-
line, which provides phonetic transcription and syllabification.
For German, the WebCelex corpus [20] includes syllabification,
transcription, and stress assignment. For Finnish, the Finnish
Parole Corpus [21] was acquired online. The data was auto-
matically converted into IPA by the speech synthesizer eSpeak
[22] and was automatically syllabified by a bash shell script.
For Polish, a frequency dictionary derived from large-scale web
corpora [23] was converted into IPA and syllabified by an auto-
matic tool for transcription and syllabification [24]. For Czech,
a frequency dictionary acquired from large-scale web corpora
[23] was automatically transcribed by eSpeak and syllabified
by a bash shell script.
2.2. Data analysis
F1 and F2 were measured at the temporal midpoint in vocalic
nuclei. Formant analysis was conducted with the Burg algo-
rithm in Praat [25] with a maximum of five formants, window
size of 0.025 sec, pre-emphasis from 50Hz, and a maximum
formant threshold of 5000 Hz (male speakers) and 5500 Hz (fe-
male speakers). Formant values were cleaned and manually
checked before speaker-dependent normalization was applied to
control for differences in formant values due to sex or speaker
[26]. As a measure for vowel distinctiveness, the Euclidean dis-
tance between midpoint of the vowel space and formant val-
ues for every vowel were calculated for each speaker [27]. The
larger the distance between the vowel space midpoint and indi-
vidual vowels gets, the more distinct is vowel quality. This mea-
sure is independent of differences in vowel inventory between
the languages because it assumes that vowel distinctiveness is
defined by vowel space expansion.
The prosodic model consisted of prominence and bound-
ary. Prominence was a binary factor using primary lexical stress
(stressed vs. unstressed) based on the BonnTempo corpus1.
If monosyllabic, function words were counted as unstressed,
whereas content words were identified as stressed. Boundary
showed three factor levels: none, word boundary, and high like-
lihood of prosodic boundary. Vowels were counted to be the nu-
cleus of a syllable with high likelihood before a prosodic bound-
ary when a pause of at least 100 msec followed [3].
As a measure of information density, surprisal values were
estimated from a biphone LM with function and content words,
taking the previous context into account. Contrary to Aylett and
Turk [3], we chose a biphone LM due to the restricted num-
ber of different syllables in the BonnTempo corpus. In addi-
tion, the relationship between information density and phonetic
structures is assumed to be better reflected by phoneme LMs
[30]. The factor levels high, medium and low language redun-
dancy [3] are not necessarily comparable across languages, in
contrast to the continuous variable surprisal. Surprisal values
were mean-centered.
3. Results
3.1. Correlation analysis
When averaged over all languages there was a positive relation-
ship between vowel distinctiveness and surprisal of the biphone
Xn|Xn1(r= 0.171, t(4636) = 11.808, p<.001). However,
this relationship did not hold for all languages when each lan-
guage was investigated individually. There was no significant
correlation between information density and vowel distinctive-
ness in Finnish, and only a small, but significant positive rela-
tionship for these measures in Polish (see fig. 1).
In addition, the previously observed positive relationship
between vowel distinctiveness and information density did not
hold for all vowel identities in the corpus. In CES and DEU,
all significant correlations described a positive relationship be-
tween the two variables for all vowel phonemes. Both lan-
guages also had the strongest positive correlation between in-
formation density and vowel distinctiveness (see fig. 1). The
most consistent positive results were found for /a,a:/ and /i,i:,
ı/, except for French /a/ (r= -0.163, p= 0.026) (see table 3).
3.2. Linear mixed effects model
We used R [31] and lmerTest [32] to perform a linear mixed
effects analysis of the relationship between the dependent vari-
able vowel distinctiveness and the fixed effects information den-
sity, vowel identity, speech rate, prosodic boundary, and promi-
nence. Since the correlation analysis suggested that the effect of
information density on vowel distinctiveness differs with vowel
identity, the models also included the interaction of surprisal
and vowel identity. Factor levels of vowel identity were ex-
pressed using deviation coding [33]. As random effects, we had
1In French, accent was marked on the last syllable of a phrase with
a full vowel [28, 29].
351
Figure 1: Pearson’s correlation of surprisal of the biphone
Xn|Xn1and vowel distinctiveness for all languages (** p
<0.01, *** p <0.001).
Table 3: Significant results of Pearson’s correlation of surprisal
of the biphone Xn|Xn1and vowel distinctiveness for each
vowel identity (* p <0.05, ** p <0.01, *** p <0.001).
LANG V rLANG V r
FRA i0.251*** DEU i: 0.336**
e-0.140* ı0.264***
a-0.163* e: 0.564***
FIN e: -0.635*** a: 0.476***
e0.296** a0.261***
æ 0.316*** u: 0.442**
A0.135* U0.229*
POL E-0.267*** CES ı 0.139*
a0.135** E: 0.680***
u-0.256* E0.241***
u0.848***
intercepts for word and language, as well as by-word random
slope for the effect of information density. Visual inspection of
residual plots did not show any obvious violation of the normal-
ity assumption or homoscedasticity. The model with the best fit
was identified by likelihood ratio tests [34] (see fig. 2).
Surprisal of the biphone Xn|Xn1did not affect vowel
distinctiveness significantly. Stress increased vowel distinctive-
ness significantly (t(2562) = 6.511, p<0.001) by 0.208 (SD =
0.032). Vowel space expanded by 0.031 (SD = 0.013) from fast
to normal speech (t(4274) = 2.482, p= 0.013), and expanded
even more by 0.115 (SD = 0.013) in slow speech compared to
fast speech (t(4285) = 9.150, p<0.001). Vowels in syllables be-
fore prosodic boundaries are by 0.104 (SD = 0.034) more dis-
tinct than vowels in no boundary position (t(3753) = 2.987, p
= 0.003). Significant results for vowel identity indicated differ-
ences in distance to the midpoint of the vowel space for different
vowel categories. Front mid vowel /e/ (d= -0.685 (SD = 0.030))
and low vowel /a/ (d= -0.249 (SD = 0.024)) were less dis-
tant from the vowel space midpoint than all vowels on average,
whereas high closed vowel /i/ was by 0.359 (SD = 0.031) more
distant from the midpoint than the average (t(2693) = 11.467, p
<0.001) (see fig. 2). The significant interactions between sur-
prisal and vowel identity /a/ and /e/ showed that low mid vowels
were by 0.157 (SD = 0.017, t(2756) = 9.043, p<0.001), and
front mid vowels were by 0.076 (SD = 0.017, t(3261) = 3.679,
Figure 2: Regression estimates for LMM with best fit (* p <0.05,
** p <0.01, *** p <0.001).
p<0.001) more expanded under high surprisal than all other
vowels on average.
Regarding the random structure of the model the intercept
for language explained 0.013 (SD = 0.116) of the variance of
the model, whereas intercept for word accounted for 0.299 (SD
= 0.547) of the variance. By-word random slope for the effect of
information density explained 0.167 (SD = 0.409) of the model
variance correlating only slightly with each other (r= 0.29).
The effect size of the entire model was ω2
0= 0.713.
4. Discussion
4.1. Correlation analysis
Averaged over all languages included in this paper there was
a positive relationship between vowel distinctiveness and sur-
prisal of the biphone Xn|Xn1. This relationship did not hold
when investigated for each language individually. The weak
positive relationship between the two variables found in Polish
can be due to the fact that there is (only slight) spectral vowel
reduction in this language [35, 36]. This finding contrasted with
previous observations on Polish as a language without spectral
vowel reduction [9]. In Finnish, on the other hand, vowel reduc-
tion is realized by means of differences in duration [37]. Also,
vowel quality in Finnish is morphophonemic. Front vowels (/æ,
y, ø/) never appear with back vowels (/A,u,o/) in the same lex-
eme [38] which is why vowel reduction would be detrimental to
identifying lexeme boundaries in this language.
As visible in figures 3 and 4, differences in vowel space
size at different speech rates are relatively small in these two
languages. The polygon area of the Polish vowel space changes
only slightly with tempo (A(slow) = 0.347 kHz2,A(normal) =
0.335 kHz2,A(fast) = 0.309 kHz2). Similar small effects were
observed for Finnish in the vowel space size under different
speech rates (A(slow) = 0.648 kHz2,A(normal) = 0.607 kHz2,
A(fast) = 0.522 kHz2).
When vowel distinctiveness and information density were
correlated for each vowel identity separately, it was observed
that general observations for one language did not necessarily
hold across languages. The two languages with the strongest
correlation, DEU and CES, unsurprisingly only showed signif-
icant positive correlations between the two variables. Although
there is no correlation between information density and vowel
distinctiveness averaged over all vowels in FIN, the short open
352
Figure 3: Normalised Finnish vowel space at different speech
rates.
Figure 4: Normalised Polish vowel space at different speech
rates.
mid vowels /æ, A/ and the short front mid vowel /e/ showed a
significant positive relationship. In FRA and POL, the over-
all positive result of correlations could not be replicated for all
vowel qualities. These findings were in line with Aylett and
Turk [3] who observed that their language redundancy model
did not explain the variance in all investigated US English vow-
els. However, it should be noted that there was only a small
number of items per vowel identity. Interestingly, vowel iden-
tities /i/ and /a/ showed the most consistent positive results in
correlation analysis. These vowel phonemes also denote the
maximum distance in perceptual contrast [39].
4.2. Linear mixed effects model
Contrary to previous findings [3], results of the LMM did not
show a significant effect of information density on vowel dis-
tinctiveness. However, we found significant effects for all
prosodic factors in the model, except for word boundary. The
higher the intended speech rate got, the more reduced was
vowel quality. This finding is in line with previous work which
found a connection between speech rate and vowel space size
[4, 5, 6]. Also, this study used the actual vowel space midpoint
for each speaker to measure vowel distinctiveness and not an
abstract neutral position [8], facilitating a more realistic esti-
mation of vowel distinctiveness at different speech rates. Re-
garding prominence stressed vowels were more distinct in their
vowel quality than unstressed vowels, as previously observed
[9, 8]. Also, vowels in syllables before prosodic boundaries
were more distinct in their spectral characteristics than vowels
in syllables at no boundaries [3]. Prominence was the strongest
predictor of vowel distinctiveness in this model.
Significant results for the fixed effect vowel identity indi-
cated that vowels differed in their distance to the vowel space
midpoint. Front mid vowels and low vowels were less distant
from vowel space midpoint than all vowels on average, while
closed front vowels were more distant than the average. This
finding might be due to the fact that the vowel space midpoint
was lowered towards the direction of vowel identities /e/ and
/a/ because they were more numerous in the corpus (n = 3078)
than closed vowels /i/ and /u/ (n = 1560). Open mid vowels
and to a lesser extent also front mid vowels expanded more than
all other vowel identities on average under high surprisal. This
interaction could be explained by jaw undershoot for open vow-
els which are highly predictable from their preceding context,
thereby reducing articulatory effort.
Despite the correlation analysis which revealed that infor-
mation density and vowel distinctiveness did not correlate pos-
itively in all languages, the intercept for language did only ex-
plain a small amount of variance in the LMM (0.013 (SD =
0.116)). This finding was due to the fact that this positive re-
lationship held for the majority of the languages investigated
in this study. Also, LMM with language as fixed factor did not
yield a significantly better log likelihood (log(L) = -2188.6) than
the model with language as intercept (log(L) = -2187.5; χ2(3) =
2.266, p= 0.519).
Vowel formants have been reported to be context-dependent
[9, 40]. However, this additional factor was not included in the
current study because of the relatively small number of inves-
tigated items. In order to minimize the contextual influence,
vowel formants were measured at the temporal midpoint of the
vowel. Assumed vowel targets with a small degree of spectral
change occur close to the midpoint of the vowel [9, 41], but c.f.
[42].
Further development of this research will involve alterna-
tive measures of vowel distinctiveness as well as a revision of
the prosodic model. In particular, we are interested in investi-
gating the relationship between vowel space expansion and re-
alised prominence.
5. Summary
The results of this study showed that the relationship between
information density and vowel distinctiveness depended on the
language and vowel identity under investigation. Averaged over
all languages there was a positive relationship between the two
variables: Vowel distinctiveness increased with increasing sur-
prisal. However, this relationship could not be observed when
vowel distinctiveness was modeled as a function of prosodic
factors and information density. Here, only prosodic factors had
an influence on vowel space size. This was in contrast to previ-
ous research for US English [3]. In addition, this study added
support to the literature that there is indeed a relationship be-
tween speech rate variation and differences in vowel space size.
6. Acknowledgements
This research was funded by the German Research Foundation
(DFG) as part of SFB 1102 ’Information Density and Linguistic
Encoding’ at Saarland University.
7. References
[1] A. P. Simpson and C. Ericsdotter, “Sex-specific differences in f0
and vowel space, in Proceedings of XVIth ICPhS, 2007, pp. 933–
936.
353
[2] A. R. Bradlow, N. Kraus, and E. Hayes, “Speaking clearly for
children with learning disabilities: Sentence perception in noise,”
Journal of Speech, Language, and Hearing Research, vol. 46, pp.
80–97, 2003.
[3] M. Aylett and A. Turk, “Language redundancy predicts syllabic
duration and the spectral characteristics of vocalic syllable nu-
clei,” Journal of the Acoustical Society of America, vol. 119, pp.
3048–3058, 2006.
[4] G. S. Turner, K. Tjaden, and G. Weismer, “The influence of speak-
ing rate on vowel space and speech intelligibility for individuals
with amyotrophic lateral sclerosis,” Journal of Speech and Hear-
ing Research, vol. 38, pp. 1001–1013, 1995.
[5] B. Weiss, “Rate dependent vowel reduction in German,” in Pro-
ceedings of the 12th SPECOM, Moscow, 2007.
[6] M. Weirich and A. P. Simpson, “Differences in acoustic vowel
space and the perception of speech tempo,” Journal of Phonetics,
vol. 43, pp. 1–10, 2014.
[7] R. J. J. H. van Son and L. C. W. Pols, “Formant frequencies of
Dutch vowels in a text, read at normal and fast rate, Journal of
the Acoustical Society of America, vol. 88, pp. 1683–1693, 1990.
[8] M. Fourakis, “Tempo, stress, and vowel reduction in American
English,” Journal of the Acoustical Society of America, vol. 90,
no. 4, pp. 1816–1827, 1991.
[9] B. Lindblom, “Spectroraphic study of vowel reduction,” Journal
of the Acoustical Society of America, vol. 35, no. 11, pp. 1773–
1781, November 1963.
[10] G. Fant, Acoustic theory of speech production. The Hague,
Netherlands: Mouton, 1970, vol. 2.
[11] L. C. W. Pols and R. J. J. H. van Son, “Acoustics and perception
of dynamic vowel segments,Speech Communication, vol. 13, pp.
135–147, 1993.
[12] D. R. van Bergem, “Acoustic vowel reduction as a function of
sentence accent, word stress, and word class,” Speech Communi-
cation, vol. 12, pp. 1–23, 1993.
[13] M. Aylett and A. Turk, “The smooth signal redundancy hy-
pothesis: A functional explanation for relationships between re-
dundancy, prosodic prominence, and duration in spontaneous
speech,” Language and Speech, vol. 47, no. 1, pp. 31–56, 2004.
[14] A. Frank and T. F. Jaeger, “Speaking rationally: Uniform infor-
mation density as an optimal strategy for language production,” in
CogSci 2008. Washington, DC, USA: Cognitive Science Society,
23-26 July 2008, pp. 939–944.
[15] C. D. Manning and H. Sch¨
utze, Foundations of Statistical Natural
Language Processing. Cambridge, MA: MIT Press, 1999.
[16] V. Dellwo, I. Steiner, B. Aschenberner, J. Dankovicova, and
P. Wagner, “BonnTempo-corpus and BonnTempo-tools: a
database for the study of speech rhythm and rate,” in Interspeech
2004, 2004, pp. 777–780.
[17] B. Bigi. (2013) SPPAS - Automatic Annotation of Speech.
Banque de donn´
ees parole et langage (SLDR/ORTOLANG).
[18] T. Kisler, F. Schiel, and H. Sloetjes, “Signal processing via web
services: the use case WebMAUS,” in Digital Humanities 2012,
Hamburg, Germany, 2012.
[19] B. New, C. Pallier, L. Ferrand, and R. Matos, “Une base
de donn´
ees lexicales du franc¸ais contemporain sur internet:
LEXIQUE 3.80,” LAnn´
ee Psychologique, vol. 101, pp. 447–462,
2001. [Online]. Available: http://www.lexique.org
[20] Max Planck Institute for Psycholinguistics. Webcelex. Retrieved
on March 18, 2013 and on August 6, 2014. [Online]. Available:
http://celex.mpi.nl
[21] Department of General Linguistics. (1996–1998) Finnish parole
corpus. University of Helsinki AND Institute for the Languages
of Finland. [Online]. Available: http://kaino.kotus.fi/sanat/
taajuuslista/parole.php
[22] J. Duddington. eSpeak text to speech. Retrieved on 1 February
2015. [Online]. Available: http://espeak.sourceforge.net/
[23] A. Zs´
eder, G. Recski, D. Varga, and A. Kornai, “Rapid creation
of large-scale corpora and frequency dictionaries,” in LREC 2012,
2012.
[24] A. Zeldes. (2008–2014) Automatic phonetic transcription and
syllable analysis. Georgetown University. [Online]. Available:
http://corpling.uis.georgetown.edu/amir/phon.php
[25] P. Boersma and D. Weenink. (2015) Praat: doing phonetics
by computer [computer program]. version 5.4.22. [Online].
Available: http://www.praat.org/
[26] B. M. Lobanov, “Classification of Russian vowels spoken by dif-
ferent speakers,” Journal of the Acoustical Society of America,
vol. 49, pp. 606–608, 1971.
[27] N. Amir and O. Amir, “Novel measures for vowel reduction,” in
ICPhS XVI, Saarbr¨
ucken, 2007, pp. 849–852.
[28] S.-A. Jun and C. Fougeron, “A phonological model of
french intonation,” in Intonation, ser. Text, Speech and
Language Technology, A. Botinis, Ed. Springer Netherlands,
2000, vol. 15, pp. 209–242. [Online]. Available: http:
//dx.doi.org/10.1007/978-94-011-4317-2 10
[29] C. F´
ery, “Final compression in French as a phrasal phenomenon,
in Perspectives on linguistic structure and context: Studies in hon-
our of Knud Lambrecht, S. Katz Bourns and L. L. Myers, Eds.
Amsterdam: John Benjamins, 2014, pp. 133–156.
[30] Y. M. Oh, C. Coupe, E. Marsico, and F. Pellegrino, “Bridging
phonological system and lexicon: Insights from a corpus study
of functional load,” Journal of Phonetics, vol. 53, pp. 153–176,
2015.
[31] R Core Team, R: A Language and Environment for Statistical
Computing, Vienna, Austria, 2015. [Online]. Available: http:
//www.R-project.org/
[32] A. Kuznetsova, P. Bruun Brockhoff, and R. Haubo Bojesen
Christensen, lmerTest: Tests in Linear Mixed Effects Models,
2014, r package version 2.0-20. [Online]. Available: http:
//CRAN.R-project.org/package=lmerTest
[33] M. Kuhn, contributions from Steve Weston, J. Wing, J. Forester,
and T. Thaler, contrast: A collection of contrast methods,
2013, r package version 0.19. [Online]. Available: http:
//CRAN.R-project.org/package=contrast
[34] A. Zeileis and T. Hothorn, “Diagnostic checking in regression re-
lationships,” R News, vol. 2, no. 3, pp. 7–10, 2002.
[35] P. M. Nowak, “Vowel reduction in Polish,” Ph.D. dissertation,
University of California, Berkeley, 2006.
[36] W. J. Barry and B. Andreeva, “Cross-language similarities and
differences in spontaneous speech patterns,” Journal of the Inter-
national Phonetic Association, vol. 31, no. 1, pp. 51–66, 2001.
[37] K. Suomi, J. H. Toivanen, and R. Ylitalo, Finnish sound structure.
Phonetics, phonology, phonotactics and prosody. Uolo Univer-
sity Press, 2008.
[38] R. Bertram, A. Pollatsek, and J. Hy ¨
on¨
a, “Morphological parsing
and the use of segmentation cues in reading Finnish compounds,”
Journal of Memory and Language, vol. 51, pp. 325–345, 2004.
[39] J. Liljencrants and B. Lindblom, “Numerical simulation of vowel
quality systems: The role of perceptual contrast,” Language,
vol. 48, no. 4, pp. 839–862, 1972.
[40] J. Hillenbrand and M. J. Clark, “Effects of consonant environment
on vowel formant patterns, Journal of the Acoustical Society of
America, vol. 109, no. 2, pp. 748–763, 2001.
[41] K. N. Stevens and A. S. House, “Perturbation of vowel articu-
lations by consonantal context: An acoustical study,Journal of
Speech, Language, and Hearing Research, vol. 6, pp. 111–128,
1963.
[42] J. Olive, J. van Santen, B. M¨
obius, and C. Shih, “Synthesis,” in
Multilingual Text-to-Speech Synthesis: The Bell Labs Approach,
R. Sproat, Ed. Dordrecht: Kluwer, 1998, ch. 7, pp. 191–228.
354
... Since German vowels in the stressed position and under high surprisal are known to be more dispersed in the vowel space (Malisz et al., 2018;Schulz et al., 2016), we would expect higher average F2 and lower average F1 values for front vowels in the stressed position and under high surprisal than for those in the unstressed position. Judging from the GAMM heatmaps for biphone surprisal of the preceding context, that is, the same surprisal measure as that used in our previous studies, we find the predicted pattern for front vowels. ...
... This means that prominent back vowels are produced with lower F2 values and prominent front vowels are produced with higher F2 values than their non-prominent counterparts. This effect is captured by expanded vowel dispersion for stressed vowels in German (Schulz et al., 2016) and could also be replicated in our study. The German vowel system, however, differentiates between tense and lax vowels, which can both stand in stressed or unstressed positions. ...
Article
Full-text available
Phonetic structures expand temporally and spectrally when they are difficult to predict from their context. To some extent, effects of predictability are modulated by prosodic structure. So far, studies on the impact of contextual predictability and prosody on phonetic structures have neglected the dynamic nature of the speech signal. This study investigates the impact of predictability and prominence on the dynamic structure of the first and second formants of German vowels. We expect to find differences in the formant movements between vowels standing in different predictability contexts and a modulation of this effect by prominence. First and second formant values are extracted from a large German corpus. Formant trajectories of peripheral vowels are modeled using generalized additive mixed models, which estimate nonlinear regressions between a dependent variable and predictors. Contextual predictability is measured as biphone and triphone surprisal based on a statistical German language model. We test for the effects of the information-theoretic measures surprisal and word frequency, as well as prominence, on formant movement, while controlling for vowel phonemes and duration. Primary lexical stress and vowel phonemes are significant predictors of first and second formant trajectory shape. We replicate previous findings that vowels are more dispersed in stressed syllables than in unstressed syllables. The interaction of stress and surprisal explains formant movement: unstressed vowels show more variability in their formant trajectory shape at different surprisal levels than stressed vowels. This work shows that effects of contextual predictability on fine phonetic detail can be observed not only in pointwise measures but also in dynamic features of phonetic segments.
... Vowels are more dispersed in high information content. [18,11] broadened this field by analyzing the effect and interaction of ID and prosodic structure on segmental vari-ability in production studies from a cross-language perspective, including six languages (American English, German, French, Finnish, Czech, and Polish). In these studies, as in the present one, ID is defined as contextual predictability, or surprisal (Equation 1) ...
... This study investigated whether Bulgarian L2 speakers of German behave similarly to German native speakers in their vowel dispersion in different surprisal contexts, and whether their vowel productions depended on their proficiency level of German. German vowels were more dispersed when they were difficult to predict from their preceding context [18,11]. Advanced L2 speakers showed a tendency to modulate their vowel productions in the same way as German natives with regard to ID factors, whereas intermediate L2 speakers were not able to make these distinctions. ...
... Lower uncertainty has been shown to be associated with shorter words, syllables and segments (Aylett and Turk, 2004;Cohen Priva, 2015) and more centralized vowels (Wright, 2004;Aylett and Turk, 2006;Munson, 2007;Malisz et al., 2018;Brandt et al., 2019). This has been demonstrated by studies that operationalized uncertainty by means of word frequency (Wright, 1979(Wright, , 2004Fosler-Lussier and Morgan, 1999;Bybee, 2002), conditional probability (Jurafsky et al., 2001a,b;Aylett and Turk, 2004;Bell et al., 2009), or informativity (Cohen Priva, 2015;Schulz et al., 2016;Malisz et al., 2018;Brandt et al., 2019Brandt et al., , 2021. Turk (2004, 2006)'s Smooth Signal Redundancy Hypothesis explains these reduction phenomena from an information theoretic perspective (Shannon, 1948), arguing that the amount of information in the speech signal is balanced against the amount of information conveyed at the syntagmatic level. ...
Article
Full-text available
The uncertainty associated with paradigmatic families has been shown to correlate with their phonetic characteristics in speech, suggesting that representations of complex sublexical relations between words are part of speaker knowledge. To better understand this, recent studies have used two-layer neural network models to examine the way paradigmatic uncertainty emerges in learning. However, to date this work has largely ignored the way choices about the representation of inflectional and grammatical functions (IFS) in models strongly influence what they subsequently learn. To explore the consequences of this, we investigate how representations of IFS in the input-output structures of learning models affect the capacity of uncertainty estimates derived from them to account for phonetic variability in speech. Specifically, we examine whether IFS are best represented as outputs to neural networks (as in previous studies) or as inputs by building models that embody both choices and examining their capacity to account for uncertainty effects in the formant trajectories of word final [ɐ], which in German discriminates around sixty different IFS. Overall, we find that formants are enhanced as the uncertainty associated with IFS decreases. This result dovetails with a growing number of studies of morphological and inflectional families that have shown that enhancement is associated with lower uncertainty in context. Importantly, we also find that in models where IFS serve as inputs—as our theoretical analysis suggests they ought to—its uncertainty measures provide better fits to the empirical variance observed in [ɐ] formants than models where IFS serve as outputs. This supports our suggestion that IFS serve as cognitive cues during speech production, and should be treated as such in modeling. It is also consistent with the idea that when IFS serve as inputs to a learning network. This maintains the distinction between those parts of the network that represent message and those that represent signal. We conclude by describing how maintaining a “signal-message-uncertainty distinction” can allow us to reconcile a range of apparently contradictory findings about the relationship between articulation and uncertainty in context.
... According to the Smooth Signal Redundancy Hypothesis Turk 2004, 2006), words that are syntagmatically more predictable are less informative and more redundant, thus phonetically reduced (see also Cohen Priva 2015;Schulz et al. 2016;Hall et al. 2018;Brandt et al. 2019;Le Maguer et al. 2016;Priva and Jaeger 2018;Jaeger 2010;Malisz et al. 2018). Since high frequency words are also syntagmatically more predictable, it follows that high frequency words are more redundant than low frequency words. ...
Article
Full-text available
Many theories of word structure in linguistics and morphological processing in cognitive psychology are grounded in a compositional perspective on the (mental) lexicon in which complex words are built up during speech production from sublexical elements such as morphemes, stems, and exponents. When combined with the hypothesis that storage in the lexicon is restricted to the irregular, the prediction follows that properties specific to regular inflected words cannot co-determine the phonetic realization of these inflected words. This study shows that the stem vowels of regular English inflected verb forms that are more frequent in their paradigm are produced with more enhanced articulatory gestures in the midsaggital plane, challenging compositional models of lexical processing. The effect of paradigmatic probability dovetails well with the Paradigmatic Enhancement Hypothesis and is consistent with a growing body of research indicating that the whole is more than its parts.
... Language use varies according to a number of factors, from pragmatic over cognitive to social. In on-line processing, it has been shown that specific forms of variation directly serve rational communicative goals by offering ways to modulate information density in language production, and there is ample evidence that particular linguistic choices are associated with specific levels of surprisal in language comprehension (Jaeger and Levy, 2007;Levy, 2008;Schulz et al., 2016;Delogu et al., 2017;Sikos et al., 2017). It is much less clear, however, what the communicative effects might be of particular linguistic choices recurring across interactants and interaction instances. ...
Article
Full-text available
We present empirical evidence of the communicative utility of conventionalization , i.e., convergence in linguistic usage over time, and diversification , i.e., linguistic items acquiring different, more specific usages/meanings. From a diachronic perspective, conventionalization plays a crucial role in language change as a condition for innovation and grammaticalization (Bybee, 2010; Schmid, 2015) and diversification is a cornerstone in the formation of sublanguages/registers, i.e., functional linguistic varieties (Halliday, 1988; Harris, 1991). While it is widely acknowledged that change in language use is primarily socio-culturally determined pushing towards greater linguistic expressivity, we here highlight the limiting function of communicative factors on diachronic linguistic variation showing that conventionalization and diversification are associated with a reduction of linguistic variability. To be able to observe effects of linguistic variability reduction, we first need a well-defined notion of choice in context. Linguistically, this implies the paradigmatic axis of linguistic organization, i.e., the sets of linguistic options available in a given or similar syntagmatic contexts. Here, we draw on word embeddings, weakly neural distributional language models that have recently been employed to model lexical-semantic change and allow us to approximate the notion of paradigm by neighbourhood in vector space. Second, we need to capture changes in paradigmatic variability, i.e. reduction/expansion of linguistic options in a given context. As a formal index of paradigmatic variability we use entropy, which measures the contribution of linguistic units (e.g., words) in predicting linguistic choice in bits of information. Using entropy provides us with a link to a communicative interpretation, as it is a well-established measure of communicative efficiency with implications for cognitive processing (Linzen and Jaeger, 2016; Venhuizen et al., 2019); also, entropy is negatively correlated with distance in (word embedding) spaces which in turn shows cognitive reflexes in certain language processing tasks (Mitchel et al., 2008; Auguste et al., 2017). In terms of domain we focus on science, looking at the diachronic development of scientific English from the 17th century to modern time. This provides us with a fairly constrained yet dynamic domain of discourse that has witnessed a powerful systematization throughout the centuries and developed specific linguistic conventions geared towards efficient communication. Overall, our study confirms the assumed trends of conventionalization and diversification shown by diachronically decreasing entropy, interspersed with local, temporary entropy highs pointing to phases of linguistic expansion pertaining primarily to introduction of new technical terminology.
... Vowels are also strengthened in their spectral features when they are difficult to predict from their context compared to easily predictable vowels [4]. Closely related languages, such as German [11] and Dutch [10], also seem to show the same positive relationship between vowel dispersion and predictability. These examples, among many others, illustrate that speakers' choices and listeners' preferences are affected by the occurrence probability and frequency of how such units are realized in a variety of contexts. ...
... We have evidence of this link at all linguistic levels. For example, at the phonetic level we find across languages that word informativity influences acoustic duration (Pellegrino et al. 2011) and vowel space size (Schulz et al. 2016); or at the syntactic level, we encounter omission of syntactic elements (e.g. complementizers or relativizers; Sikos et al. 2017) or condensation (e.g. ...
Article
Full-text available
We present a model of the linguistic development of scientific English from the mid-seventeenth to the late-nineteenth century, a period that witnessed significant political and social changes, including the evolution of modern science. There is a wealth of descriptive accounts of scientific English, both from a synchronic and a diachronic perspective, but only few attempts at a unified explanation of its evolution. The explanation we offer here is a communicative one: while external pressures (specialization, diversification) push for an increase in expressivity, communicative concerns pull toward convergence on particular options (conventionalization). What emerges over time is a code which is optimized for written, specialist communication, relying on specific linguistic means to modulate information content. As we show, this is achieved by the systematic interplay between lexis and grammar. The corpora we employ are the Royal Society Corpus (RSC) and for comparative purposes, the Corpus of Late Modern English (CLMET). We build various diachronic, computational n-gram language models of these corpora and then apply formal measures of information content (here: relative entropy and surprisal) to detect the linguistic features significantly contributing to diachronic change, estimate the (changing) level of information of features and capture the time course of change.
Article
Glossolalia can be regarded as an instance of speech production in which practitioners produce syllables in seemingly random sequences. However, a closer inspection of glossalalia's statistical properties reveals that sequences show a Zipfian pattern similar to natural languages, with some syllables being more probable than others. It is well established that statistical properties of sequences are implicitly learned, and that these statistical properties correlate with changes in kinematic and speech behavior. For speech, this means that more predictable items are phonetically shorter. Accordingly, we hypothesized for glossolalia that if practitioners have learned a serial pattern in glossolalia in the same manner as in natural languages, its statistical properties should correlate with its phonetic characteristics. Our hypothesis was supported. We find significantly shorter syllables associated with higher syllable probabilities in glossolalia. We discuss this finding in relation to theories about the sources of probability-related changes in the speech signal.
Chapter
By applying data-driven methods based on information theory, this study adds to previous work on the development of the scientific register by measuring the informativity of alternative phrasal structures shown to be involved in change in language use in 20th-century Scientific English. The analysis based on data-driven periodization shows compounds to be distinctive grammatical structures from the 1920s onwards in Proceedings A of the Royal Society of London. Compounds not only increase in frequency, but also show higher informativity than their less dense prepositional counterparts. Results also show that the lower the informativity of particular items, the more alternative, more informationally dense options might be favoured (e.g., of-phrases vs. compounds) - striving for communicative efficiency thus being one force shaping the scientific register.
Article
Full-text available
This paper presents acoustic and articulatory (ultrasound) data on vowel reduction in Polish. The analysis focuses on the question of whether the change in formant value in unstressed vowels can be explained by duration-driven undershoot alone or whether there is also evidence for additional stress-specific articulatory mechanisms that systematically affect vowel formants. On top of the expected durational differences between the stressed and unstressed conditions, the duration is manipulated by inducing changes in the speech rate. The observed vowel formants are compared to expected formants derived from the articulatory midsagittal tongue data in different conditions. The results show that the acoustic vowel space is reduced in size and raised in unstressed vowels compared to stressed vowels. Most of the spectral reduction can be explained by reduced vowel duration, but there is also an additional systematic effect of F1-lowering in unstressed non-high vowels that does not follow from tongue movement. The proposed interpretation is that spectral vowel reduction in Polish behaves largely as predicted by the undershoot model of vowel reduction, but the effect of undershoot is enhanced for low unstressed vowels, potentially by a stress marking strategy which involves raising the fundamental frequency.
Article
Full-text available
One of the frequent questions by users of the mixed model function lmer of the lme4 package has been: How can I get p values for the F and t tests for objects returned by lmer? The lmerTest package extends the 'lmerMod' class of the lme4 package, by overloading the anova and summary functions by providing p values for tests for fixed effects. We have implemented the Satterthwaite's method for approximating degrees of freedom for the t and F tests. We have also implemented the construction of Type I - III ANOVA tables. Furthermore, one may also obtain the summary as well as the anova table using the Kenward-Roger approximation for denominator degrees of freedom (based on the KRmodcomp function from the pbkrtest package). Some other convenient mixed model analysis tools such as a step method, that performs backward elimination of nonsignificant effects - both random and fixed, calculation of population means and multiple comparison tests together with plot facilities are provided by the package as well.
Article
Full-text available
In this paper, we propose a functional and cross-language perspective on the organization of phonological systems based on the notion of functional load (FL). Using large corpora, we quantitatively characterize the relationships between phonological components (segments, stress and tones) by estimating their role at the lexical level. In a first analysis, we examine the relative contribution of each phonological subsystem to the pool of lexical distinctions and compare the results between two tonal (Cantonese and Mandarin) and seven non-tonal languages (English, French, German, Italian, Japanese, Korean, and Swahili). The equal weight of vowels and tones in lexical distinction is confirmed as well as the phenomenon of consonantal bias – advocated in several psycholinguistic studies – in five languages (English, French, German, Italian, and Swahili), with various corpus configurations in order to assess the influence of morphology and usage frequency. Our results reflect a strong preference toward consonant-based distinctions rather than vowel-based distinctions in a reduced (lemmatized) configuration of the lexicon. This preference is nevertheless modulated when inflectional morphology and usage frequency were considered. A second analysis consists in a cross-language comparison of the internal FL distribution within vocalic and consonantal subsystems in nine languages. We observe uneven FL distributions with only a few salient high-FL contrasts. Shared trends in terms of the mostly employed phonological features are also revealed but a few language-specific patterns are also present. These results are discussed in terms of organization and processing of the mental lexicon.
Article
Full-text available
Novel measures for vowel reduction are presented here, for examining vowel space as a whole, and for quantifying reduction of individual vowels. These measures were used to evaluate the degree of vowel reduction in continuous speech, as manifested in the F1-F2 plane. The new measures were applied to a set of 1500 tokens, extracted from a database of spontaneous Hebrew speech (30 tokens of each vowel, recorded from five men and five women). Using a similarity measure, we found that vowels were reduced by a factor of 2.09 for men and by 2.93 for women. The reduced vowel space for men was more distorted than for women. Error measure estimations were larger for men in comparison to women (0.0714 versus 0.0525, respectively). While vowel reduction in women exhibited a relatively symmetric pattern across vowels, it showed a skewed pattern in men. This was attributed to a more pronounced reduction in the back vowels /o/ and /u/.
Article
Full-text available
Despite various studies describing longer segment durations and slower speaking rates in females than males, there appears to be a stereotype of women speaking faster than men. To investigate the mismatch between empirical evidence and this widespread stereotype, listening experiments were conducted to test whether a relationship between perceived tempo and acoustic vowel space size might exists. If a speaker traverses a larger acoustic vowel space than another speaker within the same time then this speaker might be perceived as speaking faster. To test this, two listening experiments with either exclusively female or male speakers but with varying vowel space sizes were conducted. Listeners were asked to rate the perceived speech tempo of same-sex speaker pairs. The stimuli were manipulated to have the same segment durations and f0 contour. Results indicate that a positive correlation between acoustic vowel space size and perceived speech tempo exists. Since females exhibit on average a larger acoustic vowel space than males, it is suggested that the stereotype of faster speaking women might arise from this.
Article
A significant body of evidence has accumulated indicating that vowel identification is influenced by spectral change patterns. For example, a large scale study of vowel formant patterns showed substantial improvements in category separability when a pattern classifier was trained on multiple samples of the formant pattern rather than a single sample at steady state [Hillenbrand et al., J. Acoust. Soc. Am. 97, 3099–3111 (1995)]. However, in the earlier study all utterances were recorded in a constant /hVd/ environment. The purpose of the present study was to determine whether a close relationship between vowel identity and spectral change patterns is maintained when the consonant environment is allowed to vary. Recordings were made of six men and six women producing the vowels /i, I,ε,æ,■,■,■,u/ in isolation and in CVC syllables. The CVC utterances consisted of all combinations of seven initial consonants (/h,b,d,g,p,t,k/) and six final consonants (/b,d,g,p,t,k/). Formant frequencies for F 1– F 3 were measured every 5 ms during the vowel nucleus using an interactive editing tool. Results showed highly significant effects of phonetic environment. As with an earlier study of this type, particularly large shifts in formant patterns were seen for rounded vowels in alveolar environments [K. Stevens and A. House, J. Speech Hear. Res. 6, 111−128 (1963)]. Despite these context effects, large improvements in category separability were observed when a pattern classifier incorporated spectral change information. [Work supported by NIH.]