ResearchPDF Available

Proceedings of the Winter School Speech Production and Perception: Learning and memory

Authors:

Abstract

Here are the proceedings of the winter school.
Chorin,
This w
o
Univer
s
Linguis
5
t
h
Sp
e
L
Germa
o
rk was
s
ity Saa
r
tics (Z
A
P
h
Inte
r
e
ec
P
L
ear
n
9th
ny
suppor
t
r
brücke
n
A
S) Berli
r
oce
e
r
nati
o
h P
r
P
er
c
n
ing
to 13t
t
ed by a
n
(UFA)
.
e
ding
s
o
nal
W
r
od
u
c
ep
t
g
an
d
h Jan
u
grant f
r
and th
e
s
of t
h
W
inte
r
u
cti
o
t
ion
d
Me
u
ary 2
0
r
om the
e
Leibni
z
h
e
r
Sch
o
o
n
a
:
m
or
y
0
17
Germa
n
z
Institu
t
o
ol
a
nd
y
n
-Frenc
h
t
e Gen
e
1
h
e
ral
2
Program
Monday 9th
Introduction – Canonical, exemplar representations
Sensorimotor learning in the lab
13:00-13:45
Registration
13:45-14:00
Introduction, basic information of the day
14:00-14:30
Susanne Fuchs
Motivation of the school, context and overview of the week.
14:30-15:30 Francesco Cangemi
The role of canonical forms in memory (and in the history of linguistics)
15:30-15:50
Lisa Morano
Looking for exemplar effects: testing the comprehension and memory
representations of reduced words in Dutch learners of French.
15:50-16:10
Coffee break
16:10-17:10
Amélie Rochet-Capellan
Investigating sensorimotor learning and its transfer in laboratory as a way to
assess the nature of representations in speech production.
17:10-17:30
Eugen Klein
Degree of flexibility of the acoustics-to-articulation mapping in vowels and
consonants.
17:30-17:50
Mareike Flögel
Adaptation to dichotically presented spectral and temporal real-time
perturbations of auditory-feedback.
17:50-18:20
General discussion
18:20-19:30
Break
19:30
Dinner
3
Tuesday 10th
S
p
eech motor learnin
g
and rehabilitation technics
9:00-9:15
Introduction, basic information of the day
9:15-10:15 Edwin Maas
Principles of motor learning: What are they and do they work for speech motor
learning?
10:15-10:35
Anne Hermes
Aging in speech motor control.
10:35-11:05
Cornelia Heyde
CV transitions in the fluent speech of people who stutter – a kinematic
approach using ultrasound.
11:05 -11 :2 0
Coffee break
11:20-12:20
Joanne Cleland
Using visual biofeedback to teach children with speech sound
disorders new articulations.
12:20-14:00
Lunch
14:00-15:00 Peter Birkholz
Visual biofeedback of speech movements by electro-optical stomatography.
15:00-15:20
Elina Rubertus
Anticipatory V-to-V coarticulation in German children and adults
– An ultrasound study.
15:20-15:40
Tom Starr Marshal
Transcription agreement of children with speech disorders using standard
speed and slow motion video.
15:40-16:00
Coffee break
16:00-17:00 Ian Howard
Models of speech motor learning, the role of reinforcement.
17:00-17:30
General discussion
17:30-18:50
Short presentation of the posters that will be presented on Wednesday
(3 min/posters)
18:50-19:30
Break
19:30-21:00
Dinner
21:00
Demo evening: bring your experimental set-up!
4
Wednesday 11th
Touristic event – Speech development - Posters
9:00-12:30
Free time, touristic and cultural activities
12:30-14:00
Lunch
14:00-15:00
Aude Noiray
- Acquisition of spoken language proficiency in relation to
phonological and speech motor control developments.
15:00-17:00
Posters and
coffee break
Mélaine Cherdieu et al.
Can arm-hand gestures help to learn arm-hand anatomical
structures?
Elisabet Eir Cortes
Motorics of speech movements in conditions of varying vocal effort.
Daniel Duran
Towards an exemplar-theoretic model of phonetic convergence.
Lei He & Volker Dellwo
Speaker-specific variability in intensity dynamic.
Friederike Charlotte Hechler
The role of feedback in production and perception.
Alexandre Hennequin, Amélie Rochet-Capellan, Marion Dohen
Auditory-visual perception of VCVs produced by people with Down
Syndrome.
Helena Levy, Lars Konieczny, Adriana Hanulíko
Long-term effects of accent exposure on perception and
production.
Monika Lindauer
Factors influencing early acquisition of German by bilinguals.
Louise McKeever
Speech motor control in autism.
Fereshteh Modaressi
Lexical Access: The interaction of phonological and semantic features.
Alexandra Kati Müller
German dialects.
Gediminas Schüppenhauer and Katarzyna Stoltmann
Short-term memory processes of German Sign Language speakers.
Tabea Thies, Anne Hermes, Doris Mücke
Atypical speech production of Essential Tremor Patients.
5
Anastasiia Tsukanova
Articulatory speech synthesis.
Eline van Knijff
Listening abilities and school success for children with a Cochlear
Implant in mainstream primary education.
Pauline Veenstra
Efficacy of ASR software for congenitally blind speakers.
Anna Womack
An MRI investigation of gestural interdependence of the tongue and
larynx during speech: My study in relation to learning and memory.
Sophia Wulfert
Consonant clusters as units of processing: the roles of frequency and
sonority Abstract of work in progress.
Johann Philipp Zöllner
Assessing transient disruptions in human functional speech networks
following tumor surgery of the dominant hemisphere.
17:00-18:00
Marion Dohen
Manual gestures in communication, language and speech development.
18:00-18:30
General discussion
18:30-19:30
Break
19:30
Dinner
21:00
Share your experiences and ideas as well as problems and questions
regarding communication and dissemination of research and results to a
broader public.
6
Thursday 12th
Neurophysiological basis of memory - Models of speech perception - Statistics
9:00-9:15
Introduction, basic information of the day
9:15-10:15
Simon Hanslmayr
Searching for memory in brain waves – The
synchronization/desynchronization Conundrum.
10:15-10:35
Orsolya Beatrix Kolozsvari
Audio-visual perception of familiar and unfamiliar syllables: a MEG study.
10:35-10:55
Coffee break
10:55-11:55 Leonardo Lancia
Balancing stability and flexibility in speech perception: paradigms, data and
models.
11:55-12:30
General discussion
12:30-14:00
Lunch
14:00-16:00
Martijn Wieling
Using generalized additive modeling for analyzing articulography data (and
other time series data).
16:00-16:20
Coffee break
16:20-17:50
Martijn Wieling
Statistics part2
17:50-19:30
Break
19:30
Dinner
7
Friday 13th
Listener-speaker, Multilingualism
9:00-10:00 Malte Viebahn, Sophie Dufour, Audrey Bürki
Imitation of sublexical speech segments.
10:00-10:20
Margarethe McDonald
Effect of cross-linguistic overlap and bilingual’s language dominance on
lexical co-activation.
10:20-10:40
Pamela Fuhrmeister
The role of native language interference in perceptual learning of non-
native
speech sounds.
10:40-11:00
Maria Dokovova
The effect of language mode on stop voice contrast cue weighting for
Bulgarian-English bilinguals
11:00 -11 :3 0
12:00
Closing session
Lunch
Organizing and scientific committee
Joanne Cleland
Lecturer at School of Psychological Sciences and Health, University of Strathclyde, Scotland, UK
Susanne Fuchs
Senior researcher at the Centre for General Linguistics (ZAS) Berlin, Germany
Amélie Rochet-Capellan
CNRS researcher at Gipsa-lab, CNRS, Grenoble Alpes University, Grenoble, France
Cornelia Heyde
PhD student and research assistant, Clinical Audiology, Speech and Language (CASL)
Research Centre, Queen Margaret University, Edinburgh, UK
Olivia Maky
Bachelor student at the HU Berlin studying Germanic Linguistics, Student assistant at
ZAS, Berlin, Germany
8
9
Monday, 9th of January 2017
10
Francesco Cangemi
The role of canonical forms in memory (and in the history of linguistics)
University of Cologne, Germany
Until a few decades ago, the consensus view on how words are learnt, memorised and
accessed revolved around linear, underspecified and abstract representations (or
“canonical forms”), such as /nəʊ/ for (British) English ‘know’. In the second half of the last
century, this view has been vigorously shaken by developments in a variety of seemingly
disconnected fields, such as philosophy (non-Aristotelian approaches to categoriality;
Wittgenstein 1953), phonetics (development of spectrographic techniques; Koenig et al.
1946), psychology (theoretical, experimental and simulation-based work on episodic
memory; Medin & Schaffer 1978, Hintzman 1986), informatics (tools for storage and
annotation of large datasets of spontaneous speech) and linguistic theory (modularity vs.
interfacing vs. integration of linguistic sub-components; Ohala 1990, Durand & Laks 2002).
As a result, much experimental and theoretical work has been devoted to demonstrate the
role of knowledge which is not easily captured by canonical forms, for example with the
exploration of phonetic detail and reduction. This body of research has successfully
showed the importance of non-canonical representations in a wide variety of domains,
from auditory recognition (Johnson 2004) to second language learning (Ernestus & Warner
2011). Thus, rather than further elaborating on the merits of non-canonical representations,
with the present submission I take the complementary path, and question the reasons
behind the long-lasting (and perhaps excessive) success of canonical forms in the first
place (Cangemi & Niebuhr, in press).
I will first argue that the primacy of canonical forms in the early stages of modern
linguistics can be explained on methodological grounds. Given the questions of interest in
the scientific community, symbolic, linear and minimalistic representations were the main
(if not only) tool available. This applies not only to written words in the early phase of
diachronic studies in comparative philology (on account of the available data sources;
Jones 1786, Bopp 1816, Schleicher 1861), but also to spoken words in the later phase on
synchronic studies of living languages (on account of the early focus on lexical meaning in
semiotics; Saussure 1916).
Then I will suggest that, while canonical forms enjoyed a lasting success in the research
and teaching practices of linguists, the very building blocks behind this notion were already
put under scrutiny in the first half of the 20th century. I will briefly review three threads of
work in this spirit, focussing respectively on non-lexical meaning (and thus beyond the
semiotic mode; Spitzer 1921, Jakubinskij 1923, Benveniste 1974), on non-linear
phonological representations (e.g. with the notion of prosodies; Firth 1948), and on the
impact of alphabetic writing on the development of phonemic awareness (rather than
innate phonemic awareness resulting in the creation of alphabets; Gelb 1952). I will
highlight how these relatively underrepresented aspects prepare the terrain for the
paradigm shift towards exemplar- and usage-based models. As such, the submission
offers a critical reflection on the reasons behind the success of canonical forms in
phonological representations and in models of memory for linguistic knowledge.
11
Lisa Morano
Looking for exemplar effects: testing the comprehension and memory
representations of reduced words in Dutch learners of French
Radboud University
L1 identity priming experiments (e.g. Tulving and Schacter, 1990) have repeatedly shown
that listeners recognize words faster when they occur for the second time (as "targets")
than when they occurred for the first time (as "primes"), especially if the two tokens share
fine acoustic characteristics such as the speaker's voice (i.e. both the prime and the
target are uttered by the same person; McLennan and Luce, 2005). The same findings
have also been reported with allophonic variation: e.g. better was recognized faster when
its intervocalic consonant was consistently produced either as a /t/ or as a flap for both
prime and target (McLennan, Luce, and Charles-Luce, 2003). These specificity (or
exemplar) effects suggest that participants stored the occurrences they heard with at
least a certain degree of acoustic detail, in the form of exemplars.
While most researchers now assume that the mental lexicon is hybrid in nature
(Goldinger, 2007), containing both abstract representations and exemplars, exemplar
research, however, has almost never been applied to second language (L2) learners. We
could find only one published study reporting exemplar effects for L2 learners
manipulating indexical variation (i.e. speaker's voice; Trofimovich, 2005). We wanted to
replicate this finding by manipulating acoustical (or allophonic) variation instead of
indexical (or speaker) variation and by using a lexical decision task instead of an
immediate repetition task. Indeed, it is likely that L2 learners use the same mechanisms
to deal with speaker variation (for example distinguishing between a male and a female
voice) in their L1 and their L2, while when dealing with acoustical variation, their L1
phonological filter is likely to interfere. Furthermore, a lexical decision task is a more
faithful measure of the comprehension process than an immediate repetition task which
also involves production.
One way to look at exemplar effects without manipulating speaker voice is to use
pronunciation variants resulting from reduction (as has been done by McLennan et al.,
2003). Reduction is the weakening or deleting of sounds (such as features, phonemes or
even whole syllables) compared to the canonical pronunciation. For example, in
American English, the word better is often pronounced as [bɛɾɚ] and yesterday can be
pronounced as [jɛʃej].
In our study, we investigated the reduction phenomenon of high vowel devoicing. In
casual French, in a word like la cité, the /i/ can be devoiced as the voicing fails to be re-
established in time after the /s/. Our research question was thus: Can L2 learners show
exemplar effects in a lexical decision task using devoicing?
Our target words were 24 bisyllabic words containing a high vowel (/i/, /y/, or /u/)
following a devoiced consonant in the first syllable, and they were always repeated either
as a variant match (i.e. both prime and target were either voiced or devoiced) or as a
variant mismatch (i.e. when the prime was voiced, the target was devoiced, and vice
versa). The prime and target were separated by ten to 100 trials. The remainder of the
264 trials included 72 fillers (twelve of which were repeated) and 96 pseudowords (36 of
which were repeated). For each experiment we tested 40 Dutch university students who
had studied French for four to seven years in high school.
In Experiment 1, we used different recordings (or tokens) for the primes and the targets,
meaning that even in case of a match, the prime (token A) and the target (token B) were
different recordings. In this way, we wanted to replicate the more ecologically valid testing
condition
s
speech c
o
rarely he
a
voiced t
o
devoiced
interactio
n
Hanique
e
prime an
d
leaners
w
Experime
target (w
h
effects n
o
significan
t
with 40
n
reaction t
i
In experi
m
1 as pri
m
graph): p
a
also dev
o
1 and 3 a
n
We are c
u
caused s
u
et al.'s c
o
more limi
t
very restr
study, we
and targe
effects. F
represent
Figure 1:
R
the prime
w
devoiced t
a
devoiced ta
Referenc
e
Goldinger,
perc
e
Hanique I.
Lexi
c
McLennan
spo
k
321.
s
of Haniq
o
mprehen
s
a
r the exa
c
o
kens wer
tokens t
h
n
between
e
t al. (20
1
d
target,
w
w
hen using
nt 2a was
h
ile it was
o
r any exe
m
t
ly less a
c
n
ew partic
i
i
mes betw
e
m
ent 3, we
m
e (token
a
rticipants
o
iced and
v
nd the ma
t
u
rrently in
v
u
ch differe
o
nclusion t
t
ed than it
ictive con
d
could onl
t, and onl
y
inally, we
ation in th
e
R
eaction tim
e
w
as also an
s
a
rget / voice
rget) condi
t
e
s
S. (2007).
A
e
ption. Pro
c
, E. Aalder
s
c
on 8, 269-
2
, C.T. & P.
A
k
en word re
c
ue, Aalde
s
ion, this
c
t same to
e answer
e
h
an the
v
voicing a
1
3) all the
w
e wanted
the same
identical
t
used as
t
m
plar effe
c
c
curate an
d
i
pants (E
x
e
en the m
a
used for
p
A). We
o
were fast
e
v
ice versa
.
t
ching of
p
v
estigatin
g
nt results.
hat the ro
l
is currentl
y
d
itions (on
l
y find ex
e
y
for token
are worki
n
e
mental l
e
e
s (from wor
d
s
wered to co
d prime-voi
c
t
ion.
A
complem
c
. Of ICPh
S
s
, & M. Ern
e
2
94.
A
. Luce (20
0
c
ognition.
J
rs, and E
r
condition
ken twice.
e
d to sig
n
v
oiced on
e
nd matchi
previous
to test w
h
token for
p
t
o Experi
m
t
arget onl
y
c
ts (cf. Fig
u
d
slower t
h
periment
2
a
tch and
m
p
rime and
o
btained s
e
r at resp
o
.
We also
f
rime and t
g
what typ
e
In any ca
s
l
e of exe
m
y
assume
d
l
y one in f
o
mplar eff
e
A, althou
g
n
g on ho
w
e
xicon.
d
onset in mi
rrectly, for al
c
ed target) a
n
entary syst
e
S
, pp. 49-5
4
e
stus (201
3
0
5). Exami
n
ournal of E
r
nestus (2
0
is more f
a
Our resul
n
ificantly
f
e
s, and c
o
ng of the
literature
h
ether ex
e
p
rime and
m
ent 1 exc
e
y
in Experi
u
re 1, thir
d
h
an in ex
p
2
b). Agai
n
m
ismatch
c
target the
s
ignificant
o
nding to
a
found a si
g
t
arget.
e
of diffe
r
e
s
e, our mi
x
m
plars in s
d
. Haniqu
e
o
ur of thei
r
e
cts when
g
h it is unc
l
w
to interp
i
lliseconds)
t
l experimen
t
n
d mismatc
h
ems appro
a
4
. Saarbrüc
k
3
). How rob
u
n
ing the tim
e
xp. Psy.: L
e
0
13). If e
x
a
ithful to r
ts (cf. figu
f
aster wh
e
o
nversely
primes an
on exem
p
e
mplar eff
e
target in t
h
e
pt that to
ment 1).
O
d
graph).
G
p
eriment 1
,
n
, there w
a
c
onditions
(
same tok
e
exemplar
a
devoiced
g
nificant i
n
e
nce in th
e
x
ed result
s
peech co
m
e
et al. cou
r
experim
e
the same
l
ear why t
o
ret what t
h
o the correc
t
t
s, and divid
e
h
(devoiced
p
a
ch to abst
r
k
en.
u
st are exe
m
e
course o
f
e
arning, M
e
x
emplars
a
e
al-life sit
u
r
e 1, first
g
e
n they
w
(as we
f
d targets)
.
p
lars uses
e
cts could
h
e match
c
k
en B wa
s
O
ur result
s
G
iven that
t
,
we ran t
h
a
s no sig
n
(
cf. Figure
e
n that wa
s
effects (c
f
target pr
o
n
teraction
e
tokens (
A
s
go in the
m
prehensi
o
ld only fin
d
e
nts with
D
recording
o
ken B did
h
ese findi
n
t
ly answere
d
e
d by the m
a
p
rime-voiced
r
act and ep
i
m
plar effec
t
f
indexical s
e
mory and
C
a
re actuall
y
u
ations in
g
raph) sh
o
w
ere prim
e
f
ound no
. Given th
a
the sam
e
be obser
v
c
ondition.
s
used for
s
show no
t
he partici
p
h
e experi
m
n
ificant di
f
1, fourth
g
s
used in
e
c
f. Figure
o
vided the
between
e
A
and B)
c
direction
o
o
n is prob
a
d
exempla
D
utch nativ
was use
d
not trigge
r
n
gs mean
d
target item
s
a
tch (devoic
e
target / voi
c
isodic spe
e
t
s? The Me
s
pecificity e
f
C
ognition 3
1
2
y
used fo
r
which w
e
o
w that th
e
e
d by th
e
significan
t
a
t beside
s
e
token fo
r
v
ed for L
2
prime an
d
mismatc
h
p
ants wer
e
m
ent agai
n
f
ference i
n
g
raph).
e
xperimen
t
1, secon
d
prime wa
s
e
xperimen
t
c
ould hav
e
o
f Haniqu
e
a
bly muc
h
r effects i
n
es). In ou
r
d
for prim
e
r
exempla
r
for word
s
s
provided
e
d prime-
ed prime-
ch
ntal
f
fects in
1, 306-
2
r
e
e
e
t
s
r
2
d
h
e
n
n
t
d
s
t
e
e
h
n
r
e
r
s
13
McLennan, C.T., P.A. Luce, & J. Charles-Luce (2003). Representation of lexical form. Journal of
Experimental Psychology: Learning, Memory, and Cognition 29, 539-553.
Trofimovich, P. (2005). Spoken-word processing in native and second languages: An investigation
of auditory word priming. Applied Psycholinguistics 26, 479-504.
Tulving, E., & D.L., Schacter (1990). Priming and human memory systems. Science 247(4940),
301-306..
14
Amélie Rochet-Capellan
Sensorimotor learning in laboratory and transfer of learning as a way to
assess the nature of representations in speech production
Univ. Grenoble Alpes, CNRS, GIPSA-lab Grenoble, France
Sensorimotor learning in laboratory and transfer of learning as a way to assess the nature
of representations in speech production Since the end of the 19th century, scientists have
developed paradigms to investigate sensorimotor recalibrations in laboratory. This was first
done by psychologists using prismatic vision in the case of visuo-motor coupling. It was
then applied later to the production of vowel sounds using the “auditory prism” consisting
in real time alteration of auditory feedback during vowel production (Houde and Jordan,
1998). Using this paradigm, numerous studies have shown that people compensate for
and adapt to the perturbation: they learn new auditory-motor associations.
Sensorimotor learning paradigms are also appropriate experimental situations to question
principles of motor learning and, in particular, the links between sensory and motor
representations and their plasticity. Beyond these questions, these paradigms also allow
for testing generalization or transfer effects: once a speaker has learnt to change her
motor commands for the vowel /ε/, it is possible, for example, to evaluate if this learning
also applied to other vowels or to the same vowel in other contexts. In motor learning and
motor control literature, transfer of learning is considered as a behavioral window towards
the way mental representations or mental processes supporting actions are built.
In this talk, I will first introduce methods that enable the investigation of sensorimotor
learning in laboratory and the questions they make it possible to address. I will focus more
deeply on auditory-motor learning in vowel production. I will then present the empirical
researches we conducted to question the specificity of auditory-motor learning and then,
the nature of phonological representations underlying speech production.
15
Eugen Klein
Degree of flexibility of the acoustics to articulation mapping in vowels
and consonants
Humboldt University of Berlin
A recent experiment combining auditory and articulatory perturbation has shown that
participants
did not compensate for articulatory perturbations as long as the produced
auditory signal matched
their desired output (Feng, Gracco, and Max, 2011). To test for
this, participants’ jaw was pulled
up by a robotic arm, which would have resulted in a
deviant first formant (F1), but at the same
time F1 frequency was auditorily perturbed
upwards such that there was no impact of the applied
kinetic force on the acoustic
signal. On the other hand, in the condition without applied auditory
perturbation, which
would correct for the articulatory perturbation, participants articulatorily
compensated for
the pulling of the robotic arm. These findings suggest that participants monitor
their
speech production in terms of acoustic targets as they seem to accept a modified
articulatory
configuration as long as the acoustic output conforms to their production
intention. Furthermore,
this result implies high flexibility of the acoustics-to-articulation
(A-to-A) mapping and the
possibility that participants are able to produce intended
speech sounds with different articulatory
configurations without the necessity to relearn
these relations. However, several questions
remain. Which degree of complexity in
the A-to-A mapping is feasible for a speaker? How
flexible is the (re-)mapping
mechanism and how reliable is it in producing the desired speech
output? Which role
do articulatory constraints play in this mechanism?
Speech of seven native Russian speakers was acoustically recorded while they were
producing CV syllables which contained the central unrounded vowel /ɨ/ (Klein,
Brunner, and
Hoole, 2016). The second formant (F2) of the vowel /ɨ/ was perturbed
in real-time during the
production and feed back to the participant via insert
headphones. From previous auditory
perturbation studies it is known that participants
start to produce formant values opposing the
direction of the applied perturbation
(e.g., McDonald, Goldberg, and Munhall, 2010). Based on
this general finding, the F2
was shifted upwards on one half of the trials and downwards on the
other half in order
to encourage participants to produce the vowel /ɨ/ with two different
articulatory
configurations. The trials were presented in random order. The direction of the
perturbation was dependent on the consonant preceding /ɨ/.
Each experimental session contained three phases which differed in the amount of
the
applied perturbation from 220 to 520 Hz. First analyses have shown that
participants produced
significantly different F2 values for the different tokens of /ɨ/
depending on the direction and the
amount of applied perturbation (cf. Figure 1). This
finding suggests that participants used
different articulatory configurations in order to
produce the target vowel /ɨ/.
F
s
r
F
As seen
i
apart as
participan
sound.
W
The co
m
perturbati
o
compens
a
consonan
the audit
o
In order
perturbati
o
currently
currently
amount
o
the sylla
b
provide i
n
Referenc
e
Feng, Y.,
G
the
n
Klein, E.,
B
pho
n
Con
f
MacDonal
d
form
a
F
igure 1: F
2
s
plit by
re
s
r
epresent s
a
F
2 is pertur
b
i
n Figure
1
the amou
n
ts strongl
y
W
e further
m
pensation
o
n phase
a
tion was
n
ts with th
e
o
rily guide
d
to asse
s
o
n phase
s
being a
n
being car
r
o
f compen
s
b
les /dɨ/
a
n
sights int
o
e
s:
G
racco, V.
L
n
eural cont
r
B
runner, J.
n
emic
repr
e
f
erence on
d
, E. N.,
G
a
nt
pertur
b
2
values pr
o
s
ponse. D
a
a
mple mea
b
ed upwar
d
1
, the F2
v
n
t of audit
o
y
relied o
n
can infer
occurs
(50 trial
s
bigger fo
r
e
i
r
differe
n
d
remappi
n
s
s the a
r
s
, the sp
e
n
alyzed.
A
r
ied out t
o
s
ation in t
h
a
nd /gɨ/. T
o
the A-to-
A
L
., Max, L.
r
ol
of spee
c
& Hoole,
P
e
sentation
s
Phonetics
a
G
oldberg,
R
b
ations of d
i
o
duced duri
a
ta is poole
d
ns. For /dɨ
/,
d
s for /gɨ/.
v
alues for
o
ry pertur
b
n
the aco
u
from the
rathe
r
i
m
s
). In Fi
g
r
/gɨ/ com
n
t constric
n
g mecha
n
t
iculatory
e
ctral mo
m
A
lso, dat
a
o
assess
t
h
e vowel
/
his data
w
A
mapping
2011. Inte
g
c
h movem
e
P
. 2016. R
e
s
. In C. D
a
nd Phono
l
R
., Munhall,
fferent ma
g
ng the bas
e
d
across s
e
,
F2 is pert
u
the two t
e
b
ation incr
e
u
stic infor
m
data that
m
mediately
g
ure 1,
w
pared to
/
c
tion sites
n
ism.
stability
o
m
ents of t
h
a
collecti
o
t
he influe
n
/
ɨ/ by inte
r
w
ill be pr
e
g
of conso
n
g
ration of
a
e
nts. J. Ne
u
e
lation bet
w
raxler &
F
l
og
y
(P&P
)
K. G. 201
g
nitudes. J.
e
line and p
e
e
ven partici
u
rbed dow
n
e
sted syll
a
e
ased. Thi
s
m
ation to
p
the A-to-
A
judging
w
e also
o
/
dɨ/. This
acted as
o
f the c
o
h
e burst-
o
o
n of ad
n
ce of the
r
changing
e
sented
d
n
ants.
a
uditory an
d
u
rophysiol.,
w
een articul
a
F
. Kleber (
. München
:
0. Compe
n
Acoust. S
o
e
rturbation
p
p
ants. Red
wards, wh
e
a
bles /dɨ/
a
s
result d
e
p
roduce t
h
A
mappin
g
from the
o
bserve t
h
s
uggests
articulator
y
o
nsonants
o
nsets for
d
itional s
precedin
g
the pertu
r
d
uring the
d
somatos
e
106, 667–
a
tory and
a
E
ds.), Pro
c
:
LMU.
n
sations in
o
c. Am., 12
7
p
hases
dots
e
reas
a
nd /gɨ/ ke
e
monstrat
e
h
e intende
g
is highl
y
duration
h
at the a
m
that the
p
r
y constrai
across
t
all partici
p
even sp
e
g
consona
r
bation dir
e
winter s
c
e
nsory erro
679.
a
coustic in
f
c
eedings
o
response
t
7
, 1059-10
6
1
6
pt drifting
s that the
d speech
y
flexible.
of each
m
ount of
p
receding
nts upon
t
he three
p
ants are
e
akers is
nt on the
e
ctions in
c
hool and
r
signals i
n
f
ormation i
n
o
f the 12t
h
t
o real-tim
e
6
8.
6
n
n
h
e
17
Mareike Flögel
Adaptation to dichotically presented spectral and temporal real time
perturbations of auditory feedback
Goethe University, Frankfurt
Speech production is influenced by auditory feedback. Perturbing spectral or temporal
aspects of the auditory feedback in real-time during speech production, prompts
speaker to alter their production to compensate for the disturbance (Villacorta et al.
2007; Mitsuya et al. 2014). Models of speech production assume that auditory
feedback affects speech production via processes in the right hemisphere (Tourville &
Guenther 2011). Because, so far, only spectral feedback manipulations were
investigated, it is unclear whether the observed right-lateralization reflects a right
hemispheric specialization for feedback analyses in general (Tourville & Guenther 2011)
or a right hemispheric specialization for spectral processing (Zatorre & Belin 2001).
Thus, we tested whether the adaptation to perturbations of spectral and temporal speech
features lateralizes differently. German speakers’ auditory feedback was altered during
the production of CVC monosyllabic pseudowords either spectrally (data will be
shown) or temporally (work in progress). Spectral perturbations increased vowel’s F1
frequency (20% over 40 trials in steps of 0.05%). Temporal manipulations decelerated
CVC pseudowords locally (Tourville et al. 2013). Auditory feedback was presented
dichotically (feedback manipulation only in one ear while the other ear perceived
unperturbed feedback) or diotically (perturbed/unperturbed feedback in both ears).
Over trials participants decreased produced vowels’ F1 to compensate for the spectral
feedback perturbation if perturbed auditory feedback was presented to both ears, or only
to the left ear. In contrast, if only the right ear perceived altered feedback no significant
change in produced F1 frequency was observed. This indicates that indeed the right
more than the left hemisphere processes auditory feedback to adapt to auditory real-time
perturbations of spectral speech features. Investigating adaptation to temporally altered
speech feedback is still in progress.
References:
Villacorta, V. M., Perkell, J. S., & Guenther, F. H. (2007). Sensorimotor adaptation to feedback
perturbations of vowel acoustics and its relation to perception. The Journal of the
Acoustical Society of America 122(4), 2306–2319. doi:10.1121/1.2773966.
Mitsuya, T. E., MacDonald, N., & Munhall, K.G. (2014). Temporal Control and Compensation for
Perturbed Voicing Feedback. The Journal of the Acoustical Society of America, 135(5),
2986–2994. doi: 10.1121/1.4871359.
Tourville, J. A., & Guenther, F.H. (2011). The DIVA Model: A Neural Theory of Speech Acquisition
and Production. Language and Cognitive Processes 26(7), 952–81.
doi:10.1080/01690960903498424.
Zatorre, R.J., & Belin, P. (2001). Spectral and Temporal Processing in Human Auditory Cortex.
Cerebral Cortex, 11(10), 946-953. doi:10.1093/cercor/11.10.946.
Tourville, J.A., Cai, S. & Guenther, F.H. (2013). Exploring auditory-motor interactions in normal
and disordered speech. Proceedings of Meeting on Acoustics. 9:060180.
18
19
Tuesday, 10th of January 2017
20
Edwin Maas
Principles of motor learning: What are they and do they work for speech
motor learning?
Department of Communication Sciences and Disorders, Temple University,
Philadelphia, USA
People with speech disorders may require prolonged speech therapy to achieve
improvements of speech motor skill. In most countries, health care resources (money,
clinician time) are limited, posing a significant challenge to achieving optimal treatment
outcomes. In the last decade or two, researchers and clinicians have turned to the motor
learning literature to find ways to maximize treatment outcomes in the face of these
limitations. In particular, there has been growing interest in so-called principles of motor
learning (e.g., Maas et al., 2008). Principles of motor learning refer to the use of practice
and feedback conditions that have been found to facilitate learning (retention and transfer)
of other motor skills. The relevance of such principles of motor learning lies in the notion
that speech production involves a complex motor skill, which requires considerable
practice and experience to acquire.
This presentation will introduce several principles of motor learning and review available
evidence regarding their application to speech motor learning. We will also discuss
explanations of why or how these practice and feedback conditions are thought to operate,
so that attendees can make inferences regarding the likely benefit of a given condition for
a given situation even when direct evidence is (as yet) unavailable.
21
Anne Hermes
Aging in speech motor control
University of Cologne
In 2014 nearly 19% of Europe’s population was over 65. This demographic change is
one of the major
challenges faced by the social, biological and health sciences. Ageing
is an inevitable natural process and entails changes at several physiological levels,
including the central nervous system, the musculoskeletal system, the skeletal system,
the cardiovascular system and the respiratory system (Jacobs-Condit & Ortenzo,
1985). Crucially, increasing age affects motor control, involving a slowing down of
movements and deficits in the coordination of these movements. Although ageing has
been reported to lead to a general slowing down and a high degree of variability in
speech production, very little is known about how it affects speech in terms of motor
coordination.
Effects of ageing on motor control in general:
The most striking effect of ageing on motor control in general is that movements are
slowed down, both in their initiation and in their execution (Cooke et al. 1989; Seidler et
al. 2002). This process of slowing down crucially affects the entire structure of the
movements. Movement patterns in older individuals show an asymmetrical pattern in
gestural intervals (Brown, 1996; Cooke et al., 1989) as opposed to a rather symmetrical
pattern for younger individuals (Hogan, 1984). Motor control in ageing individuals also
entails a high amount of variability in limb coordination, owing to a decrease in
accuracy, which in turn results in coordination deficits (Brown, 1996; Cooke et al., 1989).
These age-related deficits play an important role in motor control, since “[c]oordination
is a part of most tasks of daily living and therefore it is essential to understand breakdowns
in control and regulation” (Ketcham & Stelmach, 2004:6). Under time pressure older
individuals perform better on simultaneous movements (in-phase, both hands go up
and down in synchrony) than on alternating movements (anti-phase, hands go up and
down in opposite directions).
Effects of ageing on speech:
Coordinating the movement of the articulators to produce speech is referred to as speech
motor control. Speech almost exclusively involves fine motor control with the millimetre
precision and split-second timing needed to perform this highly complex task. As in
motor control in general, a commonly reported effect of ageing on speech is that the
tempo is slower (Amerman & Parnell, 1992; Ramig 1983). Our knowledge of ageing
effects on speech motor control is limited by the fact that most studies are primarily
based on acoustic recordings, precluding a detailed analysis of articulatory coordination
patterns. It is unlikely that age-related speech rate reduction compares to a deliberate
rate reduction in younger individuals (e.g. when attempting to speak clearly), much like
a slower walking tempo due to ageing is not the same as an intentionally slower walking
tempo at a younger age (Kang &
Dingwell, 2008). Mefferd & Corder, 2014 state that
“relatively little is known about how ageing affects the speech motor system, and its
potential contribution to a slowed rate of speech” (Mefferd & Corder, 2014:347). Little
is known about how ageing affects the coordination of articulators and whether or
not
coordination deficits appear (cf. motor control in general).
Present study:
This study investigates ageing in speech motor control rather than estimating from the
22
acoustic signal only. The project focuses on healthy ageing subjects rather than on
subjects with speech and language disorders. This problem is addressed by analysing
articulatory coordination patterns in ageing individuals using electromagnetic articulo-
graphy (EMA) in order to track the movements of the speech articulators. In a preliminary
analysis on speech of ageing individuals, I will analyse gestural coordination patterns
within the framework of Articulatory Phonology (Browman & Goldstein 2000), capturing
velocity profiles and the degree of variability in the realisation of syllable patterns in
German.
References
Amerman, J. D., & Parnell, M. M. (1992). Speech timing strategies in elderly adults. Journal of
Phonetics, 20(1), 65-76.
Browman, C. P., & Goldstein, L. (2000). Competing constraints on intergestural coordination and
self-
organization of phonological structures. Bulletin De La Communication Parlée, 5, 25–
34.
Brown, S. H. (1996). Control of Simple arm Movements in the Elderly. Changes in Sensory Motor
Behavior in Aging (Vol. 114, pp. 27–52). Elsevier Masson SAS.
Cooke, J. D., Brown, S. H., & Cunningham, D. A. (1989). Kinematics of arm movements in elderly
humans. Neurobiology of Aging, 10(2), 159–165.
Jacobs-Condit, L., & Ortenzo, M. L. (1985). Physical changes in aging. In Gerontology and
Communication Disorders. American Speech-Language-Hearing Association Rockville, MD.
Ketcham, C. J., & Stelmach, G. E. (2004). Movement control in the older adult. Technology for
Adaptive Aging, 64–92.
Mefferd, A. S., & Corder, E. E. (2014). Assessing Articulatory Speed Performance as a Potential
Factor
of Slowed Speech in Older Adults. Journal of Speech, Language, and Hearing
Research, 57(2), 347.
Ramig, L. A. (1983). Effects of physiological aging on speaking and reading rates. Journal of
Communication Disorders, 16(3), 217–226.
Sadagopan, N., & Smith, A. (2013). Age Differences in Speech Motor Performance on a Novel
Speech Task. Journal of Speech, Language, and Hearing Research, 56(5), 1552–1566.
Seidler, R. D., Alberts, J. L., & Stelmach, G. E. (2002). Changes in multi-joint performance with
age. Motor Control, 6(1), 19–31.
23
Cornelia Heyde
CV transitions in the fluent speech of people who stutter – a kinematic
approach using ultrasound
Queen Margaret University, Edinburgh
The symptoms of developmental stuttering tend to appear between the ages of 2
and 3
years. During this period, the storage of phonological representations appears
to shift from
primarily holistic representations of lexical items to a segmental storage
system which
permits incremental retrieval and phonological encoding during
speech production. This
shift accompanies, and likely facilitates, the vocabulary
explosion seen between the ages
of 2 and 3 in typically developing children.
It has been suggested that difficulties in transitioning to segmental encoding is responsible
for the symptoms of stuttering [1]. Although the details of psycholinguistic theories of
stuttering differ, most hypothesise a link between breakdown in the process of phonological
encoding and the symptoms of stuttering
[2, 3, 4, 5]. This has been suggested to relate to
deficits in sequence skill learning
and the development of automaticity over the course of
practice: A number of researchers have suggested that the encoding difficulties underlying
stuttering are
associated with general motor encoding and sequencing difficulties, with
evidence
indicating that, as a group, people who stutter (PWS) perform more poorly on
motor
sequencing tasks than controls (PNS; see [6] for a review). An enduring question,
however, has been why the symptoms of stuttering are intermittent if the underlying
cause
is a pervasive difficulty in mapping from semantic representations to motor
output.
For the current study we employ dynamic ultrasound tongue imaging [7] to explore
whether
the perceptually fluent speech of PWS exhibits symptoms of difficulty in phonological-
phonetic processing. This would offer support to the theory that, far
from being
momentary lapses in an otherwise functioning system, instances of stuttering are rather the
points at which atypical retrieval and encoding processes
become apparent to the listener.
The object of the study is to test Wingate
͛s Fault Line Hypothesis [5]. Wingate claims
that
disfluencies in the speech of PWS do not result from difficulty initiating speech,
but instead
from sequencing difficulty. Disfluencies, according to Wingate, indicate
difficulty integrating
a syllable onset and the subsequent syllable rhyme.
We obtained measures for movement duration, distance and peak velocity for the
fluent
recordings of 9 PWS and 9 control speakers producing overall 321 repetitions
of /əkV/
with the vowel being /ɑ/ (n=112), /i/ (n=93) or /ə/ (n=116). As was
hypothesised, results
show comparable behaviours for the two speaker groups when
initiating the syllable onset
with none of the measures of lingual movement
exhibiting significant differences.
Regarding the transition from syllable onset onto the syllable rhyme, however, groups
reveal significantly different movement patterns for stroke duration (β = 0.030, SE = 0.009, t
= 3.143), and peak velocity (β = - 24.576, SE = 9.923, t = -2.477).
Our results show that even the perceptually fluent speech of PWS differs from the
speech
of control speakers thereby supporting the claim that symptoms of stuttering
may not be
intermittent, but more pervasive than what is perceptually salient.
Differences between groups in the transition from syllable onset to syllable rhyme
support
Wingates Fault Line Hypothesis [5] suggesting that PWS have difficulty in phonological-
phonetic processing.
24
References
[1] Byrd, C. T., Conture, E. G., & Ohde, R. N. (2007). Phonological priming in young
children
who stutter: Holistic versus incremental processing. American Journal of Speech-
Language Pathology, 16(1), 43-53.
[2] Howell, P., & Au-Yeung, J. (2002). The EXPLAN theory of fluency control applied to the
diagnosis of stuttering. Amsterdam Studies in the Theory and History of Linguistic Science
Series 4, 75-94.
[3] Kolk, H., & Postma, A. (1997). Stuttering as a covert repair phenomenon. Nature
and
treatment of stuttering: New directions, 2, 182-203.
[4] Perkins, W. H., Kent, R. D., & Curlee, R. F. (1991). A theory of neuropsycholinguistic function in
stuttering. Journal of Speech, Language, and Hearing Research, 34(4), 734-752.
[5] Wingate, M. E. (1988). The structure of stuttering: A psycholinguistic analysis. Springer
Verlag, New York, NY.
[6] Smits-Bandstra, S., & Luc, F. (2007). Sequence skill learning in persons who stutter: implications
for cortico-striato-thalamo-cortical dysfunction. Journal of fluency disorders, 32(4), 251-278.
[7] Articulate Instruments Ltd 2012. Articulate Assistant Advanced User
Guide: Version 2.14.
Edinburgh, UK: Articulate Instruments Ltd.
25
Joanne Cleland
Using visual biofeedback to teach children with Speech Sound
Disorders new articulations
University of Strathclyde, Scotland, UK
Children with persistent Speech Sound Disorders (SSDs) are often resistant to traditional
speech therapy approaches. Visual biofeedback (VBF) is often cited as the missing piece
of the puzzle required to help these children establish new articulations long after the age
when they are normally acquired. Visual biofeedback techniques in this context are
instrumental phonetic techniques that allow speakers to see their own articulators moving
in real-time and use this information to correct erroneous motor programmes. These
techniques are especially useful for errors involving lingual articulations and offer speakers
real-time biofeedback of their own tongue moving and a visual model of what their tongue
ought to be doing – in essence a target motor programme. Current literature (e.g. Cleland
et al., 2015; Preston et al., 2014) suggests that visualising the tongue moving in real-time
provides speakers with a “knowledge of performance” which enables them to learn new
articulations and stabilise motor plans. This in turn leads to remediation of previously
persistent speech sound disorders. However, beyond the idea that making explicit
knowledge of the movements of the articulators, it is poorly understood why viewing
tongue movements, which are normally hidden, might in fact be useful to disordered
speakers when it is clearly unnecessary for typical learners. Further, almost no studies
look at the changes in movement patterns that take place while children acquire their new
articulations, with the process itself being largely unexplored.
This talk will explore some potential theoretical explanations that might be at play in visual
biofeedback therapy. I will provide an overview of my research using high-speed
ultrasound to both measure tongue-movements in children with speech disorders and
provide visual biofeedback therapy. By recording the children’s attempts at new
articulations, we can see that when children first acquire new segments they are often
poorly co-articulated and/or abnormally articulated (for example, velar stops realised as
uvular stops, even in the context of a high-front vowel). I will discuss this finding in relation
to the idea that the underlying cause of persistent speech sound disorders is motor-based,
even when children initially appear to have disorders which are phonological in nature.
26
Peter Birkholz
Visual biofeedback of speech movements by electro-optical
stomatography.
Institute of Acoustics and Speech Communication, Technical University Dresden,
Germany
Currently, we develop a novel method to measure and visualize speech movements in
real-time, which we call electro-optical stomatography (EOS). The method essentially
combines the well-known technique of electropalatography (EPG) with the less-known
technique of optopalatography (OPG) and its enhancement. While EPG essentially
captures the spatial pattern of contact between the tongue and the hard palate, OPG
captures the distance of the tongue from multiple points along the hard palate, as well as
the position of the upper lip, using optical distance sensors. Both the contact sensors for
EPG and the distance sensors for OPG are mounted on the same thin artificial palate fitted
to the hard palate of the subject, which is worn inside the mouth. The data are captured at
a rate of 100 Hz to provide a detailed picture of tongue and lip movements that can be
used to drive an animation model of the speaker’s vocal tract. In addition, I will briefly
present a new means to measure the position of the velum in a minimally invasive way,
which is based on acoustic reflectometry of the nasal cavity. In combination, both EOS and
the velic sensing enable the measurement and visual feedback of the articulation to
facilitate second language learning and the real-time articulatory synthesis of speech
driven by only the (silent) speech movements of a person.
27
Elina Rubertus
Anticipatory V to V coarticulation in German children and adults – An
ultrasound study
University of Potsdam
Coarticulation is defined as connecting single speech sounds by varying degrees of
articulatory overlap. Not only adjacent sounds are coarticulated with each other but
coarticulatory influences can span several segments in both directions. To become
fluent speakers, young children must
acquire the typical spans and magnitudes of their
native language’s coarticulation patterns. There fore, they have to develop a fine control
of their speech production system on the one hand and learn to plan their articulation on
the other hand. Albeit studied quite frequently, coarticulation in
child speech remains
poorly understood because of contradictory results in previous studies. Limited numbers of
participants, varying methods, as well as different languages investigated are some
of the
factors hindering concluding interpretations.
The present study is part of a larger project aiming to track the developmental course of
coarticulation mechanisms in German children. It investigates multiple age groups and
combines traditional
acoustic measurements with direct measures of articulation via
ultrasound imaging. Here, we more
specifically focus on articulatory investigations of
anticipatory lingual vowel-to-vowel (V-to-V) coarticulation in German preschoolers. Some
studies on child speech reported a systematic change
of the first vowel depending on
the following vowel across a consonant in VCV sequences (Nittrouer, Studdert-
Kennedy, & Neely (1996), Boucher (2007), Repp (1986) for 9;5-year-old
child). Others
did not find such effect (Repp (1986) for 4;8-year-old child, Barbier et al. (2013)). For
adults, there is strong evidence for anticipatory V-to-V coarticulation. This effect has
been shown to be at least slightly modulated by the intervocalic consonant’s resistance,
the articulatory constraints imposed on the tongue during consonant articulation:
Intervocalic non-resistant labials
allowed for more crossing V-to-V coarticulation than
highly resistant alveolars (Öhman (1966), Recasens (1984), Recasens (1987), Fowler &
Brancazio (2000)). However, as Recasens (1987) emphasizes, the impact of the
consonant’s resistance is a lot smaller in anticipatory V-to-V coarticulation than it is in
carry-over V-to-V coarticulation.
Our study is the first to investigate anticipatory V-to-V coarticulation in German children.
Pre- schoolers aged 3, 4, and 5 years as well as children at the end of their first school
year (6-7 years of age) were tested and compared with each other to trace the temporal
development of coarticulation patterns. German monolingual adults were recorded as a
reference group. In a repetition task,
participants produced pseudowords of the form
C1VC2ǝ (C = /b/, /d/, /g/, V = /i/, /y/, /u/, /a/, /e/, /o/, C1 C2) preceded by the German
female article /aɪnə/. Tongue movement was recorded via
ultrasound imaging to track
the spatial and temporal organization of lingual gestures. To measure
the magnitude of
V-to-V anticipatory coarticulation we investigated whether the horizontal position of the
highest point of the tongue during the vowel predicts its position during the preceding
schwa of the article. For the statistical analysis, we applied linear regressions to the
articulatory signal and fit linear mixed effects models.
First results provide strong evidence for anticipatory V-to-V coarticulation to be present in
3- and 5-year-olds as well as in adults. Interestingly, the magnitude of coarticulation
decreases significantly with age, 3-year-olds showing most coarticulation, 5-year-olds
intermediate and adults least.
In none of the age cohorts analyzed so far the intermediate
28
consonant’s resistance had a significant impact on the crossing V-to-V coarticulation.
Although this is not in line with several previous
studies in adults, it might correspond to
Recasens’ (1987) claim about the difference between anticipatory and carry-over
coarticulation. Data from our two last age cohorts (4- and 6/7-year-olds) are currently
under analysis. The 4-year-olds will enrich the data set by bridging the gap between 3-
and 5-year-olds and the first graders will provide insights into possible influences of literacy.
To
tackle possible differences in coarticulation span and track the temporal unfolding,
we will also
analyze the tongue configuration at the acoustic end of the schwa in addition
to the currently used schwa midpoint.
References
Barbier, G., Perrier, P., Ménard, L., Payan, Y., Tiede, M. K., & Perkell, J. S. (2013). Speech
planning as an index of speech motor control maturity. 14th Annual Conference of the
International Speech Communication Association.
Boucher, K. M. (2007). Patterns of anticipatory coarticulation in adults and typically developing
children. M.S. thesis, Brigham Young University, Provo.
Fowler, C. A. & Brancazio, L. (2000). Coarticulation resistance of American English consonants
and its effects on transconsonantal vowel-to-vowel coarticulation. Language and Speech,
43(1), 1-41.
Nittrouer, S., Studdert-Kennedy, M., & Neely, S. T. (1996). How children learn to organize their
speech gestures: Further evidence from fricative-vowel syllables. Journal of Speech and
Hearing Research, 39, 379-389.
Öhman, S. E. G. (1966). Coarticulation in VCV utterances: Spectrographic measurements. Journal
of the Acoustical Society of America, 39, 151-168.
Repp, B. (1986). Some observations on the development of anticipatory coarticulation. Journal of
the Acoustical Society of America, 79(5), 1616-1619.
Recasens, D. (1984). Vowel-to-vowel coarticulation in Catalan VCV sequences. Journal of the
Acoustical Society of America, 76, 1624-1635.
Recasens, D. (1987). An acoustic analysis of V-to-C and V-to-V coarticulatory effects in Catalan
and Spanish VCV sequences. Journal of Phonetics, 15, 299-312.
29
Tom Starr Marshall
Transcription agreement of children with speech disorders using
standard speed and slow motion video
Canterbury Christ Church University
Starr-Marshall, Knight, Martin & Pring (submitted) found that transcriptions of severely
disordered speech were less reliable than transcriptions of mild-moderately disordered
speech and, therefore, suggested that alternative assessment tools may be needed
to assess severe speech disorders. Informal feedback from the participants in this study
suggests that one of the problems with transcribing severely disordered speech is the
amount of information one needs to process in a short space of time.
Speech contains phenomena which are only a few milliseconds in duration and may
be difficult to perceive, yet still play an important role in speech sound production; for
example, the duration of an alveolar tap may be as little as 16 milliseconds
(Ladefoged, 2006). Although such rapid phenomena are not a problem when
perceiving speech under normal conditions, it can be difficult to reliably transcribe
speech phenomena occurring at speed (Hollett, 1982). This can be even more difficult
when transcribing disordered speech as the sounds may not always fall neatly into our
own perceptual categories (Ball, 2008). Thus transcribing unfamiliar speech sounds
occurring at speed can be problematic.
Recent research into the mental processes involved in transcription highlights the
importance of short term memory (Maguire & Knight, 2011), and the facility of
holding spoken material in the phonological loop whilst analysing it (Knight, 2010).
This suggests that retaining information about a particular speech segment in short term
memory whilst the speech continues may be a possible source of difficulty for some
transcribers.
This study explores the utility of slow motion video for transcribing severely
disordered speech. It uses a video editing program to play the audio-visual signal frame-
by-frame. The assumption is that, by segmenting the video and audio stream in this way,
it will benefit the transcriber in two ways. Firstly it will be easier to attend to fleeting visual
information, which might be imperceptible at full speed; so that the listener can see the
exact mouth shape and sometimes the tongue position associated with a particular
sound (the viewer can also view the video at full speed to double check coarticulatory
events). Secondly it will reduce the short term memory load required to transcribe
disordered speech effectively. This study showed video footage of two children with
severely disordered speech (less than 50% consonants correct in the Diagnostic
Evaluation of Articulation and Phonology (Dodd et al., 2002) phonology section) to 10
Speech and Language Therapists, with at least 10 years’ experience, in the slow motion
condition and at normal speed. Analysis compared inter-transcriber agreement between
the 2 conditions. This study aims found that the agreement of phonetic transcriptions of
severely disordered speech increased by using slow motion video
A secondary question will be explored in a further study as to whether students with
poorer short term memory increase their transcription agreement significantly more than
other students when using slow motion video? If slow motion video improves transcription
agreement, or aids transcribers that have poorer short term memory, there are important
implications for employing this tool in clinical settings as increased reliability in
assessment will result in more appropriately targeted intervention.
30
Ian S. Howard
Learning motor actions for speech production
Centre for Robotics and Neural Systems, Plymouth University, UK
In order to communicate using speech, we first need to collect our thoughts and formulate
a message. This is then converted into a speech signal by means of an appropriate
movement sequence of the articulators in our vocal apparatus. At an abstract level, speech
production thus requires the generation of a sequence of acoustic contrasts that encode
the underlying message, whereas at a lower level it involves motor skills, which are
needed to ensure the articulators are moved around with sufficient speed and precision to
realize the desired acoustic contrasts. In a new born infant, both of these abilities will be at
best limited, if not absent. Here we consider how the necessary motor action generation
needed for effective speech production can develop through learning. To do so, we first
examine the relationship of an infant with its environment in terms of signal flow
interactions. We then consider the role of reinforcement learning and discuss how
interaction with a learned caregiver plays an essential part in the process of speech
development. We also describe how modern control theory can shed light on the lower
level motor control issues the human motor system needs to address in order to make
speech articulations. Finally we report the results from some experiments using
articulatory models of speech production and discuss how the field of cognitive
developmental robotics can help test theories of speech acquisition and production
31
Wednesday, 11th of January 2017
32
Aude Noiray
Acquisition of spoken language proficiency in relation to phonological
and speech motor control developments
University of Potsdam, Linguistics Department, Germany
A source of both excitement and challenge for developmental psycholinguists regards the
complex intertwining nature of language acquisition. Indeed, in the first years of life,
children develop spoken language fluency in parallel to developing perceptual, motor,
lexical and phonological knowledge. While most of those competences have been well
studied in the last decades, work addressing speech production has lagged due to
practical difficulties in measuring articulation in the young age. However, recent
developments of methodologies appropriate to child-study have made direct observation of
speech articulation feasible and provided new opportunities for investigating the
development of articulatory skills together with other language-related skills. In this talk, I
will give an overview of the research addressing the acquisition of spoken language
proficiency in relation to speech motor control and phonological development. I will present
some of the research I conducted in the past few years that specifically examines the
maturation of coarticulation from the preschool years when children have limited
phonological knowledge and control over their speech production system to the first school
year when both motor and phonological knowledge are henceforth used for literacy
acquisition. I will discuss results in light of current debates regarding the development of
linguistic organization of spoken language.
33
Mélaine Cherdieu, Pascal Perrier, J. Troccaz, O. Palombi & Amélie
Rochet-Capellan
Can arm-hand gestures help to learn arm-hand anatomical structures?
Univ. Grenoble Alpes, CNRS, GIPSA-lab, Grenoble
Anatomy learning is broadly investigated in educational researches as challenging for
many students in medical school, physiotherapy, sport etc. The learning of structural and
functional anatomy requires spatial abilities and cognitive resources. Students have to
construct mental 3 dimensional (3D) representations of structures and their joints and
mental simulation of their deformation in specific functions. They should also be able to
remind the names of the structures (which can be very complex) and described their
functional role. In the light of the literature about anatomy learning and education, it is clear
that anatomy is a broad field that can be linked to different types of knowledge. Structural
and functional anatomy is particularly complicated for the students and the classical
methods are probably not sufficient to overcome the difficulties (Sugand, Abrahams, &
Khurana, 2010; Zumwalt et al., 2010).
Recently, and in the framework of embodied cognition, direct implication of the body has
been tested, such as making facial mimic to learn functional anatomy of the face (Dickson
et Stephens 2015). Embodied cognition theories describe cognition as an activity linked to
the body and to its interaction with the environment (Clark & Chalmers, 1998; Varela,
Thompson, & Rosch, 1993). The cognitive processes depend on previous experiences
and are grounded in the sensorimotor processes involved in these experiences (Borghi &
Pecher, 2011; Wilson, 2002). Previous work provides evidences of embodiment for
different kinds of learning. For example, manual gestures can help problem solving and
mathematical knowledge acquisition in children (Alibali & Nathan, 2012); second language
learning (Church, Ayman-Nolley, & Mahootian, 2004), (for review see Goldin-Meadow &
Wagner, 2005; Hostetter & Alibali, 2008) or processing and memorization of sentences
(Chao, Huang, Fang, & Chen, 2013; Kelly, Ozyurek, & Maris, 2010). Some studies have
investigated the effect of manual gesture in learning and memorization of anatomical
concepts (Dickson & Stephens, 2015; Oh, Won, Kim, & Jang, 2011). In line with this
previous work, our aim is to investigate the effect of manual gesture when learning
structural and functional anatomy of the arm. In particular, we want to evaluate if gesturing
during learning could facilitate the memorization of the names and functions of anatomical
structures involved themselves in the gesture.
In our experiment, two groups of naïve participants watch a video explaining the structures
and muscles of the upper limbs involved in movements of pronation and supination (cf.
Figure 1). One of the groups does gestures linked to the lesson: doing a pronation
movement while learning muscles involved in the pronation. The other group watches the
lesson without doing any movement. After watching the video, the two groups perform an
assessment test evaluating their knowledge on the different structures and their functions.
The role of gestures in learning is evaluated by comparing the performances between the
two groups. Evaluations were run right after learning and again, few days after. Our results
show no significant short-term effects of making gestures while gestures significantly
improve the long-term recall of new vocabulary (cf. Figure 2). Hence, gesture seems to
consolidate learning of a new vocabulary and/or facilitate its recall.
Figure 1.
M
M
ethods :
A
F
A
to D : mai
n
F
igure 2. M
a
n
schema u
s
a
in results
:
s
ed during
:
gestures i
m
learning an
m
prove lon
g
d testing.
E
g
-term rec
a
E
. Example
a
ll
3
4
of gesture
s
4
s
35
Elisabet Eir Cortes
Motorics of speech movements in conditions of varying vocal effort
Stockholm University
Speech production requires a fine-tuned control of the articulatory system. This system is
particularly difficult to study since large parts of it are not visible to an observer’s eye and
therefore require advanced non-invasive techniques to model it. Studying speech as an
articulatory process is essential to gain a deeper understanding of articulators’
organization and control. The outcomes are also relevant for understanding the
relationship between kinematics and acoustics of speech, as well as its phylogenetic and
ontogenetic development.
My PhD project aims to investigate the motor control of speech movements in speech
uttered with varying vocal effort. Previous research has shown that the articulatory
behaviour of speaking loudly includes lowering the jaw (Schulman, 1989; Geumann,
2001). This effort-induced jaw lowering will provide a means to access the constancy and
reorganization of speech movements in the context of varying communicative demands.
The goal is to collect experimental data as a basis for two studies: One focusing on
physiological and acoustic aspects of speech, the other on sound structure (the admissible
arrangement of sounds within a certain language, here Swedish).
The speech study addresses several issues, of which one is the phonetic specification of
(Swedish) speech sounds, i.e. the systematic way in which speakers modify their
articulation and acoustic output as a function of vocal effort. Another important issue is the
motor control of compensation, assessed through adaptations of other articulators to
effort-induced jaw lowering.
The sound structure study aims at investigating the relationship between articulation and
sound structure, with special focus on the role of the jaw cycle (MacNeilage, 1998).
Existing views on the jaw’s role range from it being a largely irrelevant secondary
articulator to a key factor in shaping both speech and phonological structure (cf. Saltzman
& Munhall, 1989; Browman & Goldstein, 1990; Wood, 1979; and Lindblom, 1983). One of
the goals of this project is to present experimental data that clarifies the status of the jaw.
Another aspect, which this second study will access, is how sub-syllabic content
(segmental structure) in turn modifies jaw movement.
My presentation aims to give an overview of my PhD project and the ongoing data
collection and analysis.
References:
Browman, C. & Goldstein, L.1990. Gestural specification using dynamically-defined articulatory
structures. J. Phonetics 18, 299-320.
Geumann, A. 2001a. Invariance and variability in articulation and acoustics of natural perturbed
speech. Ph. Dissertation, University Munich.
Lindblom, B. 1983. Economy of speech gestures. In MacNeilage, P. (ed). The production of
speech. New York: Springer-Verlag.
MacNeilage, P.F. 1998. The frame/content theory of evolution of speech production. BBS 21(4),
499-511.
Saltzman, E. & Munhall, K. 1989. A dynamical approach to gestural patterning in speech
production. Ecological psychology 1(4), 333-382.
Schulman, R. 1989. Articulatory dynamics of loud and normal speech. JASA 85(1), 295-312.
Wood, S. 1979. A radiographic analysis of constriction location for vowels. J. Phonetics 7, 25-43.
36
Daniel Duran
Towards an exemplar-theoretic model of phonetic convergence
University of Stuttgart
As experimental methods and speech technology advanced during the past decades, a
number of challenging observations have been made for theories in phonetics and pho-
nology. To mention some examples, consider the issues of gradual acquisition of phonetic
detail, the phenomenon described as the native language magnet which alters speech
perception depending on the listener’s linguistic experience, or the ubiquitous variability in
phonetic realizations in contrast with the apparent equivalence of phonemes across dif-
ferent languages. Usage-based approaches have been developed to address these is-
sues, and this line of research gave rise to exemplar-theoretic models of speech produc-
tion and perception. Exemplar theory, as it is applied in phonetics and phonology, has its
origins in cognitive psychology. The central question was how individual stimuli are cate-
gorized. More specifically, the problem that psychologists faced was to develop a formal
model that could explain how categories can be learned based on individual concrete
instances and how they are represented cognitively. The computational exemplar-
theoretic models of speech perception and production have their roots in early connec-
tionist and formal simulation models of learning and classification [cf. 1, 2, 3, 4, 5].
Here, I present on-going work which investigates the dynamics of speaker adaptation and
phonetic convergence following an exemplar-theoretic approach. Phonetic convergence
(or accommodation) is the phenomenon when two speakers become more alike within
the course of a dialog [cf. 6]. Current research into speaker accommodation and vari-
ation increasingly focuses on the socially-relevant details of an encounter as well as on
speakers’ individual differences of both psychological (personality-related) and cogni-
tive (processing skill-related) nature [cf. 7, 8, 9]. The proposed model incorporates these
factors.
A simple exemplar-theoretic model of the convergence phenomenon can be outlined as
follows: The basic assumption that needs to be made is the existence of a speech per-
ception– production feedback loop. Based on this premise it can be assumed that previ-
ously perceived speech items serve as targets or templates for speech production. Re-
cency will lead to a higher activation of exemplars which correspond to the interlocutor’s
productions relative to older exemplars stored in memory. Thus, productions of the inter-
locutor will influence one’s own production targets. Phonetic convergence, as a result, is
expected to be a trivial outcome of a dialog situation within a simple exemplar-theoretic
framework. However, studies on phonetic convergence show that the phenomenon is
elusive, with large individual variability between subjects. In some cases speakers may
even become more dissimilar in their productions within the course of a dialog. A com-
plete exemplar-theoretic model thus needs to take into account individual cognitive and
psychological features which may enhance or attenuate the convergence effect. Lewan-
dowski discusses such a hybrid model of phonetic convergence (i.e. partially automatic
but prone to various within- and inter- speaker influences [10]). A recently completed data
collection extends the GECO database with new dialogs from
male and female subjects
along with their individual personality and psychological data with a particular focus on
attention [11, 12, 13]. Our computational simulation model is based on this empirical data
which combines dialog speech recordings with personality and cognitive data for all par-
ticipants of the study.
37
References
[1] Douglas L. Hintzman. “Schema abstraction” in a multiple-trace memory model.
Psychological Review, 93(4):411–428, 1986.
[2] Robert M. Nosofsky. Exemplar-based accounts of relations between classification,
recognition, and typicality. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 14(4):700–708, 1988.
[3] Keith Johnson. Speech perception without speaker normalization: An exemplar model.
In Keith Johnson and John Mullennix, editors, Talker Variability in Speech
Processing, pages 145–165. Academic Press, 1997.
[4] Janet B. Pierrehumbert. Exemplar dynamics: Word frequency, lenition, and contrast. In
Joan L. Bybee and Paul J. Hopper, editors, Frequency and the Emergence of
Linguistic Structure, pages 137–157. John Benjamins Publishing Company, 2001.
[5] Daniel Duran. Computer simulation experiments in phonetics and phonology:
simulation technology in linguistic research on human speech. Doctoral dissertation,
Universität Stuttgart, 2013.
[6] Jennifer S. Pardo. On phonetic convergence during conversational interaction. The
Journal of the Acoustical Society of America, 119(4):2382 – 2393, 2006.
[7] Molly Babel and Grant McGuire. The effects of talker variability on phonetic
accommodation. In Proceedings of the 18th International Congress of Phonetic
Sciences (ICPhS), pages 1–5, Glasgow, UK, 2015. Paper number 661.
[8] Natalie Lewandowski. Phonetic convergence and individual differences in non-native
dialogs (a). In Abstract presented at New Sounds 2013 in Montreal, Canada,
Montreal, Canada, 2013.
[9] Jonathan Vais, Michael Walsh, and Natalie Lewandowski. Investigating frequency of
occurrence effects in l2 speakers: Talent matters. In Proceedings of the 18th
International Congress of Phonetic Sciences, pages 1–5, Glasgow, UK, 2015. Paper
number 723.
[10] Natalie Lewandowski. Talent in nonnative phonetic convergence. PhD thesis, Institut
für Maschinelle Sprachverarbeitung, Universität Stuttgart, 2012. Dissertation.
[11] Natalie Lewandowski, Antje Schweitzer, Daniel Duran, and Grzegorz Dogil. An
examplar-based hybrid model of phonetic adaptation. In GURT 2014 – Usage-based
Approaches to Language, Language Learning, and Multilingualism, Georgetown
University, Washington D.C., March 2014.
[12] Antje Schweitzer, Natalie Lewandowski, and Daniel Duran. Attention, please!
Expanding the GECO database. In The Scottish Consortium for ICPhS 2015, editor,
Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS),
Glasgow, UK, 2015. Paper number 620.
[13] Natalie Lewandowski, Carolin Krämer, Daniel Duran, and Antje Schweitzer. Impact of
personality and social factors on phonetic convergence. In Architectures and
Mechanisms for Language Processing (AMLAP), Bizkaia Aretoa, Bilbao, Spain,
September 2016.
The anat
o
motor co
n
speech si
the signa
amplitud
e
velocity
o
dynamics
articulatio
Previous
specific d
well as s
y
both dur
a
measure
s
intensity
a
consecuti
v
behavior,
[6] findin
g
measurin
g
negative
d
and exa
m
The posit
i
and the n
e
where I
Pe
a
sented a
s
XY repre
s
both a se
For each
tion, and
,
m
used the
T
ured the
i
gression
m
measure
s
explainin
g
each me
a
plained
m
that the cl
Referen
c
Sp
e
o
mical idio
n
trols of th
gnal
[1].
O
l intensity
e
envelope
o
f intensit
y
). Such int
o
n.
works ha
v
urational
v
y
llabic int
e
a
tion- and
i
s
of durat
i
a
cross an
ve trough
a
especiall
y
g
that mo
u
g
the rat
e
d
ynamics)
m
ine which
i
ve dynam
i
e
gative dy
a
k
and I
Troug
h
s
T
Peak
and
s
ent the p
o
e
ries of po
s
sentence,
pairwise
v
m
ean
_
T
EVOID
c
i
ntensity d
y
m
odel wit
h
s
as the p
g
speaker
e
a
sures ou
t
m
ore
betw
e
osing ges
c
es
e
aker-sp
e
syncrasie
s
e articulat
o
O
u
r
presen
t
dynamics
valley poi
y
decreas
ensity dyn
v
e shown
v
ariability
o
e
nsity vari
a
i
ntensity-b
i
on and i
n
utteranc
e
a
nd peak
p
y
the jaw
m
u
th apert
u
e
of chan
g
, we can
i
of these t
w
i
cs (
namics (
h
refer to t
h
T
Trough
. Fi
g
o
sitive and
s
itive dyn
a
we calcul
a
v
ariability
, stde
v
c
orpus [2,
3
y
namics
p
h
speaker
a
redicto
r
v
a
e
ffect can
t
of all chi
^
e
en-speak
e
ture of
j
a
w
Lei He
e
cific va
Univ
e
s
of the vo
c
o
rs are fu
n
t
study inv
e
defined
h
n
t to a nei
g
e
from a
amics are
that spe
a
o
f differen
t
a
bility [4,
5
a
sed metr
n
tensity,
w
e
(i.e., ho
w
p
oints) as
m
ovement.
u
re and e
n
g
e of inte
n
i
ndirectly
e
w
o gesture
) is defin
e
) is
d
h
e intensit
y
g
. 2 shows
negative
a
mics {
a
ted the d
index [7])
v_
,
3
] (16 Zuri
c
e
r
senten
c
a
s the no
m
a
riable. C
o
be calcul
a
^
2’s comb
er
differen
c
w
moveme
n
& Volke
r
riability
e
rsity of
Z
c
al tracts
a
n
damental
estigates
h
h
ere as th
g
hboring
p
peak to
largely re
g
a
ke
r
idios
y
t
levels of
5
]. Such
v
r
ics of spe
e
w
e now tr
y
w
fast inte
a better a
p
The ratio
n
n
velope h
e
n
sity incre
a
e
stimate t
h
e
s encode
m
e
d as
d
efined as
y
values a
t
a geome
t
dynamics
} and
a
istribution
a
of both
{
stdev_
c
h Germa
n
c
e per sp
e
m
inal resp
o
o
ntribution
a
ted as the
b
ined. Res
u
c
es than p
o
n
t encode
m
r
Dellwo
in inten
s
Z
urich
a
nd speec
h
to speak
e
h
ow spea
k
e
velocity
p
eak point
an adjac
e
g
ulated by
y
ncratic a
r
phonetic
v
ariabilitie
s
e
ch rhyth
m
y
to mea
s
nsity incr
e
p
proximati
o
n
ale
is bas
e
e
ight cov
a
a
ses and
h
e velocit
y
m
ore spe
a
,
abs
t
envelope
t
rical illust
r
separatel
y
a
series o
f
a
l charact
e
{
}
a
, PVI
_
n
speaker
s
e
aker. We
o
nse varia
of each i
n
proportio
n
u
lts show
e
o
sitive dy
n
m
ore spe
a
s
ity dyn
a
h
organs
a
er
individu
a
k
e
r
idiosyn
c
of intensit
y
(= positiv
e
e
nt valley
mandibul
a
t
iculation
r
intervals i
n
s
have be
e
m
. In addit
i
s
ure dyna
e
ases or
d
o
n to idios
y
e
d on Cha
a
ry (see
F
decrease
s
y
of jaw o
p
a
ke
r
-specif
i
peak and
r
ation: the
y
. An utter
a
f
negative
e
ristics (m
e
a
nd {
_
a
s
* 256 se
n
fitted a m
u
ble and th
e
n
tensity d
y
n
of the lik
e
e
d that ne
g
n
amics (se
e
ke
r
idiosy
n
a
mic
a
s well as
i
a
l differen
c
c
rasy is r
e
y
increas
e
e
dynamic
s
point (=
ar
movem
e
results in
n
speech
e
n measu
r
i
on to the
s
mic fluctu
d
ecreases
y
ncratic a
r
a
ndraseka
r
F
ig. 1 ove
r
s
(i.e. po
s
p
ening an
d
ic informa
t
trough p
o
secant lin
e
a
nce is co
m
dynamics
e
an, stan
d
}, nam
e
a
nd PVI_
n
tences),
a
u
ltinomial
e intensit
y
y
namics
m
e
lihood ra
t
gative dy
n
e
Fig. 3),
s
n
cratic inf
o
3
8
i
ndividual
c
es in the
e
flected in
e
from an
s
) and the
negative
e
nt during
speaker-
[2, 3], as
r
ed using
s
e “static”
ations of
between
r
ticulatory
r
an et al’s
r
leaf). By
s
itive and
d
closing,
t
ion.
o
ints rep
r
e
-
e
s YZ an
d
m
posed o
f
{ }
.
ard devia
-
e
ly mean
_
.W
e
a
nd meas
-
l
ogistic
re
-
y
dynamic
s
m
easure t
o
t
io chi^2 o
f
n
amics ex
-
s
uggestin
g
o
rmation.
8
-
d
f
.
-
_
e
-
-
s
o
f
-
g
[1] V. Dell
w
to
s
p
edit
e
[2] A. Le
e
feat
u
59–
6
[3] V. Del
pros
o
[4] L. H
e
Proc
[5] L. He
&
Jour
n
[6] C. Ch
a
stati
s
[7] E. Gr
a
Lab
o
pp.5
Fig. 1.
Chandra
s
Fig. 2. Illu
s
points in
t
measures
steepness
Fig. 3. Ne
g
dynamics.
w
o, M. Hu
c
p
eech prod
e
d by C.
M
ü
e
mann, M.
u
res:
impli
c
6
7 (2014).
lwo, A. Le
o
dic, and li
e
& V. D
e
c
eedings of
&
V. Dellw
o
n
al of Spe
e
a
ndrasekar
a
s
tics
of au
d
a
be & E.
L
o
ratory
Ph
o
14-546.
Covariatio
s
ekaran et
a
s
tration of
c
t
he envelo
in the inte
n
of YZ and
X
g
ative inte
n
c
kvale & M.
uction and
ü
lle
r
(Sprin
g
-J. Kolly
&
c
ations for
emann &
nguistic fa
c
e
llwo, Sp
e
INTERSP
E
o
, The rol
e
e
ch, Langu
a
a
n, A. Tru
b
d
iovisual s
p
L
. Low, D
u
o
nology 7
e
n of mo
u
a
l. [6])
c
alculating
t
pe “Env(t)
n
sity curve
X
Y in I(t).
n
sity dyna
m
Ashby, H
o
descriptio
n
g
er, Berlin
&
V. Dell
w
forensic v
o
M.-J. Kolly
c
tors. JAS
A
ake
r
idios
y
E
ECH 201
4
e
of syllabl
e
a
ge and th
e
b
anova, S.
p
eech, PLo
S
u
rational v
a
e
dited by
C
u
th apertu
r
t
he positiv
e
are pinp
o
“I(t)”. The
m
ics explain
o
w is indivi
d
n
for speak
e
and Heide
l
w
o, Speak
e
o
ice comp
a
y
, Rhythmi
c
A
137, 151
3
y
ncratic v
a
4
, Singapo
r
e
intensity
i
e
Law 23,
2
Stillittano,
S Comput
a
a
riability in
C
. Gussen
h
r
e and a
m
e
and nega
t
o
inted; the
n
positive a
n
more
bet
w
d
uality expr
er
classific
a
berg, 200
7
er
-individual
a
rison, For
e
c
variability
3
–1528 (20
1
a
riability o
f
r
e, pp. 233
-
i
n between
2
45–275 (
2
A. Caplie
r
a
tional Biol
o
speech
a
h
oven & N.
m
plitude
e
t
ive intensit
y
n
the inten
n
d negative
w
een-speak
e
e
ssed in v
o
a
tion, in Sp
), pp. 1–2
0
ity in sup
r
e
nsic Scie
n
between
s
1
5).
f
intensity
-
237.
-
speaker r
h
2
016).
r
& A.A. G
o
gy 5, e10
0
a
nd rhythm
Warner (
M
e
nvelope
m
y
dynamic
s
sity values
dynamics
er
variabilit
y
o
ice? An in
p
eake
r
Cla
s
0
.
r
asegment
a
n
ce Intern
a
s
peakers:
a
across s
y
h
ythmic va
r
hazanfar,
T
0
0436 (20
0
class hy
p
M
outon, B
e
m
agnitude.
s
: the peak
s
at these
are calcul
a
y
than posi
t
3
9
troduction
s
sification I,
a
l tempora
l
a
tional 238
,
a
rticulatory
,
y
llables, i
n
r
iability,
Int
.
T
he natura
l
0
9).
p
othesis, i
n
e
rlin, 2002)
,
(Source:
and trough
points are
a
ted as the
t
ive
9
l
,
,
n
.
l
n
,
40
Friederike Charlotte Hechler
The role of feedback in production and perception
University of Potsdam
The human articulators form a highly versatile system, which allows the production of a
large number of different speech sounds, but also prevents the repetition of “one and the
same” sound. Although this variability usually, poses no problem in communication, it is
challenging for the study of phonetics and it is still unclear how exactly interlocutors deal
with variability, e.g., how conscious they are. Since the acoustic signal is the main
information source that conversational partners share, its study promises to reveal
important insights into the mental processes at work.
1. Purpose and Background
Previous research showed that sensory feedback from the acoustic signal serves human
beings to rapidly control and align speech (Houde & Jordan, 1998). In perturbation studies,
such as Tremblay, Shiller, and Ostry (2003), speakers compensated for altered feedback
and adapted this behaviour to situations, where feedback was blocked or the
consonantal/vowel context changed. Given that these designs were usually unnatural and
required an artificial setting, they left aside the communicative properties of language.
However, from investigations which demonstrated that listeners are sensitive to
information about coarticulation, use it for speech processing (Fowler, 2005) and acquire
phonetic aspects of model talker voices in shadowing/auditory naming tasks (Babel, 2010),
a close connection between perception and production can be derived. Therefore,
interlocutors might not only compensate for their own feedback, but might also be sensitive
to the compensatory behaviour of their conversational partner. This information could then
again serve to deal with the variability of the signal.
I propose a first (preparatory) study to investigate the importance of auditory feedback for
production and perception in a communicative environment by relating the previously
described branches of research: An interactive task for both, speaker and listener, based
on a perturbation paradigm, could provide insights into how speech is planned,
represented and processed.
2. Research Questions:
The experimental design I shall discuss shortly is guided by the following questions: I. How
do interlocutors react, when acoustic perturbation changes the speakers’ auditory
feedback? II. How are the abilities to compensate for/adapt to feedback shifts (production)
and to detect such behaviour (perception) related?
3. Hypothesis and Concrete Predictions:
i. My main hypothesis is that interlocutors are sensitive to slight changes in their own and
others’ speech signal.
a. On the one hand, speakers can compensate for auditory feedback shifts due to
perturbation and adapt this behaviour to situations, where feedback is blocked and the
linguistic context changed.
b. On the other, listeners can register this behaviour and align their own productions
accordingly.
ii. Furthermore, the abilities to compensate for/adapt to feedback shifts (production) and to
detect such compensation (perception) are related.
a. In particular, more compensation/adaptation in the speaker correlates with a stronger
41
perception of someone else’s compensation in the listener.
4. Methodology and Key Comparison
Native speakers of Standard German without phonetic training, balanced for sex
participate.
Pre-Test: One week in advance each individual’s vowel space will be quantified, based on
which productions are later shifted and compensation/adaptation measured.
Tasks: To create a more natural communicative situation, participants are led to believe
they solve a task together with a partner (for simplicity of the same sex). Although this
partner is not present, participants are told that he accomplishes his part simultaneously in
another room. Due to the fake “live-character” fatigue, attention loss of listeners can be
reduced by letting them listen to just a sample of speakers’ productions, while the task
appears interactive. The second research question is addressed by letting each participant
take over both, speaker and listener role (memory effects are minimized by a one-week
interruption).
The Speaker’s Task: The speaker believes that the words he whispers (which reduces
effects of bone conduction) are used by his partner to solve a task.
The Listener’s Task: The listener reads a carrier phrase in which the final word is missing
and has to fill this gap by shadowing (whisper as soon as he identifies it) the production of
the speaker. The participant is told that the audio is transmitted live from another room.
Post-Task: An interview follows each session and after the last session, a questionnaire
has to be filled in (see below/Appendix).
The Speaker’s Material: Speakers see singly displayed monosyllabic C1VC2 words. The
first consonant is a voiced stop [d, g, b], the second is (due to terminal devoicing in
German) voiceless [t, k, p]. Vowels are long front unrounded vowels from the German
vowel inventory:
close: i (written ie)
close-mid: e: (written eh)
open-mid: ɛ: (written äh)
Although single words instead of carrier sentences make the task less natural, they reduce
coarticulation effects. For the same reason, the word-length is kept constant in the task
and the pre-test. To avoid lexical frequency effects, only non-words are used. There are
two filler lists consisting of similarly structured CVCs that contain any consonant apart from
stops and any German vowel. Thus, each member of a pair receives (as a speaker) a
different set than his partner.
The Listener’s Material: For listeners, the visually displayed material consists of carrier
phrases such as “Das Alien Sib besucht seinen Freund.” Via headphones they hear the
speakers’ productions from the compensation-and-adaptation phase (see below), which
shall be used to fill the gap in the sentence. Thus, the non-words can be interpreted as
proper names in a foreign language. Again, there are two filler lists.
Independent Variables: Evaluated will be the influence on reactions of
- the participant’s sex (SEX): female vs. male
42
- the role in the interactive task (ROLE): speaker vs. listener
- the feedback shift direction (SHIFT DIRECTION): positive vs. negative
- the production phase a listener hears (PHASE): normal-feedback phase vs.
compensation-and-adaptation phase
Manipulation of the Speakers’ Audio: While the speakers’ feedback for words that contain
the vowels [i: or ɛ:] is blocked, feedback for those with [e:] is shifted. The primary acoustic
dimensions of vowel quality in these changes are F1 and F2, because these frequencies
are sensitive to modifications in the overall tongue body activity (Recasens, Fontdevila &
Pallarès, 1995). F1 corresponds to the articulatory dimension of “openness/height” (higher
F1 = more closed) and F2 to “frontness“ (higher F2 = more fronted). Thus, since they are
relevant for the distinction between the three front unrounded German vowels, the speaker
hears his shifted production of [e:] either as [i:] (positive) or [ɛ:] (negative). For the two
independent frequency shift directions, a digital signal processing (DSP) board and
measurements from the pre-test, as in Houde and Jordan (1998), will be used. The
perturbation is structured into the following phases:
- normal-feedback phase
- ramp phase (slowly altered feedback)
- alternating: compensation (altered feedback) vs. adaptation phase (feedback blocked by
noise)
Measurements: Repeated measures will be taken during the experiment for both, speaker
and listener. To test the first hypothesis (i.e., the extent of compensation/adaptation),
acoustic measures of difference (centre frequencies of the first three vowel formants,
estimated from a spectral analysis using linear predictive coding) will be utilised; for
speakers relative to the pre-test baseline (AMS) and for listeners relative to the speakers’
production (AML). To study how the behaviour develops over time and to reveal potential
learning, practice or fatigue effects, these measures will be additionally connected to the
time course (AMST/AMLT). Furthermore, reaction times (RTS/RTL defined as time
between stimulus display and articulatory onset) show the velocity of compensa-
tion/adaptation.
A comparison of AMST and RTS vs. AMLT and RTL for each individual will evaluate the
second hypothesis (i.e., the relationship between production and perception).
Although in previous research (e.g., Houde & Jordan, 1998) most participants did not
realize feedback manipulations, speakers’ awareness of the shifts or of delays in the
feedback and listeners’ perception of the speakers’ productions will be checked in the
subsequent interviews. Since perturbation studies found, just as those on accommodation
(e.g., Namy et al., 2002), variation between participants, individual differences (see also
confounding factors in the Appendix) will be addressed by estimating the vowel space in
the pre-test and perceptual sensitivity in the questionnaire.
Apparatus: To conduct my experiment I will need:
- sound proof boot/room
- microphone
- high volume headphones
- computer with a screen and a programme to present items as well as record answers
- DSP board
5. Future Research
Although neither the ability to compensate nor the ability to perceive compensatory
43
behaviour demonstrates that such information is used to deal with variability in the speech
signal, their presence would suggest its possibility. In case the described above design
finds effects, additional, here for the sake of simplicity still uncontrolled, but potentially
important factors should be manipulated (see Appendix). Furthermore, with simultaneously
present participants, the listener’s response could be transmitted to the speaker to create
a more natural design. Additional support for the second hypothesis might come from an
AXB similarity judgement for another couple’s productions that each participant fulfils.
Given that accommodation was found in social interaction as well as laboratory contexts
(Babel, 2010), this behaviour might reduce effects for listeners. Therefore, it could be
interesting to investigate how the listener reacts, if the correct answer is displayed after the
trial.
Also, these findings could offer suggestions for practical use: For L1 and L2 acquisition,
factors that improve the skills to analyse productions or to convey the correct
pronunciation could be identified as well as strategies to align productions to a native-like
pronunciation. Moreover, these results can be helpful in clinical set-ups/treatments
(overcoming articulation problems, stuttering, etc.).
References
Babel, M. (2010). Dialect divergence and convergence in New Zealand English. Language in
Society, 39, 437–456.
Fowler, C. A. (2005). Parsing coarticulated speech in perception: Effects of coarticulation
resistance. Journal of Phonetics, 33, 199–213.
Goldstein, L., Pouplier, M., Chen L., Saltzman E., & Byrd, D. (2007). Dynamic action units slip in
speech production errors. Cognition, 103, 386–412.
Houde J. F., & Jordan, M. I. (1998). Sensorimotor adaptation in speech production. Science, 2779,
1213–1216.
Namy, L. L., Nygaard, L. C., Sauerteig D. (2002). Gender differences in vocal accommodation: The
role of perception. Journal of Language and Social Psychology. 21(4), 422–432.
Pardo, J. S. (2006). On phonetic convergence during conversational interaction. Journal of the
Acoustical Society of America, 119(4), 2382–2393.
Recasens, D., Fontdevila, J., & Pallarès, M. D. (1995). Velarization degree and coarticulatory
resistance for /l/ in Catalan and German. Journal of Phonetics, 23, 37-52.
Tremblay, S., Shiller, D. M., & Ostry, D. J. (2003). Somatosensory basis of speech production.
Nature, 423(19), 866-869.
44
Alexandre Hennequin, Amélie Rochet-Capellan & Marion Dohen
Auditory-visual perception of VCVs produced by people with Down
Syndrome
Univ. Grenoble Alpes, CNRS, GIPSA-lab, Grenoble
Introduction
Down syndrome (DS) is the most frequent
genetic disorder in humans. Among other
things, it is the cause of an intellectual
deficiency and of speech production difficulties.
Speech difficulties in people
with DS originate from anatomical and physiological
specificities as well as motor impairments and appear in early childhood (see Kent &
Vorperian, 2013 for a review). To our knowledge no study has explored
auditory-visual
perception
of
speech
produced by adults with DS, whereas it is well known that speech
perception benefits from the addition of vision especially in disturbed conditions (for
example in noise (Grant & Seitz, 1998). Our study aims at
exploring if and how vision can
improve the
phonetic intelligibility of people with DS when perceived by naive listeners.
Materials and methods
AV stimuli
The stimuli belong to a larger corpus (Rochet-Capellan & Dohen, 2015). Four control
speakers (CS; 2 females, 2 males) and four speakers with DS (DS: 2f, 2m) were selected
from this corpus; all were native speakers of French. Speech was recorded in a sound
proof room with a head mounted microphone. The participants’ task was to repeat Vowel-
Consonant-Vowel (VCV) sequences played on a loudspeaker. A total of 16 stimuli were
selected for each speaker, with the vowel /a/ and 16 different consonants. The mean
intensity of all audio files was normalized to 70dB. A 74dB (-4 SNR) cocktail party noise
(Zeiliger, Serignat, Autessere, & Meunier, 1994) was added to all audio files. Adding noise
allows comparisons with CS speakers (avoiding a ceiling effect) and quantification of the
visual contribution depending on the consonant.
Perception experiment
Forty-eight
native speakers of French participated in the
perceptual study. In order to limit
the
duration of the session, participants were
divided into two groups. Group 1 performed
the experiment with two speakers with DS and 2 control speakers and Group 2 with the
4 others speakers. The experiment was divided into 3 blocs each corresponding to
one modality: Auditory only (A), Visual
only (V), Auditory-Visual (AV). The order of the
blocs was randomized across participants. Each bloc consisted of 64 trials (16 VCVs * 4).
The order of the stimuli was randomized within blocks and across participants.
Participants were told that they would hear or see, or hear and see a stimulus and then
asked to say what they had perceived.
Analyses
All the responses were manually transcribed and considered as correct if they
corresponded to the intended VCV. A logistic regression analysis was performed using
glmer (from package lme4 Bates, Mächler, Bolker, & Walker, 2015) on the
binary
variable: correct response vs. incorrect response. The model included Modality and
Speaker Group as fixed effects; and Participant and Stimulus as random effects.
Results
Figure 1
s
the DS
g
better th
a
than the
between
g
than the
d
The resul
good: it
h
will be di
consonan
we starte
d
Figure 1:
M
the 2 grou
Acknowl
e
This rese
a
FIRAH. T
h
DS (ARIS
Research
(FP7/2007
Referenc
e
Bates, D.,
mod
Grant, K.
W
audi
t
spo
k
135
t
Kent, R.
D
Jou
r
Rochet-C
a
you
n
Zeiliger, J
.
de
p
s
hows tha
t
g
roup (p<.
a
n A for
b
DS grou
g
roup and
d
ifference i
ts sugges
t
h
elps in p
scussed
o
n
t
which c
o
d
to do on
M
ean perc
ps of spea
k
e
dgements
a
rch is p
a
h
e authors
T). The r
e
Council
7
-2013 Gra
n
e
s
Mächler,
M
els
using l
W
., & Seit
z
t
ory attenti
o
k
en senten
c
t
h Meeting
o
D
., & Vorpe
r
r
nal of Spe
e
a
pellan,
A
.
,
n
g
adults
w
.
, Serignat,
arole de lo
c
t
A percep
t
001). Thi
s
b
oth grou
p
p in the
modality:
n the A m
o
t
that the
artly com
p
o
n the err
o
o
rrespond
t
the effect
o
entages o
f
k
ers. Error
a
rt of the
thank the
e
search lea
under the
n
t
A
greem
e
M
., Bolker,
B
me4. Jour
n
z
, P. F. (199
o
n: Reduci
c
es. Proce
e
o
f the Aco
u
r
ian, H. K.
(
e
ch, Lang
u
,
& Dohen,
w
ith Down
s
J.,
A
utess
e
c
uteurs so
u
t
ion of VC
V
s
is how
e
p
s. The C
S
AV mod
a
the differ
e
o
dality (p<
.
visual
info
p
ensating
f
o
rs made
d
t
o 80% of
o
f speake
r
f
correct
re
bars repre
s
Communi
q
A
ssociatio
n
ding to th
e
Europea
n
e
nt no.3391
B
. M., & W
n
al of Stati
s
8). The us
e
ng tempor
a
e
dings of t
h
u
stical Soc
i
(
2013). Sp
e
age & Hea
r
M. (2015).
s
yndrome.
I
e
re, D., &
M
u
mis à du
b
V
s is signi
f
e
ver not t
r
S
group i
s
a
lity (p<.0
0
e
nce
betw
.01).
o
rmation p
r
f
o
r
differe
n
d
uring the
the error
s
r
familiariz
a
e
sponses o
v
s
ent the 95
q
uons Ens
e
n
fo
r
Rese
a
e
se results
n Comm
u
52- “Spee
c
a
lker, S. C
.
s
tical Soft
w
e
of visual
a
l and
spe
c
h
e 16th
In
t
i
et
y
of
Am
e
e
ech
Impa
ring
Rese
a
Acoustic
c
ICPhS.
M
eunier, C
.
b
ruit. Acte
s
f
icantly m
o
r
ue for V
s
howeve
r
0
1). Ther
e
een grou
p
r
ovided by
n
ces bet
w
perceptio
n
s
made.
W
a
tion on p
e
v
e
r
all part
% confide
n
e
mble” re
s
a
rch and S
has recei
v
u
nity's Se
v
c
h Unit(e)s”
.
(2015). Fi
t
w
are,
67(1)
speech cu
c
tral uncer
t
t
ernational
e
rica, 108(
3
irment in
D
a
rch, 56(1)
c
haracteri
s
.
(1994).
B
s
Des
X
èm
o
re difficul
t
perceptio
n
r
significa
n
e
is a si
p
s in AV is
speakers
w
een grou
p
n
test, es
p
W
e will als
o
e
rception.
icipants fo
r
ce
interval
s
s
earch pro
j
ocial Integ
r
v
ed fundin
g
v
enth Fra
m
).
t
ting linear
, 1–48.
e
s (speec
h
t
ainty in au
d
C
ongress
o
3
Pt 1), 20
D
own Synd
r
,
178–210.
ation of vo
w
d_bruit, un
e
s JEP, 28
t when p
r
n
. AV pe
r
n
tly more
gnificant
significan
with DS i
s
p
s. Furthe
r
p
ecially er
r
o
introduc
e
r
the 3 mo
d
s
.
j
ect funde
r
ation of P
e
g
from the
m
ework P
r
mixed-eff
e
h
reading) f
o
d
itory dete
c
o
n Acousti
c
22.
r
ome: A R
e
w
el produ
c
n
e base de
8
7–290.
4
5
oduced b
y
r
ception i
s
intelligibl
e
interactio
n
tly smalle
r
s
relativel
y
r
analyse
s
r
ors on th
e
e
the wor
k
d
alities an
d
d
by the
e
ople with
European
r
ogramme
e
cts
or
directing
c
tion of
c
s and the
e
view.
c
tion by
données
5
y
s
e
n
r
y
s
e
k
d
46
Helena Levy, Lars Konieczny & Adriana Hanulíková
Long-term effects of accent exposure on perception and production
Albert-Ludwigs-University, Freiburg
Many children in Germany are growing up in a language surrounding that is highly
diverse in terms of accents. In addition to a rich dialect-scape with a variety of regional
accents, children are increasingly exposed to different languages as well as foreign
accented German. We still
know little about how school-aged children perceive and
produce speech as a consequence of frequent exposure to regional or foreign accents
(Christia et al. 2012). Children’s cognitive
and linguistic skills are still developing during
the school years and their ability to retrieve
meaning from accented speech is lower
than that of adults (Bent & Atagi, 2015). While there
are studies on the comprehension of
regional (Nathan et al., 1998) and foreign accents (Bent, 2014) for this age-group, less
attention has been paid to the influence of experience with accented speech on
accent comprehension. This project is concerned especially with the
effects of type of
accent (regional or foreign) and amount of accent experience on childrens
comprehension of unfamiliar accents. Also, we are interested in how children with
different
amounts of experience with regional and foreign accents differ in their
production of German words.
Perception study:
We know from previous studies that both adults (Bradlow & Bent, 2008) and children
(Best
et al. 2009; Schmale et al. 2010; Schmale & Seidl, 2009) have more difficulties
recognizing words in accented speech than in unaccented speech. The impact of these
perceptual
difficulties is lowered by experience with a particular accent (Sumner &
Samuel 2009;
Floccia et al. 2009). Even when listening to previously unknown
accents, experience with regional or foreign accents seems to be a facilitating factor
(Baese-Berk 2013, Bent &
Bradlow 2003). Frequent exposure to general accent
properties might provide listeners with a
benefit for the comprehension of accented
word forms. The assumption behind this is that
experience with more than one
pronunciation variant might facilitate the mapping of an accented word form onto the
correct mental representation (Sumner & Samuel 2009). Most of the previous studies
that tested the influence of experience on the perception of accented speech,
however, operationalized experience as a binary variable.
The aim of the present study was to use amount of experience as a continuous variable
and to test children’s comprehension of sentences in unfamiliar foreign and regional
accents. Experience with accents was measured via parental questionnaire and the
main criterion was the number of hours per week each child spends with Standard
German, languages other than German, regionally accented German and foreign
accented German. We asked 65 German primary school children (mean age 9 years,
10 months) to repeat sentences spoken by three
different speakers: one who spoke
standard German, one with a foreign accent (Korean accented German) and one with
a regional accent in German (Palatinate German). All of the
children had experience
with regional and foreign accents but the amount of accent exposure
to both kinds of
accents varied considerably. None of the children had any experience with Palatinate or
Korean accented German. Half of the children were monolingual, the other half were
bilingual. Results showed that experience with regional accents helped repeating
sentences correctly in the regional accent condition. Also, bilingual children performed
significantly worse than monolingual children across conditions. However, those
47
bilingual
children who had more accent experience still performed better than those
with less accent experience. The results suggest that type and amount of experience
co-determine processing ease of accented speech.
Production study:
The same subjects from the perception study were also tested on their production of
eight
different German vowels in spontaneously produced words. Firstly, we are
interested in how children’s pronunciation differs acoustically according to the type
and amount of accent
experience each child has. In a study by Darcy & Krüger (2012),
it was shown that bilingual
Turkish and German speaking children and monolingual
German children (aged 10-11) produced these vowels with very little difference. Formant
measurements, however, revealed that there was greater variability in the productions
of the bilingual children. Also, the
bilingual children produced some of the vowels
([a], [a:] and [e:]) differently from
monolingual children. With the present study, we
aimed to replicate this for children with more and less experience with accented
speech. Secondly, we examined how children who have a regional or foreign accent
themselves produce frequent and infrequent German words, hypothesizing that accent
features will be more apparent in productions of frequent words. We would like to
present the findings of this second experiment, which we are currently analyzing, at
the Winter-school and discuss them in relation with the results from our perception study.
References:
Baese-Berk, M. M., Bradlow, A. R, and Wright, B. A. (2013). Accent-independent
adaptation to foreign
accented speech. JASA 133 (3), EL174-EL1080.
Bent, T. & Bradlow, A. R. (2003). The interlanguage speech intelligibility benefit. JASA 114
(3), 1600-1610.
Bent, T. (2014). Children’s perception of foreign-accented words. Journal of Child Language
41 (6), 1 – 22.
Bent, T. & Atagi, E. (2015). Children's perception of nonnative-accented sentences in
noise and quiet. J A S A 138: 3985.
Best, C. T. et al. (2009). Development of phonological constancy: Toddlers’ perception of
native- and Jamaican-
accented words. Psychological Science 20, 539–542.
Bradlow, A. & Bent, T. (2008). Perceptual adaptation to non-native speech. Cognition 106 (2),
707-729.
Cristia, A., Seidl, A., Vaughn, C., Schmale, R, Bradlow, A., & Floccia, C. (2012) Linguistic
processing of
accented speech across the life span. Frontiers in Psychology 3: 479.
Darcy I., Krüger F. (2012). Vowel perception and production in Turkish children acquiring L2
German. Journal
of Phonetics 40, 568–581.
Floccia, C., Butler, J., Goslin, J., & Ellis, L. (2009). Regional and foreign accent processing
in English: Can
listeners adapt? Journal of Psycholinguistic Research 38(4), 379–412.
Nathan, L., Wells, B., & Donlan, C. (1998). Children’s comprehension of unfamiliar
regional accents: a
preliminary investigation. Journal of Child Language 25, 343–365.
Schmale, R., Christià, A., Seidl, A., & Johnson, E.K. (2010). Developmental changes in
infants' ability to cope
with dialect variation in word recognition. Infancy 15, 650-662.
Schmale, R., and Seidl, A. (2009). Accommodating Variability in Voice and Foreign Accent:
Flexibility of Early
Word Representations. Developmental Science 12, 583-601.
Sumner, M., and Samuel, A. (2009). The effect of experience on the perception and
representation of dialect
variants. Journal of Memory and Language 60, 487–501.
48
Monika Lindauer
Factors influencing early acquisition of German by bilinguals
University of Konstanz
Bilingual children show the same acquisition steps as monolinguals, though there may
be
delaying or accelerating cross-linguistic influences (e.g., Meisel & Müller 1992, Döpke
1998,
Müller 1990, 2000, Eichler et al. 2013, Kupisch 2006, Müller et al. 2002,
Haberzettl 2005,
Lleo 2006, among others). Furthermore, age of onset of acquisition
and language exposure
play a crucial role (Meisel 2009, Unsworth 2013). Recent
studies also suggest that the
acquisition process differs for each grammatical
phenomenon. For instance, German word stress is acquired earlier than verb position
and verb-object order is acquired earlier and faster than the definite article. The
acquisition of German verb order starts at the two-word stage with the canonical V-final
position followed by raising of the finite verb to the second position at the age of 3-4
years (Meisel 1992, Tracy & Thoma 2009). The definite article is acquired at around 3-
5 years with the feminine form often being overgeneralized (Bittner
2006, Dieser 2009,
Mills 1986, Ruberg 2013, Szagun, Stumper, Sondag & Franik 2007). For the acquisition
of German word stress, children start to learn reducing unstressed syllables with their
first lexical words after a period where they produce each syllable with more or
less the
same acoustic prominence (Lintfert 2010).
The current research project considers these phenomena and influence factors together
by
analysing productions of bilingual and monolingual children between 3;6 and 8 years.
In a sentence completion test (Hulk & Cornips 2006, Zuckerman 2001), we collected
data on verb order and the gender of the definite article. In a picture-naming task (Ruberg
2013), the
children had to produce target words and we analysed their use of the
definite article and their word stress productions. The test stimuli containing child-
appropriate lexical material were chosen from the databases CHILDES and WebCelex
as well as from literature and test
material about lexical acquisition (Bockmann &
Kiese-Himmel 2006, Laaha, Ravid, Korecky- Kröll, Laaha & Dressler 2006, Quay 1995,
Taeschner 1983, Vollmann, Sedlak, Müller & Vassilakou 1997, Volterra & Taeschner
1978). The sentence completion test contained 48
test sentences, distributed among 4
lists. The picture-naming task contained 18 target words distributed between two lists.
All lists had filler items and all test items and lists were
balanced according to
several morphological, prosodic and combinatory aspects. After trying
out and evaluating
different methodological procedures of the experiments, we chose the
most appropriate
one for the sentence completion test and the picture-naming task. We
tested 14
children in each of the chosen procedure in 3 age groups (3;6-4;11 | 5;0-6;11 | 7;0-
8;10) and 2 language groups (bilingual vs. monolingual).
The pilot results reveal tendencies of influence factors on language competence. As for
age,
older children tend to perform better with regard to verb position and the definite
article. Verb position is more problematic in V2 than in V-final target clauses. This might
be due to the more complex operation of verb raising. The definite article is mostly
correct in feminine
gender, whereas children have more problems with masculine and
neuter gender. This seems to confirm previous research about the feminine article
being the first one in the
acquisition process. There is also a tendency for cross-
language influence. Bilingual children make slightly more errors with verb order and the
definite article than monolinguals. Especially with regard to word stress production,
we find that children with Italian and Turkish as heritage language reduce German
unstressed syllables to a lesser degree than German monolingual children do.
49
Influence from the syllable timed heritage languages might be a possible explanation. We
thus see cross-linguistic influence especially in prosody. For the acquisition of verb order
and the gender of the definite article, our data indicate the same acquisition steps
among
bilinguals and monolinguals.
References
Bittner, Dagmar. 2006. Case before Gender in the Acquisition of German. Folia Linguistica:
Acta Societatis Linguisticae Europaeae 40(1-2). 115-134.
Bockmann, Ann-Katrin & Christiane Kiese-Himmel. 2006. ELAN Eltern Antworten.
Elternfragebogen zur Wortschatzentwicklung im frühen Kindesalter. Göttingen: BELTZ,
Hogrefe.
Dieser, Elena. 2009. Genuserwerb im Russischen und Deutschen. Korpusgestützte Studie zu
ein- und zweisprachigen Kindern und Erwachsenen. München, Berlin: Otto Sagner.
Hulk, Aafke & Leonie Cornips. 2006. Between 2L1- and Child L2 Acquisition: An Experimental
Study of Bilingual Dutch. In Conxita Lleó (ed.), Interfaces in Multilingualism:
Acquisition and Representation, 115-137. Amsterdam: Benjamins.
Laaha, Sabine, Dorit Ravid, Katharina Korecky-Kröll, Gregor Laaha & Wolfgang U. Dressler.
2006. Early Noun Plurals in German: Regularity, Productivity, or Default? Journal of
Child Language 33(2). 271-302.
Lintfert, Britta. 2010. Phonetic and phonological development of stress in German. Dissertation,
Universität Stuttgart.
Meisel, Jürgen M. 1992. The acquisition of verb placement: functional categories and V2
phenomena in language acquisition. Dordrecht [u.a.]: Kluwer.
Mills, Anne E. 1986. The acquisition of gender: a study of English and German. Berlin [u.a.]:
Springer.
Quay, Suzanne. 1995. The bilingual lexicon: implications for studies of language choice.
Journal of Child Language 22(02). 369-387.
Ruberg, Tobias. 2013. Der Genuserwerb ein- und mehrsprachiger Kinder. Hamburg: Kovac.
Szagun, Gisela, Barbara Stumper, Nina Sondag & Melanie Franik. 2007. The
Acquisition of Gender Marking by Young German-Speaking Children: Evidence for
Learning Guided by Phonological Regularities. Journal of Child Language 34(3). 445-471.
Taeschner, Traute. 1983. The sun is feminine: a study on language acquisition in bilingual
children. Berlin [u.a.]: Springer.
Tracy, Rosemarie & Dieter Thoma. 2009. Convergence on Finite V2 Clauses in L1, Bilingual L1
and Early L2 Acquisition. In Christine Dimroth & Peter Jordens (eds.), Functional
Categories in Learner Language, 1-43. Berlin: Mouton de Gruyter.
Vollmann, Ralf, Maria Sedlak, Brigitta Müller & Maria Vassilakou. 1997. Early Verb Inflection and
Noun Plural Formation in Four Austrian Children: The Demarcation of Phases and
Interindividual Variation. Papers and Studies in Contrastive Linguistics 33. 59-78.
Volterra, Virginia & Traute Taeschner. 1978. The Acquisition and Development of Language
by
Bilingual Children. Journal of Child Language 5. 311-326.
Zuckerman, Shalom. 2001. The Acquisition of “Optional” Movement. Enschede: Print
Partners Ipskamp.
50
Louise McKeever
Speech motor control in autism
University of Strathclyde, Glasgow
Background
Fine motor control is frequently impaired in children with autism, however, speech motor
control has been found to be unimpaired in some studies using perceptual methods. This
is despite the need for intricate movement of the tongue required for accurate speech.
However, a small number of studies found residual and non-developmental speech errors
are significantly higher in a sample of children with autism (33-40%) than the normal adult
population (1-2%; Shriberg et al., 2001; Cleland et al., 2010). Conflicting evidence may be
due to unreliable perceptual analysis that relies on auditory skills of the assessor. The
cause of these speech errors is still in debate.
Aim of the study
Our research will investigate speech errors in autism using Ultrasound Tongue Imaging
(UTI). It will be used to identify any inaccurate or uncoordinated movements of the tongue
which could indicate a motor impairment. We aim to determine whether errors in fine motor
control are echoed in errors of speech. Using UTI eliminates the higher likelihood of
variation and inaccuracy of perceptual assessments. Additionally we will compare UTI and
perceptual assessments to determine whether there are speech errors missed in clinic
through use of perceptual assessment only.
Methods
We will compare UTI data with standardized speech assessments. UTI is used in the
imaging of speech as it allows investigation of tongue movement. By placing a standard
medical ultrasound probe under the chin, most of the surface of the tongue in a midsagittal
view is imaged. UTI has been used in the field for decades however until recently it was
hard to gain useful data from. Now ultrasound is portable, provides fast frame rates and
can synchronize ultrasound images with audio. This allows analysis of tongue movement
that can be compared across participants. Fine motor control will be assessed using
standardized assessment and specific fine motor measurements. All assessments will
analyse the coordination and accuracy of movements of fingers and speech muscles.
Conclusions/importance of work
It is important to investigate the relationship between speech motor control and fine motor
control as it can ultimately change the treatment provided by speech and language
therapists (SLTs). If speech errors in autism are a result of a motor control difficulty then
traditional speech therapy is less likely to be successful. Therapy needs to specifically
target speech motor planning. If perceptual assessments are not sensitive enough to
identify speech errors in autism, UTI may be an effective instrument to improve diagnostic
accuracy to inform practice.
51
Fereshteh Modaressi
Lexical access: The interaction of phonological and semantic features
ZAS Berlin
One of the major research directions in psycholinguistics is to explore the
processes we
use to access words in our mental lexicon. The goal of the current
study is to investigate
the effect of phonological priming in sentences and how it
interacts with other contextual
information in sentences and affect lexical access using a behavioral reaction time (RT)
paradigm (Experiment 1) and event-related potential (ERP) measures (Experiment 2).
An attempt is then made to explain the
results of the research within existing models of
lexical access in support of interactive models of speech perception.
This experiment design is inspired by a Persian language game, in which the
presence of a non-word prime (presented before the question) interferes with
access to a semantically appropriate word (in the answer). In this game, the
participant is asked to respond as fast as possible to the following question:
a) Panga
l
Panga
l
A
sh-o bachi mikhori?
Panga
l
Panga
l
soup-objm with what eat
“Pangal Pangal what do you eat the soup with?”
The correct response would be ghashogh ‘spoon’. Due to the presence of the
non-
word prime ‘Pangal’, it is expected that a number of participants will respond with
changal ‘fork’, because they are prompted by the memory of the rhyme.
Participants who do reply with the correct answer, ghashogh ‘spoon’ are expected to
show a longer reaction time. Participant’s response is influenced by the
presence of
the rhyme. In other words, an incorrect yet expected response is changal which is
semantically related to eat and phonologically related to pangal. In both experiments
the stimuli consisted of questions and one-word answers (targets). The target item
either rhymed or not with the non-word prime preceding the question, and was
semantically related or not to the highly expected correct
response. The highly
expected correct responses that either rhymed or not with the
non-word prime were
also presented. Participants listened to the stimuli and their task was to decide
whether the answer to the question was correct or incorrect.
Behavioral results suggest a speed-accuracy trade-off and demonstrate that
the
interaction of rhyme and semantics affects lexical access, while preliminary ERP
results suggest that the N400 is in fact affected by sentential phonological
congruency. This study also supports bi-directional flow of information between
different stages as proposed by interactive models of lexical access.
52
Alexandra Kati Müller
The lexicon in German dialects
University of Leipzig
In my studies involving German dialects and especially dialect syntax one important
aspect is language change. Schmidt & Herrgen (2011) discuss this coining the term
Sprachdynamik. They describe in fictitious terms the language acquisition of two children
and the problems that were reported to occur when the system of dialect collides with the
system of literary language. As a result, the child that has been socialised in dialect has to
learn an entirely new system or rather has to reorganise his stabilised existing system of
dialect completely while passing through literary language acquisition. In other words, for
this child, writing the standard variety means writing in another language system, and
trying to speak close to the standard variety means to know what the written word looks
like and to “read” this aloud (Schmidt & Herrgen 2011:38-48). This begs the question
“which linguistic units are stored in the lexicon” when, for instance, the standard expresses
‘die Füße‘ this way /di: ˈfy:sə/, while a person speaking dialect would use the term /də fɪs/
or /də bɔdn
̩ /. To choose one or the other term depends on how adequately it fits the
situation, the individual competence and the will to perform it in a socially expected
manner. In the perspective of Eckert 2012 this also means that patterns of variation “are
part of the active––stylistic––production of social differentiation (98).” Therefore there are
always reasons for the choice of exactly that one term, register or variety one had made
(An approach to link social meanings and chosen varieties has been given by Silverstein
and Agha with some key concepts e.g. indexicality and enregisterment (Johnstone 2014:
1f. of .pdf)). And through each act of performance one get a glimpse of the individual
competence of the speaker in using adequate and expected words and registers for the
current social setting.
A special point of interest is that a speaker can intentionally choose words or registers that
differ from the expected choice. This, however, is usually tolerated as individual style. A
dialect speaking person in the area around Hohenstein-Ernstthal in Saxony applies the
preterit twice on selected irregular verbs: using irregular form via vowel gradation (= Ab-
laut) plus the dental suffix of the regular verb in past tense, e.g.: kamen > kamten or gin-
gen > gingten. In the system of that dialect this is quite correct, while the term fliegten
would be incorrect. This is known as overgeneralisation during language acquisition. This
illustrates how important it is to analyse errors and their genesis. Spillner 1990 was one of
the first to suggest a systematisation of error analysis for forensic purposes, because most
of the works on error analysis had focused on language acquisition for native or secondary
languages (Despite this, specialised methodology/didactics for German as second lan-
guage still criticise still existing desiderata in the departments error analysis and psycho-
linguistics regarding transfer of knowledge (Transferleistungen) (Tönshoff 2015: 104f.).
This potential has even been recognized by German law enforcement who have assem-
bled a corpus consisting of blackmail (KISTE: Kriminaltechnisches Informations-System
Texte) letters, which were investigated with methods of error analysis in order to help ana-
lysing new blackmail letters by comparing them to already-known ones. Furthermore, fo-
rensic linguistics could help identify which errors were simulated in order to conceal the
identity of the writer (Bülow 2016:208f., Schall 2007). Last but not least, error analysis can
also assist research into aphasia-induced relearning. Therapy with music or art in order to
improve the relearning process has shown promising results (Baumann 2011: online). As
Deutscher puts it, our brain could be imagined as kind of a huge office building and we, the
scientists, have merely permission to stand in front of it, watching the lights being switched
53
on and off in all those special rooms of the building. We can only guess what the rooms
are used for and what happens inside. In short: We actually have no idea, what is going on
exactly – we can only observe where and when the lights are turned on (2012:270f.).
For all these reasons I am looking forward to the winter school, hoping that I can infer,
from the information we are going to exchange there, the possible meanings of the lighted
rooms I do observe.
References
Baumann, M. (2011): „Musiktherapie in der Behandlung von Aphasie“. Homepage of Deutsche
Musiktherapeutische Gesellschaft. Available at:
http://www.musiktherapie.de/musiktherapie/arbeitsfelder/neurologische-
rehabilitation/aphasie.html
Bülow, L. (2016): „Textsortenkonstituierende Parameter von Erpresserschreiben. Zur performativen
Wirkung des Textsortenwissens“. In: Bülow, L., Bung, J., Harnisch, R. & Wernsmann, R.
(Eds.): Performativität in Sprache und Recht. Berlin/Boston: 191-226.
Deutscher G. (2010): Im Spiegel der Sprache. München.
Eckert, P. (2012): “Three Waves of Variation Study: The Emergence of Meaning in the Study of
Sociolinguistic Variation“. In: Annual Review of Anthropology 41: 87-100.
Johnstone, B. (2014): “Enregisterment: Linguistic Form and Meaning in Time and Space“. In: Bus-
se, B. & Warnke, I.H. (eds.): Sprache im urbanen Raum/Language in Urban Space. Berlin:
33 Seiten. (=Handbücher Sprachwissenschaft 20) Available at:
http://works.bepress.com/barbara_johnstone/66/
Schall, S. (2007): „Forensische Linguistik“. In: Knapp. K. et al. (eds.): Angewandte Linguistik. Ein
Lehrbuch. Tübingen/Basel: 566-584.
Schmidt, J.E:, Herrgen, J. (2011): Sprachdynamik. Eine Einführung in die moderne Regionalspra-
chenforschung. Berlin. (=Grundlagen der Germanistik 49)
Spillner, B. (1990): „Status und Erklärungspotential sprachlicher Fehler“. In: Kniffka, H. (ed.): Texte
zu Theorie und Praxis forensischer Linguistik. Tübingen: 97-113. (=Linguistische Arbeiten
249)
Tönshoff, W. (2015): „Deutsch als zweite oder weitere Fremdsprache – vom Lernerprofil zu einer
spezifischen Didaktik und Methodik“. In: Stork, A. & Hoffmann, S. (eds.): Lernerorientierte
Fremdsprachenforschung und –didaktik. Tübingen: 97-110. (=Giessener Beiträge zur Fremd-
sprachendidaktik).
54
Gediminas Schüppenhauer & Katarzyna Stoltmann
Short-term memory processes of German Sign Language speakers
ZAS, Berlin
While studies on the semantic integration of co-speech gesture have recently gained a lot
of attention, the processes of accessing modality-specific information through gesture still
re-main to be investigated in more detail. Focusing on effects on short-term memory, Hall
and Bavelier (2011) distinguished between three main processing stages which he manip-
ulated individually in his study on hearing ASL-English (American Sign Language) bilin-
guals: perception, encoding and recall. The task performed by the participants consisted of
recalling previously encountered sequences of digits. It was shown that perception and
encoding in the verbal modality generally increased the span of digits that could be accu-
rately remembered. Surprisingly, the opposite was true for the recall condition: participants
showed an ad-vantage over verbal recall when they performed recall by using their hands
to represent dig-its. This argues against the common assumption of a global disadvantage
for sign languages in memory tasks.
To this date, studies investigating number processing in sign language speakers have
generally supported this notion. For the most part, the goal of these studies was to com-
pare the processing of written digits (e.g. 1, 2, 3) and spelled out numerals (e.g. one, two
three). As a result, they did not contain stimuli in the gestural modality using the respective
sign language number system (Iversen 2008: 95). The above mentioned study by Hall and
Bavelier (2011) is the first to investigate individual stages of short term memory processing
in sign language speakers by using signed digits as stimuli. In the light of the emerging
paradigm of Embodied Cognition, it seems counterintuitive to neglect the modality in which
specific information is encoded. All the more so because findings in neurology have been
suggesting for quite some time that processing of numbers and processing of visuospatial
information are both located within the intraparietal sulcus (Dehaene et al., 2003). Further
investigation might confirm a functional connectivity between the two, especially consider-
ing the work of Wilson and Emmorey (1997) according to which sign language speakers
rely on visuospatial coding during serial recall tasks.
For this reason, we would like to repeat Hall and Bavelier’s study with bilinguals fluent in
DGS (Deutsche Gebärdensprache, eng.: German Sign Language) and German. In order
to account for similarities in signed digits and counting gestures commonly used by Ger-
man speakers, we plan to add German and DGS monolinguals as control groups.
There is also considerable variation between sign languages with regard to how digits are
signed. The system in ASL is based on 5 fingers, whereas the system in DGS is based on
10 fingers with a sub base of 5 (Iversen 2008, Hall 2011). This difference is illustrated in
the following pictures:
In accordance with Hall and Bavelier (2011), we plan to use digits from 1 to 9 for the study
material. These digits will be presented in an oral as well as a gestural condition with no
more than two consecutive digits in a presented sequence. The fact that DGS speakers
use both of their hands for signing digits may possibly lead to additional effects that could
not be observed in ASL. Since it isn’t unheard of that there might be general memory ad-
vantages for stimuli associated with manual gestures (Hall and Bavelier 2011: 62), we
hope to account for this by including the aforementioned monolingual control groups. In
their study, Hall and Bavelier hypothesize that improved recall in the gestural modality
might be a result of “retain[ing] speech-based representations in memory while performing
recall in a modality that did not interfere with those representations” (Hall and Bavelier
2011: 62). Consolidating findings for DGS could therefore provide further evidence for this
model of
ment of n
e
Figure 1: 1
in A
S
short-ter
m
e
w learnin
a. Number
S
L (Hall 20
1
m
memory
g environ
m
system for
1
1, 64)
and migh
t
m
ents for
d
single figu
r
t
in turn h
a
d
isordered
r
e digits in
D
a
ve implic
speakers.
D
GS (Ivers
e
a
tions for
e
n 2008, 8
1
the furthe
1
), 1b Num
b
5
5
r develop
-
b
er system
5
-
56
Tabea Thies, Anne Hermes, Doris Mücke
Atypical speech production of Essential Tremor Patients
IfL Phonetik, University of Cologne
Theoretical Background
Deep Brain Stimulation (DBS) is an effective treatment for patients with medication re-
sistant Essential Tremor (ET). A frequent side effect of this treatment is that ET patients
report on deleterious effects on their speech under stimulation. Acoustic studies based on
fast syllable repetition tasks for patients with multiple sclerosis (Pützer et al. 2007) and
essential tremor (Mücke et al. 2014) showed a deterioration of speech in the acoustic do-
main in fast syllable repetition tasks such as /tatata/, /kakaka/ or /papapa/. They found
frication during intended voiceless closures in /ta/, /ka/ and /pa/ as well as an increase in
voicing during the entire syllable cycle, both indicating a deterioration in speech motor con-
trol of the oral and glottal system under stimulation. In a preliminary analysis, Mücke and
Hermes (2016) have shown, that DBS affects the syllable production in German ET pa-
tients. They investigated coronal consonants in CV and CCV syllables, such as /li/ vs. /kli/
by using Electromagnetic Articulography. They found stimulation induced timing deficits for
ET patients treated with VIM-DBS. However, it is unclear whether the timing deficits result
from synergies of the lingual system, the tongue tip and tongue body. Therefore, the pre-
sent study focusses on syllable coordination of the labial and lingual system. We analyse
articulographic data by testing coordination patterns in syllables with different complexity
CV and CCV within the framework of Articulatory Phonology consulting the coupling hy-
pothesis of syllable structure (Browman & Goldstein 2000, Nam & Saltzman 2003). These
different complexities require different demands in the speech motor system. More specifi-
cally, this study investigates the timing relation of competitive and non-competitive cou-
pling structures in /lima/, /pina/ and /plina/. In simple syllable onsets, CV, it is assumed that
C and V are timed in-phase, i.e. they are initiated at the same time (non-competitive cou-
pling structure). In complex syllable onsets, CCV, it is assumed that both Cs are timed in
anti-phase with each other and at the same time in-phase with V (competitive coupling
structure). This underlying coupling mode entails a leftward and a rightward shift of both
Cs relative to the following V. Thus, these shifts are used as a measure for phonological
syllable diagnosis.
Method
The speech material is designed to test gestural coordination in simple (CV, e.g. /lima/ and
/pina/) and in complex syllable onsets (CCV, e.g. /plina/). This study deals with the analy-
sis of 5 ET patients with VIM-DBS (bilateral implanted). Patients were recorded with stimu-
lation on (DBS-on) and off (DBS-off) under the articulograph. The articulatory data was
recorded with a 3D Electromagnetic Articulograph (AG 501). We measured the target-to-
target distance from the target of the consonantal gesture(s) to the target of the vocalic
gesture, referred to as leftward and rightward shift. We compared the latencies and thus,
the direction of the shift from CV to CCV to test whether the consonants are adjusted to
make room for another. Based on the coupling hypothesis of syllable structure, it is as-
sumed that the plosive should shift to the left comparing /pina/ vs. /plina/. Therefore, the
latencies between the gestural targets should increase leading to a leftward shift. Further,
the lateral should shift to the right from /lima/ to /plina/ with decreasing latencies; referred
to as rightward shift.
Results
Figure 1
shift of th
e
dition. All
er, only 1
stimulatio
ing. As p
r
/lima/ wit
h
domain.
Figure 1:
S
CCV cond
i
Conclusi
o
We concl
u
nation pa
t
icits. Furt
produce
p
Therefor
e
added to
t
ing to mis
Referenc
e
Pützer, M.
spe
e
Mücke, D.
,
M. (
2
Tre
m
Mücke, D.
talk
a
Nam, H.
&
Proc
Browman,
self-
o
represent
s
e
conson
a
patients r
e
out of 5 s
p
n the perf
o
r
esented, t
h
/plina/. It
S
hift distan
ition comp
a
o
n
s
u
de that
E
t
terns req
u
hermore,
u
p
atterns r
e
e
, no right
w
t
he syllabl
e
s
timing of t
h
e
s
, Barry, W.
e
ch subsyst
,
Becker, J.
2
014). The
m
or Patient
s
& A. Her
m
a
t colloqiu
m
&
Saltzman
c
. of the 15t
h
C. P. & G
o
o
rganizatio
n
s
the resu
l
a
nts comp
a
e
veal a sh
p
eakers is
o
rmance o
he rightm
o
shifts furt
h
ces and di
a
ring DBS-
o
E
T patient
s
u
iring the l
a
u
nder sti
m
e
quiring
c
w
ard shift
e
. The pre
h
e prosod
i
J. & Mori
n
ems in pati
e
Barbe, M.
T
effect of
D
s
. JSLHR,
5
m
es (2016)
m
"Korpusli
n
, E. (2003)
h
ICPhS, 2
2
o
ldstein, L.
n
of phonol
o
l
ts for the
a
ring CV t
o
i
ft of leftw
a
produced
f the patie
o
st C in th
e
h
er away
f
rections of
o
ff vs. DBS-
s
with VIM
-
a
bial and l
i
m
ulation th
e
ompetitiv
e
of the pr
e
vocalic co
n
i
c syllable
n
glane, J.
R
e
nts with
m
T
., Roettge
r
D
eep Brain
5
7(4), 1206
.
Wie Tiefe
n
guistik und
. A compe
t
2
53-2256.
(2000). C
o
o
gical stru
c
measure
d
o
CCV for
a
rd C to th
the shift
o
nts is getti
e
cluster d
f
rom the V
,
the conso
n
-
on.
-
DBS hav
e
i
ngual sys
t
e
se defici
t
e
coupling
e
vocalic c
o
nsonant s
h
constitue
n
R
. (2007).
E
m
ultiple scle
r
r
, T.B., Mei
s
Stimulatio
n
-1218.
e
Hirnstimul
Phonetik",
t
itive, coupl
o
mpeting c
o
c
tures. Bull
e
d
latencie
s
ET patien
t
e left from
o
f the right
m
ng worse
b
oes not s
h
,
indicatin
g
n
antal gest
e
difficulti
e
t
em. The
p
t
s get eve
n
structure
s
o
nsonant
i
h
ifts to the
n
t.
E
ffect of d
e
r
osis. Jour
n
s
ter, I., Lieb
h
n
on the s
p
ation die
S
HU Berlin.
ed oscillat
o
o
nstraints
o
e
tin De La
C
s
including
t
s in DBS-
o
CV to C
C
m
ost C to
t
b
ecause l
a
h
ift at all t
o
g
a mistimi
n
u
res from
s
s in prod
u
p
atients sh
n
worse.
T
s
in com
p
s found
w
left away
f
e
ep brain s
n
al of Voice
,
h
art, L., Ti
m
p
eech mot
o
S
prechmoto
r
Germany.
o
r model o
f
o
n interges
t
C
ommunica
the direc
t
o
ff and D
B
C
V conditi
o
t
he right (
S
a
tencies a
r
o
the right
c
ng in the
a
s
imple CV
u
cing sylla
b
ow coordi
n
T
hey are
n
p
lex sylla
b
w
hen a co
n
from the
v
s
timulation
o
, 21(6):741
-
m
mermann
o
r system i
rik beeinfl
u
f
syllable s
t
tural coord
a
tion Parlée
5
7
t
ion of th
e
B
S-on con
-
o
n. Howev
-
S
3). Unde
r
r
e increas
-
c
omparin
g
a
rticulator
y
to comple
x
b
le coordi
-
n
ation def
-
n
ot able t
o
le onsets
.
n
sonant i
s
owel lead
-
o
n differen
t
-
753.
L., & Grice
,
n Essentia
l
sst. Invite
d
t
ructure. In
:
ination an
d
, 5, 25–34.
7
e
-
-
r
-
g
y
x
-
-
o
.
s
-
t
,
l
d
:
d
58
Anastasiia Tsukanova
Articulatory speech synthesis
University of Lorraine
Most state-of-the-art techniques for text-to-speech synthesis (TTS), even though they
present results of very good quality, are purely technical solutions which bring no or very
little information about the acoustics of speech and how the articulators (mandible, tongue,
lips, velum…) are controlled to produce the signal.
By contrast, the articulatory approach generates the speech signal from the vocal tract
shape and its modelled acoustic phenomena. The vocal tract deformation control
comprises slow anticipation of the main constriction and fast and imperatively accurate
aiming for consonants. The baseline system is a try to work with the notion of articulatory
target as the position that the speaker aims for to produce a particular phoneme. It may be
interpreted as the closest constriction position—such as the catch position in production of
a stop—or, potentially, even a physically impossible position, e.g., the tip of the tongue
going beyond the hard palate. For that we follow the approach by Birkholz (2013) and use
a collection of ca. 100 static magnetic resonance imaging (MRI) captures of the vocal tract
shape where a speaker of the French language keeps his articulators in such a position as
if he were about to produce a required consonant that would be followed by a required
vowel. The shapes are encoded by a principal-component-analysis-based (PCA-based)
articulatory model (Laprie and Busset, 2011; Laprie et al., 2014, 2015) to obtain
articulatory vectors of a variable length, in our case of length 26. Since the collection does
not cover all consonant-vowel (CV) pairs, we have to expand the dataset. For that, in the
space of the articulatory parameters, we can project any vowel /V/ onto the convex hull of
the cardinal vowels (/a/, /i/, and /u/). This allows us to estimate a missing /CV/ sample from
the present ones: /Ca/, /Ci/, and /Cu/.
Then this expanded dataset is used in a rule-based coarticulation model. It allows for
anticipation of a coming vowel even across several consonants within the limits on time,
the number of phonemes, and spatial arrangements of the articulators.
The evaluation of the baseline model is done both on the animated graphics representing
the vocal tract shape evolution (how natural and efficient the movement is) and on the
synthesised speech signals that are perceptively and—in terms of formants—qualitatively
compared to identical utterances made by a human.
Our results show that there are a lot of effects in the dynamic process of speech that
manage to be reproduced by manipulating solely static data. We discuss generation of
pure V, VV and VCV transitions and the articulators’ behaviour in phrases, report which
acoustic properties have been rendered correctly and what could be the reasons for the
system to fail to produce the desired result in other cases, and ponder how to reduce the
after-effects of target-oriented moves to obtain a more gesture-like motion.
This baseline system serves as a starting point for working with dynamic articulatory data:
electromagnetic articulography in the nearest future, real-time MRI later. Considering the
increased availability of larger amounts of data and the recent advances in applying deep
learning to TTS (such as Oord et al., 2016), we see an opportunity to capture the
complicated phenomena occurring in speech production with machine learning techniques
and, hopefully, improve the current results in articulatory speech synthesis.
References
Birkholz, P. (2013). Modeling consonant-vowel coarticulation for articulatory speech synthesis. PloS one,
8(4),
e60603.
59
Laprie, Y. and Busset, J. (2011). Construction and evalu-
ation of an articulatory model of the vocal tract. In
19th
European Signal Processing Conference - EUSIPCO-
2011. Barcelona, Spain.
Laprie, Y., Elie, B., and Tsukanova, A. (2015). 2D
articulatory velum modeling applied to copy synthesis
of
sentences containing nasal phonemes. In International
Congress of Phonetic Sciences.
Laprie, Y., Vaxelaire, B., and Cadot, M. (2014). Geometric articulatory model adapted to the produc-
tion of consonants. In 10th International Seminar
on Speech Production (ISSP). K¨oln,
Allemagne. URL
http://hal.inria.fr/hal-01002125.
Oord, A.v.d., Dieleman, S., Zen, H., Simonyan, K.,
Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A.,
and Kavukcuoglu, K. (2016). Wavenet: A generative
model for raw audio. arXiv preprint arXiv:1609.03499.
60
E.C. van Knijff
Listening abilities and school success for children with a cochlear
implant in mainstream primary education
University of Amsterdam
Twenty years ago, most children in the Netherlands faced with deafness or a severe to
profound hearing-impairment were enrolled in special education, a context in which
communication was mainly based on sign language. However, thanks to advances in
hearing rehabilitation over the last few years, an increasing number of deaf or hard-of
hearing children with cochlear implants are enrolling in mainstream primary education.
The recently implemented legislation regarding ‘Tailored Education’ in the Netherlands
will only further increase this number. However, little is known about the exact challenges
Dutch children
with cochlear implants are facing in day-to-day mainstream classroom
scenarios. Information on their speech perception performance in realistic educational
settings is sparse. The current
PhD-project therefore aims to study potentially
contributing acoustic and linguistic factors in
the academic achievement of these children.
In the overall project, the academic achievements of the children with cochlear implants in
mainstream classrooms will be related to parent and teacher reports and standardized test
results on hearing and listening abilities. This information will then be complemented by
data
collected in later studies on classroom acoustics (noise levels, reverberation time)
and its
effect on speech transmission on the one hand, and on the role of syntactic
complexity in the
understanding of sentences-in-noise perception on the other hand.
Furthermore, the final
stages of the project involve the evaluation of current testing and
training materials in terms
of effectivity and ecological validity. The first study in the
project thus aims to gain further insight in the specific relationship
between the school
success and hearing and listening abilities of children with cochlear
implants. The target
population will include at least 10 early cochlear-implanted children around six or seven
years old entering the formal education settings of Dutch third grade. Each of these
participants will be matched to 3 hearing, chronological (and if possible
language-) age-
equivalent classmates to form a 30-participant control group.
Classroom performance in comparison to hearing peers will be analysed using
questionnaires as the Screening Instrument for Targeting Educational Risk (Anderson,
1989) and the
Speech, Spatial, and Qualities of Hearing Scale for Teachers (Galvin &
Noble, 2013), which will give valuable insight into the specific performance issues of
children with implants in different school settings. Additionally, the Listening Inventories for
Education (Anderson &
Smaldino, 1998) will give insights in the children’s personal
listening experiences in these
settings. With appropriate permissions, the questionnaire
data will be supplemented by performance on a variety of academic fields from the
school’s student information system, ‘het leerlingvolgsysteem’, including both language
based skills, such as reading and writing skills, and for example mathematics. These data
will be analysed to clarify which academic
areas are still at risk for this subgroup children
with cochlear implants that perform well
enough to enrol in mainstream education and
how this relates to their listening skills.
61
Pauline Veenstra
Efficacy of ASR software for congenitally blind speakers
University of Groningen
Research question
Is the efficacy of automatic speech recognition software different comparing congenitally
blind from sighted speakers?
Approach
Earlier research by Ménard and colleagues [1] focused on speech production and
perception of blind French speakers and mainly looked at the acoustic and articulatory
properties of vowels.
This study concentrates on speech production of Dutch speakers and focuses on vowels,
consonants, words, sentences, stories and free (running) speech. The acoustic properties
(formant measurements) and articulatory properties (quantifiable; lip movement recorded
on video and tongue and lip movement recorded with an electromagnetic articulography
device) are analysed, and the acoustic recordings are used as input for testing the
efficacy of speech recognition software (for instance Dragon and Google speech API).
Motivation
Visual and acoustic input is important in language acquisition. Kuhl and Meltzof [3] and
Legerstee [4] state that when babies are approximately 4 months old, they start to
associate
lip movement with sound. Speakers who are blind from birth do not receive
this visual input;
a recent study of Ménard [2] has shown that the articulation of Canadian
French speakers who are blind from birth (congenitally blind (CB)) differs from the
articulation of non-blind (NB) speakers. The lip movements of CB speakers are less
strong than those of NB speakers,
but they compensate this by using stronger tongue
movements. However, this compensation strategy is not sufficient, as the difference
between certain vowel pairs (for
instance é vs. è) is more pronounced in NB speech [2].
This suggests a clear link between visual input during language acquisition and the
resulting pronunciation.
The effect of this difference on the legibility of CB speakers remains a question, given
that the speech of CB speakers is articulatory different and seemingly acoustically less
distinctive. The effect of this lack of vowel distinction for CB speakers on the legibility of
running speech is minimal for the human ear, as we use sentence context amongst other
information to make sense of spoken language [5]. However, ASR systems are not yet as
evolved as the human ear and brain; their performance relies on the acoustic model. The
acoustic model is based on the relation between the acoustic speech signal and the
individual sounds, which in turn is based on the speech of NB speakers. This means that
if CB
speakers indeed have a tighter formant space in their running speech, ASR
systems will have
more trouble with recognition of their speech.
Data type
The acoustics and articulatory movements of Dutch CB and NB speech are recorded with
a microphone, video camera and electromagnetic articulography (EMA) device. The
participants pronounce sounds (all Dutch vowels and consonants), words and non-words,
sentences, stories and running speech. The articulatory information includes lip movement
recorded with camera and lip and tongue movement recorded with the EMA device. This
62
ensures that we have diverse pronunciation data to be able to properly compare CB
speech with NB speech. It also makes sure that we can use the recordings of the type of
speech mostly used for ASR purposes, namely sentences and running speech.
Method
The population studied consists of speakers who are blind since birth, or are visually
impaired to the degree that they have never been able to read lips. To make sure the
instructions are fully understood, the minimum age of participants is 12. The CB participants
have been contacted via several websites of importance to the visual impaired community
as well as via associations such as the Oogvereniging and Audio magazines such as 'Moet
je
horen'. More than 35 CB and visually impaired participants expressed their willingness to
cooperate for the first part of this study. The control group consists of matching participants
with regard to age, sex, and accent. Each of the groups for the speech production
experiment (CB, NB) consists of 20 participants. The experiment was conducted at the
participant's home, as the equipment is portable. It consists of 2 parts; the first part consists
of audio and video recordings of sounds (vowels and consonants), words, non-words,
sentences, stories and free running speech, which will be read from a screen or printed
braille. This part takes approx. 90 minutes. Part 2 consists of attaching the sensors to the
lips and tongue of the participant and audio and articulography recordings of sounds
(vowels and consonants), words, non-words, stories and free running speech. This part
takes approx. 60-75 minutes. The participant receives a compensation of €20. The audio
recordings of this experiment is used as input for the ASR systems.
References
[1] Ménard, L., Dupont, S., Baum, S. R., & Aubin, J. (2009). Production and perception of French
vowels by congenitally blind adults and sighted adults.The Journal of the Acoustical Society
of America, 126(3), 1406-14.
[2] Ménard, L., Toupin, C., Baum, S. R., Drouin, S., Aubin, J., & Tiede, M. (2013). Acoustic and
articulatory analysis of French vowels produced by congenitally blind adults and sighted
adults. The Journal of the Acoustical Society of America, 134(4), 2975-87.
[3] Kuhl, P. K., & Meltzoff, A. N. (1982). The bimodal perception of speech in infancy. Science,
218(4577), 1138-1141.
[4]Legerstee, M. (1990). Infants use multimodal information to imitate speech sounds. Infant
behavior and development 13, 343-54. [5] McClelland, J. L., & Elman, J. L. (1986). The
TRACE model of speech perception. Cognitive psychology, 18(1), 1-86.
63
Anna Womack
An MRI investigation of gestural interdependence of the tongue and
larynx during speech.
Queen Margaret University, Edinburgh
Little is known about the relationship between the tongue and larynx during speech. This
research will build on current knowledge through a comprehensive and systematic
investigation into larynx height in lingual articulations. The high degree of anatomical and
physiological interdependence within the vocal apparatus is well established (Laver 1980)
and it is clear that alterations in one area of the vocal tract may effect changes in other
areas of this interlinked system. For example, if the tongue is raised and no compensatory
muscle adjustments are made, the larynx is also likely to be pulled upwards. This means
that in a speech task which involves upward and forward movement of the tongue, it may be
harder to lower the larynx.
Segmental articulations and tongue settings will be investigated. These are differentiated
by the extent of positional adjustment, timescale and communicative function. Segmental
articulations are rapid movements that signal linguistic contrasts, and include the most
extreme tongue adjustments that occur during speech. Tongue settings are longer term
tendencies for tongue posture to be biased in a particular direction (Beck 2005, Laver
1980) and typically involve less extreme tongue adjustments. Setting underlies segmental
articulation and may reflect habitual speech patterns associated with language or accent,
or temporary paralinguistic adjustments (e.g. signalling mood).
The project will explore to what extent different speakers show similar laryngeal changes
whilst producing equivalent lingual adjustments, whether patterns of lingual movement are
linked to adjustments in specific regions of the tongue, and whether the laryngeal
consequences of the extreme tongue adjustments involved in segmental articulations can
reliably predict the laryngeal consequences of habitual tongue settings. Confirmation
would show that there is some physiological inevitability about covariation and contribute
to our understanding of motor learning by beginning to pinpoint elements of (sub)-
conscious variation versus potentially inevitable coordination of movement.
The project will use imaging techniques, such as Magnetic Resonance Imaging (MRI), to
provide a visual depiction of the vocal tract for analysis. Limited research addresses the
relationship between larynx position and tongue position (Ewan and Krones 1973; Honda
1983; Tiede 1996; Hoole and Kroos 1998; Demolin et al. 2000; Engwall and Badin 2000;
Kim, Honda and Maeda 2005). Research to date suggests lingual gestures may affect
larynx height (Hoole and Kroos 1998; Story, Titze and Hoffman. 2001; Kim, Honda and
Maeda 2005). This project will build on previous research by using larger numbers of
speakers, and investigating lingual settings as well as consonant and vowel articulation.
The study will contribute to better understanding of the integrated activity of the vocal
apparatus, the covariance of lingual and laryngeal activity, and the potential impact of
lingual adjustments on laryngeal function.
Approaching the vocal tract as an interconnected whole, this research aims to contribute to
a more holistic picture of motor behaviour during speech. If larynx height is affected by
tongue position, there are theoretical reasons to develop hypotheses relating to speech
production modelling, sociophonetics, cross-linguistic research and clinical management
of voice disorder. A tendency to adjust the larynx according to tongue setting has potential
implications for learning, for example, in second language (L2) acquisition and voice
disorder. Articulatory setting is relevant for sounding ‘native-like’ for L2 speakers (Wilson
and Gick 2014), and a number of languages have laryngeal involvement in phonological
64
contrasts; for example, larynx height facilitating pitch alterations in Mandarin (Moisik, Lin
and Esling 2014). As such, greater understanding of tongue-larynx interaction across
segmental articulations and tongue settings may inform instruction in second language
teaching. Additionally, altered larynx position is a feature of Muscle Tension Dysphonia
(MTD) (Mathieson 2001). A better understanding of the relationship may inform clinical
assessment and more targeted voice therapy teaching, using tongue position to facilitate
effective laryngeal function.
References
Beck, J.M., 2005. Perceptual analysis of voice quality: the place of vocal profile analysis. In: W.J.
Hardcastle and J.M. Beck eds. A Figure of Speech: A Festschrift for John Laver, Mahwah, NJ:
Lawrence Erlbaum Associates, pp.285-322.
Demolin, D., Metens, T. and Soquet, A., 2000. Real time MRI and articulatory coordinations in
vowels. In Proceedings of the 5th Seminar on Speech Production. May, pp. 86-93.
Engwall, O. and Badin, P., 2000, May. An MRI study of Swedish fricatives: coarticulatory effects. In
Proceedings of the 5th Seminar on Speech Production: Models and Data, May, pp. 297-300.
Ewan, W.G. and Krones, R., 1973. A Study of Larynx Height in Speech Using the Thyro-
Umbrometer. The Journal of the Acoustical Society of America, vol. 53, issue 1, pp.345-345.
Honda, K., 1983. Relationship between pitch control and vowel articulation. Haskins Laboratories
Status Report on Speech Research, vol. 73, pp.269-282.
Hoole, P. and Kroos, C., 1998. Control of larynx height in vowel production. Proc. 5th Int. Conf.
Spoken Lang. Processing, 2, pp. 531-534.
Kim, H., Honda, K., and Maeda, S. 2005. Stroboscopic-cine MRI study on the phasing between the
tongue and the larynx in Korean three-way phonation contrast. Journal of Phonetics 33, pp.
1-26.
Laver, J. 1980. The Phonetic Description of Voice Quality, Cambridge University Press: Cambridge.
Mathieson L. 2001. Greene and Mathieson's: The Voice and its Disorders. Sixth Edition. Whurr
Publishers. London and Philadelphia.
Moisik, S. R., Lin, H., & Esling, J. H. 2014. A study of laryngeal gestures in Mandarin citation tones
using simultaneous laryngoscopy and laryngeal ultrasound (SLLUS). Journal of the
International Phonetic Association, 44 (1), pp. 21–58.
Story BH, Titze IR, Hoffman EA. 2001. The relationship of vocal tract shape to three voice qualities.
Journal of the Acoustical Society of America. vol. 109, pp.1651–1667.
Tiede, M.K., 1996. An MRI-based study of pharyngeal volume contrasts in Akan and English.
Journal of Phonetics, vol. 24, issue 4, pp. 399-421.
Wilson, I. & B. Gick. 2014. Bilinguals use language-specific articulatory settings. Journal of Speech,
Language, and Hearing Research (JSLHR), Vol.57, pp. 361–373.
65
Sophia Wulfert
Consonant clusters as units of processing: the roles of frequency and
sonority
Albert-Ludwigs-University, Freiburg
Although investigated for decades, the question of which linguistic units are stored in
the mental
lexicon and how they are retrieved during speech production and
perception is far from being definitely answered. Units demonstrated to be relevant for
speech processing include words, syllables, segments and features (Frisch & Wright
2002, Goldrick 2002, Massaro & Cohen 1983). But MacKay (1972) also showed that
consonant clusters (hereafter CCs) behave like a robust group within the syllable: They
are broken up by slips of the tongue less often than could be expected by chance. This
leads to the assumption that CCs too might act as relevant units in speech processing.
From a usage-based point of view the consonants of a cluster can be seen as
entrenched by their frequent use together. In languages like German, which allow for a
large number of CCs both in syllable initial and in syllable final position, this would
reduce the processing effort of assembling the consonants of a cluster anew every time
and would increase efficiency in speech production and perception. If CCs are stored as
separate units, then we would expect there to be an effect of cluster frequency, just as
word, syllable and phoneme frequency effects have been observed (cf. also the
idea of
a Mental Syllabary, Levelt & Wheeldon 1994).
In my dissertation project I investigate whether the frequency of a consonant cluster plays
a role for its processing. My hypothesis is that there is a processing advantage for high
frequency CCs both in speech production and speech perception, i.e. high frequency
CCs should be produced and perceived both faster and more accurately than low
frequency CCs. It is striking that both speech production errors and speech perception
errors hardly ever result in phonotactically illegal structures and many studies on
(production and perception errors) have indeed focused on the absolute
wellformedness
of phonological structures (e.g. Berent et al. 2007). In analyzing frequency effects in
CCs I intend to supplement this work by investigating the role gradient wellformedness
(i.e. the
phonotactic probability) of structures plays for their processing.
I contrast the effect of cluster frequency with that of sonority as a measure of
phonological
wellformedness which has also been found to influence speech processing
in aphasics and healthy speakers (e.g. Aichert & Ziegler 2004, Hartley & Houghton
1996). In most cases, phonological
wellformedness and frequency of phonological
patterns are correlated, but in German the single
most frequent CC violates the
Sonority Sequencing Principle (e.g. Selkirk 1984). This provides an excellent test case
for the question whether it is universal phonological rules of wellformedness or the
speaker's/hearer's sensitivity to the statistics of a particular language that determine the
difficulty of phonotactic structures.
Very few studies compare the influences of frequency and sonority or similar structural
measures directly and the ones that do arrive at contrasting results (e.g. Levelt &
Wheeldon 1994, Levitt &
Healy 1985, Goldrick 2002) Moreover, the units of investigation
in these studies are segments and syllables rather than biphones. The one study that
specifically examines CCs finds that the position of a consonant within the cluster is the
most important aspect for processing: Most speech production errors occur on the
second consonant of a cluster (Stemberger & Treiman 1986). These
results are
attributed to different syllable positions for C1 and C2, which receive different amounts of
activation. In order to decorrelate all three variables, I assembled pairs of structurally
66
similar onset CCs (e.g. /fl/ and /sl/) and reverse CCs (e.g. /sk/ and /ks/) which differ in
frequency but not in adherence to the Sonority Sequencing Principle and vice versa.
These cluster pairs are compared directly in a
speech production and a speech
perception experiment. As frequency and sonority principles as well
as the position
account make contrasting predictions concerning processing load for many of the CC
pairs, their relative influences can be evaluated.
Speech production experiment
For the speech production experiment, the tongue twister paradigm (Dell et al. 2000)
has been adopted. Here, subjects have to repeat pseudoword stimuli at a high speech
rate which easily elicits speech errors. In my adaptation, stimuli consisting of two
syllables beginning with the respective
CCs to be compared (e.g. [ʃtɪŋ ʃnʊk]) are
presented auditorily and have to be repeated four times in close succession at a rate of
144 beats per minute in time with a metronome. The stimuli are
balanced
concerning a number of phonological, phonetic and neighbourhood factors.
Approximately 50 adult speakers of Standard German will be tested. The number of
slips on each CC is used as a measure of processing difficulty, while the number of
times a CC is erroneously produced is taken to be an indicator of a processing
advantage for that cluster. The results will be
analyzed with frequency of a CC in the
German language, its adherence to the Sonority Sequencing Principle and position of
the consonants within the cluster as the main predictor variables and several other
frequency measures as well as positional, articulatory and lexical properties as further
predictor variables. They will be discussed within the framework of a connectionist
model of speech processing which ascribes different levels of activation to nodes for
elements differing in frequency of use. Results from pretests support the hypothesis
in that frequency was the most
important predictor variable for speech errors.
Speech perception experiment
Concerning speech perception, two kinds of studies can be differentiated: Studies
assessing perception quality operationalized as number of misperceptions (called
“qualitative studies” here) usually present stimuli under noise and mostly find phonetic-
acoustic reasons for perception asymmetries (e.g. Chang et al. 2001, Davidson & Shaw
2012) while studies assessing processing speed in terms of reaction times
(“quantitative studies”) use tasks such as lexical decision, AX discrimination, and
naming with priming and mostly find lexical characteristics (frequencies, phonotactic
probability, neighbourhood properties and familiarity) to be relevant (e.g. Luce & Large
2001, Vitevitch & Luce 1999). The subtle effects of lexical properties demonstrated in
these studies seem to be obscured by a stronger effect of acoustic characteristics that
shows in perception studies which apply masking (especially white noise) of stimuli. In
order to uncover subtle frequency effects in speech perception it is therefore important
choose a kind of noise that does not mask the
acoustic characteristics of some
consonants more than those of others (like white noise does, cf. Silbert & Zadeh
2015) and to combine “quantitative” and “qualitative” methods in a coherent
manner
using the same stimuli for a good comparison of results.
Testing the frequency hypothesis will contribute to answering the question to what extent
the human brain makes use of probabilities of phonological patterns during speech
processing and whether the
learning of these patterns involves a process of abstraction
(as in the sonority account) or rather a
sensitivity to statistical properties of a particular
language.
67
References:
Aichert, I., & Ziegler, W. (2004). Syllable frequency and syllable structure in apraxia of speech.
Brain and language, 88(1), 148-159.
Berent, I., Steriade, D., Lennertz, T., & Vaknin, V. (2007). What we know about what we have never
heard:
Evidence from perceptual illusions. Cognition, 104(3), 591-630.
Chang, S., Plauché, M. C., & Ohala, J. J. (2001). Markedness and consonant confusion
asymmetries. The role of
speech perception in phonology, 79-101.
Davidson, L., & Shaw, J. A. (2012). Sources of illusion in consonant cluster perception. Journal of
Phonetics, 40(2), 234-248.
Dell, G. S., Reed, K. D., Adams, D. R., & Meyer, A. S. (2000). Speech errors, phonotactic
constraints, and implicit learning: a study of the role of experience in language production.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 26(6), 1355.
Frisch, S. A., & Wright, R. (2002). The phonetics of phonological speech errors: An acoustic
analysis of slips of the tongue. Journal of Phonetics, 30(2), 139-162.
Goldrick, M. (2002). Patterns of sound, patterns in mind: Phonological regularities in speech
production.
Unpublished doctoral dissertation, Johns Hopkins University, Baltimore .
Hartley, T., & Houghton, G. (1996). A linguistically constrained model of short-term memory for
nonwords. Journal of Memory and Language, 35(1), 131.
Levelt, W. J., & Wheeldon, L. (1994). Do speakers have access to a mental syllabary?. Cognition,
50(1-3), 239- 269.
Levitt, A. G., & Healy, A. F. (1985). The roles of phoneme frequency, similarity, and availability in the
experimental elicitation of speech errors. Journal of Memory and Language, 24(6), 717-733.
Luce, P. A., & Large, N. R. (2001). Phonotactics, density, and entropy in spoken word recognition.
Language and Cognitive Processes, 16(5-6), 565-581.
MacKay, D. G. (1972). The structure of words and syllables: Evidence from errors in speech.
Cognitive
Psychology, 3(2), 210-227.
Massaro, D. W., & Cohen, M. M. (1983). Phonological context in speech perception. Perception &
psychophysics, 34(4), 338-348.
Selkirk, E.O. (1984). On the major class features and syllable theory. In M. Aronoff & R. Oehrle
(eds) Language
sound structure. Cambridge, Mass: MIT Press. 107-136.
Silbert, N. H., & Zadeh, L. M. (2015). Variability in noise-masked consonant identification. In 2015T.
S. C. for Icp. (Ed.), 18th International Congress of Phonetic Sciences .
Stemberger, J. P., & Treiman, R. (1986). The internal structure of word-initial consonant clusters.
Journal of
memory and language, 25(2), 163-180.
Vitevitch, M. S., & Luce, P. A. (1999). Probabilistic phonotactics and neighborhood activation in
spoken word recognition. Journal of Memory and Language, 40(3), 374-408.
68
Johann Philipp Zöllner
Assessing transient disruptions in human functional speech networks
following tumor surgery of the dominant hemisphere
Goethe University, Frankfurt
Transient aphasia following tumor surgery on the language dominant hemisphere is a well-
known phenomenon and is associated with a high psychological und functional burden for
the affected patient. The cause for this transient aphasia is not well understood.
Postoperative brain oedema as well as disruptions of functionally important speech and
language networks have been proposed as an explanation.
In this study, 15 patients undergoing awake surgery of tumors on the language dominant
hemisphere were included. Postoperative aphasia was present in all but one of the
patients, lasting from 2 days to one week after surgery. Pre- and postoperative language
evaluation employing a structured interview based on the Aachen aphasia test and general
neuropsychological testing was used to reveal language and other cognitive deficits. Pre-
and postoperative functional MRI during wakeful rest was performed to assess relevant
resting state speech and language networks.
Our preliminary results from 5 patients show significant reductions in inter- and
intrahemispheric functional connectivity when comparing preoperative to postoperative
fMRI recordings at the time of maximal functional deficit (first postoperative testing for all
patients). Intrahemispheric connectivity between the left inferior frontal gyrus and the
supramarginal gyrus decreased during transient aphasia compared to the pre-operative
scan. In parallel, intrahemispheric connectivity decreased between the left superior
parietotemporal area and ipsilateral inferior frontal gyrus, superior temporal sulcus, and
primary auditory cortex. An aphasia-related reduction of interhemispheric connectivity was
apparent between Broca’s area and its homologue as well as between the left superior
parietotemporal area and the contralateral primary motor and secondary somatosensory
cortex.
Our preliminary results suggest that transient aphasia is related to disturbances in the
functional connectivity of speech perception and production networks.
69
Marion Dohen
Manual gestures in communication, language and speech development
Gipsa-Lab, Grenoble INP and Grenoble Alpes University, France
Communication is often conceived of as language and speech whereas it can be achieved
through many other means. Manual gestures are one of these means. Babies actually
spontaneously use them before speech and language to communicate with adults (e.g.
through manual pointing). Their use by babies for communicative purposes even predicts
speech onset and the production of the first words at first and then the first word
combinations. Some theories suggest that, in evolution, the hands would have been used
before the mouth to communicate by our far ancestors. Gestures are however not only
used when language and speech are not yet available. They are used throughout the
lifespan by all human beings in all communicative settings and all cultures. They also help
various types of acquisitions apart from that of language and speech. Manual gestures are
even more crucial in cases in which language and speech are delayed or impaired. Using
manual gestures as an alternative to speech actually promotes speech development.
Motor control of the hands and the mouth are in fact tightly coupled. A number of research
studies have analyzed the coordination between speech and gestures assessing this tight
coupling and showing that it is constrained not only by motor issues but also by
communicative ones.
This presentation will provide an overview of the research in the wide domain of manual
gestures and communication with a special focus on the effects of manual gestures on
learning in general and in speech and language in particular. It will especially question
lexical learning and address memorization issues. It will also provide information on the
beneficial role manual gestures play when language is impaired and how they can be used
in speech therapy. This will be illustrated more thoroughly with the description of a study
on the role of manual gestures in novel word learning in children with Down Syndrome.
70
71
Thursday, 12th of January 2017
72
Simon Hanslmayr
Searching for memory in brain waves – The synchronization /
desynchronization Conundrum
School of Psychology, University of Birmingham, UK
Brain oscillations have been proposed to be one of the core mechanisms underlying
episodic memory. But how do they operate in the service of memory? Reviewing the
literature a conundrum emerges as some studies highlight the role of synchronized
oscillatory activity, whereas others highlight the role of desynchronized activity. In this talk I
will describe a framework that potentially resolves this conundrum and integrate these two
opposing oscillatory behaviours. I will present results from EEG/MEG, EEG-fMRI,
intracranial EEG and brain stimulation studies and argue, based on these findings, that the
synchronization and desynchronization reflect a division of labour between a hippocampal
and a neocortical system, respectively. Specifically, whereas desynchronization is key for
the neocortex to represent information, synchronization in the hippocampus is key to bind
information. This novel oscillatory framework integrates synchronization and desynchroni-
zation mechanisms in order to explain how the two systems (i.e. neocortex and
hippocampus) interact in the service of episodic memory. Finally, I will discuss open
questions, specific predictions and challenges that follow from this framework.
Research Background: My research is focused on the role of brain oscillations for human
cognition. Specifically I am interested in how brain oscillations mediate our ability to form
and retrieve memories, or to attend to a particular aspect in our environment. In order to
address these very complex questions I use an array of electrophysiological and imaging
methods, such as EEG/MEG, fMRI, combined EEG-fMRI, and intracranial EEG.
73
Orsolya Beatrix Kolozsvari
Audio-visual perception of familiar and unfamiliar syllables: a MEG
study
University of Jyväskylä, Finland
During speech perception listeners rely on multi-modal input and make use of both
visual and auditory information. When presented with contrasts of syllables, the
differences in brain responses are not caused merely by the acoustic or visual
differences however. The familiarity of the syllable, i.e. whether it appears in the
viewer-
listener's native language or not, may also cause distinct brain responses.
We investigated how the familiarity of the presented stimuli affects brain responses to
audio-visual speech. 12 Finnish native speakers (right-handed with an average age
of
23.92 years (SD: 1.98)) and 13 Chinese native speakers (right-handed with an average
age of 24.31 years (SD: 3.61)) watched videos of a Chinese speaker
pronouncing
syllables (/pa/, /pha/, /ta/, /tha/, /fa/) during a MEG measurement. The stimuli presented
were either audio-visual (moving pictures with
simultaneous sound), auditory (still image
of the speaker with simultaneous
sound) or visual (moving pictures of the mouth alone).
The cover task was to press
a button when the /fa/ stimulus was presented in visual,
auditory, or audio-visual form.
For Finnish participants, only /pa/ and /ta/ are familiar because they are part of their
native phonology. For Chinese participants all four syllables are familiar.
Comparisons
were made for familiarity where familiarity of stimuli refers to whether or not they
were familiar for Finnish participants – for auditory only and
audio-visual stimuli.
We found significant differences between responses to syllables familiar and unfamiliar
to
the Finnish for both the Finnish and Chinese speakers. These results suggest
that long
term memory representations for speech sounds are manifested in the
brain activity at
the 80 – 140 ms time window, calculated from the start of sound
in the syllable stimuli.
74
Leonardo Lancia
Balancing stability and flexibility in speech perception: paradigms, data
and models
CNRS & Laboratoire de Phonétique et Phonologie, Paris, France
One of the most studied aspects of speech signals is their variability. The same word,
syllable, phoneme, etc., is produced differently depending on its linguistic context and on
multiple extra- linguistic factors. To retrieve the linguistic content of complex and variable
speech signals, the perceptual system must balance flexibility and stability in its behaviour.
Stability guarantees robustness to the idiosyncratic features of the speech utterances.
Flexibility permits adaptation to the many factors affecting speech production and
perception. Currently there are two different theoretical frameworks explaining how
listeners resolve the tension between flexibility and stability in speech perception. One is
based on distributional learning; the other is based on concepts coming from dynamical
systems theory. In the first framework, categories are associated to distributions of values
of the perceptual cues and learning is implemented as an update rule affecting these
distributions. In the second framework, processes such as perceptual competition,
perceptual habituation and associative learning interact while unfolding over different time
scales. Despite their differences, both approaches ground the core functioning of speech
perception on general properties of the perceptual system, rather than on speech-specific
processing principles. We will first review the experimental paradigms permitting to study
perceptual flexibility and stability in a systematic fashion. We will then describe the
different models and discuss how they are supported by available data. We will also
discuss the possibility that these approaches account for different aspects of the same
mechanism and are not necessarily incompatible.
75
Martijn Wieling
Using generalized additive modeling for analyzing articulography data
(and other time series data)
University of Groningen, Department of Computational Linguistics, the Netherlands
This tutorial will provide a hands-on introduction to Generalized Additive Modeling (GAMs).
Generalized Additive Modeling is a flexible regression method which is able to model non-
linear patterns while simultaneously taking into account fixed and random effect factors.
GAMs are especially suitable to model time-series data and I will illustrate their use by
analyzing articulatory data (i.e. tongue movement patterns compared between different
dialectal groups). Besides being suitable for analyzing articulatory data, GAMs are also
suitable for analyzing other types of time series data such as EEG data, eyetracking data,
or reaction time data.
76
77
Friday, 13th of January 2017
78
Malte Viebahn1, Sophie Dufour2, & Audrey Bürki1,3
Imitation of sublexical speech segments
1 Faculty of Psychology and Educational Sciences, University of Geneva
2 CNRS & Aix-Marseille Université
3 Linguistic department, University of Potsdam
Several studies reported evidence showing that speakers adapt their pronunciation to that
of their interlocutor. This phenomenon is termed phonetic imitation or phonetic conver-
gence and has been observed in conversations (e.g., Pardo, 2006; Pardo, Jay, & Krauss,
2010) and word repetition tasks. Goldinger (1998) for instance, asked participants to pro-
duce isolated words and non-words and to then repeat the same words (shadowing task).
He then asked a second group of participants to judge the degree of imitation between the
stimuli presented to the first group of participants during the shadowing task, and tokens of
the same words produced by these same participants before and during the shadowing
task. He found that the tokens produced during the shadowing task were judged more sim-
ilar to the stimuli heard during this task than the tokens produced before the shadowing
task. Moreover the imitation was stronger for low frequency words (but see Pardo, Jordan,
Mallari, Scanlon, & Lewandowski, 2013).
The present study builds on and extends this work. It examines whether imitation is re-
stricted to lexical units (words) or whether speakers also imitate sub-lexical units, such as
diphones or syllables. French speakers were first asked to read aloud disyllabic target
words in a baseline condition (without auditory primes) and preceded by an auditory prime.
The phonological overlap between the target and prime was manipulated. In the identical
condition, the auditory prime corresponded to the target word. In the syllable overlap con-
dition, the prime and target words shared the first syllable (e.g., carotte ´carrot’ – café ‘cof-
fee’). In the diphone overlap condition, target and prime shared a diphone across the syl-
lable boundary (e.g., tofu-sofa).
Phonetic imitation was measured in an AXB perceptual judgment task and in an analysis
based on amplitude envelopes. For the perceptual judgement task, the tokens that had
served as primes during the reading task were compared with tokens produced in the
baseline condition (without a prime) and tokens produced in the identical, syllable overlap
and diphone overlap conditions. Tokens from the identical and diphone overlap conditions
were judged more similar to the primes than baseline tokens.
Results from the amplitude envelope analysis revealed that all tokens produced after a
prime (identical, syllable overlap and diphone overlap conditions) were more similar to the
prime than the baseline tokens were. These results show that phonetic imitation is not lim-
ited to words but also occur when speakers repeat parts of words from their interlocutor´s
speech.
79
Margarethe McDonald
Effect of cross-linguistic overlap and bilingual’s language dominance
on lexical co-activation
University of Wisconsin-Madison
Auditory perception of lexical items in monolinguals begins with the activation of all items
that are in the cohort of that word (McClelland & Elman, 1986). For example, when hearing
“candle” English speakers also activate “candy” since it has a similar onset (Tanenhaus et
al., 1995). There is evidence that bilinguals, who have lexical representations from both of
their languages, activate lexical items in both their languages regardless of which
language the word was presented in (Spivey & Marian, 1999). Studies investigating this
phenomenon of cross-linguistic activation are beneficial for deducing the structure of the
speech perception network, and examining how it differs from monolinguals. Research has
revealed that bilinguals are more likely to experience activation of their native language
(L1) when listening in their second language (L2) than activation of the L2 when listening
in their L1 (Weber & Culter, 2004). In addition, some studies find that co-activation of the
L2 is only possible when the production of the L2 items are modified to match the
phonetics of the L1 (Ju & Luce, 2004). It is still not clear to what degree phonological
similarity of items between a bilinguals two languages could affect the amount of co-
activation they experience or how this interacts with the language dominance of the
bilinguals.
In my current research, we examine the effect of type of cross-linguistic overlap of lexical
items on degree of co-activation in bilinguals. We tested English native speakers who
acquired Spanish as a second language. Using an eye-tracking visual world paradigm, we
presented four images on a screen followed by the auditory presentation of a target word
in English. Participants clicked on an image corresponding to the target word (e.g. moon),
and co-activation was indexed by eye movements to an image that was cross-linguistic
competitor at the onset in Spanish (e.g. muñeca “doll”). We tested whether words with
sonorant, obstruent, or no cross-linguistic overlap at the onset would cause different
degrees of co-activation. Since sonorants are realized more similarly across Spanish and
English than obstruents, we expected more coactivation for sonorant than obstruent onset
words. We found significant co-activation of sonorants in these populations, but not of
obstruent onsets compared to filler trials.
Data collection is ongoing testing Spanish native speakers who acquired English as a
second language on the same task. We expect Spanish native speakers to show stronger
co-activation overall than Spanish L2 speakers—perhaps for obstruent onset words in
addition to sonorant onset words.
Future directions include examining cross-linguistic activation effects using the same tasks
in children who are enrolled in dual-language immersion schools and are therefore in the
process of acquiring their L2. I am also interested in linking cross-linguistic activation in
perception to degree of accent in production.
References
Ju, M., & Luce, P. A. (2004). Falling on sensitive ears constraints on bilingual lexical activation.
Psychological Science, 15(5), 314-318.
McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive
psychology, 18(1), 1-86.
Spivey, M. J., & Marian, V. (1999). Cross talk between native and second languages: Partial
80
activation of an irrelevant lexicon. Psychological science,10(3), 281-284.
Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of
visual and linguistic information in spoken language comprehension. Science, 268(5217),
1632.
Weber, A., & Cutler, A. (2004). Lexical competition in non-native spoken-word recognition. Journal
of Memory and Language, 50(1), 1-25
81
Pamela Fuhrmeister
The role of native language interference in perceptual learning of non-
native speech sounds
University of Connecticut
Acquiring speech sounds in a second language can be an extreme challenge for adults.
Although factors such as the perceptual similarity to native language speech sounds have
been explored extensively1, factors such as learning and memory have received little
attention until
recently. A nascent literature addressing the contributions of sleep-mediated
memory consolidation to speech sound category learning may provide a link between
systems of learning and memory and the difficult task of learning non-native speech
sounds and help further elucidate this challenge that many adult language learners face.
Memory consolidation through sleep has been shown to be beneficial to language in two
main ways. Specifically, it can help solidify learned information by protecting it from
interference, and it can also aid in generalization to novel instances. 2,3 A recent study by
Earle and Myers 4suggests a role of sleep in non-native speech sound learning by means
of protecting learned, non- native phonetic categories from native language interference. In
this study, participants (native
speakers of English) were trained to identify the Hindi
dental/retroflex contrast (a difficult
contrast for native speakers of English1) either in the
morning or in the evening and were
reassessed 12 and 24 hours later. Of note is that both
groups’ training and testing schedules contained an overnight interval, but only the group
trained in the evening showed improvement
following a period of sleep. The authors
concluded that the divergence in performance could be a
result of native language
exposure before sleep: the evening group had minimal exposure to their
native language
prior to sleep, while the morning group had a day’s worth of exposure that may have
interfered with consolidation. To test this empirically, a subsequent study was carried out, in
which all participants were trained on the Hindi dental/retroflex contrast in the evening and
then assigned to one of two interference conditions. For the interference task, participants
listened to 15 minutes of consonant-vowel utterances beginning with either /b/ or /d/.
Because
the alveolar /d/ sound is perceptually similar to the learned contrast while /b/ is
not, they predicted that the group bombarded with /d/-initial tokens before sleep would
experience an interference effect on their learning of the contrast, while the group that
heard /b/-initial tokens would not. This prediction was borne out: participants who heard /b/-
initial tokens improved after sleep, while those who heard /d/-initial tokens showed no such
improvement on their discrimination of the Hindi sounds. Thus, it appears that native
language exposure to sounds that
are similar to the learned non-native sounds before
sleep may interfere with consolidation.
One question raised by these results is the level at which the native language interferes.
Based on the results from Earle and Myers4, one prediction is that the native language
interferes at the
acoustic level. However, activating abstract phonological representations
or representations of articulatory gestures during speech perception could also produce
the observed interference.
This question is addressed in two experiments following the same procedure as in Earle
and Myers.4 In both experiments, participants (native speakers of English) were trained to
identify the Hindi dental/retroflex contrast in the evening (17:00-21:00) by identifying
minimal pair ‘words’, each paired with a visual stimulus. Participants were then assigned to
an interference
condition (B group or D group) or a control group that did not complete the
interference task. Participants were assessed on their discrimination and identification of
the contrast following the
interference task and again the next morning (8:00-10:00) to
assess o
v
Experime
phonologi
phonetic
homopho
n
screen (/
d
= 31 per
g
aloud (e.
g
d’ scores
effect of t
time poi
n
improve
m
the log o
stimulus
included
b
test reve
a
improve
m
that activ
a
interfere
w
Experime
articulato
r
assigned
bah, beh,
the 62 p
a
Control g
r
Results
o
improved
the mixe
d
8.42,
p
=
group im
p
significan
t
perceptu
a
experime
n
occurs at
Earle &
M
Results
o
suggest t
h
of learne
d
be most
v
ernight im
nt 1 test
e
i
cal repres
contrast.
e
j
udgm
e
d
/-initial to
k
group) an
d
g
. ‘drane’
=
were su
b
ime (
F
(2,1
n
t. No gr
o
m
ents in id
e
dds of se
played on
b
y-particip
a
a
led a sig
n
m
ent for all
a
tion of a
n
w
ith conso
l
nt 2 expl
r
y gesture
s
to either
a
dah, doh
,
a
rticipants
r
oup in Ex
p
o
f the
A
N
O
in their di
s
d
logit mo
d
0.01) for i
proved o
v
t
ly improv
e
a
l similarit
y
n
t currentl
y
the acou
s
M
yers
4
to p
r
o
f the pre
s
h
at native
l
d
, non-na
t
important
provemen
t
e
d the pr
e
entation t
h
In this e
x
e
nt task, i
k
ens for D
d
were as
k
=
drain, bl
a
b
mitted to
80) = 42.
3
o
up differ
e
e
ntification
,
lecting th
e
each tri
a
a
nt rando
m
n
ificant ef
f
groups. N
o
n
abstract
p
l
idation of
a
ored the
s
. For th
e
a
B or D
g
,
etc.). Th
e
in the B
a
p
eriment 1
O
V
A
reve
a
s
criminati
o
d
el reveal
e
dentificati
o
v
ernight (
z
e
. These
r
y
, may i
n
y
in progre
s
tic level,
w
r
oduce a s
i
s
ent exper
i
l
anguage i
t
ive speec
h
for acqu
t
.
e
diction t
h
h
at interfe
r
x
periment,
n which
p
group, /b/
-
k
ed if the
a
ff = nonw
a
repeate
3
4,
p
<
0.
0
e
nces or
,
we used
e
correct
v
a
l. Time a
n
m
intercep
t
f
ect of tim
o
group di
f
p
honologi
c
a
learned,
prediction
e
interfere
n
g
roup who
e
same an
a
a
nd D gro
u
were incl
u
a
led a ma
o
n of the c
o
e
d a signi
f
o
n perfor
m
z
= -2.73,
r
esults su
g
n
terfere w
i
ss tests th
w
e expect
i
mila
r
inte
r
i
ments ta
k
nput may i
h
categori
e
i
ring nov
e
h
at native
r
es with sl
e
the inter
f
p
articipant
s
-
initial for
B
string of l
e
w
ord). To t
e
d measur
e
0
01), indic
a
interactio
a mixed l
o
v
isual sti
m
n
d group
t
s and ran
d
m
e (
χ
2(1)
=
f
ferences
o
c
al repres
e
non-nativ
e
n
that nat
n
ce task i
n
o
silently a
alyses we
r
u
ps in thi
s
u
ded.
in effect
o
ontrast at
f
icant inte
r
m
ance. Po
s
p
= 0.0
2
g
gest that
ith conso
l
e predicti
o
sine wav
e
r
ference ef
f
k
en togeth
i
nterfere w
e
s at the
a
e
l speech
languag
e
e
ep-medi
a
f
erence ta
s
were sh
o
B
group, a
e
tters wou
l
e
st for gro
u
e
s
A
NOV
A
a
ting impr
o
ns were
o
git model
m
ulus that
were fixe
d
d
om slope
s
=
8.49,
p
or
interacti
o
e
ntation in
e
phonetic
ive langu
a
n
this ex
p
r
ticulated
/
r
e run as i
s
experim
e
o
f time, a
g
each time
r
action be
t
s
t hoc
Wa
2
), while
t
articulatio
n
idation o
f
o
n of acou
s
e
versions
f
ect.
er
with fin
ith sleep-
m
a
rticulator
y
categorie
s
input ac
t
a
ted cons
o
sk was a
o
wn a stri
n
d no tas
k
d represe
n
u
p differen
c
A
, which r
e
o
vement f
o
found. T
o
with the d
e
was pair
e
d
factors
s
for sess
i
= 0.004),
o
ns emer
g
the nativ
e
contrast.
a
ge interf
e
p
eriment,
6
/
b/- or /d/-
i
n
Experim
e
e
nt as wel
l
g
ain indic
a
point. A li
t
ween Ti
m
ld z tests
t
he B an
d
n
of any
s
a learn
e
s
tic interfe
r
of the au
d
dings fro
m
m
ediated
m
level and
s
. These
f
c
tivates a
n
o
lidation o
f
text-bas
e
ng of lett
e
k
for Contr
o
n
t a real
w
c
es in dis
c
e
vealed a
or
all grou
p
o
test for
e
pendent
v
e
d with th
and rand
o
i
on. A likel
indicating
g
ed. Thus,
e
languag
e
e
rence is
6
2 partici
p
initial syll
a
ent 1, and
l
as the 3
1
a
ting that
i
kelihood r
m
e and Gr
o
indicate t
h
d
D grou
p
s
ound, re
g
e
d contra
s
r
ence. If i
n
d
itory stim
u
m
Earle a
n
m
emory co
n
that this l
findings
a
8
2
n
abstract
,
f
a learne
d
e
d pseud
o
e
rs on th
e
o
l group,
n
w
ord if rea
l
c
rimination
,
significan
t
p
s at eac
h
overnigh
t
v
ariable a
s
e auditor
y
o
m effect
s
ihood rati
o
overnigh
t
it appear
s
e
does no
t
linked t
o
p
ants wer
e
a
bles (e.g.
,
data fro
m
1
from th
e
all group
s
a
tio test i
n
o
up (
χ
(2)
=
h
e Contro
l
p
s did no
t
g
ardless o
f
s
t.
5
A thir
d
n
terferenc
e
u
li
used i
n
n
d Myers,
4
n
solidatio
n
evels ma
y
a
dd to th
e
2
,
d
o
e
n
l
,
t
h
t
s
y
s
o
t
s
t
o
e
,
m
e
s
n
=
l
t
f
d
e
n
4
n
y
e
83
growing literature
linking learning and memory to different aspects of language processing
and acquisition.
References
1. Best, C. T., McRoberts, G. W. & Goodell, E. (2001). Discrimination of non- native consonant
contrasts varying in perceptual assimilation to the listener’s native phonological system. J
Acoust Soc Am 109, 775–94.
2. Davis, M. H., Di Betta, A. M., Macdonald, M. J. E. &
Gaskell, M. G. (2009). Learning and
consolidation of novel spoken words. J. Cogn. Neurosci.21, 803–820.
3. Earle, F. S. & Myers, E. B. (2015). Overnight consolidation promotes generalization across
talkers in the identification of nonnative speech sounds. J. Acoust. Soc. Am. 137, EL91. 4.
4. Earle, S. & Myers, E. B. (2015a) Sleep and Native Language Interference Affect
Non-Native
Speech Sound Learning. J. Exp. Psychol. Hum. Percept. Perform. 5.
5. Baese-Berk, M. M., & Samuel, A. G. (2016). Listeners beware: Speech production may be
bad for learning speech sounds. Journal of Memory and Language, 89, 23-36.
84
Maria Dokovova
The effect of language mode on stop voice contrast cue weighting for
Bulgarian-English bilinguals
Queen Margaret University, Edinburgh
Background
According to Grosjean (1997), who introduced the idea of language modes, the two
languages of a bilingual exit in a continuum of activation, shifting between
monolingual mode (when one of the languages is mostly inhibited) and a bilingual
mode (when both languages are activated). Indeed, research has shown that bilinguals
might differ in what cues they use depending on their language dominance and the
language they think they are listening to (Kong and Edwards 2015, Hazan and Boulakia
1993). However, it has also been demonstrated by Holt and Lotto (2006) that it is the
high variance of a cue and not its decreased informativeness that decreases its
weighting. As between English (an aspirating language) and Bulgarian (a voicing
language), voice onset time (VOT) is the more variable cue than post-release
fundamental frequency (f0), which behaves the same way in both languages, it can
be expected that bilinguals will have learned to rely on f0 more, regardless of whether
they are in a Bulgarian or English monolingual mode. Alternatively, bilinguals might
transfer VOT as a preferred cue for L2 and not differ from the monolinguals in either
group, in accordance with previous results in the literature (Kong and Edwards 2015).
Purpose
This study investigates if highly proficient bilingual speakers (Bulgarian-native, English-L2)
will differ from functionally monolingual speakers (Bulgarian and English) and from each
other (depending on what language they think they are listening to) in the way they
weigh VOT and f0 to distinguish between voiced and voiceless stops.
Participants
There were 40 participants in total, of which 24 were native Bulgarian speakers who
have worked or studied in a native-English-speaking environment for at least a year. 8
Bulgarian and 8 British English functional monolinguals were used as control subjects.
Stimuli
The stimuli were non-words in both Bulgarian and English, based on the productions of
a Bulgarian speaker saying /buvu/, /duvu/, /guvu/, /puvu/, /tuvu/, /kuvu/. Prevoicing from
the voiced tokens was cross-spliced on the words starting with short-lag stops (the
VOT=12 ms tokens). There were two levels of f0 (high-falling, which is usually associated
with voiceless stops, and low-rising, associated with voiced stops) combined with four
levels of VOT (- 25ms, -12ms, 12ms, 25ms) and three places of articulation (PoA),
resulting in 24 tokens in total.
Procedure
The perception experiment was sent to the participants online. In order to induce a
particular language mode, they received their instructions, language experience
questionnaire and listening comprehension task either in Bulgarian or in English. This was
followed by listening to the 24 randomised stimuli with each stimulus being followed by a
question such as: “What did you hear? a/ buvu b/puvu.”
85
Results
All language groups perceived the tokens similarly using a mix of both cues. In
ambiguous tokens of VOT +/-12ms, more weight was given to f0. The only significant
difference between language conditions was achieved by the bilinguals in the English
condition only for the tokens with VOT=25ms and low-rising f0. The bilinguals in English
mode relied on f0 significantly more than the bilinguals in Bulgarian mode and the English
control group.
Conclusion
Although a bigger sample is needed, it appears that all participants rely on f0 more
heavily when the primary cue VOT is less informative and in contradiction with the
secondary cue, not entirely supporting Holt and Lotto (2006). However, bilinguals might
experience more ambiguity in different tokens when primed with their L2 compared to the
L1 priming condition. Hence, changing the language mode of the bilinguals might affect
their cue weighting but only for some tokens.
References
Grosjean, François. 1997. Processing mixed languages: Issues, findings, and models. In Tutorials
in Bilingualism: Psycholinguistic Perspectives, eds. Annette M. B. de Groot, Judith F. Kroll,
225-254. Mahwah, NJ: LEA.
Hazan, Valerie L., and Georges Boulakia. 1993. Perception and production of a voicing contrast by
french-english bilinguals. Language and Speech 36 (1): 17-38.
Holt, Lori L., and Andrew J. Lotto. 2006. Cue weighting in auditory categorization: Implications for
first and second language acquisition. The Journal of the Acoustical Society of America 119
(5): 3059.
Kong, Eun Jong, and Jan Edwards. 2015. Individual differences in L2 learners' perceptual cue
weighting patterns. 18th International Congress of Phonetic Sciences.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Phonetic convergence; cognitive IDs; MLAT (Paper presented at New Sounds 2013 in Montréal)
Article
Full-text available
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-of-the-art performance, with human listeners rating it as significantly more natural sounding than the best parametric and concatenative systems for both English and Mandarin. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on the speaker identity. When trained to model music, we find that it generates novel and often highly realistic musical fragments. We also show that it can be employed as a discriminative model, returning promising results for phoneme recognition.
Conference Paper
Full-text available
This paper investigates the impact of word frequency and pronunciation talent on proficient speak-ers' second language (L2) production variabil-ity/similarity. The data analysed stem from a corpus of quasi-spontaneous conversations between native speakers of German and English. Within-speaker production similarity was established through a comparison of amplitude envelope signals for word tokens of L2 English word types. Production similarity scores, type frequency, and speaker talent were used as input for several linear (mixed effects) models. The models yielded a significant effect of type frequency, and a signficant interaction of type frequency and talent, for predicting similarity. The results are discussed in the context of exemplar-based category formation and indicate that less talented speakers are less capable of forming L2 categories.