ArticlePDF Available

Figures

Content may be subject to copyright.
Speak and unSpeak with P
RAATRAAT
By Paul Boersma and Vincent van Heuven
Introduction
By the Goodies Editor, Rob Goedemans
Many linguists use recorded speech in their research.
In descriptive work, visual representations of such
recordings (mostly oscillograms) are often annotated
with IPA symbols and other labels, and then used to
illustrate a phenomenon or defend a certain position
regarding the nature of some phonetic or phonologi-
cal property of the language in question. In phonetic
and psychophysical research some parameter of the
recorded speech (like tempo or intensity) is often
altered, after which the new sound thus obtained is
used in an experiment to test the sensitivity of the
human ear, or brain, to certain speech properties.
The introduction of the computer has brought
about a virtual revolution in the linguistic sciences
with respect to the usage of speech recordings. A lab
full of cumbersome machinery has now been replaced
by one PC, Mac or workstation, on which anyone who
puts his mind to it can record, annotate and modify
speech with some simple commands or a few
mouseclicks. Even the calculation of some speech
parameters that were rather complicated to obtain in
the past (like pitch and spectral analysis) but fre-
quently used in phonetic research nonetheless, are
now often just one or two mouseclicks away.
As a result, a growing number of colleagues ®nd
use for ®les with speech sounds in their linguistic
explorations. The needs of this group are served by a
rather small number of software packages designed
for the representation, annotation and analysis of
speech (and much more in many cases). In my
opinion, one of these stands out in many ways. It is
called ``P
RAATRAAT
''; the imperative form of to speak in
Dutch. Since this package is rapidly gaining in
popularity, we have decided to devote some attention
to it in this issue. First, one of the authors, Paul
Boersma, introduces the package and outlines its
impressive functionality. Then, an experienced user,
Vincent van Heuven, highlights some of the advan-
tages and disadvantages of using P
RAATRAAT
in everyday
phonetic research.
It is my sincere hope that these two goodies will
convince even more linguists to download P
RAATRAAT
and experiment with it a little. They will see that
incorporating, for example, some oscillograms of
minimal pairs in their work is as easy as ABC. Their
publications will undoubtedly be the better (and the
livelier) for it.
P
RAATRAAT
is a computer program for analysing,
synthesizing, and manipulating speech. It has been
developed since 1992 by Paul Boersma and David
Weenink at the Institute of Phonetic Sciences of the
University of Amsterdam. There are versions for most
of the common operating systems: Macintosh, Win-
dows, Linux, and several Unix workstations (Solaris,
Silicon Graphics, Hewlett-Packard). By September
2001, there were more than 5,000 registered users in
99 countries.
P
RAATRAAT
, a system for doing phonetics
by computer
By Paul Boersma
1. Analysing speech with P
RAATRAAT
P
RAATRAAT
allows you to record a sound with your
microphone or any other audio input device, or to
read a sound from a sound ®le on disk. You will then
be able to have a look `inside' this sound. The upper
half of the sound window (see ®gure 1) will show you
a visible representation of the sound (the wave form).
The lower half will show you several acoustic analy-
ses: the spectrogram (a representation of the amount of
high and low frequencies available in the signal) is
painted in shades of grey; the pitch contour (the
frequency of periodicity) is drawn as a cyan curve;
and formant contours (the main constituents of the
spectrogram) are plotted as red dots.
P
RAATRAAT
is most often used with speech sounds, in
which case the pitch contour is associated with the
vibration of the vocal folds and the formant contours
are associated with resonances in the vocal tract. But
the use of P
RAATRAAT
is certainly not limited to speech
sounds: musicians and bio-acousticians use it for the
analysis of sounds produced by ¯utes, drums, crick-
ets, or whales, and the interpretation of the three
analyses will change accordingly.
The Sound window allows you to zoom in for
more detail, to scroll to the places that you are
Paul Boersma, Institute of Phonetic Sciences, University of
Amsterdam, Herengracht 338, 1016 CG Amsterdam,
The Netherlands, Paul.Boersma@hum.uva.nl
Vincent van Heuven, Universiteit Leiden Centre for Linguistics
(ULCL), P.O. Box 9515, 2300 RA Leiden, The Netherlands,
V.J.J.P.van.Heuven@let.leidenuniv.nl
Glot International Vol. 5, No. 9/10, November/December 2001 (341±347) 341
ÓBlackwell Publishers Ltd. 2001, 108 Cowley Road, Oxford, UK and 250 Main Street, Malden MA 02148, USA
interested in, to set a time cursor or select a time
stretch, and to listen to the parts of the sound that
you are viewing or selecting. You can easily query all
the important properties of the analyses, e.g. obtain
the average pitch value inside the selected time
stretch. You can turn the analyses into separate
objects (independent from the original sound), which
is handy for further processing, e.g. it allows the
pitch contour to be saved, printed, or converted into
something else.
2. Annotating speech with P
RAATRAAT
P
RAATRAAT
is used by many linguists (phoneticians,
phonologists, syntacticians) to label and segment their
speech recordings. You can make transcriptions and
Figure 1. Praat's sound window.
Figure 2. Praat's annotation window.
Goodies Glot International, Volume 5, Number 9/10, November/December 2001 342
annotations on multiple levels simultaneously (see the
three levels in ®gure 2), in a window that typically
also shows visible representations of the sound, the
spectrogram, and perhaps the pitch contour. P
RAATRAAT
supports an easy use of special symbols in annota-
tions, including nearly all symbols de®ned by the
International Phonetic Association (such as the h
symbol, typed as ``\te'', in ®gure 2).
3. Synthesizing speech with P
RAATRAAT
P
RAATRAAT
is not a text-to-speech system: you cannot
type in an English sentence and have the program
read it aloud. But you can generate many types of
sounds with P
RAATRAAT
. First, you can use formulas to
generate simple sounds like sine waves or white
noise from scratch, or to generate more complicated
sounds from other sounds. Second, you can create
sounds from other types of data, e.g. you can turn a
pitch contour in a pulse train. Third, you can do
source-®lter synthesis: from stylized pitch, intensity,
and formant contours that you can build from
scratch, you can create speech-like sounds. Fourth,
you can perform articulatory synthesis: from a speci-
®cation of timed muscle contractions, Praat will
compute the resulting sound. Fifth, you can create
sounds from other sounds by a variety of ®ltering
and enhancement techniques.
4. Manipulating speech with P
RAATRAAT
A specialized manipulation window allows you to
stylize and modify the pitch contour of an utterance.
In ®gure 3, the impatient-sounding question ``can you
time it?'', with an original ®nal high rise, has been
converted into a slightly whining command. The
same window allows you to modify relative durations
within this utterance. In this way, you can change the
intonation and stress patterns of the utterance, which
is useful when creating stimuli for research into the
perception of prosody.
5. Graphical capabilities of P
RAATRAAT
P
RAATRAAT
comes with a separate Picture window into
which you can draw your sounds, pitch contours,
spectrograms, and any other data types. You can add
text (several fonts, many special symbols, several
sizes, any rotation), lines (several colours, any widths,
several styles), circles/ellipses/rectangles (®lled or
outlined), and several types of markers along and
inside your drawings. Figure 4, for instance, shows
the modi®ed pitch contour of ®gure 3, with appro-
priate vertical text to its left, and a comment added
above it.
The Picture window is designed for producing
publication-quality graphics for your articles and
dissertations. From this window, you can print to
any printer (PostScript, Macintosh, Windows) and
save your drawings as EPS ®les (best quality, but
works with PostScript printers and PDF creators
only), WMF ®les (Windows), or PICT ®les (Macin-
tosh). All of these can be easily imported into your
word processor. The Macintosh and Windows ver-
sions support the graphical clipboard as well, so that
you can use simple copy-and-paste to move P
RAATRAAT
Figure 3. Praat's manipulation window.
Goodies Glot International, Volume 5, Number 9/10, November/December 2001 343
pictures to your word processor, if you have no use
for PostScript quality.
6. The P
RAATRAAT
scripting language
In most parts of the world, slavery was abolished in
the 19th century, but it is not unusual to see
phoneticians measure the pitch values of 1500
vowels by hand. You would probably want to
replace such work by an automated procedure that,
say, loops over all the sound ®les that reside in a
certain directory or over all the segments marked
``u'' in a vowel annotation. Such things can easily be
performed by the P
RAATRAAT
scripting language, which
is a general-purpose programming language with
special capabilities for simulating menu choices and
button presses in the P
RAATRAAT
program. Many people
use this language for all their analyses, tabulations,
statistics (there are special functions for computing
levels of signi®cance in t,v
2
,orFtests), and
complicated pictures. In fact, you can use P
RAATRAAT
as a general drawing program: ®gure 4 shows a
P
RAATRAAT
script that draws the complicated ®gure at
the top of the Picture window.
7. Other features of P
RAATRAAT
The P
RAATRAAT
program contains several possibilities in
areas that are only remotely connected to phonetics.
Phonologists and syntacticians like its implementa-
tion of Optimality-Theoretic learning (constraint
demotion, gradual learning algorithm, robust inter-
pretive parsing), which you can apply to your
own cases. Other possibilities include neural-net
modelling and extensive high-level statistics (prin-
cipal-component analysis, discriminant analysis,
multidimensional scaling).
Figure 4. Praat's script and picture windows.
Figure 5. Praat's manual window.
Goodies Glot International, Volume 5, Number 9/10, November/December 2001 344
8. The P
RAATRAAT
manual
P
RAATRAAT
comes with an extensive tutorial, which you
can start by choosing ``Introduction to P
RAATRAAT
'' from
the ``Help'' menu. The entire reference manual is
contained in the program as well and consists of
about 800 pages that are connected via hyperlinks (see
®gure 5). Help buttons are available in most windows
and dialog boxes, and clicking them will take you into
the part of the manual that is most appropriate in the
current context.
9. Why P
RAATRAAT
?
You will want to choose P
RAATRAAT
for most of your
phonetic research not only because it is the most
complete program available (it contains much more
than could be discussed here), or because it is
distributed for free, but also because it comes with
the ®nest algorithms. The pitch analysis algorithm is
the most accurate in the world; the articulatory
synthesis is the only one that can handle dynamic
length changes (ejectives), non-glottal myo-eleastics
(trills), and sucking effects (clicks, implosives); and
the gradual learning algorithm is the only linguistic-
ally-oriented learning algorithm that can handle free
variation. But of course, there will always be things
related to phonetics that other programs are better at.
For your convenience, P
RAATRAAT
has therefore been
designed to interface reasonably well with Matlab,
SPSS, Excel, and the Klatt synthesizer.
10. How to get the P
RAATRAAT
program
You can get the P
RAATRAAT
program through its web site,
www.praat.org. By writing an e-mail message to the
®rst author, you obtain a free licence to download all
current and future versions of the program, install as
many copies as you like on as many computers as you
like, and use the program for any legal purpose at
your work, at home, and in the ®eld. You will also be
informed about major updates of the program, which
appear approximately twice a year. The source code
of the P
RAATRAAT
program is distributed under the
General Public Licence.
A user's comments on P
RAATRAAT
By Vincent J. van Heuven
Introduction
P
RAATRAAT
is probably the most comprehensive toolbox
for phonetic research available worldwide, and it is
certainly the most affordable; it actually costs no
money at all. In fact, it is so diverse that I have never
met anyone ± apart from its authors ± who could
claim to have experience with all the modules that the
program contains. I for one will have to limit the
present appraisal to just those few modules that my
co-workers and I have used in our laboratory. More-
over, P
RAATRAAT
rejuvenates at an alarming rate. The
release that I am currently using is version 3.9.36
running on the Windows NT platform.
P
RAATRAAT
started out as a collection of programs that
were speci®cally designed to produce top-quality
graphic representations of speech, i.e. oscillograms,
spectra, spectrograms, fundamental frequency and
intensity plots, etc. However, the ¯exible and well-
planned structure of the program allowed its maker(s)
to extend P
RAATRAAT
's functionality almost inde®nitely.
Often, the same tasks can be done by P
RAATRAAT
using
different modules with different algorithms. Pitch
extraction, for example, can be done with the aid of at
least four different algorithms: autocorrelation, cross-
correlation, SPINET, and subharmonic summation.
Help ®les are available for each of the algorithms,
explaining the meaning of the many parameter values
that can be speci®ed in for each algorithm and
providing references to the literature. Each algorithm
comes with a set of default parameter settings that can
be overriden by the user. Also, there is an unmarked
algorithm (which turns out to be an autocorrelation
technique) that allows no special tuning.
In all, there would seem to very little that P
RAATRAAT
cannot do for you. However, some things can be done
instantaneously, other tasks can be performed only in
non-obvious ways that the novice user will never
discover by himself. Fortunately, the makers of
P
RAATRAAT
take great pride in their product, and are
willing to answer queries from the ¯oor 24 hours a
day, or so it seems, again at no cost.
It should be pointed out that P
RAATRAAT
is not a self-
study course in experimental/instrumental phonetics.
To be true, a detailed on-line technical reference
manual is included with the program, but it generally
does not discuss the pros and cons of alternative
approaches/solutions to speech analysis problems.
The user must decide on his own which algorithm
will suit his purposes best. In this respect P
RAATRAAT
is
not unlike the magic broom that takes off with the
sorcerer's apprentice. The general advice would be:
do not try this at home, and always consult your local
phonetician.
Multipanel editors
A recent development seems to have been toward
providing smorgasbord-like complex presentations
which display speech parameters as a function of
time in multiple synchronized panels. Two such
complex editors are provided.
1. The ®rst is the basic waveform editor (which is
invoked by a Sound object), which can be tailored to
the user's taste. It allows for simultaneous display of
the waveform, spectrogram, formant tracks (in red), a
pitch curve (blue) and an intensity curve (yellow), all
superimposed on the spectrogram. Each of the ®ve
displays can be switched on/off, scales can be
adjusted for optimal visual resolution, there is a
(limited) choice of algorithms that can be invoked for
each display, and parameter settings can be chosen
Goodies Glot International, Volume 5, Number 9/10, November/December 2001 345
independently for each display. Values can be eye-
balled and read out under cursor control; digital
readouts can be obtained through data queries. The
edit functions allow cut, copy and paste, zero, and
time-reverse. The parameter tracks can be extracted
from each display and stored separately.
2. The second is the editor that is used for
Manipulation objects. The waveform is displayed
together with a pitch track (default pitch determin-
ation algorithm) and a relative duration parameter. In
the waveform the moments of glottal closure are
indicated by vertical blue lines. The corresponding
pitch-synchronous frequency value is displayed in
light gray in the pitch manipulation display. Pres-
ence/absence and location of glottal pulses can be
manipulated. Also the user can stylize the pitch curve
and/or change the pitch curve in any way he wants.
Similarly, time intervals can be selected and given
different relative durations. This allows portions of
the utterance to be stretched or compressed in time.
After manipulation the sound can be resynthesized
using two different analysis-resynthesis schemes:
a. PSOLA resynthesis: a relatively simple waveform
manipulation technique that affords the manipulation
of pitch and duration but detracts very little from the
original sound quality.
b. LPC resynthesis: a statistical data reduction
technique that generally leads to considerable loss of
sound quality but affords ± in principle ± the
manipulation not only of prosodic parameters (pitch
and duration) but also of spectral parameters (sound
quality or timbre). Unfortunately, the display and
manipulation (smoothing, stylization, frequency
shift) of spectral parameters (formant tracks) is not
implemented in the manipulation editor, nor are
these functions easily available elsewhere in the
package.
It should be doable, in principle, to reduce the two
editors to just one generalized editor that allows the
display, interactive measurement and manipulation
of all the relevant properties of the speech signal. The
manipulable properties should include the intensity
curve. This parameter is currently displayed in the
waveform editor (optionally) but cannot be manipu-
lated.
Additional displays
Cochleagrams. Hidden further down the hierarchy of
P
RAATRAAT
functions are the possibilities to create audit-
ory spectrograms (or cochleagrams). As an option
with the cochleagram the loudness (expressed in
Sones) of a time-slice can be queried. It is not possible,
in its present state, to instruct P
RAATRAAT
to produce a
loudness trace as a function of time (although the user
can generate and print such a contour, using the built-
in programming language).
Vowel diagrams. It is also possible to plot a vowel,
or even a series of vowels, as points in a vowel
diagram, i.e. a two-dimensional graph plotting the
®rst formant frequency F1 against the second form-
ant frequency F2. Optionally, dispersion ellipses can
be drawn around the scatter clouds of vowel points
in the F1-by-F2 display. Such plotting facilities are
also provided by the ± expensive ± Kay Compu-
terized Speech Lab (CSL) package. Using the
annotation tools incorporated in P
RAATRAAT
, beautiful
print-quality vowel diagrams can be produced. For
teaching purposes it would be attractive if this
display could also be used as part of a user interface
to generate vowel sounds by moving the cursor
around in the display (interactively or from pre-
de®ned custom-made trajectories), using LPC syn-
thesis. The authors at one time promised that this
facility would be made available but I have not seen
it (yet). As far as I know, there is no interactive
software around that can do this sort of vowel
synthesis (although the Vowel Hunter program
developed at the Phonetics Laboratory of Bonn
University, Germany, comes close). Also, there is
the talking vowel diagram provided on the Speech
Production and Perception I CD-ROM issued by
Sensimetrics. However, this product does not pro-
vide for on-line vowel synthesis; it just plays a fairly
small number of pre-stored vowel waveforms.
Scripting language
P
RAATRAAT
comes with a full programming language
which can be used to create script that can be run in
batch mode, allowing the user to analyze large
quantities of data automatically ± with or without
user intervention, and to store measurements in a
database for off-line statistical data analysis using such
packages as SPSS. P
RAATRAAT
scripts can be programmed
from scratch or the user can build upon a basic script
that is generated by the P
RAATRAAT
macro-recorder.
P
RAATRAAT
keeps a log of any button pressed or keystroke
entered during the interactive session. At any moment
the session's history can be loaded into a text editor
and used as a starting point for a program.
Using the programming tool, the user can extend
P
RAATRAAT
any way he likes, de®ning new functions and
making these easily accessible in the P
RAATRAAT
user
interface as optional buttons. Any user with a basic
grasp of computer programming will be able to
construct P
RAATRAAT
scripts. The P
RAATRAAT
interactive man-
ual provides lots of sample scripts to give the novice a
basic feel of how to go about generating scripts.
P
RAATRAAT
as a sound generator and teaching tool
For the teaching of basic acoustics ± often a tough
subject for undergraduate language students with a
non-technical background ± P
RAATRAAT
provides a com-
plex tone generator with very limited possibilities. To
be true, P
RAATRAAT
also allows the user to de®ne any
waveform by typing in and/or editing full formulae
such as:
1=2* sin2*pi*377*xrandomGauss0;0:1;
Goodies Glot International, Volume 5, Number 9/10, November/December 2001 346
which generates a 377-Hz sine wave with some white
noise superimposed, but this is not an option for the
beginner. It would therefore be more fun if the simple
tone generator could be extended such that the user
could interactively set and adjust the fundamental
and the intensities of a number of harmonics in a
spectral display, observe the effects of the spectral
adjustments in a waveform display and listen to it, all
at the same time. Conversely, it would be ideal if
some pre-stored or external sound (either from a tape
recording of from a live microphone) could be
simultaneously displayed in an on-line fashion as a
waveform and as a spectrum. Older (UNIX) versions
of the GIPOS speech processing package developed at
the former Institute of Perception Research at the
Technical University of Eindhoven, The Netherlands,
contained such a facility as a goody, but it is no longer
included with the Windows edition of GIPOS.
Especially amusing and instructive is the function
for generating Shepard tone spirals. This is a complex
tone signal with a pitch that seems to be continually
rising, without getting anywhere.
Conclusion
In summary, P
RAATRAAT
is a formidable research and
teaching tool for phonetics. This report has not done
justice to its makers in two respects: ®rst, it singled out
only a small part of P
RAATRAAT
's many possibilities, and
second, it put undue emphasis on thing P
RAATRAAT
cannot
(yet) do. I end this review by reinstating that P
RAATRAAT
is
unrivalled as a general purpose speech analysis tool.
Goodies Glot International, Volume 5, Number 9/10, November/December 2001 347
... The automatic estimation of formant contours from speech signals requires sophisticated formant tracking techniques. Given the complexity of formant tracking, this task presents a considerable engineering challenge and has driven the development of numerous methodologies over recent decades [11]- [15]. Typically, these methodologies involve two main stages. ...
... The first stage, known as estimation, computes preliminary formant values over short time-segments (e.g., 25 ms) using techniques such as linear prediction (LP) [16] or cepstral analysis [17]. The second stage, tracking, integrates the formant estimates from individual frames into continuous contours that span longer speech units, such as syllables, words, or phrases [11], [12]. Some advanced methods merge these two stages, using an initial model of the vocal tract system to estimate and track formants simultaneously [13], [14]. ...
... One key characteristic of model-driven LP-based trackers is that they generate formant estimates directly from the speech signal without requiring prior training on formant data. Notable model-driven methods include autocorrelation and covariance-based LP techniques [11], [12], along with closed-phase (CP) analysis, which improves formant estimation accuracy by excluding speech samples from the open phase of the glottal cycle, thereby reducing the influence of tracheal effects [19]. Further advancements in model-driven formant estimation involve the use of weighted linear prediction (WLP), where temporal weighting is applied to the LP prediction error, increasing robustness against noise and high fundamental frequencies [20]- [22]. ...
Article
Full-text available
Formant tracking is an area of speech science that has recently undergone a technology shift from classical model-driven signal processing methods to modern data-driven deep learning methods. In this study, these two domains are combined in formant tracking by refining the formants estimated by a data-driven deep neural network (DNN) with formant estimates given by a model-driven linear prediction (LP) method. In the refinement process, the three lowest formants, initially estimated by the DNN-based method, are frame-wise replaced with local spectral peaks identified by the LP method. The LP-based refinement stage can be seamlessly integrated into the DNN without any training. As an LP method, the study advocates the use of quasi-closed phase forward-backward (QCP-FB) analysis. Three spectral representations are compared as DNN inputs: mel-frequency cepstral coefficients (MFCCs), the spectrogram, and the complex spectrogram. Formant tracking performance was evaluated by comparing the proposed refined DNN tracker with seven reference trackers, which included both signal processing and deep learning based methods. As evaluation data, ground truth formants of the Vocal Tract Resonance (VTR) corpus were used. The results demonstrate that the refined DNN trackers outperformed all conventional trackers. The best results were obtained by using the MFCC input for the DNN. The proposed MFCC refinement (MFCC-DNNQCP-FB) reduced estimation errors by 0.8 Hz, 12.9 Hz, and 11.7 Hz for the first (F1), second (F2), and third (F3) formants, respectively, compared to the Deep Formants refinement (DeepFQCP-FB). When compared to the model-driven KARMA tracking method, the proposed refinement reduced estimation errors by 2.3 Hz, 55.5 Hz, and 143.4 Hz for F1, F2, and F3, respectively. A detailed evaluation across various phonetic categories and gender groups showed that the proposed hybrid refinement approach improves formant-tracking performance across most test conditions.
... Os áudios obtidos das gravações foram inseridos no programa Praat (versão 6.1.51) (Boersma;Weenink, 2021), no qual os dados foram classificados em clara presença de vogais postônicas mediais e em aparente ausência dessas vogais. Essa classificação se deu por inspeção visual do oscilograma e do espectrograma. ...
... Os áudios obtidos das gravações foram inseridos no programa Praat (versão 6.1.51) (Boersma;Weenink, 2021), no qual os dados foram classificados em clara presença de vogais postônicas mediais e em aparente ausência dessas vogais. Essa classificação se deu por inspeção visual do oscilograma e do espectrograma. ...
Article
Full-text available
This study has two goals: (i) to identify possible factors favoring the apparent absence of non-final post-stressed vowels in antepenultimate stress words (when there are not evident visual cues in the oscillogram and spectrogram) and (ii) to verify possible evidences in favor of the hypothesis that cases of apparent absence of non-final post-stressed vowels with fricative consonants preceding these vowels can be interpreted as realizations of non-final post-stressed vowels totally covered by the noise of the consonants. To fulfil the goals, the Articulatory Phonology framework was used, and speech data were collected from two experiments of oral word production. The experiments were applied to native speakers from the Paulista town of São Carlos.
... Também não é possível tratar tais segmentos como uma vogal nasalizada, dado que uma porção dos dados do nosso corpus de palavras com a nasal inicial não apresentou indícios para essa assunção. No caso de tais sequências, realizamos, em caráter exploratório, verificações acústicas com o software Praat -version 6.2.20 (Boersma;Weenink, 2022). Um exemplo da falta de pistas acústicas que suportem a presença de uma vogal no contexto referido são os itens ngana [ŋɡa.ˈna] ...
... Também não é possível tratar tais segmentos como uma vogal nasalizada, dado que uma porção dos dados do nosso corpus de palavras com a nasal inicial não apresentou indícios para essa assunção. No caso de tais sequências, realizamos, em caráter exploratório, verificações acústicas com o software Praat -version 6.2.20 (Boersma;Weenink, 2022). Um exemplo da falta de pistas acústicas que suportem a presença de uma vogal no contexto referido são os itens ngana [ŋɡa.ˈna] ...
Article
Full-text available
Angolar is one of the autochthonous languages of the Republic of São Tomé and Príncipe. This study aims to describe the syllable onset in Angolar, based on Syllable Theory (Selkirk, 1982). As for the corpus, we analyzed 1524 items collected in 2014 and in 2018 in São Tomé and Príncipe, based on recordings with bilinguals (speakers of Angolar as their mother tongue and Portuguese as their second language). Using the Dekereke software (Casali, 2022), we detected the following structures: V, CV, CGV, CCV, CVC, VC and VG. In syllable onset, one or two consonants are legal. This can generate a simple onset (C) or a complex C1C2 sequence with the following configurations: 1) C1 may be an obstruent (b, p, t, d, k, g, f, v) followed by C2 (l, ɾ); 2) C1 may be a consonant (b, p, t, d, k, g, f, v, s) followed by G (w, j). Word-initial nasals followed by obstruents were considered pre-nasalized stops, consisting of a single contour segment in onset position. Furthermore, we highlight that syllable onsets in Angolar follow the Sonority Sequence Principle, as its elements increase in sonority up to the syllable nucleus: C1C2 (0 > 2); and C1G (0 >3).
... Vocal intensity is perceptually correlated with perceived loudness, a common deficit in hypokinetic dysarthria. Mean SPL for each utterance was obtained through a Praat script (Boersma & Weenink, 2008). Utterances were delimited by pauses of 150 ms or greater. ...
Preprint
Telehealth is increasing popular as a treatment option for people with Parkinson disease (PD). The SpeechVive device is a wearable device that uses the Lombard effect to help patients speak more loudly, slowly, and clearly. This study sought to examine the effectiveness of the device to improve communication in people with PD, delivered over a telehealth modality as compared to in-person, using implementation science design. 66 people with PD were enrolled for 12 weeks with 34 choosing the in-person group and 32 in the telehealth group. Participants were assessed pre-, mid-, and post-treatment. Participants produced continuous speech samples on and off the device at each timepoint. Sound pressure level (SPL), utterance length, pause frequency, and total pause duration were measured. Psychosocial surveys were administered to evaluate the effects of treatment on depression, self-efficacy, and participation. The in-person group increased SPL when wearing the device while the telehealth group did not. Both groups paused less often while wearing the device. Utterance length increased post-treatment for the telehealth group, but not for the in-person group. An increase in communication participation ratings in the telehealth group, but not the in-person group, was the only significant change in the psychosocial metrics. The in-person group showed similar treatment effects as previous studies. The device was not as effective in the telehealth group. One limitation was data loss due to recording issues that impacted the telehealth group more than the in-person group.
... The recordings are analyzed using Praat (Boersma & Weenink, 2024). The analysis primarily involves perceptual judgments made by the researchers, with an inter-rater reliability rate of 96%. ...
Article
Full-text available
This study investigates the different realizations of TH sounds in Brunei English among 31 local university students based on their pronunciations in a recorded reading task and five-minute interview. The study also attempts to compare the variations in TH sound realizations between female and male speakers to identify gender-based differences. The paper aims to discuss these variations by comparing its findings to previous studies of TH sounds in Brunei. The findings mainly support previous studies on TH sounds, particularly for initial voiced TH and medial and final TH sounds. However, the more remarkable occurrences of dental fricative [θ] in initial voiceless TH in tokens such as threaten and third contrasts with earlier reports stating more use of plosive [t] instead. Also, it is unclear from our data whether female speakers are leading in linguistic changes as there is insufficient data. Hopefully, this study will contribute to discussing sociolinguistic trends and the defining features of Brunei English to distinguish it from other varieties, such as Malaysian and Singapore English.
... Access to the audio files of the LLCs, and therefore to the prosodic information of what-constructions, proved useful in identifying the reactive what-x construction and excluding superficially similar constructions like the pragmatic marker what and split questions. Thus all the what-constructions that exhibited the syntactic structure of the reactive what-x construction were subject to close auditory and instrumental analysis, the latter of which was conducted in the phonetics software Praat (Boersma 2001). Only those constructions where what was realized as an unaccented pre-head of the tone unit, with a nuclear pitch accent somewhere in the following element, were included. ...
Article
Full-text available
We suggest a novel theoretical analysis of what is known as the reactive what-x construction. This construction, which has recently been noticed and described in Põldvere & Paradis (2019, 2020), has primarily clarificational properties and requires the presence of an antecedent in the preceding context. We begin by summarizing its syntactic properties and main functions, based on data drawn from the London–Lund Corpora of spoken British English, and then address a pattern that has escaped notice thus far, i.e. that the majority of the instances of this construction feature a type of ellipsis known as fragments. Departing from the analysis articulated in Põldvere & Paradis (2020), we present one that captures the elliptical properties of the reactive what-x construction by assimilating it to two classes of fragments: those serving as reprise utterances and those serving as direct utterances. Our analysis relies on Ginzburg & Sag's (2000) detailed analysis of reprise and direct fragments couched within a non-sententialist approach to ellipsis. This allows us to analyze the reactive what-x construction as a type of an in-situ interrogative clause whose elliptical properties are licensed by a version of the constraint Ginzburg & Sag (2000) use to license fragments.
... We also annotated whether the explicit label was produced (i.e., when the caregiver produced the name of the toy). Utterance-level coding was carried out by trained members of the research team using Praat 70 . Explicit mentions of toys' labels (i.e., the caregiver saying the word "saxophone") were extracted from the transcriptions and annotated to be temporally aligned with the utterance in a separate tier of the Praat textgrid. ...
Article
Full-text available
Communication comprises a wealth of multimodal signals (e.g., gestures, eye gaze, intonation) in addition to speech and there is a growing interest in the study of multimodal language by psychologists, linguists, neuroscientists and computer scientists. The ECOLANG corpus provides audiovisual recordings and ELAN annotations of multimodal behaviours (speech transcription, gesture, object manipulation, and eye gaze) by British and American English-speaking adults engaged in semi-naturalistic conversation with their child (N = 38, children 3-4 years old, face-blurred) or a familiar adult (N = 31). Speakers were asked to talk about objects to their interlocutors. We further manipulated whether the objects were familiar or novel to the interlocutor and whether the objects could be seen and manipulated (present or absent) during the conversation. These conditions reflect common interaction scenarios in real-world communication. Thus, ECOLANG provides ecologically-valid data about the distribution and co-occurrence of multimodal signals across these conditions for cognitive scientists and neuroscientists interested in addressing questions concerning real-world language acquisition, production and comprehension, and for computer scientists to develop multimodal language models and more human-like artificial agents.
Article
Observing lip movements of a speaker facilitates speech understanding, especially in challenging listening situations. Converging evidence from neuroscientific studies shows stronger neural responses to audiovisual stimuli compared to audio-only stimuli. However, the interindividual variability of this contribution of lip movement information and its consequences on behavior are unknown. We analyzed source-localized magnetoencephalographic (MEG) responses from 29 normal-hearing participants (12 female) listening to audiovisual speech, both with and without the speaker wearing a surgical face mask, and in the presence or absence of a distractor speaker. Using temporal response functions (TRFs) to quantify neural speech tracking, we show that neural responses to lip movements are, in general, enhanced when speech is challenging. After controlling for speech acoustics, we show that lip movements contribute to enhanced neural speech tracking, particularly when a distractor speaker is present. However, the extent of this visual contribution to neural speech tracking varied greatly among participants. Probing the behavioral relevance, we demonstrate that individuals who show a higher contribution of lip movements in terms of neural speech tracking, show a stronger drop in comprehension and an increase in perceived difficulty when the mouth is occluded by a surgical face mask. By contrast, no effect was found when the mouth was not occluded. We provide novel insights on how the contribution of lip movements in terms of neural speech tracking varies among individuals and its behavioral relevance, revealing negative consequences when visual speech is absent. Our results also offer potential implications for objective assessments of audiovisual speech perception.
Article
Focusing on a single source within a complex auditory scene is challenging. M/EEG-based auditory attention detection (AAD) allows to detect which stream an individual is attending to within a set of multiple concurrent streams. The high interindividual variability in the auditory attention detection performance often is attributed to physiological factors and signal-to-noise ratio of neural data. We hypothesize that executive functions—in particular sustained attention, working memory, and attentional inhibition—may partly explain the variability in auditory attention detection performance, because they support the cognitive processes required when listening to complex auditory scenes. We chose a particularly challenging auditory scene by presenting dichotically polyphonic classical piano excerpts that lasted 1 min each. Two different excerpts were presented simultaneously, one in each ear. Forty-one participants, with different degrees of musical expertise, listened to these complex auditory scenes focusing on one ear while we recorded the EEG. Participants also completed several tasks assessing executive functions. As expected, EEG-based auditory attention detection was greater for attended than unattended stimuli. Importantly, attentional inhibition ability did explain 6% of the reconstruction accuracy and 8% of the classification accuracy. No other executive function was a significant predictor of reconstruction or classification accuracies. No clear effect of musical expertise was found on reconstruction and classification performance. In conclusion, cognitive factors seem to impact the robustness of the neural auditory representation and hence the performance of EEG-based decoding approaches. Taking advantage of this relation could be useful to improve next-generation hearing aids.
ResearchGate has not been able to resolve any references for this publication.