Speak and unSpeak with P
By Paul Boersma and Vincent van Heuven
By the Goodies Editor, Rob Goedemans
Many linguists use recorded speech in their research.
In descriptive work, visual representations of such
recordings (mostly oscillograms) are often annotated
with IPA symbols and other labels, and then used to
illustrate a phenomenon or defend a certain position
regarding the nature of some phonetic or phonologi-
cal property of the language in question. In phonetic
and psychophysical research some parameter of the
recorded speech (like tempo or intensity) is often
altered, after which the new sound thus obtained is
used in an experiment to test the sensitivity of the
human ear, or brain, to certain speech properties.
The introduction of the computer has brought
about a virtual revolution in the linguistic sciences
with respect to the usage of speech recordings. A lab
full of cumbersome machinery has now been replaced
by one PC, Mac or workstation, on which anyone who
puts his mind to it can record, annotate and modify
speech with some simple commands or a few
mouseclicks. Even the calculation of some speech
parameters that were rather complicated to obtain in
the past (like pitch and spectral analysis) but fre-
quently used in phonetic research nonetheless, are
now often just one or two mouseclicks away.
As a result, a growing number of colleagues ®nd
use for ®les with speech sounds in their linguistic
explorations. The needs of this group are served by a
rather small number of software packages designed
for the representation, annotation and analysis of
speech (and much more in many cases). In my
opinion, one of these stands out in many ways. It is
called ``P
''; the imperative form of to speak in
Dutch. Since this package is rapidly gaining in
popularity, we have decided to devote some attention
to it in this issue. First, one of the authors, Paul
Boersma, introduces the package and outlines its
impressive functionality. Then, an experienced user,
Vincent van Heuven, highlights some of the advan-
tages and disadvantages of using P
in everyday
phonetic research.
It is my sincere hope that these two goodies will
convince even more linguists to download P
and experiment with it a little. They will see that
incorporating, for example, some oscillograms of
minimal pairs in their work is as easy as ABC. Their
publications will undoubtedly be the better (and the
livelier) for it.
is a computer program for analysing,
synthesizing, and manipulating speech. It has been
developed since 1992 by Paul Boersma and David
Weenink at the Institute of Phonetic Sciences of the
University of Amsterdam. There are versions for most
of the common operating systems: Macintosh, Win-
dows, Linux, and several Unix workstations (Solaris,
Silicon Graphics, Hewlett-Packard). By September
2001, there were more than 5,000 registered users in
99 countries.
, a system for doing phonetics
by computer
By Paul Boersma
1. Analysing speech with P
allows you to record a sound with your
microphone or any other audio input device, or to
read a sound from a sound ®le on disk. You will then
be able to have a look `inside' this sound. The upper
half of the sound window (see ®gure 1) will show you
a visible representation of the sound (the wave form).
The lower half will show you several acoustic analy-
ses: the spectrogram (a representation of the amount of
high and low frequencies available in the signal) is
painted in shades of grey; the pitch contour (the
frequency of periodicity) is drawn as a cyan curve;
and formant contours (the main constituents of the
spectrogram) are plotted as red dots.
is most often used with speech sounds, in
which case the pitch contour is associated with the
vibration of the vocal folds and the formant contours
are associated with resonances in the vocal tract. But
the use of P
is certainly not limited to speech
sounds: musicians and bio-acousticians use it for the
analysis of sounds produced by ¯utes, drums, crick-
ets, or whales, and the interpretation of the three
analyses will change accordingly.
The Sound window allows you to zoom in for
more detail, to scroll to the places that you are
Paul Boersma, Institute of Phonetic Sciences, University of
Amsterdam, Herengracht 338, 1016 CG Amsterdam,
The Netherlands,
Vincent van Heuven, Universiteit Leiden Centre for Linguistics
(ULCL), P.O. Box 9515, 2300 RA Leiden, The Netherlands,
interested in, to set a time cursor or select a time
stretch, and to listen to the parts of the sound that
you are viewing or selecting. You can easily query all
the important properties of the analyses, e.g. obtain
the average pitch value inside the selected time
stretch. You can turn the analyses into separate
objects (independent from the original sound), which
is handy for further processing, e.g. it allows the
pitch contour to be saved, printed, or converted into
something else.
2. Annotating speech with P
is used by many linguists (phoneticians,
phonologists, syntacticians) to label and segment their
speech recordings. You can make transcriptions and
Figure 1. Praat's sound window.
Figure 2. Praat's annotation window.
annotations on multiple levels simultaneously (see the
three levels in ®gure 2), in a window that typically
also shows visible representations of the sound, the
spectrogram, and perhaps the pitch contour. P
supports an easy use of special symbols in annota-
tions, including nearly all symbols de®ned by the
International Phonetic Association (such as the h
symbol, typed as ``\te'', in ®gure 2).
3. Synthesizing speech with P
is not a text-to-speech system: you cannot
type in an English sentence and have the program
read it aloud. But you can generate many types of
sounds with P
. First, you can use formulas to
generate simple sounds like sine waves or white
noise from scratch, or to generate more complicated
sounds from other sounds. Second, you can create
sounds from other types of data, e.g. you can turn a
pitch contour in a pulse train. Third, you can do
source-®lter synthesis: from stylized pitch, intensity,
and formant contours that you can build from
scratch, you can create speech-like sounds. Fourth,
you can perform articulatory synthesis: from a speci-
®cation of timed muscle contractions, Praat will
compute the resulting sound. Fifth, you can create
sounds from other sounds by a variety of ®ltering
and enhancement techniques.
4. Manipulating speech with P
A specialized manipulation window allows you to
stylize and modify the pitch contour of an utterance.
In ®gure 3, the impatient-sounding question ``can you
time it?'', with an original ®nal high rise, has been
converted into a slightly whining command. The
same window allows you to modify relative durations
within this utterance. In this way, you can change the
intonation and stress patterns of the utterance, which
is useful when creating stimuli for research into the
perception of prosody.
5. Graphical capabilities of P
comes with a separate Picture window into
which you can draw your sounds, pitch contours,
spectrograms, and any other data types. You can add
text (several fonts, many special symbols, several
sizes, any rotation), lines (several colours, any widths,
several styles), circles/ellipses/rectangles (®lled or
outlined), and several types of markers along and
inside your drawings. Figure 4, for instance, shows
the modi®ed pitch contour of ®gure 3, with appro-
priate vertical text to its left, and a comment added
above it.
The Picture window is designed for producing
publication-quality graphics for your articles and
dissertations. From this window, you can print to
any printer (PostScript, Macintosh, Windows) and
save your drawings as EPS ®les (best quality, but
works with PostScript printers and PDF creators
only), WMF ®les (Windows), or PICT ®les (Macin-
tosh). All of these can be easily imported into your
word processor. The Macintosh and Windows ver-
sions support the graphical clipboard as well, so that
you can use simple copy-and-paste to move P
Figure 3. Praat's manipulation window.
pictures to your word processor, if you have no use
for PostScript quality.
6. The P
scripting language
In most parts of the world, slavery was abolished in
the 19th century, but it is not unusual to see
phoneticians measure the pitch values of 1500
vowels by hand. You would probably want to
replace such work by an automated procedure that,
say, loops over all the sound ®les that reside in a
certain directory or over all the segments marked
``u'' in a vowel annotation. Such things can easily be
performed by the P
scripting language, which
is a general-purpose programming language with
special capabilities for simulating menu choices and
button presses in the P
program. Many people
use this language for all their analyses, tabulations,
statistics (there are special functions for computing
levels of signi®cance in t,v
,orFtests), and
complicated pictures. In fact, you can use P
as a general drawing program: ®gure 4 shows a
script that draws the complicated ®gure at
the top of the Picture window.
7. Other features of P
The P
program contains several possibilities in
areas that are only remotely connected to phonetics.
Phonologists and syntacticians like its implementa-
tion of Optimality-Theoretic learning (constraint
demotion, gradual learning algorithm, robust inter-
pretive parsing), which you can apply to your
own cases. Other possibilities include neural-net
modelling and extensive high-level statistics (prin-
cipal-component analysis, discriminant analysis,
multidimensional scaling).
Figure 4. Praat's script and picture windows.
Figure 5. Praat's manual window.
8. The P
comes with an extensive tutorial, which you
can start by choosing ``Introduction to P
'' from
the ``Help'' menu. The entire reference manual is
contained in the program as well and consists of
about 800 pages that are connected via hyperlinks (see
®gure 5). Help buttons are available in most windows
and dialog boxes, and clicking them will take you into
the part of the manual that is most appropriate in the
current context.
9. Why P
You will want to choose P
for most of your
phonetic research not only because it is the most
complete program available (it contains much more
than could be discussed here), or because it is
distributed for free, but also because it comes with
the ®nest algorithms. The pitch analysis algorithm is
the most accurate in the world; the articulatory
synthesis is the only one that can handle dynamic
length changes (ejectives), non-glottal myo-eleastics
(trills), and sucking effects (clicks, implosives); and
the gradual learning algorithm is the only linguistic-
ally-oriented learning algorithm that can handle free
variation. But of course, there will always be things
related to phonetics that other programs are better at.
For your convenience, P
has therefore been
designed to interface reasonably well with Matlab,
SPSS, Excel, and the Klatt synthesizer.
10. How to get the P
You can get the P
program through its web site, By writing an e-mail message to the
®rst author, you obtain a free licence to download all
current and future versions of the program, install as
many copies as you like on as many computers as you
like, and use the program for any legal purpose at
your work, at home, and in the ®eld. You will also be
informed about major updates of the program, which
appear approximately twice a year. The source code
of the P
program is distributed under the
General Public Licence.
A user's comments on P
By Vincent J. van Heuven
is probably the most comprehensive toolbox
for phonetic research available worldwide, and it is
certainly the most affordable; it actually costs no
money at all. In fact, it is so diverse that I have never
met anyone ± apart from its authors ± who could
claim to have experience with all the modules that the
program contains. I for one will have to limit the
present appraisal to just those few modules that my
co-workers and I have used in our laboratory. More-
over, P
rejuvenates at an alarming rate. The
release that I am currently using is version 3.9.36
running on the Windows NT platform.
started out as a collection of programs that
were speci®cally designed to produce top-quality
graphic representations of speech, i.e. oscillograms,
spectra, spectrograms, fundamental frequency and
intensity plots, etc. However, the ¯exible and well-
planned structure of the program allowed its maker(s)
to extend P
's functionality almost inde®nitely.
Often, the same tasks can be done by P
different modules with different algorithms. Pitch
extraction, for example, can be done with the aid of at
least four different algorithms: autocorrelation, cross-
correlation, SPINET, and subharmonic summation.
Help ®les are available for each of the algorithms,
explaining the meaning of the many parameter values
that can be speci®ed in for each algorithm and
providing references to the literature. Each algorithm
comes with a set of default parameter settings that can
be overriden by the user. Also, there is an unmarked
algorithm (which turns out to be an autocorrelation
technique) that allows no special tuning.
In all, there would seem to very little that P
cannot do for you. However, some things can be done
instantaneously, other tasks can be performed only in
non-obvious ways that the novice user will never
discover by himself. Fortunately, the makers of
take great pride in their product, and are
willing to answer queries from the ¯oor 24 hours a
day, or so it seems, again at no cost.
It should be pointed out that P
is not a self-
study course in experimental/instrumental phonetics.
To be true, a detailed on-line technical reference
manual is included with the program, but it generally
does not discuss the pros and cons of alternative
approaches/solutions to speech analysis problems.
The user must decide on his own which algorithm
will suit his purposes best. In this respect P
not unlike the magic broom that takes off with the
sorcerer's apprentice. The general advice would be:
do not try this at home, and always consult your local
Multipanel editors
A recent development seems to have been toward
providing smorgasbord-like complex presentations
which display speech parameters as a function of
time in multiple synchronized panels. Two such
complex editors are provided.
1. The ®rst is the basic waveform editor (which is
invoked by a Sound object), which can be tailored to
the user's taste. It allows for simultaneous display of
the waveform, spectrogram, formant tracks (in red), a
pitch curve (blue) and an intensity curve (yellow), all
superimposed on the spectrogram. Each of the ®ve
displays can be switched on/off, scales can be
adjusted for optimal visual resolution, there is a
(limited) choice of algorithms that can be invoked for
each display, and parameter settings can be chosen
independently for each display. Values can be eye-
balled and read out under cursor control; digital
readouts can be obtained through data queries. The
edit functions allow cut, copy and paste, zero, and
time-reverse. The parameter tracks can be extracted
from each display and stored separately.
2. The second is the editor that is used for
Manipulation objects. The waveform is displayed
together with a pitch track (default pitch determin-
ation algorithm) and a relative duration parameter. In
the waveform the moments of glottal closure are
indicated by vertical blue lines. The corresponding
pitch-synchronous frequency value is displayed in
light gray in the pitch manipulation display. Pres-
ence/absence and location of glottal pulses can be
manipulated. Also the user can stylize the pitch curve
and/or change the pitch curve in any way he wants.
Similarly, time intervals can be selected and given
different relative durations. This allows portions of
the utterance to be stretched or compressed in time.
After manipulation the sound can be resynthesized
using two different analysis-resynthesis schemes:
a. PSOLA resynthesis: a relatively simple waveform
manipulation technique that affords the manipulation
of pitch and duration but detracts very little from the
original sound quality.
b. LPC resynthesis: a statistical data reduction
technique that generally leads to considerable loss of
sound quality but affords ± in principle ± the
manipulation not only of prosodic parameters (pitch
and duration) but also of spectral parameters (sound
quality or timbre). Unfortunately, the display and
manipulation (smoothing, stylization, frequency
shift) of spectral parameters (formant tracks) is not
implemented in the manipulation editor, nor are
these functions easily available elsewhere in the
It should be doable, in principle, to reduce the two
editors to just one generalized editor that allows the
display, interactive measurement and manipulation
of all the relevant properties of the speech signal. The
manipulable properties should include the intensity
curve. This parameter is currently displayed in the
waveform editor (optionally) but cannot be manipu-
Additional displays
Cochleagrams. Hidden further down the hierarchy of
functions are the possibilities to create audit-
ory spectrograms (or cochleagrams). As an option
with the cochleagram the loudness (expressed in
Sones) of a time-slice can be queried. It is not possible,
in its present state, to instruct P
to produce a
loudness trace as a function of time (although the user
can generate and print such a contour, using the built-
in programming language).
Vowel diagrams. It is also possible to plot a vowel,
or even a series of vowels, as points in a vowel
diagram, i.e. a two-dimensional graph plotting the
®rst formant frequency F1 against the second form-
ant frequency F2. Optionally, dispersion ellipses can
be drawn around the scatter clouds of vowel points
in the F1-by-F2 display. Such plotting facilities are
also provided by the ± expensive ± Kay Compu-
terized Speech Lab (CSL) package. Using the
annotation tools incorporated in P
, beautiful
print-quality vowel diagrams can be produced. For
teaching purposes it would be attractive if this
display could also be used as part of a user interface
to generate vowel sounds by moving the cursor
around in the display (interactively or from pre-
de®ned custom-made trajectories), using LPC syn-
thesis. The authors at one time promised that this
facility would be made available but I have not seen
it (yet). As far as I know, there is no interactive
software around that can do this sort of vowel
synthesis (although the Vowel Hunter program
developed at the Phonetics Laboratory of Bonn
University, Germany, comes close). Also, there is
the talking vowel diagram provided on the Speech
Production and Perception I CD-ROM issued by
Sensimetrics. However, this product does not pro-
vide for on-line vowel synthesis; it just plays a fairly
small number of pre-stored vowel waveforms.
Scripting language
comes with a full programming language
which can be used to create script that can be run in
batch mode, allowing the user to analyze large
quantities of data automatically ± with or without
user intervention, and to store measurements in a
database for off-line statistical data analysis using such
packages as SPSS. P
scripts can be programmed
from scratch or the user can build upon a basic script
that is generated by the P
keeps a log of any button pressed or keystroke
entered during the interactive session. At any moment
the session's history can be loaded into a text editor
and used as a starting point for a program.
Using the programming tool, the user can extend
any way he likes, de®ning new functions and
making these easily accessible in the P
interface as optional buttons. Any user with a basic
grasp of computer programming will be able to
construct P
scripts. The P
interactive man-
ual provides lots of sample scripts to give the novice a
basic feel of how to go about generating scripts.
as a sound generator and teaching tool
For the teaching of basic acoustics ± often a tough
subject for undergraduate language students with a
non-technical background ± P
provides a com-
plex tone generator with very limited possibilities. To
be true, P
also allows the user to de®ne any
waveform by typing in and/or editing full formulae
such as:
1=2* sin2*pi*377*xrandomGauss0;0:1;
which generates a 377-Hz sine wave with some white
noise superimposed, but this is not an option for the
beginner. It would therefore be more fun if the simple
tone generator could be extended such that the user
could interactively set and adjust the fundamental
and the intensities of a number of harmonics in a
spectral display, observe the effects of the spectral
adjustments in a waveform display and listen to it, all
at the same time. Conversely, it would be ideal if
some pre-stored or external sound (either from a tape
recording of from a live microphone) could be
simultaneously displayed in an on-line fashion as a
waveform and as a spectrum. Older (UNIX) versions
of the GIPOS speech processing package developed at
the former Institute of Perception Research at the
Technical University of Eindhoven, The Netherlands,
contained such a facility as a goody, but it is no longer
included with the Windows edition of GIPOS.
Especially amusing and instructive is the function
for generating Shepard tone spirals. This is a complex
tone signal with a pitch that seems to be continually
rising, without getting anywhere.
In summary, P
is a formidable research and
teaching tool for phonetics. This report has not done
justice to its makers in two respects: ®rst, it singled out
only a small part of P
's many possibilities, and
second, it put undue emphasis on thing P
(yet) do. I end this review by reinstating that P
unrivalled as a general purpose speech analysis tool.
