Acoustic evidence for the emergence of tonal contrast in contemporary Korean


Abstract and Figures

Acoustic evidence suggests that contemporary Seoul Korean may be developing a tonal system, which is arising in the context of a nearly completed change in how speakers use voice onset time (VOT) to mark the language's distinction among tense, lax and aspirated stops. Data from 36 native speakers of varying ages indicate that while VOT for tense stops has not changed since the 1960s, VOT differences between lax and aspirated stops have decreased, in some cases to the point of complete overlap. Concurrently, the mean F0 for words beginning with lax stops is significantly lower than the mean F0 for comparable words beginning with tense or aspirated stops. Hence the underlying contrast between lax and aspirated stops is maintained by younger speakers, but is phonetically manifested in terms of differentiated tonal melodies: laryngeally unmarked (lax) stops trigger the introduction of a default L tone, while laryngeally marked stops (aspirated and tense) introduce H, triggered by a feature specification for [stiff].
1 Introduction
Despite long-standing descriptions of standard Korean as a non-tonal and
non-accentual language (Martin 1992: 60, Sohn 1999: 48, Lee & Ramsey
2000 : 315), the data analysed for this study indicate that some self-identified
speakers of standard (Seoul) Korean employ a speech-production strategy
that includes a tonal contrast. Moreover, this tonal system has arisen in the
context of a diachronic shift in Korean’s three-member contrast among
tense, lax and aspirated obstruents : as reported by several researchers (Jun
1993, Kim 2000, Kim & Park 2001), many speakers of the standard
language – more specifically, younger speakers (Silva 2006) – no longer
mark the contrast between lax and aspirated stops in phrase-initial po-
sition with differences in voice onset time (VOT). Rather, some speakers
of Korean appear to be implementing surface-level phonation-to-tone
shifts akin to those reported for languages such as Tibetan (Duanmu 1992)
and Kammu (Svantesson & House 2006), languages in which historical/
underlying laryngeal distinctions are reinterpreted as systematic differ-
ences in fundamental frequency (F0).
As some younger speakers have
neutralised VOT differences between lax and aspirated stop phonemes
and mark this contrast tonally, formal accounts of the Korean obstruent
system need revision in the direction of feature representations that ad-
equately account for the phonetic behaviour of all speakers of the standard
variety, young and old alike. The modification proposed here involves
replacing underlying glottal aperture features (specifically [spread glottis]
and [constricted glottis]) with a more abstract laryngeal ‘ tensity feature
`la Kim 1965), [stiff]. This single feature has the advantage of
generalisability across the speaker pool : [stiff ] may be phonetically
implemented in either a more traditional way, i.e. maintaining VOT dis-
tinctions between lax and aspirated stops, or a more innovative manner,
i.e. backgrounding aspiration differences in favour of a tone-based strat-
egy for marking the underlying lax vs. aspiration distinction.
2 Background
Korean is unusual among the major languages of East Asia in at least two
ways. First, the consonant system of Korean includes a typologically rare
(if not unique) distinction among three types of voiceless obstruents : lax
(or plain), which Sohn (1999 : 154) describes as ‘ basically voiceless, with
only a minor degree of aspiration and no tenseness ; tense (or reinforced,
geminated), characterised by ‘ building up air pressure behind the closed
place of articulation’ and an unaspirated release (Sohn 1999: 154); and
aspirated, produced ‘with strong aspiration lasting about 100 ms’ (Lee &
Ramsey 2000: 62).
Second, most varieties of Korean do not employ F0
lexically. While both earlier stages of Korean and contemporary varieties of
the language have been shown to employ pitch-based accentual systems,
288 David J. Silva
the majority of Koreans speak a variety in which F0 plays no phonologi-
cally contrastive role at the word level (Sohn 1999 : 60–62, Lee & Ramsey
2000: 280, 315).
As documented in the current study, however, the phonetic im-
plementation of the phonemic contrast among lax, tense and aspirated
segments has been changing over the past two generations. According to
widely cited phonetic accounts of Korean written in the 1960s and 1970s
(Han & Weitzman 1965, 1967, Kim 1965, Hardcastle 1973, Kagaya 1974),
a key acoustic correlate associated with each stop type is voice onset time :
tense stops manifest short VOTs (in the range of 6 to 18 ms), aspirated
stops manifest long VOTs (~100–115 ms), and lax stops manifest inter-
mediate VOT values (~20–60 ms). Yet more recent acoustic studies of
the language (Silva 1992, Kim 1995, Cho 1996, Han 1996, Flege et al.
1999) have reported noticeably different VOT values for the lax and
aspirated stops; specifically, lax stops are more aspirated (~40–70 ms),
while aspirated stops are less so (~85–105 ms).
The first published mention of a possible diachronic change in VOT
values is a passing mention in Silva (1992: 157). Indirect evidence for
such a change for VOT was presented in Silva (2002), a meta-analysis of
relevant phonetic studies published between 1965 and 2000. The con-
clusions drawn from the meta-analysis led directly to the development of
the study being reported here. Preliminary accounts of the acoustic data
reported in this work first appeared in Silva et al. (2004), an initial de-
scription of observed age-related VOT changes for 14 female speakers in
the current subject pool, which verified the existence of a robust age-
related reduction of VOT values associated with the aspirated stops of the
language (VOT
). Silva (2006) provides an expanded phonetic account
of VOT behaviours for an additional 20 speakers, in which it is reported
that an observed apparent-time decrease in VOT
, coupled with a less
clear-cut rise in VOT
and a lack of any change for VOT
, points to a
pattern whereby the VOT difference between lax and aspirated stops
has decreased over time. Indeed, for many younger speakers, particularly
those born during the 1970s and 1980s, any VOT-based distinction be-
tween lax and aspirated stops has been completely neutralised. If VOT
no longer serves to distinguish lax from aspirated stops (at least for
some speakers), is there evidence of a phonemic merger ? As the data
analysed here suggest, the answer is ‘ no : rather, contrasts previously
marked primarily by differences in VOT have now been complicated
by the addition of a tonal dimension, which, for some speakers, has as-
sumed a primary role in distinguishing lax stops from their aspirated
3 Methodology
3.1 Materials
The materials developed for this study consist of experimentally con-
trolled frame sentences containing nine different target forms, each of
which is a three-syllable lexical item beginning with a ‘C-a’ sequence
(Table I). The frame took the shape i ken_-i-la-ko ha-cio This thing is
called (a) _, where the left edge of the target corresponds to the left edge
of a phonological phrase, as discussed in Silva (1992) and Yu-Cho (1990)
(and roughly consistent with an accentual phrase as characterised by Jun
1993). This frame is consistent with that used in previous research (e.g.
Silva 1992), particularly in that it locates the target forms in ‘initial po-
sition’ (a term used without further elaboration by the earliest researchers
on this topic, e.g. Han & Weitzman 1965, 1967). The use of a topic-marked
pronominal immediately to the left of the target form, i ken (Yi kes-eunYi
this’, kes thing’, -eun TOPIC), further ensures the presence of a phrasal
boundary immediately before the target. As has been amply demonstrated
in the literature on Korean phonetics and phonology, the prosodic po-
sition of a consonant segment plays a critical role in how that segment will
be realised on the surface. In phrase-initial position, for example, the
word-initial stops of the target items are expected to be realised in their
strongest (i.e. least lenited) variants : tense segments should manifest no-
ticeably long closures with very short VOTs, lax segments should mani-
fest moderate VOTs and aspirated segments should manifest very long
VOTs (Silva 1992).
Each sentence (e.g. i ken panulcililako hacio this is called needlework’)
was printed in Korean script on individual index cards. Subjects were
instructed to read the cards in random order, along with other cards
functioning as distractors (items to be used in research not related to the
‘sewing, needlework’
lax (plain)
place of articulation
‘multiple cropping’
‘grinding sound’
‘far away time/place’
‘scallion salad’
‘alien place’
labial alveolar velar
Table I
Target words employed in the study. All forms are given in Yale Romanisation.
Orthographic syllable boundaries are indicated by a period.
290 David J. Silva
current study), at a self-selected rate of speech, described as ‘normal,
comfortable reading speed’. After the subject had read each of the ran-
domly presented sentences, the cards were shuffled and the reading process
repeated. For this study, data from either three or four rounds have been
analysed, thereby yielding a total of 27 to 36 tokens for each subject.
3.2 Subjects
The data for this study were elicited from 36 adult native speakers of
Korean residing in the area of Dallas-Fort Worth, Texas, 21 females and
15 males, born between 1943 and 1982. In response to a demographic
questionnaire, all subjects reported that they were born, raised and edu-
cated in the capital region of Korea (i.e. Seoul city or Gyeonggi province),
and that they speak the standard variety. All subjects entered the United
States after the age of 18 and use Korean at home, speaking English as a
second language (primarily for business or educational purposes). No
subjects reported any difficulties in speech or hearing.
Subjects were recruited through a social networking approach and in-
vited to participate in a 20–30 minute recording session in a quiet location
of the subject’s choosing. Each session was recorded by one of two female
native speakers of Korean using a standard cassette audiotape and a lapel
microphone. The recordings were subsequently digitised (22,050 Hz,
16 bit) for acoustic analysis in Praat (version 4.2.14).
3.3 Data measurement
For each target word in the corpus, seven measurements were taken: VOT
of the word-initial stop, the F0 of the vowel in the first syllable at three
points (onset, midpoint and offset), and the averaged F0 of the vowels in
each of the target form’s three syllables. Averaged F0 values were obtained
by selecting the span of the vowel in acoustic display and allowing Praat to
automatically calculate the desired value. While employing this sort of
summary measure obscures F0 contours that might be associated with a
particular vowel, it successfully captures a more general sense of the vowel’s
pitch in relation to (a) the other two vowels in the same target word and (b)
corresponding syllables (i.e. first–second–third) in other target forms.
4 Results
4.1 VOT patterns
As predicted on the basis of previous research, while VOT values for the
tense stops have remained stable over time, there are clearly discernable
age-based differences in VOT values for lax and aspirated stops. In Fig. 1,
which (for the sake of clarity) aggregates subjects into a series of nine five-
year bands (following Labov 1994 : 60ff), we find that for younger speakers
the distances between mean VOT
and mean VOT
are smaller than
those observed for older speakers. This apparent change in behaviour is
most evident in the VOT data associated with the cohort of speakers born
in 1965 and after: for each of these bands, the difference between VOT
and VOT
diminishes considerably, with corresponding overlap in the
VOT data ranges (presented here as one standard deviation on either side
of the mean).
The changing role of VOT as a means of distinguishing lax and as-
pirated stops in Korean becomes all the more evident when one dis-
aggregates the data and calculates the mathematical difference between the
mean VOT values for lax and aspirated stops for each of the 36 subjects.
Since there is no natural pairing of individual tokens in the corpus, ex-
tracting VOT differences may only be done by determining the mean
and mean VOT
for each speaker and then calculating ‘delta-
Figure 1
Mean VOT values and range (1 standard deviation) over time. Each marker
represents the mean VOT value for speakers aggregated into five-year bands,
based on year of birth.
1950–54 1960–64 1970–74 1980–
1945–49 1955–59 1965–69 1975–79
year of birth aggregated in five-year bands
mean VOT +1 SD (ms)
292 David J. Silva
Amean VOT
. In calculating DVOT,
we can better assess the relative (and putatively phonemically driven)
differences in VOT employed by each speaker, while accounting for the
fact that there are individual speaker differences in overall degree of as-
piration (Silva et al. 2004). As illustrated in Fig. 2, older speakers con-
sistently produce clear differences between mean VOT
and mean
(DVOTZ0 ms), while younger speakers produce noticeably
smaller differences between mean VOT
and mean VOT
20 ms). In fitting a curve to the data, the best fit proved to be that for a
quadratic equation (R
=0.686), indicating that the relationship between
DVOT and year of birth is non-linear (cf. the R
of 0.656 for a linear
regression). Perhaps most surprising is the finding that for a handful of
speakers DVOT is negative: lax stops appear to manifest longer voice on-
set times than aspirated stops, a finding certainly at odds with widely ac-
cepted descriptions of the language.
A repeated-measures ANOVA performed on the mean VOT values
for the 36 subjects further revealed that both place of articulation
(F(2, 66)=5.2, p=0.008) and phonation type (F(2, 66)=57.4, pY0.001)
are significant within-speaker effects, with year of birth functioning as a
significant covariate (F(1, 33)=10.9, p=0.002) ; speaker sex was not sig-
nificant (F(1, 33)=0.0, p=0.992). Moreover, there was a single significant
interaction: phonationuyear of birth (F(2, 66)=20.9, pY0.001).
Figure 2
DVOT (VOTasp®VOTlax) as a function of subjects’ year of birth. The best-fit
curve is that representing a quadratic function (R2=0·686).
10 1940 19801950 1960 1970
subjects’ year of birth
DVOT (ms)
With respect to place of articulation effects, we find little new to report:
as the point of articulation moves toward the posterior region of the oral
cavity, VOT tends to increase (Laver 1994 : 352, Ladefoged 2003: 98).
Although robust for the tense and the lax stops, this pattern is confounded
for the aspirated stops : while the mean VOT for /kH/ is the longest
(80.4 ms), the mean VOT for /pH/ (79.5 ms) is greater than that for /tH/
(73.5 ms). Scheffe
´post hoc tests reveal two homogeneous subsets, the first
of which includes the alveolar and labial places of articulation and the
second of which includes labial and velar.
With regard to phonation effects, VOT values associated with the tense
stops (/pp tt kk/) were significantly (and unsurprisingly) lower than those
associated with lax /p t k/ or aspirated /pHtHkH/ for all speakers in the
corpus, a result is consistent with Silva’s (2002) meta-analysis of the VOT
literature. As concerns the mean VOT for lax and aspirated stops, how-
ever, the data revealed more complex relationships involving the speakers’
year of birth. A hierarchical cluster analysis on the VOT data revealed an
important divide among the subjects : speakers before 1965 belonged to
one cluster, while those born in 1965 and after belonged to another. One
subject, a male born in 1970, patterned with the pre-1965 group.
Among the older speakers, the ‘traditionalists’, the mean VOT associ-
ated with tense, lax and aspirated stops were all significantly different
(pY0.05), with mean VOT
ranging from 3 to 18 ms, mean VOT
ranging from 36 to 90 ms and mean VOT
ranging from 51 to 117 ms
(Fig. 3).
For younger speakers, the ‘ innovators’, the mean VOT associated
with tense and lax stops is nearly equal to that for the traditionalists, but
the mean VOT for aspirated stops is markedly lower (69.7msvs.94
.0 ms).
Moreover, a repeated-measures ANOVA run only on the innovators
(using place of articulation and phonation type as the independent factors
and year of birth as a covariate) indicates that despite the relatively lower
mean VOT
, phonation type remains a significant factor (pY0.001) : as a
group, these younger speakers appear to use VOT differences to mark the
relevant phonemic distinction, but without the same degree of separation
in evidence for older speakers of the community.
In considering further the production of these younger speakers, how-
ever, one cannot help but be drawn back to the data in Fig. 2 and wonder if
the story to be told is, in fact, more complicated still. For the innovator
cohort alone, a correlation between year of birth and DVOT yields an R
0.173, indicating no meaningful age-related pattern. Moreover, there
are five speakers for whom DVOT is negative, suggesting that for these
subjects lax stops are produced with more aspiration than phonemic
aspirated’ stops, a departure from established norms. Can these speakers
be differentiated from the others on extralinguistic grounds other than age ?
Given the current corpus, with its limited degree of social stratification,
the answer at this point is a qualified ‘ no ’. With DVOT as the dependent
variable, a series of one-way ANOVAs for each of the demographic and
self-evaluation questions posed on a survey used in the fieldwork revealed
no differences in means that even approximated a=0.05 ; the smallest
p-value obtained was 0.303 (for the question ‘ How important is good
pronunciation?’). At this point in our understanding of VOT variation,
then, we are left to explore elsewhere other factors that might possibly
differentiate those subjects who present positive DVOTs from those who
do not.
Figure 3
Mean VOT values (ms) of tense, lax and aspirated stops for traditionalists vs.
innovators. Error bars: +1·00 SD.
VOT (ms)
generation group
Tonal contrast in contemporary Korean 295
The need for more carefully constructed and socially informed socio-
linguistic inquiry notwithstanding, a fundamental phonological question
remains to be addressed: if it is true that at least some of the younger
speakers in the community no longer use VOT to differentiate lax and
aspirated stops, how do they signal the underlying contrast? A potential
answer lies in an analysis of the corresponding F0 patterns.
4.2 Tonal melodies
Across the corpus, independent of a speaker’s age or sex, F0 patterns for
the target items are consistent: for words beginning with a lax stop, the F0
of the vowel in the first syllable is consistently lower than both (a) the
second and third syllables of the same target word and (b) the first syllable
of analogous words beginning with either a tense or aspirated stop (Fig. 4).
The robustness of these patterns is confirmed by a repeated-measures
ANOVA in which the dependent variable was the mean F0 for the vowel
in each syllable of the trisyllabic target forms. In this analysis, three in-
dependent variables were found to be statistically significant : phonation
type of the word-initial stop (F(2, 33)=121.5, pY0.001), location of the
vowel in the target word (first, second or third) (F(2, 33)=11.2, pY0.001)
and speaker’s sex (F(1, 34)=260.0, pY0.001). In addition, three interac-
tions proved significant (each at p50.001): PhonationuSex, Phonationu
Syllable and PhonationuSyllableuSex. Two factors not significant in the
analysis (pZ0.05) were the place of articulation of the word-initial stop
Figure 4
Comparison of averaged tonal melodies of male vs. female speakers. The phonation
types indicated refer to the first C in each target word. The bars represent 1 SD
above and below the mean value.
mean F0 (Hz)
296 David J. Silva
and the speaker’s year of birth. These findings suggesting that unlike the
VOT behaviours reported above, F0 behaviours are different in two
critical ways: place of articulation for the word-initial stop has no bearing
on the rate of vocal fold vibration of the targeted vowels, and there are no
clearly discernable diachronic effects on the same. This latter point is
particularly noteworthy, as it suggests that the tonal patterns presented in
Fig. 4 have been stable over time.
While one might expect phonation effects on the F0 values for the vowels
appearing in the target words’ initial syllables (Han & Weitzman 1965,
Silva 1998), differences among F0 values are also statistically significant
across syllables 2 and 3, but only with respect to a distinction between lax
and non-lax word-initial stops. In words beginning with aspirated and
tense stops, mean F0 values are not statistically significantly different in
either syllable 2 (p=0.318) or syllable 3 (p=0.059). Further probing by
means of Scheffe
´post hoc tests reveals that words beginning with a lax stop
exhibit a sequence of progressively higher-pitched syllables, words begin-
ning with an aspirated stops exhibit a sequence of progressively lower-
pitched syllables, and words beginning with a tense stop exhibit a pattern
whereby the F0 of syllable 2 is higher than those of syllables 1 and 3, but
the F0s of syllables 1 and 3 are not significantly different from each other.
These patterns are schematised using Chao numbers in (1):
5 Discussion
5.1 Variation and language change
The results of the acoustic study point to a change in how some Korean
speakers signal differences between lax and aspirated stops. The clearest
indicator of such a change is the presence of age-related shifts in VOT
values: as shown in Figs 1 and 2, older speakers are more likely to main-
tain clear VOT distinctions between lax and aspirated stops, while
younger speakers tend to minimise (or neutralise) such differences. The
non-linear pattern suggested by the data in Fig. 2 is arguably the first and
second components of a so-called s- (or z-) shaped curve, a pattern widely
documented in the sociolinguistic literature as representative of a language
change in apparent time (Labov 1972, 1994, Bailey et al. 1991, inter alia).
As Guy writes, ‘although such data actually constitute a synchronic
snapshot of a single point in time, the progress of the change is reflected in
the differential use by age’ (2003: 384–385). In the specific case of the
Tonal contrast in contemporary Korean 297
VOT data under analysis here, we can see the shift from a wide, statistically
significant differentiation between the VOTs of lax and aspirated stops
among older speakers, via a narrower (but still statistically significant)
differentiation of the same among middle-aged speakers, to the apparent
neutralisation of the VOT distinction for some of the youngest members
of the subject pool. As noted above, this younger cohort of speakers does
not appear to present a consistent set of behaviours, with some subjects
producing statistically significant differences in VOT for lax vs. aspirated
word-initial stops and other subjects appearing to neutralise VOT differ-
ences for these two phoneme types. How this situation plays out for a
larger pool of increasingly younger subjects remains to be documented.
The distribution of DVOT in Fig. 2, coupled with the fact that the best-fit
curve is a quadratic (and not a linear) relationship, provides support for
the claim that the sound change evidenced here has recently concluded. As
such, it is predicted that VOT data collected from speakers born after
1982 will yield an extension of points hugging the X-axis of the graph
in Fig. 2 (i.e. DVOT=0), thereby providing more a clearly discernible
(reversed) s-shaped curve.
Evidence further supporting the existence of a recent change lies in
the quasi-longitudinal data brought to bear in Silva (2002). Here we find
that speakers in their twenties during the 1960s and 1970s produced
traditional’ VOT patterns, i.e. clear differences in the mean VOTs of lax
vs. aspirated stops. Moreover, as Han & Weitzman report in their 1970
perception experiment, systematically manipulated differences in VOT
yielded responses that reflected a divide between the phonemic categories
of lax (shorter VOTs) and aspirated (longer VOTs). In contrast, speakers
in their twenties some 40 years later produce more ‘innovative’ VOT
patterns, i.e. DVOTY20 ms, often approximating 0. Were the data from
these two time periods representative of some other sociolinguistic
phenomenon (e.g. age-grading), there should have been more consistency
in the behaviour of subjects of similar ages. This is not the case.
5.2 Phonological implications
Despite this change in VOT patterns, the underlying contrast between lax
and aspirated stops in Korean is still preserved in the speech of younger
speakers, apparently manifested in terms of differentiated F0 patterns : lax
stops are associated with a melody that begins with a low pitch and then
rises through the target word, while aspirated stops are associated with a
melody that begins with a relatively higher pitch followed by a falling F0
Although phonation-associated F0 behaviours are not new to Korean
(Han & Weitzman 1965, Silva 1998), they have typically been relegated
to secondary status : while VOT values were taken as primary markers
of phonation type, F0 values were viewed as redundant.
Systematic consonant effects on the F0 trajectory of /a/ following word-initial target
stops could not be discerned in this corpus, either within or across subjects. This
298 David J. Silva
long-distance tonal associations correlated with lax vs. non-lax phonation
types have been documented in other varieties of Korean (e.g. Jun 1993) ;
but in these cases, F0 melodies have coexisted with robust VOT differ-
ences (Kim 2000). The current study, however, presents a new situation:
for younger speakers of the standard variety, VOT differences have been
neutralised and F0 appears to have assumed a primary role in marking the
lax vs. aspirated distinction.
These acoustic data corroborate a perception study by Kim et al. (2002),
who claim that their native Korean listeners made a critical distinction
in how they processed vocalic information edited from source stimuli
originally drawn from CV sequences containing lax, tense and aspirated
onsets. When presented with only the vowel portions of the source syl-
lables, subjects were consistently able to identify those source syllables
beginning with lax onsets: ‘for both vowel-only stimuli and cross-spliced
stimuli with conflicting consonant and vowel portions, listeners heard lax
initial stops if and only if the vowel had an L tone ’ (2002 : 97). Vowel
portions from syllables with tense and aspirated onsets, however, were not
so readily differentiated. Indeed, vowels drawn from syllables with both
tense and aspirated onsets were largely identified as coming from tense
stops; F0 cues were not sufficiently robust differentiators between the two
stops types. It was only when significant periods of aspiration noise were
included in the stimuli that subjects included aspirated onsets in their
reactions. Thus we find perceptual evidence in support of the acoustic
study reported here: that underlying phonation types for consonants are
correlated with the fundamental frequency characteristics of the immedi-
ately following vowel. The present study further generalises this claim by
arguing that these C-to-V correlations not only local, but may be broader
in scope. Given that the presence of a word-initial lax stop yields a tri-
syllabic tonal melody significantly lower than that associated with forms
beginning with either an aspirated or tense stop, we have reason to believe
that among these speakers of Korean, word-initial consonantal effects are
not restricted to the initial syllable, but extend into the higher prosodic
domain of the phonological word.
This shift in the relative weighting of phonetic events from the CV
transition (most critically, the post-release aspirated region) to the F0 of
the following vowel(s) provides evidence that standard Korean has un-
dergone a sound change analogous to that reported for various languages,
including Vietnamese (Haudricourt 1954, Thurgood 2002, among others),
Zaiwa (Wannemacher 1996) and Tibetan (Duanmu 1992). In Tibetan, for
example, historical differences between aspirated and unaspirated stops
finding is consistent with Han & Weitzman’s assessment of the issue: ‘there is
mutual overlapping between the onset values of fundamental frequency follow-
ing all three stops. This extreme overlapping suggests that the onset value of
fundamental frequency cannot be too significant a cue in the distinction of stop
consonants’ (1967 : 22). It is surprising, however, given the local phonation
effects reported in Silva (1998). This lack of local effects neither undermines nor
diminishes the more global melodic patterns reported here.
Tonal contrast in contemporary Korean 299
have been neutralised and replaced by differences in tone in the contem-
porary Lhasa variety.
khó (H level)
kh† (L rising)
Svantesson & House (2006) make a similar case in their comparison of
two varieties of Kammu (Laos). Where Eastern Kammu presents a voiced
vs. voiceless contrast (e.g. buuc winevs. puuc to undress’), Northern
Kammu manifests a corresponding L vs. H contrast, with no voicing
distinction (pu
`uc winevs. pu
´uc to undress’). Svantesson & House argue
that Northern Kammu is not truly tonal, despite the phonetic evidence.
Rather, they argue that the Eastern and Northern varieties share a com-
mon set of underlying representations, which are phonetically specified in
dialect-specific ways: speakers of Eastern Kammu realise the underlying
voicing contrast in initial consonants as such phonetically, while speakers
of Northern Kammu realise the same underlying contrast on the syllable
rhyme (i.e. higher vs. lower F0).
An analogous set of arguments has, in fact, been advanced for Korean
by Kim & Duanmu (2004). In their account, underlying lax stops are
argued to bear the features of a voiced obstruent ([+voiced, aspirated]),
while the tense series is treated simply as voiceless ([voiced, aspirated])
and the aspirated series is assigned the features [voiced, +aspirated].
Moreover, they write that ‘in this analysis, the main phonetic difference
in words pairs such as [t*al] ‘daughter’ and [tal] ‘moon’ [with tense and
lax initial stops, respectively] does not lie in the stops themselves but
in the tone of the vowel ([ta
´l] vs.[îl/tl])’ (2004: 96). In phrase-initial
position, lax stops are devoiced, thereby leaving only tonal information
to differentiate them from their tense counterparts. Under this analysis,
one further presumes that in word-initial position, the lax~aspirated
contrast is maintained by the opposition between [aspirated] and
Aside from the critical insight that phonation and tone should be phono-
logically integrated in Korean, Kim & Duanmu’s accounting evokes both
empirical and theoretical concerns. Empirically, their analysis provides no
clear accounting for the phonetic facts presented in w4 above : it is not clear
how they would capture the innovation manifested by younger speakers,
whereby lax and aspirated stops are no longer distinguished by differential
VOT values. Theoretically, their account raises two questions of con-
siderable import. First, is renders the tense series as the least marked in
the phonological system, as they are reanalysed as voiceless unaspirated
segments. Second, it suggests that in Korean, intervocalic position is the
primary locus for faithfulness (at least with respect to the feature
[+voiced]), an implication at odds with widely accepted accounts that give
primacy to domain-initial positions and prosodic heads in assessing fea-
tural faithfulness (Beckman 1998).
300 David J. Silva
5.3 Toward a revised model of Korean stop phonology
In this section, we propose a theoretical explanation that accounts for
the ‘traditional’ and ‘innovative’ patterns reported above without
the need to introduce radical changes to our existing understanding of the
Korean feature inventory. The analysis advanced draws substantially
from that proposed by Ahn & Iverson (2004), which was predicated on a
fully ‘traditional’ understanding of the phonetic facts.
We begin by returning to the Tibetan and Kammu situations, noting
that when they are compared to more traditional treatments of Korean, a
key difference emerges: in Korean, there is no underlying voicing contrast
in play. As would be predicted, both Tibetan and Kammu voiced stops are
associated with a low tone on the following vowel, a relatively common
depressor effect’ triggered by the voicing, high sonorance or nasality of
an immediately preceding syllable onset (Goldsmith 1990, Koehler 1995).
In Korean, however, standard accounts of the consonantal system deny a
contrastive role for the feature [voiced], under the general redundancy
statement that links sonorance with voicing : [\sonorant] £[\voiced] (see
Kim 1965, Kim-Renaud 1974, Lee & Ramsey 2000, Sohn 1999). This is
no problem, however, as support for an aspirated segment taking pre-
cedence over a corresponding unaspirated segment when it comes to tone
raising (the issue of voicing per se aside) can be found as far back as the
1970s, when researchers such as Hyman & Schuh (1974) advanced
the following hierarchy of consonant-induced tonal effects (reported in
Lee 1978: 218):
(3) implosive
voiceless aspirated
voiceless unaspirated
voiced obstruent
breathy voiced
tone raising
tone lowering
With the hierarchy in (3) as background, the absence of phonemic voicing
in Korean poses little theoretical problem for the analysis advanced here,
which rests primarily on the fact that in most languages (including
Korean) voiceless unaspirated segments are the least marked elements in
the consonant inventory. Interpreting markedness in a structural sense,
the current account treats the presence of a privative feature [spread
glottis] (or an analogous feature marking aspiration) as the driving force
behind the shift from aspiration to tone to mark the underlying lax vs.
aspirated distinction. More specifically, the ‘innovators’ in the subject
pool employ a redundancy rule whereby a laryngeal node dominating any
content is interpreted not on the segmental plane but tonally, formalised
in (4) (inspired by Yip 1995).
Tonal contrast in contemporary Korean 301
Glottal Aperture
(4) X
Under the analysis in (4), aspirated stops, by virtue of their underlying
marking for [spread glottis], acquire a tonal value of H; similarly, tense
stops, with their surface-level marking for [constricted glottis], likewise
acquire an H tone.
In contrast, lax stops do not receive any tonal marking,
leaving them to be realised by a default L tone. Appealing to a default L is
consistent with tonal analyses of other languages, e.g. Geman, Haya,
Kimatuumbi, Luganda and Somali (X-Tone 2005), as well as Lee’s (1987:
104) account of Gyeongsang Korean (but may be at odds with Kim’s 1997
account of the Northern Gyeongsang variety).
What, then, might motivate this association of a H tone with a non-null
Laryngeal node? To address this question, let us turn to the theory of
glottal features originally advocated in Halle & Stevens (1971). We begin
by assuming that underlying lax stops are represented by a single laryn-
geally unspecified C. Additionally, phonemic tense stops are represented
as geminated Cs (linked to a single syllable node), which are subsequently
modified by the introduction of a phonetic-level laryngeal specification
[constricted glottis] (Han 1996 : 191, Silva 1992 : 63–67). In the current
account, however, the element interpolated to the C-C structure is a
more abstract ‘tensity feature’, privative [stiff] (Halle & Stevens 1971, Bao
1990). Here we adopt Kim’s (1965) perspective on the notion that tense
and aspirated stops do, in fact, form a natural class of tense (or fortis)
segments, a view later adopted by Kim-Renaud in her underlying feature
specification of tense and aspirated phonemes as [+tense] (1974 : 5).
Finally, under this account, aspirated segments are laryngeally marked in
underlying representation, the relevant feature again being privative
[stiff]; they are the only segments that bear any laryngeal specification
in the lexicon.
As consonantal representations are interpreted by the phonetic com-
ponent of the grammar, C nodes marked by the tensity feature (i.e. tense
and aspirated) are realised with H tone the primary reflex of the stiff
vocal folds. At the same time, any non-sonorant singleton consonant
(i.e. non-geminate and aspirated) in phrase-initial position is realised with
aspiration, indicated by the insertion of the feature [spread]. Such an
account, with its explicit reference to prosodic constituency, is a reflection
This account operates independently of any assumptions regarding the underlying
status of the tense stops in Korean, be they phonemically geminates (which ulti-
mately receive a language-specific marking for [constricted glottis]) or singletons
(marked for [constricted glottis] from the outset).
302 David J. Silva
of what Kim-Renaud refers to as a ‘boundary-sensitive strengthening
phenomen[on]’ (1974: 3).
[+cons, son] [+cons, son]
[+cons, son] [+cons, son]
[sti‰] [sti‰]
[+cons, son] [+cons, son]
Xa. lax
X aspirated,
Xb. aspirated X
Xc. tense XXX
Under this analysis, the degree of aspiration associated with these phrase-
initial singleton Cs is assumed to be consistent, regardless of the con-
sonant’s marking for the tensity feature: [stiff] and [spread] is no more
aspirated than just [spread]. What is different about [stiff ] and [spread] is
the fact that the stiffness leads to a higher F0 on the following vowel than a
structure marked by [spread] alone. This transfer of laryngeal character-
istics from C to V is formally accomplished by following Ahn & Iverson
(2004), who invoke the principle of ‘ bipositionality:
H(6) [son]
Applying the principle in (6) to the structures in (5), we can begin to
account for the tonal patterns schematised in (1). What remains to be
accounted for is the apparent long-distance nature of the C-to-V spreading.
Tonal contrast in contemporary Korean 303
It is suggested here that no further phonological spreading is required to
account for the observed tonal melodies ; rather, phonetic interpolation
working across a higher prosodic domain, such as the minor phrase (Silva
1992) or the accentual phrase (Jun 1993), gives rise to the observed up-
ward trajectory for the lax/L items and the downward trajectory for the
non-lax/H items.
When compared to more traditional accounts of the lax~aspirated~
tense contrast in Korean, the analysis above differs in several ways. First,
it suggests that for Korean language innovators, the aspiration feature
(here [spread]) no longer functions phonemically ; rather, it is now a pro-
sodically conditioned redundant property. More specifically, the insertion
of [spread] occurs only in phrase-initial position, an example of a prosodi-
cally driven strengthening process. In word-internal intervocalic position,
by contrast, lax stops remain unmarked for laryngeal features, thereby
allowing for the phonetic interpolation of voicing (Silva 1992 : 142).
Aspirated stops in word-internal position also fail to acquire any new
features; their underlying marking for [stiff] prevents intervocalic voicing
(be the process a matter of phonological feature spreading or one of pho-
netic interpolation), leaving these segments voiceless in this position. The
extent to which the presence of a word-internal instance of [stiff] yields a
raised F0 on the following vowel is a matter left for future research.
Second, the current analysis assumes that the spreading effects attrib-
uted to bipositionality can be – if not must be – more than simply local. In
the spirit of Pierrehumbert & Beckman (1988), we argue that this phono-
logically spread instantiation of [stiff ] functions as an anchor for sub-
sequent F0 interpolation across tonally unspecified syllables across a
larger prosodic unit (most likely a phonological word, but perhaps even
a phrasal constituent), thereby yielding the F0 patterns displayed in Fig. 4.
Further consideration of [stiff ] as a more general manifestation of tonal
prominence’ will likely dictate whether the developing tonal contrast
might eventually be characterised as a pitch-accent system. Such an ac-
count would certainly place contemporary Seoul Korean in a familiar
context, given the existence of other pitch-accented dialects. All the same,
it would be foolhardy to assume that the phenomenon under discussion
here is unequivocally pitch-accentual, solely on the basis of what has been
claimed in other varieties of the language. In the absence of the necessary
corroborating data, no position is taken on the matter.
Finally, this account takes the explicit position that the relationship
between phonation type and the fundamental frequency of the following
vowel(s) is by-and-large ‘automatic’, as opposed to ‘controlled’. In con-
trast to the position put forth by Kingston & Diehl (1994: 423), the cur-
rent analysis claims that the phonological feature [stiff] actually serves to
predict phonetic behaviour, as opposed to simply limiting it. Moreover, it
is implied that this relationship predates the onset of the sound change
whereby VOT differences between lax and aspirated stops have become
neutralised, suggesting further that relevant relationship is not one between
glottal aperture and F0 (as argued by Kingston & Diehl, who focus their
304 David J. Silva
discussion on the feature [voice]), but rather one between glottal tension
and F0.
6 Conclusions
As the data from this study indicate, the obstruent system of Korean ap-
pears to have undergone a process of sound change : while the underlying
tripartite distinction among tense, lax and aspirated stops persists, the
basic phonetic manifestation of the latter two stop types has shifted from
one of a clear voice onset time distinction to one whereby fundamental
frequency plays the primary role. While other researchers have docu-
mented this sort of relationship between phonation type and F0 in speech
of contemporary Korean speakers (Jun 1993, Kim 2000, Kim & Park
2001), the current study makes clear the diachronic status of this situation,
namely, that the change appears to be in its final stages, if not actually com-
pleted. This type of shift is much like the voicing-to-tone changes attested
in other languages (such as Tibetan), thereby providing further support
for a phonological theory that allows for unified functionality of
the laryngeal mechanism, a single set of features that account for both
tonal and phonation events; the relevant feature here is [stiff], analogous
to Kim’s (1965) ‘ tensity’ feature. The acoustic results of the current
research further corroborate perceptual experiments conducted by Kim
et al. (2002), by supplementing their observations and analysis of related
data with a larger-scale, age-differentiated acoustic study. Indeed, the
Kim et al. methodology merits replication with a larger pool of listeners,
one that includes subjects older than the twelve 26–32 years old ‘ pho-
netically untrained native speakers of the Seoul dialect of Korean’ (2002:
84). If the acoustic data reported here have any bearing on the outcome of
such a perception experiment, we might expect that the responses of
older speakers (i.e. those born before 1965) would differ from those of their
younger counterparts, with older speaker perhaps less likely to consistently
differentiate among all three stop types solely on the basis of F0 infor-
mation. In addition, adding sonorant- and zero-initial three-syllable
words to the corpus (e.g. mapali packhorse’, nameci remainder’, ladio
radio’, apeji father’) will reveal the extent to which sonorance and/or
voicing influence the tonal melody of the entire word.
Finally, this study suggests that standard Korean is coming into align-
ment with other varieties of the language (most of which employ either
lexical or phrasal pitch accent), as well as with other East Asian languages,
which use F0 in phonemically relevant ways.
This paper argues that the phonetic interpretation of phonological representations may be controlled as well as automatic, because contextual variation in the realization of distinctive feature values is a flexible and adaptive response to variation in the demands on the production or perception of these values between contexts. The principal evidence presented in support of this argument is that the variation in the phonetic realization of speech sounds between contexts or languages involves reorganization of articulations into distinct phonetic categories. Extensive evidence of such reorganization in the realization of the feature [voice] is presented.