ArticlePDF Available

Some Recent Proposals Concerning the Classification of the Austronesian Languages

Authors:

Abstract

The comparative method is a relatively well-defined tool that has been employed successfully in the classification of languages for two centuries. In recent years, there have been several proposals about the classification of the Austronesian languages that violate basic principles of method. Because some of these have been advanced by scholars who are well established in other branches of linguistics, they have acquired an influence that is out of proportion to their scientific merit. This paper addresses three of these proposals: the Austronesian-Ongan hypothesis of Juliette Blevins, the Quechua-Austronesian hypothesis of E. M. Kempler-Cohen, and the higher phylogeny of Austronesian and the position of Tai-Kadai by Laurent Sagart. By carefully delimiting the analytic operations that belong to the comparative method and those that do not, it is shown that each of these scholars makes use of illicit operations to justify inferences about the classification of Austronesian languages, whether this involves claims about relationships that are external to the family or internal to it.
SOME RECENT PROPOSALS CONCERNING
THE CLASSIFICATION OF THE AUSTRONESIAN LANGUAGES
Robert Blust
University of Hawai’i
ABSTRACT. The comparative method is a relatively well-defined tool that has been
employed successfully in the classification of languages for two centuries. In recent
years there have been several proposals about the classification of the Austronesian
languages that violate basic principles of method. Because some of these have been
advanced by scholars who are well-established in other branches of linguistics they have
acquired an influence that is out of proportion to their scientific merit. This paper
addresses two of these proposals: the ‘Austronesian-Ongan’ hypothesis of Juliette
Blevins, and the ‘higher phylogeny of Austronesian’ by Laurent Sagart. By carefully
delimiting the kinds of analytic operations that belong to the comparative method and
those that do not, it is shown that both of these scholars make use of illicit operations to
justify inferences about the external relationships of Austronesian on the one hand, and
the internal classification of the languages on the other.
1. The comparative method of linguistics: what it is and what it isn’t. The set of
procedures used to justify inferences in historical linguistics emerged gradually during
the nineteenth century through the work of a number of linguists located primarily in
Germany and Denmark. Initially the aim of these scholars was not to develop a body of
theory or method, so much as to demonstrate to their own satisfaction that the similarities
they observed between most of the languages of Europe and various others in the Middle
East and India could not be products of accident or diffusion. But, by thoroughly
pursuing issues of empirical substance they eventually found themselves dealing heavily
with matters of form, or what is more commonly called ‘method’. By the late nineteenth
century questions of method had crystallized around the Neogrammarian dictum that
there are no exceptions to sound change, and in modified form this remains the central
principle of the comparative method of linguistics to this day. Despite the fact that the
comparative method is a rather well-defined tool with a large amount of intersubjective
agreement about how it can or cannot be bent to account for particular cases, there has
been a surge of claims in recent years about distant genetic relationship which has led to a
split between what is sometimes informally called ‘mainstream’ vs. ‘divergent’ historical
linguistics (the latter type sometimes going by less flattering names). This in turn has led
some mainstream historical linguists to reassert the basic principles of the field in order to
draw a clear line between what is and is not acceptable method in reaching inferences
about language history (e.g. Campbell 2008). At the risk of belaboring what many will
consider the obvious, I think it may be worthwhile to briefly review how the comparative
method works, and under what conditions other types of explanations are to be preferred.
The starting point for any investigation in historical linguistics is the recognition of
similarity between languages. Whether there is also dissimilarity is irrelevant, since
change can eradicate elements that were once shared. For reasons that are well described
by Greenberg (1957:35-36), but which need not be elaborated here, the quickest,and in
many ways most reliable way to establish that languages are related, is through lexical
comparison. So far as anyone has been able to determine over the past two centuries of
comparative linguistic research, there are four logically possible causes of lexical
similarity between languages, each of which will be briefly discussed and illustrated here.
1.1. Chance similarity. One fairly obvious cause of similarity is chance, which can also
be called ‘chance convergence’. Since any fully functional language contains thousands
of lexical items, most of which are formed by an arbitrary association of sound and
meaning, it is not surprising that striking similarities may occasionally occur as a result of
chance convergence. Such similarities have convinced some individuals that two or more
languages are related, and this has led in some cases to programs aimed at producing
more matches even where this is only possible through violations of good method. It was
similarities between a few words in Sanskrit and some of the languages of Indonesia that
led the well-known German Indo-Europeanist Franz Bopp in the early 1840s to propose
that Indo-European and ‘Malayo-Polynesian’ languages share a common ancestor apart
from other languages of the world (Bopp 1841). While Bopp knew that languages such
as Old Javanese were laden with Sanskrit loanwords he nonetheless concluded, against
the views of virtually every other scholar before or since, that this amounted to borrowing
between related languages. Many similar examples could be cited involving various of
the world’s known language families or isolates. Ruhlen (1987:11), for example, cites
German nass, Zuni nas ‘wet’ as an example of similarity due to chance, and Table 1
shows a set of random similarities, mostly between Austronesian languages and
languages belonging to various other families. To underscore how easy it is to find such
examples, these were assembled over a period of some 4-5 hours in the reference section
of a graduate library. The language families involved are Arawakan (Parauhano, Piapoco,
Yavitero), Austronesian (Agta, Erai, Fijian, Gane, Iban, Isneg, Kanakanabu, Kayan,
Lindrou, Malagasy, Malay, Mussau, Rotinese, Toba Batak), Eskaleutian (Yupik), Indo-
Europrean (Sanskrit, English), Isolate (Ainu), Niger-Congo (Swahili), Tacanan (Chama),
and Tupi-Guarani (Tupinambá):
TABLE 1: Examples of chance similarity between languages
1.1. Sanskrit dva, Malay dua ‘two’
1.2. Parauhano we, English we
1.3. Yavitero axi, Iban ari ‘day’
1.4. Yavitero ani ‘wasp’, Erai ani ‘honeybee’
1.5. Piapoco ina ‘woman’, Toba Batak ina ‘mother’
1.6. Chama bao ‘carry’, Mussau bao ‘carry pick-a-back’
1.7. Chama eme, Isneg íma ‘hand’
1.8. Chama etawa ‘blue, green’, Fijian kara-karawa ‘blue’
1.9. Tupinambá kaŋ ‘dry’, Kayan gaŋ ‘dry branch; be dry’
1.10. Swahili kaka ‘elder brother’, Malay kaka ‘elder sibling’
2
1.11. Swahili babu, Lindrou babu ‘grandfather’, Agta bábo ‘grandparent’
1.12. Swahili bua ‘stalk, stem of larger grasses’, Kayan bua ‘fruit’
1.13. Yupik manik ‘bird’s egg’, Gane manik ‘bird’
1.14. Yupik nani, Kanakanabu nanu ‘where?’
1.15. Yupik tamu ‘chew once’, Rotinese tamu ‘smack the lips while eating’
1.16. Ainu abe, api, Malay api ‘fire’
1.17. Ainu nunnu ‘suck the breast’, Malagasy nunu ‘breast’
1.18. Ainu wakka, English water
How do we know when similarities such as these are due to chance? If such similarities
between phonetically identical or nearly identical forms with the same or nearly the same
meaning are not matched by other examples of the same quality it must be concluded that
they are products of random convergence. In other words, chance similarity cannot
produce recurrent patterns of correspondence without abandoning either tight controls on
meaning, or full accountability of form, or both. Moreover, chance similarity is generally
limited to comparisons between two witnesses, whether these are attested languages or
proto-languages. By contrast, as is well known, genuine historical relationship can be
demonstrated through an entire population of languages, and the sound correspondences
that underlie this demonstration need not involve segments that are phonetically identical,
or even similar, so long as they exhibit recurrence without the need to resort to ad hoc
hypotheses.
1.2. Similarity due to language universals. The distributional pattern of similarity due
to universals differs from that of chance convergence in one important respect: although
it may exhibit a random similarity, a lexical item of similar form and meaning will be
distributed across many of the world’s languages. Perhaps the most striking example of
this phenomenon is seen in the words mama and papa/baba, first discussed in the general
linguistics literature by Jakobson (1960). Table 2 lists a few of the many languages in the
world in which the words for ‘mother’ and ‘father’, particularly infantile words, are
identical or very similar to the canonical types mama and papa:
TABLE 2: Widespread lexical similarity due to language universals
Spanish mamá ‘mother’ papá ‘father’
Mandarin mama ‘mother’ baba ‘father’ (get tone marks)
Swahili mama ‘mother’ baba ‘father’
Samoan mama ‘mother’ papa ‘father’
Sierra Miwok amá.-ʔ‘grandmother’ pá.pa- ‘grandfather’
In this case we can be reasonably sure why such widespread similarity exists in the form
of these words despite their historical independence. Human infants lack control of the
tongue for the first half year or so of their lives, but have full control of lip closure, since
mammals depend on nursing behavior in order to survive their infancy. As a result,
infant vocalization is restricted largely to consonants that can be produced with closure of
the lips and vowels that require minimal tongue control, resulting in syllables such as ba,
pa or ma (the vowel may be more fronted, but is generally low). Infants tend to repeat
3
syllables in vocalization, leading to sequences such as babababababa, papapapapapa or
mamamamamama, and while these mean nothing to the child, proud parents the world
over are quick to conclude that their unusually intelligent child is already calling out for
them. The result of this phyiological constraint in the infant and the psychological
tendencies of parents is the kind of globally-distributed pattern of lexical similarity that is
seen in Table 2. It is not clear how common such universally motivated similarity in
lexicon is, but there is reason to believe that it is not restricted to the single example of
mama and papa, even though this may well remain the most striking case.
Language universals naturally also produce non-historical similarity between languages
outside the area of phonology. The Austro-Tai hypothesis first proposed by Benedict
(1942, 1975) would appear to be strengthened by such striking comparisons as Malay
mata hari, Thai taa wan ‘sun’, both formed from the morphemes for ‘eye’ (mata, taa) +
‘day’ (hari < PAN *waRi, wan). Whatever the merits of the Austro-Tai hypothesis may
be, however, they are not strengthened by this type of comparison, since it reflects a
universal tendency for the morpheme for ‘eye’ to be used in the sense of ‘center, most
salient or important part’, as seen in Irish súil an lae (eye-of-day) ‘sunrise’, or in more
disguised form, English daisy < Middle English dayesye, Anglo-Saxon daegesēage
‘day’s eye’, from the fancied resemblance of a daisy to the sun surrounded by rays, much
as with the larger sunflower (Blust 2011). Expressions for a wide number of meanings
that exhibit a parallel structure using the morpheme for ‘eye’ in this more abstract sense
can easily be cited, as with German Hühnerauge, Mandarin jiyen ‘callus’ (both lit.
‘chicken eye’), or Malay mata ikan, Thai taapla ‘callus’ (both lit. ‘fish eye’).
Other examples of cross-linguistic similarity due to the operation of language universals
are seen in structural features such as the order of major sentence constituents, or
canonical shape. Since the order of major sentence constituents and other syntactic
properties often form part of a single ‘typological package’ it is not uncommon for
genetically diverse languages to share many common structural features, as with Japanese
and Korean, which like many other SOV languages, have postpositions, preposed relative
clauses, and the like. Similarly, since open syllables are preferred in many languages a
canonical shape CVCV will result from natural tendencies to ‘erosion from the right’, and
may give rise to superficial similarity in word shape between languages that have no
demonstrable historical connection.
1.3. Similarity due to borrowing/diffusion. Chance and universals share a property that
distinguishes them from borrowing and genetic relationship, namely that the latter two
involve historical contact, while the former two do not. In other words, cross-linguistic
similarities of the type discussed so far can arise independently at a distance with no
foundation in common history, whether that means vertical history (descent) or horizontal
history (borrowing).
Lexical borrowing has, of course, been recognized as a source of irregularity in sound
change since the nineteenth century, and it is also known to sometimes create divergent
viewpoints with regard to language classification, as with the older but now widely
discarded view that Tai-Kadai is related to Chinese. As shown by Thomason and
4
Kaufman (1991), borrowing is possible on every level, from phonetics to phonology to
morphology to semantics to syntax, and may result in massive typological similarity
between languages of very different origin. Where it affects several languages over a
continuous geographical territory it can create linguistic areas in which diagnostic
features are shared between languages of different families, as in India, or different major
subgroups, as in the Balkans of southeast Europe. Borrowing that affects areas wider
than a single pair of languages or cultures is generally known both in linguistics and in
cultural anthropology, as diffusion.
1.4. What it is: a tool for explaining similarity due to common origin/genetic
relationship. It is a truism in historical linguistics that non-relationship cannot be
demonstrated by argument, and that common origin or genetic relationship is the default
explanation that is adopted once chance, universals and borrowing have been considered
and found inadequate as explanations for cross-linguistic similarity.
Consider the data in Table 3, drawn from two languages, Malay and Hawaiian, that are
spoken some 5,000 miles apart:
TABLE 3: Evidence for the genetic relationship of Malay and Hawaiian
No. Malay Hawaiian English
1. mata maka eye
2. kutu ukuʔlouse
3. ikan i aʔfish
4. laŋit lani sky
5. taŋis kani cry, weep; to sound
It requires no linguistic training to see that there are striking similarities in these five
words (and many others) shared by Malay and Hawaiian (and, of course, many other
languages). However, it does require linguistic training to recognize that this small set of
words is tightly interconnected by recurrent sound correspondences. The only difference
between Malay mata and Hawaiian maka is t vs. k, and without further information this
comparison could naturally be nothing more than a product of chance. What gives
historical significance to the similarity of these words is the observation that the same
sound correspondence recurs in the pair kutu : ukuʔ, leading to the working hypothesis
that the correspondence of Malay t to Hawaiian k reflects a common origin for this
speech sound. However, to accept the word for ‘louse’ as exemplifying t : k rather than
chance similarity we must also hypothesize that k : ʔ is a recurrent sound correspondence
between Malay and Hawaiian. This second hypothesis finds further support in the
comparison ikan : i aʔ, but again only if we adopt a third hypothesis, namely that final
consonants in Malay correspond to zero in Hawaiian. This third hypothesis is further
supported by examples 4 and 5, but only if we accept a fourth hypothesis, that Malay ŋ in
correspondence with Hawaiian n is not an isolated product of chance. This hypothesis
finds confirmation in the same two examples, one of which (laŋit : lani) does double duty
in illustrating the recurrence of ŋ : n and –C : zero, and the other of which does triple duty
5
in illustrating the recurrence of these same sound correspondences plus t : k. In short, as
shown in Figure 1, with just five words we have been able to demonstrate the existence
of four recurrent sound correspondences linking Malay and Hawaiian in a network of
chance-defying similarity:
FIGURE 1: Recurrent sound correspondences in the data of Table 1
No. Malay : Hawaiian Examples
1. t : k 1, 2, 5
2. k : ʔ2, 3
3. -C : Ø 3, 4, 5
4. ŋ : n 4, 5
While a sound correspondence supported by two examples is technically ‘recurrent’ most
linguists would naturally want to find further examples of correspondences 2 and 4, and
this can easily be done by expanding the set of words compared without adopting further
hypotheses about sound correspondences (e.g. Malay kulur : Hawaiian uluʔ ‘breadfruit’,
Malay aŋin ‘wind’ : Hawaiian aŋi ‘blow softly, of a breeze’).
When we consider the other three logically possible alternatives for explaining the cross-
linguistic similarities observed in this case we can easily see that they are unsatisfactory:
if the similarities seen in Table 3 were due to chance we would not expect them to be
recurrent, in other words, to show the pattern of recurrent sound correspondences seen in
Figure 1. If they were due to universals we would expect words of similar shape and
meaning to have a scattered distribution worldwide. Borrowing also fails to provide a
convincing alternative, since the languages are spoken thousands of miles apart, giving
little opportunity for the intensive contact normally needed for extensive borrowing to
take place, and such basic words are borrowed only under very exceptional contact
situations such as those described by Thomason and Kaufman (1991).
The example presented in Table 3 is particularly striking in that it allows the statement of
four recurrent sound correspondences in just five forms, something that experiment will
show to be impossible in languages that are not generally regarded as genetically related.
Two general features of this demonstration are of critical importance. First, the glosses
given are chosen directly from standard sources for these languages without significant
modification. All are identical with one qualification: whereas Wilkinson (1959:1167)
glosses Malay taŋis as ‘Weeping. In contr. to cries (raung) or lamentations (ratap).
Mĕnangis ‘to weep’’, Pukui and Elbert (1971) gloss Hawaiian kani as ‘sound of any kind;
pitch in music; to sound, cry out, ring, peal, tinkle, roar, rumble, crow; to strike, of a
clock; voiced’. The meanings of these two forms are sufficiently different to raise
questions about the likelihood of common origin, but comparison with other Polynesian
languages quickly silences any doubts, as seen by adding Maori taŋi which Williams
(1971) glosses ‘sound, give forth a sound, cry, of things animate or inanimate; weep, utter
a plaintive cry, sing a dirge; fret, cry; salute, weep over; resound; mourn; cry for; sound;
lamentation, mourning, dirge’, or Samoan taŋi ‘cry (as a baby); weep; make a noise, utter
6
a cry (characteristic of a given animal)’ (Milner 1968). Second, all details of every form
are recurrent, and can therefore be accounted for by reconstructing a proto-language that
explains the development of every segment of every form through recurrent sound
change.
It is, of course, true that meanings also are subject to change over time, as just illustrated
by reflexes of *taŋis, but it is critically important to recognize that in establishing genetic
relationship meanings should be identical, or of a sort that is recurrent cross-linguistically
(as with moon/month, or water/river). It is also true that some comparisons that are
widely accepted by linguists leave part of the form in one language or group of languages
unexplained, as with Malay aŋin : Proto-Polynesian (PPN) *mataŋi ‘wind’ (next to PPN
*aŋi ‘to blow, of breeze’, which is formally impeccable in comparison to Malay aŋin, but
diverges somewhat in meaning. Again, it is critically important that such partial
comparisons be avoided in establishing genetic relationship, since they are redundant in
demonstrating the validity of language families such as Indo-European, Niger-
Kordofanian or Austronesian, where numerous comparisons can be cited that are
identical in meaning and that contain no unexplained phoneme sequences.
1.5. What it isn’t: a tool that can be manipulated by ad hoc extensions to make data
conform to expectation. Scholars who propose disputed genetic relationships between
languages or language families have learned by experience that in most cases such claims
will not be taken seriously unless they are accompanied by evidence of recurrent sound
correspondences such as those shown in limited form in Table 3. As a result, such
correspondences must be manufactured where they do not exist in a natural state. The
ways that this has been done are varied, and range from the naïve to the highly ingenious.
Before looking at the two claims that are the focus of this paper, then, it will be helpful to
be maximally explicit about what types of illicit modifications of the comparative method
are encountered in the literature. The following checklist aims for completeness, but may
miss rare types of deviations from methodological purity. To anchor it in reality I cite
examples from proposals in the literature.
Kawamoto (1977) claims a genetic relationship between Japanese and AN based on 708
comparisons that are divided into 35 semantic fields plus a small set of addenda. He
begins his exposition with the following six examples, which appear here without change,
except that glosses are offset by single quotes.1 Many of Kawamoto’s glosses are in
German, as they were taken from Dempwolff (1938) rather than from more recent
comparative work which would have shown the need for various modifications (MJ = his
Middle Japanese, ModJ = modern Japanese, unmarked forms are Old Japanese (OJ), and
PA = Proto-Austronesian):
1) abara-fone MJ < *ampara-p nȯ ȯ ‘ribs’ (= ‘thin and spread + bone’) – PA
qampar ‘ausbreiten’.
2) agi (MJ ago) ‘chin, jaw’; agit-ofi ‘move one’s jaws’ < *naŋkut- [n ~ s, 4, 10] –
1 Trivially, Dempwolff (1934-1938) departs from standard German usage in capitalizing the first symbol of
verbs and adjectives in addition to nouns, and this is repeated by Kawamoto; I have reverted to standard
practice here.
7
PA zaŋgut ‘Kinn’.
3) ak-i ‘to open’ – PA aŋap ‘den Mund öffnen’.
4) aruki < *ar k-iȯ (MJ ariki) ‘walk around [2, 6] – PA haliq ‘umziehen’.
5) asi-nafe ‘cripple’ (= leg + cripple) < *napa-i [n ~ s] – PO sambe ‘Misbildung des
Fusses’.
6) ciNko ModJ ‘penis’ < *tin-ko; -ko ‘endearment’ [10] – PA ( )uti[n] ‘Penis’; PA
-ku ‘my’ > maNko.
In classroom settings I usually select the first 25 examples of such proposed comparisons
to avoid selection bias, and then dissect them one at a time. For reasons of space this
procedure is abbreviated here, but even with these first six examples critical issues of
method will quickly become apparent.
1.5.1. Table-data disconnects. Since he knows that this is the sine qua non of valid
historical linguistics, Kawamoto (1977:23) includes a table of ‘sound correspondences’
relating Proto-Japanese (PJ) to Proto-Austronesian (PA). However, since most of his
comparisons cite Old Japanese or Middle Japanese rather than Proto-Japanese forms a
few adjustments have to be made in order to predict the OJ shape from the PA shape.
Even so, it is a simple matter to apply the rules of correspondence between PJ and PA
that are given in this table, and to see for any given comparison in the database whether
the OJ form matches the shape that is predicted for it by the Proto-Austronesian form
Kawamoto cites, or by a currently accepted variant of the Dempwolff reconstruction.
Given an explicit table of ‘sound correspondences’ and confident statements about their
regularity, an uncritical reader could easily be led to believe that there is no longer any
doubt: Japanese is genetically related to the Austronesian languages, and this relationship
has been demonstrated by application of the comparative method of linguistics. In
reality, however, the data shows nothing of the kind. Table 4 shows the Dempwolff
reconstructions used by Kawamoto, their modern equivalents or replacements, the
predicted Old Japanese shapes of the putatively related forms, the actual Old Japanese
shapes of these forms as given by Kawamoto, and a running tally of formal and semantic
problems (FP, SP) with the comparisons in the two rightmost columns:
TABLE 4: Table-data disconnects in Proto-Austronesian-Old Japanese comparisons
PAN Predicted OJ Actual OJ FP SP
*qampar = **aba X X
PAN *SapaR **afa abara-fone X X
‘unroll a mat’ ‘ribs’
*zaŋgut= **sagu X
PAN *qazay **ase agi X
‘chin, jaw’ ‘chin, jaw’
*aŋap=
8
PMP *aŋap **aka ak-i X X
‘gape’ ‘open’
*aliq=
PWMP *aliq **ari aruki
‘change place’ ‘walk around’ X X
POC *sampe **sabe asi-nafe
‘clubfoot’ ‘cripple’ X
PAN *qutiN **uti
‘penis’ **utigu ciŋko X X
The first comparison juxtaposes a suggested Middle Japanese (MJ) form with an AN
form that should appear as *SapaR ‘unroll a mat, spread out a mat’ (Blust and Trussel,
ongoing). Given PAN *SapaR the predicted OJ shape should be **afa since Kawamoto’s
table of ‘sound correspondences’ shows ‘PA *p, *b’ corresponding to ‘PJ *p’, which in
turn matches OJ *f/b (the latter reflecting earlier prenasalization, which is not posited for
PAN. Needless to say, on purely formal grounds the match with the attested form leaves
several things to be explained. A similar failure to match the OJ form predicted by
Kawamoto’s table of ‘sound correspondences’ is found in all other examples in Table 4.
Comparison 2) can be dismissed because Dempwolff’s *zaŋgut is attested only in Malay
and a small number of languages in western Indonesia that have a known history of
extensive borrowing from Malay. There is thus no basis for assigning it great antiquity in
the AN family. Rather, the best-supported term for ‘chin, jaw’ in PAN is *qazay (Blust
and Trussel, ongoing), which would predict OJ **ase, which is no closer than **sagu to
the reported agi (both show two discrepancies, which to most comparativists is fatal).
Comparison 5) is peculiar in that the comparison is with Proto-Oceanic *sabe ‘clubfoot’,
and cognates of this form are unknown in other AN languages, leaving its antiquity in
AN unsupported. As in cp. 2) Kawamoto includes a note which says that although OJ n
normally corresponds to an alveolar or palatal nasal in AN, it sometimes corresponds to s.
However, in cp. 2) the variation is with *z, not *s, and it is difficult not to conclude that
when he saw his proposed sound correspondences failing to produce the desired results
Kawamoto resorted to further ad hoc devices to shore them up.
Comparison 6), which by Kawamoto’s table of ‘sound correspondences’ should show OJ
**uti, or **utigu (if it contains a fossilized 1sg genitive suffix *-ŋku) contains so many
layers of speculation that it is perhaps best treated with silence.
To summarize, what one finds in virtually every case is that the predicted OJ shape does
not match the form that is given (a discrepancy that is general to the 708 comparisons
cited), and one must conclude that the table of ‘sound correspondences’ is this paper is
little more than window dressing for a long parade of random similarities.
9
1.5.2. Semantic laxity. Most linguists seem to agree that meaning is more difficult to
quantify or treat rigorously than sound, or elements of structure. Semantic divergence in
forms that are compared for the purpose of establishing genetic relationship is therefore
something that should be tightly controlled. There are two ways to do this: 1. compare
only forms that are translation equivalents or relatable by “single-step, widely attested
shifts in meaning, e.g. “moon” and “month”” (Greenberg 1957:38), 2. allow semantically
disparate comparata only where parallels to the semantic comparison exist within the
same family of languages in association with a different morpheme (e.g. ‘human being’
and ‘slave’ for reflexes of both PMP *qaRta and *qulun in Austronesian).
Relaxizing the conditions on semantic agreement is one of the most common devices
used in long-range comparison to propose a historical explanation for cross-linguistic
similarity that in most cases is better explained by chance. Again, the data in Table 4
show clear examples of this violation of good method.
In comparison 1) PAN *SapaR means ‘unroll a mat, spread out a mat’ (Blust and Trussel,
ongoing). The meanings ‘ribs’ and ‘unroll a mat’ are so radically different that few
scholars would propose to connect them, and Kawamoto does so only through the
fanciful idea that the ribs are conceived in Japanese (or were in Middle Japanese) as
spread out like an unrolled mat.
Comparison 3) raises issues about the need for precise semantic agreement. Kawamoto’s
proposed OJ *ak-i is given as a general verb ‘to open’, but the AN form (assignable on
presently known evidence only to PMP, the hypothetical ancestor of the non-Formosan
AN languages) refers specifically to gaping, a meaning that is conceptually distinct from
other types of opening in AN languages and to my knowledge never confused with them;
cf. the submorpheme *ŋap ‘open, of the mouth’ that appears in PMP *aŋap ‘open the
mouth, gape’ and in other morphemes such as PMP *b ŋap ‘surprised, amazed’, Proto-ə
Western Malayo-Polynesian *ciŋap ‘catch one’s breath’, Malay cəŋap ‘catch in the
mouth; snap at; panting or catching one’s breath’, Minangkabau ŋaŋap ‘to snap at flies,
of a dog’, etc. (Blust 1988:129).
Comparison 4) presents a similar problem of vague semantic similarity. The OJ meaning
is given ambiguously as ‘walk around’, which could refer either to aimless walking or to
circumambulation. The data that Dempwolff (1938) cites for this reconstruction matches
neither of these (Tagalog aliʔ ‘successor’ --- not found in modern sources, Toba Batak
ali ‘in place of, in exchange for’, Javanese, Malay alih ‘change places; change clothes;
move to’, Ngaju Dayak alih ‘change, metamorphosis’, Fijian yali ‘absent, missing’). Not
only is this a formally and semantically unconvincing comparison between OJ and AN,
then, there are even serious problems accepting it as having any antiquity within AN, as
the Fijian form almost certainly is extraneous, the Tagalog form is not given in most
sources, and the remaining forms are confined to Malay and languages in western
Indonesia that have a long history of borrowing from Malay.
1.5.3. The benign slash. Geraghty (1983:2) has introduced the useful term ‘benign slash’
to refer to the practice in some comparative work of refusing to face the unavoidable
10
consequences of assuming cognation, namely that this demands complete accountability:
“There are instances ... when a linguist uses a “benign slash” to remove from
consideration the offending portion of a form, although that device is normally taken to
imply the existence of a morpheme boundary.” Among other examples he cites a
proposal by Charles Hockett to relate Bauan Fijian qalo ([ŋgalo]) ‘to swim’ to Proto-
Polynesian *kaloama ‘goatfish, surmullet’, although that there is neither synchronic nor
diachronic evidence to support –ama as a meaningful element.
The benign slash is a frequently used device in long-range comparison, because it may
appear innocuous, especially since it is sometimes employed with languages that are
known to be related (recall Malay aŋin, Proto-Polynesian *mataŋi (*mat-aŋi) ‘wind’).
However, a little reflection will show that this is a powerful device in introducing random
similarity into the arena of legitimate comparison, since the probability of finding a
phonologically compatible match with part of a form in a another language is far greater
than the probability of finding a match with the whole form.
1.5.4. Proto-form stuffing. Benedict (1975) did such methodological violence to what
may well be a valid claim of distant genetic relationship that most historical linguists
could no longer take him seriously. In a critique of what he called ‘pseudo-micritizing
devices’, Matisoff (1990) semi-humorously labeled one of Benedict’s strategies for
manufacturing so-called sound correspondences ‘proto-form stuffing’.
Most linguistic reconstructions consist of simple phoneme strings. Occasionally, where
an ambiguity exists one or two segments might appear in parentheses, but such devices
are severely constrained. By contrast, Benedict elevated this marker of alternative
choices to a level never previously seen. As Matisoff (1990:116) put it, “He claims he is
establishing ‘regular’ correspondences, he provides asterisked reconstructions bristling
with parentheses, slashes, and brackets. Sometimes these are so complex and arbitrary
that one feels like calling them ‘pseudo-micritizing devices – notational attempts to make
the speculative seem rigorous ... One way of ensuring apparent ‘regularity of
correspondence’ is to reconstruct proto-forms that are so complex canonically (e.g.
containing long consonant clusters, or even several syllables) that no given combination
of proto-entities is likely to recur very often --- thus obviating counterexamples.”
Examples abound, but one should suffice to show how the liberties that Benedict (1975)
took with the comparative method rendered it meaningless as a tool of science. In order
to relate what he called ‘Indonesian’ *[d]a ay=[dʔ]a y ‘forehead’ to Proto-Tai *hna ʔə
‘face’, he posited Proto-Austro-Tai *(q/)(n)dza[q]ai[s] ‘face, forehead’. Even without
specialist knowledge it is obvious that such a reconstruction permits many outcomes
depending upon which parenthetic elements are chosen or suppressed. What Benedict
should have compared is PAN *daqiS ‘forehead’, Proto-Tai *hna ‘face’, and what he
should have concluded is that these forms show no recurrent sound correspondences.
Moreover, ‘forehead’ and ‘face’ are clearly distinguished in most AN languages, with
‘face’ and ‘eye’ much more likely to be represented by a single form than ‘face’ and
‘forehead’.
11
1.5.5. Unique ‘sound changes’. The discussion of ‘proto-form stuffing’ could as easily
be labeled ‘unique sound changes’, since Benedict’s departures from the comparative
method gave him room to propose sound changes for which counterevidence cannot be
given, since the ‘changes’ themselves are unique. Needless to say, in any branch of
science constructs that are proposed solely to remove an empirical obstacle to a
theoretical claim meet the definition of an ad hoc hypothesis (Leplin 1975), and are
disallowed as valid methods of inference.
1.5.6. Split cognates. In the same section of the discussion note in which he proposed the
term ‘proto-form stuffing’ Matisoff (1990:116-117) says that “The height of Benedictine
megalocomparative ingenuity is reached in the concept of SPLIT COGNATES, i.e. cognates
that have reflexes of at most one given proto-phoneme in common, since they derive
from different syllables of a polysyllabic etymon.” This is seen in Proto-Austro-Tai
*[wa]kl wm[a] ‘dog’, a complex etymon intended as an umbrella shape for Proto-Tai ə
*hma, Proto-Hmong-Mien *klu and PAN *asu (with evidence for *wasu from a few
Formosan languages). While this may appear initially to be the same device as ‘proto-
form stuffing’, it differs in a crucial respect: the Proto-Tai form is thought to derive from
the final syllable, the Proto-Hmong-Mien form from the second syllable, and the PAN
form from the first two syllables together (with e.g. Proto-Austro-Tai *kl > PAN *s a
unique attestation). Needless to say, if modifications of the comparative method allow
entire syllables or sequences of syllables to disappear, such that the first syllable is lost in
Language A and the last two in Language B (a mirror-image parallel to Benedict’s
derivation of Proto-Tai *hma and PAN *wasu from *[wa] kl wm[a]), we would have no ə
trouble relating, e.g. English dog and Tagalog áso by simply positing Proto-English-
Tagalog *dogaso.
2. Two proposals concerning the classification of the Austronesian languages.
My primary goal in this paper is to examine two recent claims about the classification of
the Austronesian (AN) languages, and to show that these assertions fall outside the
mainstream of historical linguistic research for very specific reasons, namely that they
achieve their results only by appealing to illicit extensions of the comparative method
such as those described under 1.5. As in several other questionable proposals concerning
the classification of the AN languages (e.g. Bopp 1841, Brandstetter 1937) the claims I
will address in this paper have been advanced by highly competent scholars who have
established solid reputations in other branches of linguistics. However, this does not
make them any different from similar claims that have been made by linguistic amateurs
or other less well-known individuals, as in the final analysis all that matters are the
arguments, and the existence (or non-existence) of evidence to support them. The two
proposals are the Austronesian-Ongan hypothesis of Blevins (2007), and the higher
phylogeny of Austronesian and the position of Tai-Kadai by Sagart (2004, to appear).
The first of these is a claim about distant genetic relationship, and so is classed together
with many other similar claims about Austronesian that have been made in the past and
that continue to be made today, including Austronesian-Indo-European (Bopp 1841,
Brandstetter 1937), Austronesian-Beothuk (Campbell 1892), Austric (Schmidt 1906,
Reid 1994, Hayes 1992, 1997, 1999), Austronesian-Semitic (Macdonald 1907),
Austronesian-Japanese (van Hinloopen Labberton 1924, Kawamoto 1977, 1984, Benedict
12
1990), Austronesian-‘Hokan’ (Rivet 1925), Austro-Tai (Benedict 1942, 1975, 1990),
Austronesian-Quechua (Key 1984, 1998, Kempler-Cohen 2012), Austronesian-Tacanan
(Key 1984, 1998), Austronesian-Uto-Aztecan (Key 1984, 1998), Austronesian-
Mapuche/Mapudungu (Key 1984, 1998), Austronesian-Panoan (Key 1984, 1998), Sino-
Austronesian (Sagart 1993, 1994, 1995, 2005). The second is a claim about the highest-
level subgroups of Austronesian, and includes Tai-Kadai within one of these subgroups, a
conclusion that departs from the views of every other scholar, including those who accept
the Austro-Tai hypothesis.
2.1. The Austronesian-Ongan hypothesis. Blevins (2007) startled the scholarly world
with a novel proposal that many considered almost unimaginable before her: Jarawa and
Önge of the southern Andaman Islands form a linguistic clade (Öngan) that is coordinate
with the AN language family in a superfamily she calls ‘Austronesian-Ongan’.2 [MAP
HERE]. Although there was universal strong skepticism among Austronesian specialists
about her claim, the writer, who was asked to referee the manuscript, advised the editor
of Oceanic Linguistics that this was a paper from a distinguished phonologist, and that
every claim by a proven scholar, no matter how far it departs from received opinion,
‘deserves its day in court’. Consequently the paper was published, and a new assertion
about the external relationships of the AN languages entered the literature, joining the
thirteen others noted above. Within a short time following the publication of this article
MPI Leipzig, where Blevins was employed at the time, issued a news announcement that
an important new linguistic relationship had been discovered, namely that Önge and
Jarawa are related to the Austronesian languages, an announcement which implied that
the matter was settled and agreed upon by every interested party. Today one finds a
surprising number of references and comments on the Internet referring to this claim,
although few if any of them investigate its merits. The time has therefore come for the
promised day in court.
Blevins (2007:154) begins the abstract for her paper with the statement “This paper
applies the comparative method to two related languages of the southern Andaman
Islands, Jarawa and Onge, leading to the reconstruction of a proto-language termed
“Proto-Ongan” (PON). The same method is used to argue that Proto-Ongan may be
related to Proto-Austronesian (PAN).” Although she qualifies her proposal of the
Austronesian-Ongan hypothesis in her abstract by the wording “may be related”, by the
time she reaches her conclusions 36 pages later this reservation has disappeared, and she
maintains “In this study, the comparative method has been applied to two related
languages of the southern Andaman Islands, Jarawa and Onge. This comparison has
allowed reconstruction of a protolanguage termed “Proto-Ongan”, and the identification
of regular sound changes in each daughter language ...The same method was used in
section 3 to demonstrate that Proto-Ongan is related to Proto-Austronesian.”3
2 The first vowel of Önge is a schwa, and Blevins initially considered calling the Jarawa-Önge group
‘Engan’, and the proposed superfamily ‘Austronesian-Engan’, but when she was informed that ‘Engan’ has
been preempted for a group of Papuan languages in the Eastern Highlands of New Guinea she chose
‘Onge’ and ‘Ongan’ as substitutes. To avoid the problem of the reader assuming that this spelling
represents a mid-back rounded vowel I write ‘Önge’, following the practice of earlier writers such as XXX.
13
For a number of reasons that will be detailed below, this is an extraordinary claim, and as
the late Carl Sagan was fond of saying, “Extraordinary claims require extraordinary
evidence.” The focus of this section will be to determine, by close examination of a
representative sample of the data given by Blevins (2007), whether such extraordinary
evidence --- or even ordinary evidence --- exists in support of the Austronesian-Ongan
(AO) hypothesis.
2.1.1. The non-linguistic background. Before we consider the linguistic data offered in
support of AO it is necesssary to draw attention to several non-linguistic factors that
cannot be ignored in making a claim of this kind. These fall into four groups: 1. biology,
2. culture, 3. geographical separation, and 4. chronology.
2.1.1.1. Biology. Speakers of AN languages generally conform to what has been called a
‘southern Mongoloid’, or Southeast Asian physical type, characterized by light brown
skin, straight black hair, medium stature, and a variable presence of the epicanthic
eyefold. With local variations on a theme this type is found among the aborigines of
Taiwan, the native people of the Philippines and much of Indonesia-Malaysia, and the
Chamic peoples of mainland Southeast Asia (SOURCES). In the Pacific and Madagascar
the picture is more complex, in the former case due to contact with a pre-AN population
in coastal New Guinea and neighboring areas that entered the western Pacific tens of
thousands of years earlier, and in the latter due to the likelihood of settlement along the
Mozambique coast before the settlement of Madagascar itself. However, in Micronesia
and Polynesia, a basically Southeast Asian physical reappears which differs mainly in a
tendency to larger stature (particularly in Polynesia) and to waviness in hair form.
In striking contrast, all aboriginal groups in the Andamans, both the Önge and Jarawa of
Little Andaman in the south and the now moribund peoples of Great Andaman as
reported by Man (18XX), Radcliffe-Brown (1922) and others, have exceptionally low
stature, with jet black skin, tightly curled hair, and some steatopygy in the female sex. It
is hard to imagine a more striking contrast in physical appearance between peoples who
are thought to be linguistically related. It is true that some AN speakers in the western
Solomons are also extremely black, but here there is clear evidence of admixture with a
preexisting population that was in the region for close to 30,000 years, as well as local
genetic drift (Friedlaender 2007). No such argument can be advanced to account for the
striking physical differences between typical AN speakers and the Andamanese, or at
least the Önge and Jarawa, since these of necessity appeal to differentiation from a
common ancestral form. Rather, even without confirmatory genetic studies these two
groups would appear to belong to distinct branches of humanity that probably evolved
along separate paths from the time Homo sapiens left Africa at least 100,000 years ago
(Endicott book + other sources). The only alternative to this improbable scenario is the
3 Even greater certainty appears in an email message sent jointly to Byron W. Bender, John Lynch, Andrew
Pawley, Lawrence A. Reid, Malcolm D. Ross and myself on November 22, 2006: “If any of you were
closeby, or even in the same country, I imagine I would have arranged a visit, and thrown this all at you in
the very early stages. As it was, there was really no one here to discuss it with, so I gathered up every bit of
data I could on Jarawa, and its sister language, Onge, and worked and worked. I think I have made a major
discovery, and wanted to share it with you all before anyone else. These two languages of the Andaman
islands (i.e. Onge and Jarawa) are clearly related to Proto-Austronesian!”.
14
equally improbable one that one group of pre-PAN speakers arrived in the southern
Andamans and were physically assimilated to the local population before the AN
expansion out of Taiwan c. 4000-4,500 BP.
2.1.1.2. Culture. Culturally, the differences betwen most AN speakers and the Önge,
Jarawa, and Great Andamanese are equally dramatic. There is abundant linguistic
evidence, in most cases supported by the archaeological record, that PAN speakers had
grain agriculture, including both rice and millet, cultivated sugarcane, bananas and taro,
lived in permanent houses and villages, hunted with dogs, raised pigs, probably chickens,
and possibly water buffalos, made pottery, practiced loom weaving, had some knowledge
of metals (hematite was used as a coloring agent for ‘red slip’ pottery), and manufactured
other items of material culture that are not used by hunter-gatherers, including wooden
mortars and pestles for pounding grains, hammers and wooden nails/dowels, as well as
seagoing canoes together with the various paraphernalia associated with them (Blust
1976, 1990, Bellwood XXX). In short, they had what archaeologists commonly call a
neolithic culture, one characterized by permanent settlements, agriculture, domesticated
animals, and pottery, among other traits.
As many researchers have pointed out, and Blevins accepts, all Andamanese at the time
of contact were hunter-gatherers. They used the bow and arrow in hunting, and spears in
fishing, but wore little clothing, had no weaving, and no houses apart from temporary
brush shelters (SOURCE). Although they used rather flimsy outrigger canoes in coastal
waters from the time of British contact in the eighteenth century [CHECK DATE], these
almost certainly are products of contact with Malays or other AN speaking groups who
have traficked the eastern Indian Ocean for centuries (Waruno Mahdi, Tom Hoogevorst
diss.), and in any case are a far cry from the oceangoing outriggers that led AN speakers
to cross over 1,500 miles of open sea and settle the Marianas by at least 1,500 BC (Blust
2000), and Madagascar by perhaps 700 AD, or the modified double-hulled canoes that
led AN speakers to settle all inhabitable islands of the Pacific as far east as Hawai’i and
Rapanui by 1,100 AD (Hunt et al). In many ways the Andamanese represent an extreme
form of foraging adaptation, as they had no dogs until these were introduced by the
British in the nineteenth century, grew no crops of any kind, and did not know how to
produce a fire (they carried coals with them to new camps to kindle a fire when needed).
Both of these cultural lacunae are rare in global context, and almost certainly indicate
millennia of separation from other human populations. In short, all Andamanese at the
time of first contact had what archaeologists without exception would characterize as a
palaeolithic culture. Virtually the only cultural feature that they shared with speakers of
PAN was the use of spears and the bow, but it is generally accepted that these are cultural
features that probably were invented repeatedly throughout human history.
If there was a community speaking PAN-Ongan at some remote time in the past we have
to assume that the AN branch acquired all of its neolithic traits after separating from the
Ongan branch, or that the Ongan branch lost them. Furthermore, we have to assume that
traces of linguistic relationship have survived a much longer separation time than most
historical linguists are willint to accept for the time-depth of language families (around
15
6,000 years is a commonly assumed cut-off for time-depths that still allow similarity due
to common origin to be distinguished from similarity due to chance; SOURCE).
Blevins recognizes the major linguistic schism within the Andamans, but rather than
interpreting it as evidence that the Andamanese have been in these islands since they
became isolated from the Asian mainland at the end of the Pleistocene era 11,000 years
ago, she assumes that the Önge and Jarawa formed a single linguistic community with
physically and culturally very distinct AN populations. And, although she never
comments on the matter, she implies that the physical and cultural similarities between
the negritos of Great Andaman and Ongan speakers are due to convergence rather than
common origin.
2.1.1.3. Geographical distance. It is now generally agreed by historical linguists,
archaeologists and many population geneticists, that the AN language family, which
covers 206 degrees of longitude from Madagascar to Rapanui, began its epic epansion
from Taiwan. Both the concentration of linguistic diversity and the radiocarbon
chronology from archaeology support the thesis that Taiwan is the best candidate for the
AN homeland. From there the first movement out appears to have been into the northern
Philippines, followed by a rapid expansion southward into the Malay archipelago, with
one branch moving eastward over the north coast of New Guinea into the Pacific, and
another ultimately settling all of insular Southeast Asia, portions of the Asian mainland
and Madagascar off the east coast of Africa
Much less is known about the prehistory of the Andamans, but since the nineteenth
century it has been known that languages of Great Andaman show very little resemblance
to those of the Önge and Jarawa. Rather than indicating separate origins for a population
that otherwise shows great physical and cultural similarity, this linguistic observation
suggests that the Andamanese of Great Andaman and those of the southern part of the
archipelago have been isolated from one another for many thousands of years within the
Andamans themselves. The Andamanese are, of course, members of a Southeast Asian
population also found in the Malay peninsula, and in scattered enclaves over much of the
Philippines that has generally been referred to since [the 16th century Spanish literature on
the Philippines; CHECK THIS], as ‘Negritos’ (‘little blacks’). Although scholarly
opinion about the genetic history of Southeast Asian Negritos remains divided (Endicott
2013), one prominent hypothesis about the origin of the Andamanese is that they are a
remnant of a once much more widespread hunting and gathering Negrito population that
occupied “Old Southeast Asia” prior to the southward expansion of Mongoloid
agriculturalists near the end of the Pleistocene. Unlike the Negritos of Malaya or the
Philippines, where contact influence has led to considerable gene flow, the indigenous
population of the Andamans remained physically, culturally and linguistically relatively
intact until British colonial rule brought Indian political prisoners to Great Andaman,
initiating the cascade of changes that has led to the linguistic and cultural extinction of
this northern population, giving the southern Andamans a reprieve into the 21st century.
Whatever conclusion one might reach about where the Andamanese as a whole were
before they reached the Andamans, Proto-Ongan has a shallow time-depth, and basic
16
considerations of parsimony favor the view that it was located in the southern Andamans.
As the crow flies, the distance between Taiwan and the Andamans is about 2,500 miles
--a considerable stretch in considering the likely homelands of PAN and Proto-Ongan.
2.1.1.4. Chronology. Geologically the Andamans are a southern extension of the Arakan
range of Burma, and during the last glacial maximum they would have formed a larger
landmass than they do today that would have been closer to the Asian mainland. It was
then that the best opportunity for settlement by a population of hunter-gatherers with little
deep sea navigational skill would have presented itself, and it is most likely that when the
archaeological picture for the Andamans becomes more completely filled out it will be
shown that the Andamanese have been in the Andamans since the end of the Pleistocene
era some 10,000-11,000 years ago. Currently what little archaeology has been done
shows no settlement earlier than XXXX BP (Cooper 2002).
By 5,500 years ago, when PAN speakers began to settle Taiwan the Andamans would
already have been long isolated from mainland Asia, and some ancestral form of Proto-
Ongan presumably would have arisen in the southern part of this archipelago (based on
the data in Blevins (2007) Proto-Ongan itself appears to have been a unity much more
recently than this). Given the geographical separation of the respective proto-languages
and the physical isolation of the Andamanese, it is difficult to imagine how PAN and
Proto-Ongan could possibly have even been in contact, let alone how they could have
been descendants of an earlier community that gave rise to both.4
To summarize, Blevins is confident that she has made a major discovery overlooked by
everyone before her. As one example of her tone, she says “Shelters made of wood and
woven palm leaves were of three different types, as in other parts of the Austronesian-
speaking world” (Blevins 2007:156; italics added). In fact, the Andamanese shelters are
simple, hastily-constructed lean-tos that give limited protection from rain and wind. The
closest shelter they resemble in the AN world is the temporary field hut used when
spending several nights away from the village in agricultural labor, a poor comparison
with the well-constructed permanent dwellings made of heavy timbers and bamboo floors
raised well above the ground on ironwood houseposts traditionally found over much of
island Southeast Asia. The suggestion that variations in the Andamanese brush hut
exhibit distinctions comparable to that between domicile, public hall and granary found
in the AN-speaking world by at least Proto-Malayo-Polynesian times is little more than
ethnological fantasy (Blust 1987; other SOURCES).
4 Blevins presented her ideas at the 11th International Conference on Austronesian Linguistics at Aussois,
France in June, 2009, and when some in the audience pointed out the difficulty of reconciling these
observations with the Ongan-AN hypothesis, her response was that linguistics is an autonomous discipline,
and non-linguistic matters are of no concern to her. It is clear that this reaction stemmed largely from her
belief that she had already demonstrated a genetic relationship between PAN and Proto-Ongan, but it
reveals a basic misunderstanding of historical linguistics. In synchronic linguistics phenotype and culture
are irrelevant to language analysis, but in diachronic linguistics the matter is fundamentally different. A
proto-language is a theory of a prehistoric language community. Like any language, a proto-language must
have been spoken by an interbreeding population with a particular cultural type that can be localized in
time and space. Given the non-linguistic background that has been sketched here, it appears next to
impossible to make this work for a hypothetical Proto-Austronesian-Ongan.
17
In short biological, cultural, geographical and chronological considerations work against
the likelihood of a linguistic relationship between Ongan and AN. To persuade other
scholars that such a relationship in fact exists, the linguistic argument must be thoroughly
convincing by ruling out chance, universals and borrowing as alternative explanations of
similarity, and this is only possible by following sound methodological principles.
2.1.2. The linguistic evidence. The AN-Ongan hypothesis links only Önge and Jarawa
(‘Ongan’) with the AN languages, and explicitly excludes the now moribund or extinct
languages of Great Andaman from this proposed genetic grouping. No doubt to disarm
potential critics, Blevins projects a concern for methodological purity, noting on page
159, fn. 9 that “This is not to say that look-alikes in basic vocabulary between Great
Andamanese and Proto-Austronesian cannot be found,” thereby reassuring the reader that
she is scrupulously distinguishing similarity due to chance from similarity due to
common origin. Again, on page 160 she states “Though descriptions are limited and
preliminary, comparable basic vocabulary ... and grammatical sketches for Jarawa and
Onge, along with detailed reconstructions for Proto-Austronesian and its descendants,
provide a firm basis for application of the comparative method.” And once more, on
page 164, she says “As I show below, this comparison reveals a striking number of
cognate sets that seem to defy chance resemblances, and whose semantic and
morphological characteristics are difficult to explain in terms of borrowing or contact-
induced change alone.”
Given this glowing confidence the reader is prepared to encounter an astonishing fact:
that despite the obvious non-linguistic hurdles that stand in the way of claiming a
historical connection between Önge and Jarawa on the one hand and the AN language
family on the other, there is incontrovertible linguistic evidence that these radically
distinct groups that are also widely separated in space actually represent divergent
continuations of a single ancestral population that was distinct from the negrito
population of Great Andaman.
Before presenting the lexical evidence for her position Blevins (2007:165) supplies two
tables of sound correspondences, labeled ‘Some PAN-PON consonant correspondences’
and ‘Some PAN-PON vowel correspondences in open nonfinal syllables’; this is
followed by a table of ‘Regular sound changes in PON/PAN’. For the convenience of the
reader each of these is reproduced below with changes in the table numbers and other
very minor modifications:
TABLE 4. SOME PAN-PON CONSONANT CORRESPONDENCES5
Proto-Austronesian *p *b *t *d *k *g *q *h
Proto-Ongan *p *b,Ø *t *d *k *j,g *q *h,y
Proto-Austronesian *m *n *ɲ *N *ʔ*ku *qu
Proto-Ongan *m,- ŋ *n,-ŋ * ɲ *l,y *ʔ*kw*kw
5 A Proto-Ongan reflex of PAN *y (palatal glide) is not given, even though a reflex of *-ay is.
18
Proto-Austronesian *c,C *s,S *j *z *l *r *R *w
Proto-Ongan *c *c *j,y *c *l *r *l,r *w
TABLE 5. SOME PAN-PON VOWEL CORRESPONDENCES
IN OPEN NONFINAL SYLLABLES
Proto-Austronesian *i *u *a * (written *e)ə*ay#
Proto-Ongan *i *u,o *a,e *e *e
TABLE 6. REGULAR SOUND CHANGES IN PON/PAN
Proto-Ongan Proto-Austronesian
1) #bu > #u #bu
2) #bi > #i #bi
3) #V #qV
4) q > k q
5) qw > kw (+kw < ku) qw > q, w; kw > k, w
6) T > Ø/__#6final Cs maintained (in unstressed syllables??)
7) n > ŋ,Ø/__# final nasals maintained
8) m, > ŋ,Ø/__#ɲfinal nasals maintained (in progress?)
9) ay# > e# ay#
The first thing to note about Blevins’ argument is that in the opening sentence of her
abstract she claims to have used “the comparative method” to infer the existence of an
Ongan-AN language family (Blevins 2007:154), a claim that is repeated in her
conclusion (Blevins 2007:190). However, attention to her etymologies, and to the
definitions given under 1.5 of this paper, show that this is not true. As support for her
position Blevins (2007) presents 109 etymologies, three pieces of grammatical evidence,
and two arguments based on morpheme structure restrictions in the PAN lexicon. Given
unavoidable limitations of space, the most objective way to evaluate this evidence is to
examine the first X etymologies that she gives, and then consider the non-lexical part of
her argument separately. Because I have found it adequate in pedagogical situations in
which I have examined other claims of distant genetic relationship, I have chosen to
examine the first 25 etymologies in her list of ‘Austronesian-Ongan cognate sets’
(Blevins 2007:167-170). Following her conventions, PAO = Proto-Ongan-Austronesian,
PON = Proto-Ongan, JAR = Jarawa, ONG = Önge, PAN = Proto-Austronesian; ACD =
Austronesian Comparative Dictionary; where she cites individual AN languages by
abbreviation I write out the full language name:
(1) PAO *aCa ‘high up’. The ACD (www.trussel2.com/ACD) has PAN *aCas ‘high,
tall’, and this is compared with PON *-eca-, which Blevins glosses ‘high up, upper part;
face’. However, the latter form is justified by ONG eja/le ‘face’, eja/tore ‘beard on the
face’, eje/bo ‘eye’, eje/tati ‘skin of forehead’ and similar body-part terms in Jarawa, thus
6 The symbol T represents any oral stop.
19
supporting PON *-eca- ‘face’, but hardly ‘high up, upper part’, which is simply a freely
added gloss given for no obvious reason other than to bridge a very wide semantic gap.
PAN *aCas is supported by data from 34 languages, and in none of these does the gloss
vary much from examples such as Puyuma a-Tas ‘high’, maka-Tas ‘on’, Malay atas
‘position over or above’, or Motu lata ‘length; tall, long’. Apart from semantic issues
there are formal problems with this etymology, since the rules of correspondence in
Tables 4-6 predict PAO *aCac, not *aCa. Blevins solves this problem handily by lopping
off the final consonant of what was clearly a single morpheme, rewriting the PAN form
as *aCa/s. Note both here and elsewhere throughout her argument that Blevins uses a
hyphen to mark what appear to be uncontroversial morpheme boundaries, as with PON
*eti-a ‘we’, or ONG ubu-daŋe ‘bucket of giant bamboo’, but a slash to dispense with
material that simply does not conform to theoretical expectation.7
(2) PAO *aCay ‘liver’. PAN *qaCay, rewritten *q/aCay ‘liver’ is compared with PON
*-aceŋ ‘blood’, which Blevins writes as *ace/ŋ (< *ace-iŋ ‘liver water’). Without further
information on Önge and Jarawa word-formation processes the proposed segmentation of
the PON form must be treated with caution. Moreover, the basis for the PON form is
JAR –aceŋ, ONG -aceŋe ‘blood’. Blevins does not explain the initial hyphen, which
suggests that these comparata have been extracted from longer words. The absence of *q
in the PAO form is treated separately for a number of nouns, and will be revisited below.
Finally, words for ‘liver’ and ‘blood’ are distinct in AN languages, and the latter is never
derived from the former; rather, where lexical innovations occur that have led to the
replacement of PAN *daRaq ‘blood; to bleed; menstruate’ with some other form it is very
often a word for ‘sap, juice’ and the like’ (Blust 2010:541-543).
(3) PAO *-aku ‘self, ego; third person pronoun’. PAN *aku ‘1st is invariably reflected as
a first-person singular pronoun throughout the AN language family. This is compared
with PON *-aku/i ‘self, ego’ (= ‘self, Ego’), which in turn is based on JAR həwi- 3rd
person nominative pronoun’, ONG –ekwi ‘3rd person definite nominative plural pronoun’.
Formally the PAN-PON match is forced by a benign slash (forms of *aku-i do appear in
some of the languages of central Borneo, but this is best treated as a local development).
Semantically 1sg. and 3sg. are different categories that never cross in AN. As a last
effort to save this comparison Blevins cites ONG –akui ‘self, ego’, “added to pronouns to
form reflexives, e.g. m-akui ‘myself’, et-akui ‘ourselves’, but –akui here is clearly the
reflexive marker, not a pronoun, and has no necessary connection with the category of
7 The addition of PAN *m/aCa ‘eye’ to patch these semantic and formal problems only compounds the
methodological flaws in an argument that appears out of touch with the comparative method as it is
normally employed. Blevins justifies her change of PAN *maCa to *m/aCa by noting that Binongan Itneg
and Guinaang Kalinga, two closely-related Central Cordilleran languages or dialects of the same language
in northern Luzon that are cited in Reid (1971), reflect PAN *maCa with no initial consonant. However,
both languages/dialects also lack the expected initial consonant or syllable in several other forms, as with
PMP *niuR > ItgB iyóg, KlaG iyúg ‘coconut’, PMP *lima > ItgB, KlaG íma ‘hand, arm’, PMP *puki >
ItgB óki, KlaG ú iʔ ‘vagina’, and PMP *taliŋa > ItgB, KlaG íŋa ‘ear’. Although the reason for these
truncations is unclear, Blevins offers no explanation for how aberrant data from a single witness can be
used as comparative evidence for a proposed morpheme division in PAN. Rather, what we see in this
comparison is the first use of a methodologically illicit device that Blevins elevates to a virtual principle of
her modus operandi, namely the ‘benign slash’.
20
first person singular. Others will surely notice that this forced comparison could be made
about as plausibly (or implausibly) with Indo-European.
(4) PAO *-ala ‘fetch, get, take’. PAN *ala (disjunct *alaq, doublet *alap) ‘take, get,
fetch, obtain’ is compared with PON *-le ‘verbalizing suffix’, which in turn is based on
Jarawa –ipo/le ‘remove the bark’ (ipo ‘flesh, skin’), Önge in/le ‘fetch water’ (iŋe
‘water’). No explanation is given for why *-ala was reduced to a monosyllable, and it is
not at all clear from the evidence presented that Jarawa, Önge –le means ‘fetch, get’.
(5) PAO *aNak ‘child, offspring’. PAN *aNak ‘child, offspring; son, daughter’ is widely
reflected in non-Oceanic AN languages. Blevins compares this with PON *ale ‘child’,
based on JAR –ale ‘child’, ONG ale ‘child (term of address)’, an etymology that is
superficially appealing, since it is basically free from semantic problems, and the forms
show some formal similarity. However, the sound correspondences asserted in support of
this and other examples of PAO *N vanish on close inspection. According to Table 4,
PAN *N corresponds unpredictably to PON *l or *y. The first problem we encounter
with this claim is one that would immediately be obvious to any comparativist: allowing
two reflexes without stateable conditions increases the role of chance in producing
random crosslinguistic lexical similarity. The fact that this holds for *a > a, e as well
clearly multiplies the role of chance as an explanation for the perceived similarity. In a
proposed proto-form with four segments, then, Blevins allows extra freedom of random
association for half of the phonemes. This would already be grounds for most historical
linguists to dismiss the comparison outright, but it is only one of the methodological
problems with this proposed etymology.
In her discussion of the evidence for PAO Blevins (165) says “Preliminary reconstruction
of Proto-Ongan allows us to compare this language with Proto-Austronesian. This
comparison yields cognates and allows regular sound correspondences to be identified,
as shown in tables 4 and 5, with regular sound changes ... summarized in table 6.” As
seen above, table 4 shows PAN *N corresponding to both PON *l and *y. This naturally
implies multiple instances of each reflex for the sound correspondences to even be
recurrent, let alone regular, but the data reveals something quite different: there are four
PAO reconstructions with *N, and four separate correspondences for these between PAN
and PON, making each one unique:
PAO PAN PON
1) *aNak *aNak *ale child
2) *aNiC *qaNiC *atiy skin
3) *uNay *uNay [*ulay] thorn in skin
4) *uzaN *quzaN *ucen rain
Three of these comparisons require a further comment. First, PON *atiy is said to result
from sporadic metathesis of *ayit, making y rather than t the reflex of *N in this form,
although this still leaves *C > t unexplained (Table 4). Second, although Blevins posits
PON *ulay ‘thorn in skin’ the only evidence she cites for it is ONG ull/uke ‘be pierced by
21
thorn in the foot’, where 1. the offending –uke is removed by a ‘benign slash’, 2. the
gemination is accounted for by an ad hoc hypothesis, 3. the normal *ay# > e did not take
place (Table 5), and 4. there is no internal comparative evidence for positing a PON form
since no Jarawa cognate is known. Third, it is acknowledged that PAO *uzaN, PON
*ucen ‘rain’ is ‘irregular’ (Blevins 2007:174).
Restating Blevins’ claims more objectively, then, the actual PAN : PON correspondences
for PAO *N are: 1) *N : *l, 2) *N : *t/y, 3) *N : ONG ll, and 4) *N : *n. This obviously
makes the claim that PAO *N is based on ‘regular’ sound correspondences meaningless:
the four examples are each unique. This is an astonishing proposal for someone who has
been trained to look for generalizations in phonology: what phonological generalization
is PAO *N posited to capture?
(6) PAO *aNiC ‘skin’. PAN *qaNiC ‘animal skin, hide, leather’, rewritten as *q/aNiC,
is compared with PON *atiy (< *ayit via metathesis), based on JAR –atiye, ONG –ati
‘skin’. Needless to say, appeals to sporadic change in arguments for distant linguistic
relationship are dangerous, as this obviously widens the scope of chance in producing
random similarity. Other problems with this comparison have already been noted in the
discussion of PAO *aNak ‘child’.
(7) PAO *-an ‘locative suffix’. This comparison is based on PAN *-an ‘verbal suffix
marking locative voice; nominal suffix marking location’ : ONG –a ‘nominal locative
suffix’, -aŋ/ka (< *an-ka). Reflexes of PAN *-an are widespread as part of the voice-
marking system of Philippine-type languages, and as ordinary locative suffixes in many
AN languages that have simplified or lost the original voice-marking system, but the
comparison of this affix with ONG –a is surprising. Not only is this form too short to
take seriously in proposing a historical connection with PAN *-an, but without a Jarawa
cognate there is no basis for proposing even PON *-an. Finally, the sound changes that
Blevins gives for relating PON to Önge (164), should have produced Önge ane, not –a
(or –an).
(8) PAO *apa ‘carry on back’, PON *-apa ‘carry on shoulder’. PAN has *apa ‘carry on
the back’, which is compared with Jarawa -apa-, Onge –aba- ‘carry on shoulder’. This
appears straightforward, but we are not told what the hyphens indicate. Given their usual
function this suggests that the phoneme strings compared have been extracted from
longer words. Moreover, AN languages are very specific in distinguishing verbs of
carrying, and carrying on the back (*baba) is quite distinct from carrying directly on the
shoulder (*apa), or carrying on the shoulder with a shoulder pole (*pasaqan). In any
case, without support from other forms this comparison is little different from many of
those noted above in Table 1.
(9) PAO *aqajaw ‘sun, sunlight, day’. The ACD has doublets *qajaw and *qalejaw
‘day’. Blevins compares the first of these with Jarawa eheya ‘sun, sunlight’, Önge ekuwe
‘day, today’. The initial vowel of the Ongan forms is unexplained, and neither of the
other two vowels of the Jarawa and Önge forms correspond with one another or to the
PAN form, nor do the glides of the Jarawa and Önge forms correspond. In short, it is not
22
at all clear that the Jarawa and Önge forms are related to one another, let alone to the
PAN reconstruction. But Blevins doesn’t give up so easily, pointing to Jarawa ehe, Önge
eke ‘sun’, which cannot be reconciled with either PAN doublet, but can be matched
(more or less) with the non-morphemic phoneme string *-aqa- in PAN *daqaNi ‘day’,
and PAN *banaqaR ‘radiance, as of rising sun’. Needless to say, despite Blevins’ claim
that she is using the “comparative method” this freewheeling use of the ‘benign slash’
departs radically from the normal practice of historical linguistics.
(10) PAO *aRi ‘come, go (movement toward speaker)’. The ACD has variants *ai, *ari
and *aRi ‘come; toward the speaker; let’s go!’. These are compared with Jarawa ale/ma
‘come, go’, and Önge ale /maʔ ‘return’. Again, the ‘benign slash’ is used in defiance of
any kind of scientific method to dispose of unwanted material, in this case in Ongan
languages rather than in in PAN. In addition, PAO *aRi is based solely on the PAN
form, since neither Jarawa nor Önge has a high front vowel in this form.
(11) PAO *ati-a ‘our, us; 1pl possessive pronoun’. The ACD has *ata ‘plural inclusive
possessive pronoun, our’, and *ita ‘we (incl.)’. This is compared with Önge eta ‘our, us’,
said to be from *eti-a/eti ‘we’. Since no Jarawa form is given this pronoun cannot be
reconstructed for PON based on internal evidence. In any case a comparison of either
PAN *ata or *ita with *eti-(a) shows a : i, which is not a sound correspondence
recognized in table 4.
(12) PAO *-aya ‘mother’. The ACD has *aya ‘father’s sister’, and this is compared with
Jarawa aya ‘mother’, k/aya ‘mother [address form]’, w/aya ‘mother [reference term]’,
ONG k/aye/ri ‘mother [address form]’. Since table 4 excludes PAN *y there is no basis
for evaluating the proposed sound correspondence. However, it is noteworthy that *aya
‘mother’ has been proposed as ‘Proto-Human’ by Bencel and Matthey de l’Etang (2002),
based on similar forms in a number of genetically disparate languages. Given this wider
context there seem to be two choices in dealing with this comparison: 1. attribute the
resemblance of these forms to convergence motivated by still poorly-understood
language universals, or 2. accept the Proto-Human proposal.
(13) PAO *baqeRuh ‘new’. The ACD gives PAN *baqeRuh ‘new; bachelor’. This is
compared with ONG baro/-i/baro ‘good, nice, beautiful’, and the reader is told that “The
PON form shows syncope of medial *e, and subsequent cluster reduction of *qR > *-R,
closed syllable lowering of *u > *o, and final *h-loss.” As Blevins knows, the syncope
of schwa between consonants that are themselves flanked by vowels is common in AN
languages, but inspection of her data shows no other examples of schwa syncope from
proposed PAO forms, making this a unique change. Since PON allows no consonant
clusters the proposed reduction of *qR > *R is also unique, as is the lowering of closed-
syllable *u. Given three uniquely attested changes, the role of chance in producing this
or any other comparison is obviously too great to instill confidence in the proposed
etymology. Moreover, once again Blevins has posited a Proto-Ongan form with evidence
only from Önge, thus using ‘inverted reconstruction’, or ‘reconstruction from the top
down’ (Anttila 1972, Blust 1972) to posit PON forms from a single Ongan language plus
a PAN reconstruction. While this method is a valuable tool for languages that are known
23
to be related, it is a dangerous tool to use in proposing distant genetic relationship, since
if one can choose a form from either Jarawa or Onge to compare with AN data it clearly
magnifies the role of chance in producing similarity across language families. Finally,
reflexes of PAN *baqeRuh mean ‘new; fresh; bachelor; before’, but never ‘good, nice,
beautiful’, meanings that are associated with other lexical forms.
(14) PAO *bel ‘smoke’, PON *bel ‘smoke’. The ACD has *qebel ‘smoke’. This is
compared with ONG bele/me ‘smoke of a fire’. Again, no Jarawa form is cited, showing
that there is no internal comparative basis for positing PON *bel. Moreover, the
segmentation of Önge beleme appears to be completely arbitrary, as there is no indication
of an identifiable morpheme boundary. Finally, because it is inconvenient for her
purposes, Blevins removes the first syllable of the PAN form, rewriting the actual proto-
form *qebel as *qe/bel ‘smoke’.
(15) PAO *beRay, PON *bele ‘give’. As a basis for the preceding proto-forms, PAN
*beRay ‘give’ is compared with ONG bele ‘give’. This comparison is one of the best
that Blevins offers: the meanings are identical, and the sound correspondences agree with
her tables 4 and 5. The only qualifications are that PAN *b is paired with PON *b or Ø,
and PAN *R is paired with PON *l or *r without stateable conditions, thus increasing the
role of chance in producing cross-linguistic lexical similarity, and no Jarawa form is
known, raising questions about how the PON form has been justified (again, by ‘inverted
reconstruction’). Given the lack of supporting evidence elsewhere, this comparison is
best treated as a striking chance resemblance, no different in kind or quality than many of
those given above in table 1.
(16) PAO *biaC ‘bow; draw a bow’, PON *iya. Based on apparent cognates in just two
AN languages, the ACD gives PAN *biaC ‘draw a bow to its full extent’. This is
compared with JAR eya/ya ‘make bows and arrows’, iya ‘hunt (with bow and arrow)’,
and ONG iya ‘bow’. The AN term clearly did not mean ‘bow’ (PAN *busuR), or ‘to
hunt with bow and arrow’ (PAN *qaNup). Instead, it is assumed in the ACD that it
referred to drawing a bow to its full extent prior to releasing an arrow. This meaning is
supported by the Ilokano gloss, but in Puyuma (the only other language from which a
cognate is known) the meaning is ambivalent, as seen in the semantics of Tamalakaw
Puyuma v<en>a-viaT ‘draw a bow to its full extent; pull to pieces (e.g. the legs of a
chicken)’, Nanwang Puyuma b<en>iaT ‘stretch something like a bow to its full extent’.
Whereas ‘draw a bow to its full extent’ refers to increasing the curvature of the bow by
an inward forcing movement, pulling the legs of a chicken to pieces refers to an outward
forcing movement, and this second meaning is reinforced by Nanwang Puyuma, where
b<en>iaT is also applied to straightening an iron bar, and thus reversing the curvature
rather than increasing it. In either case the meanings of the PAN and Ongan forms seem
only tangentially related. And again, the role of chance in producing random similarity is
magnified by allowing PAN *b to correspond to either PON *b or zero without stateable
conditions.
(17) PAO *biraŋ ‘anger; angry’, PON *biraŋ ‘angry’. The ACD has *biraŋ ‘anger;
angry’ based on data from two languages, and with the annotation ‘Possibly a chance
24
resemblance’. This is compared with JAR –ero-, ONG iraŋe/biti ‘angry’, where –biti is
said to mean ‘bad’. It is acknowledged (169) that “JAR /o/ is unexplained.” However,
neither vowel of the Jarawa form agrees with Blevins’ table 3, and Önge is said to have
no fewer than four reflexes of PON *r, namely r/y/l/Ø. Needless to say, any talk of
“regular sound correspondences” here is completely meaningless.
(18) PAO *buaq, PON *wa ‘fruit’. The ACD gives *buaq ‘fruit’, and this is compared
with JAR ele/wa ‘fruit’, ONG wa ‘fruit (pl.)’. It is speculated that ele- in the Jarawa form
< PON *ele ‘leaf’, but this is based only on ONG gele ‘leaf’, where the initial consonant
is unexplained, and no related free-form for ‘leaf’ is cited for Jarawa. At the same time
she proposed this comparison Blevins alternatively compared PAN *buaq to JAR oha,
ONG okw/ottirete ‘flower’, suggesting (169) that “This PAO form appears to have given
rise to a doublet ... perhaps under different prosodic conditions. In both forms, word-
initial *bu > *u in PON; however, in the word for ‘fruit’ the final C is lost, while it is
maintained in the word for ‘flower’.” As a look-alike the first of these two alternatives
(PAN *buaq, PON *wa ‘fruit’) is fairly striking. However, in the absence of clear
evidence of recurrent sound correspondences it has no more value for establishing genetic
relationship than e.g. Chama bao ‘carry’, Mussau bao ‘carry pick-a-back’ in table 1.
Moreover, the observation that the PAN form can be compared with either of the Ongan
sets should be a warning that the resemblance has no historical significance.
(19) PAO *bubu ‘conical bamboo basket trap for fish’. The ACD has *bubu with the
same meaning. This trap consists essentially of a conical basket woven of bamboo (or
other material where bamboo is not available), with a round mouth holding converging
bamboo splints that can be pushed apart by an incoming fish or eel in order to secure the
bait that is placed inside. From the inside, however, the splints cannot be pushed apart to
exit, and the creatured that entered cannot escape. Very similar types of traps have a
wide distribution on most continents, and are almost certainly products of independent
invention. This very specific device is compared with PON *u/bubu, *ubu ‘fish-keeping
vessel’, based on JAR ububu ‘fish-keeping vessel’, ONG ubu-daŋe ‘bucket of giant
bamboo’ (-daŋe = ‘trunk, stick’). Neither of these forms appears to refer to a trap of any
kind, although this is invariably the case with reflexes of PAN *bubu distributed from
eastern Taiwan to Fiji.
(20) PAO *buhet ‘squirrel’, PON *uhe. The ACD has *buhet ‘squirrel’. This is
compared with JAR uye ‘squirrel; rat’, which Blevins says “likely refers to the palm civet
or jungle cat (Paradoxurus andamensis), which is common in the Andaman jungles...”.
Apart from the recurrent problem of PAN *b- “corresponding” to PON *b or zero
unconditionally, the medial –y- in the Jarawa form is unexplained, and without an Onge
cognate a PON form cannot be reconstructed on the basis of internal evidence. Note how
Blevins nonetheless posits PON *uhe with a medial –h- which is simply supplied without
the benefit of any evidence at all to make the PAN and “PON” (= Jarawa) forms appear
more similar than they really are.
(21) PAO *bukeS ‘hair’, PON *ukec/ele ‘tail of animal’. The ACD has *bukeS ‘head
hair’. This is compared with ONG ukice/le ‘tail of animal’, where –le is said to be a
25
plural suffix. The first problem with this comparison, and one that basically removes it
from serious consideration, is the wide divergence in meaning of the PAN and PON
forms. PAN *bukeS referred specifically to the head hair of humans as opposed to body
hair, and this is a far cry from ‘tail of an animal’. Again, no comparative evidence is
supplied to justify a PON reconstruction, and the Onge form shows an irregular change of
*e > i.
(22) PAO *bukij ‘forested area, with many trees’, PON *ukiy ‘area with many trees’.
The ACD has *bukij ‘mountain; forested inland mountain areas’. It is very likely that
early AN speakers on Taiwan maintained coastal territories for some centuries after
settling the island, as this would have enabled them to retain access to valuable marine
resources. The contrast between open coastal zone and forested inland mountain zone
therefore must have been rather fundamental. Reflexes of PAN *bukij generally refer to
‘mountain, hill’, and this evidently was the primary component in the meaning of this
term, which is compared with ONG ukye ‘grove’ < *ukiye. The semantic resemblance
here is thus rather vague and generalized. Moreover, no Jarawa form is given, so the
PON reconstruction is based entirely on a single language. Finally, PAN *j is said to
correspond to PON *j or *y without statable conditions, a problem that is encountered
repeatedly with different PAN phonemes (*b, *g, *h, *N, *j, *R).
(23) PAO *buluq ‘type of plant with slender stem for use as small poles, arrow shafts,
and general construction’, PON *ulukw, *ulu. The ACD gives PAN *buluq ‘type of
slender bamboo: Schizostachyum spp.’. This is compared with JAR uluhe ‘to weave,
make net; skeletal cane structure of conical basket’, ulu/eŋe ‘to weave with cane’, ONG
ulukwe ‘make mat’, ulukw/ene ‘common cane’. Since Blevins derives PON *kw from
PAN *ku or *qu there is no explanation for the final consonant correspondence, which
would require a PAN *buluqu. Moreover, the Ongan terms appear to be primarily verbal,
referring to weaving with cane, while PAN *buluq was simply the name of a variety of
bamboo. Although some reflexes of PAN *buluq refer to a thin bamboo that was used in
weaving house walls, no core verbal meaning can be posited for this term.
(24) PAO *Cekelu ‘beckon to come back’, PON *e/jegulu. The ACD has *Cikel ‘return,
go back’, although this form is based on data from just two languages, and is thus of very
uncertain status. Blevins compares this with ONG –ejegulu ‘beckon to come’. From the
formal side there are multiple problems with this comparison, since 1. the first vowel of
the Önge form is unexplained, 2. PAN *i > ONG e is not part of Blevins’ table 3, 3. PAN
*e > ONG u is also not part of Blevins’ table 3, and 4. the final vowel of the Önge form is
not explained. In an effort to bridge these gaping disparities in the phonology she
proposes a new PAN reconstruction *Cekelu ‘beckon to come’ for which no data is given
other than Paiwan tsikelu ‘come back’, Pazeh ma-sekela ‘to come across’, which do not
agree with one another, nor with the proposed proto-form (Paiwan could reflect a *Cikelu
if there were comparative evidence to support it, and Pazeh could reflect a *Cekela, again
if there were comparative evidence to support it). Once more, no data is cited from
Jarawa, leaving the reconstruction of even a PON from internal evidence impossible to
achieve.
26
(25) PAO *Cenek ‘hard exterior; sharp exterior’, PON *cenek ‘shell; hard exterior’. The
ACD has PAN *Cenek ‘thorn’. This is compared first with ONG –uku/cenege, uku/cene
‘heel’, where –uku is said to mean ‘node, protuberance’. Semantically one can only
wonder what Blevins has in mind. In an apparent attempt to close the semantic gap she
suggests an alternative comparison between PAN *Cenek and JAR cana/cew ‘topshell,
Trochus niloticus’ (where cew is said to mean ‘beautiful’), cana/na ‘snail’, cana/hanap
‘hard-soiled, plain ground’, ONG cena/gili ‘conch shell’ (gili = ‘smooth’), cena/dalu
‘dentalium shell’ (dalu = ‘big’), -uku/cenege, uku/cene ‘heel’, now positing PAO *cana
‘hard exterior, sharp exterior’, but still not producing a plausible semantic agreement
(words for ‘thorn’ and ‘fishbone’ in AN languages may be related, but to my knowledge
no connection has ever been reported between the meanings ‘thorn’ and ‘shell’, let alone
‘thorn’ and ‘heel’).
Before considering the grammatical claims for the AN-Ongan hypothesis it might be well
to tabulate the results of the foregoing assessment of comparisons in order to facilitate a
quick overview of the status of the lexical evidence. Table 7 lists the numbers of the 25
proposed etymologies just considered, and in the columns marked ‘FP’ (formal problem)
and ‘SP’ (semantic problem) it inventories the objections that genuine use of the
comparative examination must raise against these claims of cognation; BS = benign
slash, ISC = irregular sound change, NIC = no internal comparison (that is, data available
only for Jarawa or only for Önge, but not for both), L = length of form not sufficient to
rule out chance, LU = language universal, MR = multiple unconditioned reflexes of a
proto-phoneme allowed:
TABLE 7: PROBLEMS WITH THE FIRST 25 COMPARSIONS IN BLEVINS (2007)
NO. FP SP
1. BS X
2. BS X
3. BS X
4. ISC X
5. MR, ISC
6. MR, BS, ISC
7. ISC, NIC, L
8. BS? X
9. BS, ISC
10. MR, BS, ISC
11. ISC, NIC
12. BS, LU
13. MR, ISC, NIC X
14. MR, BS, NIC
15. MR, NIC
16. MR X
17. MR, ISC
18. MR, BS
27
19. MR X
20. MR, NIC, ISC
21. MR, BS, NIC X
22. MR, NIC, MR X
23. MR, ISC X
24. BS, ISC, NIC X
25. BS X
Table 7 shows an interesting pattern: where a comparison is semantically straightforward
it has many more formal problems, and where there are fewer formal problems it tends to
be semantically incongruous. This is exactly the pattern we would expect chance
similarities to produce, since what the comparative method demands as evidence of
genetic relationship is conformity in both sound and meaning, without ad hoc props like
the benign slash. In short, none of these 25 comparisons --- which were chosen without
selection bias --- is problem-free, suggesting that a similar picture almost certainly would
emerge from an examination of all 109 etymologies. Those comparisons that are most
problem-free are 8 (PAO *apa, PAN *apa ‘carry on back’, PON *-apa ‘carry on
shoulder’) and 15 (PAO *beRay, PAN *beRay, PON *bele ‘give’). However, since the
latter contains two segments (*b, *R) each of which is allowed two unconditioned
reflexes, the similarity of PAN *beRay to Onge bele ‘give’ (with no known Jarawa
cognate) does not appear to be significantly greater than the examples cited in Table 1.
The reader will have noticed by now that Blevins often removes an initial *q from a PAN
form by use of the benign slash. Her argument for doing this is that Jarawa and Önge
nouns fall into two classes which exhibit a previously unrecognized structural correlation
with PAN nouns: “There are two types of nouns in Jarawa and Onge. Independent or
alienable nouns can occur without pronominal prefixes, and may begin with vowels or
consonants. Dependent or inalienable nouns can only occur with pronominal prefixes,
and are vowel-initial. Dependent nouns include body parts and kinship terms, and part-
whole relations.” (Blevins 2007:179). She interprets this as a feature found in PAO, and
claims that *q- in AN body part terms has been added, arguing (183) that “all
reconstructed PAN common nouns beginning with vowels are either body part or kinship
terms, or items that are typically possessed (homes, domestic dogs, and splinters that are
embedded in the body)” (italics added). The list of nouns cited includes six kinship
terms: *aki ‘grandfather’, *ama ‘father’ *aNak ‘child’, *apu ‘grandparent/grandchild’,
*aya ‘father’s sister’, *ina ‘mother, mother’s sister’, five body part terms/excretions:
*ujuŋ/ijuŋ ‘nose’, *ikuR ‘tail’, *iSeq ‘urine’, *huRaC ‘vein, sinew’, *utaq ‘vomit’, and a
catch-all category called ‘other’: *asu/wasu ‘dog’, *aCab ‘a cover’, *ian ‘place of
residence’ (NOT PAN!), and *uNay ‘sliver, splinter’.
Consideration of the evidence for Blevins’ argument and the logic used in constructing it,
however, reveals serious problems. First, it is not true that all PAN common nouns
beginning with vowels are body part or kin terms, or items that are typically possessed.
In the first two of these categories she includes eleven words. However, one of these,
*ijuŋ/ujuŋ ‘nose’, is not PAN (www.trussel2.com/ACD), and a second, *huRaC ‘vein,
sinew’, which Blevins writes in its PMP form as *uRat, did not begin with a vowel.
28
Second, the category ‘other’ which she considers to be “items that are typically
possessed” is obviously vague, and includes nouns that are not typically possessed in AN
languages in the same way that body parts and kin terms are. To my knowledge, for
example, no AN language that distinguishes alienable from inalienable possession,
includes words for ‘house’, ‘dog’ or ‘splinter’ in the inalienable category --- this is
simply a fabrication intended to force the data into a preconceived mold. In reality, then,
Blevins has identified nine PAN common nouns that begin with a vowel and refer to
either body parts or kin. But even cursory inspection of the ACD shows that there is a
larger number of PAN common nouns that begin with a vowel but do not refer to body
parts or kin. This includes the four ‘other’ items that she herself cites: 1. *asu/*wasu
‘dog’, 2. *aCab ‘a cover’, 3. *ian ‘place of residence’, 4. *uNay ‘sliver, splinter’, but also
a number of words that she either overlooked, or that had not yet been entered in the
ACD, including 5. *aCay ‘death’, 6. *alujah ‘ant sp.’, 7. *aNay ‘termite’, 8. *apuR ‘betel
chew’, 9. *eRiq ‘sword grass’, 10. *ibaS ‘companion’, 11. *iluR ‘river channel’, 12.
*iŋsuŋ ‘rice mortar’, 13. *iuk ‘citrus fruit’, 14. *udu ‘grass’, 15. *ulaw ‘confusion’, 16.
*uNuq ‘beads, necklace’, and 17. *uŋay ‘ritually sacrified monkey’ (?). Three more
PAN common nouns that fit the criteria Blevins uses can be added (*ajem ‘heart, mind’,
*idaS ‘affine of Ego’s generation’, *isi ‘flesh of people, animals or fruit’). An
empirically better grounded account of the correlation she is seeking to establish between
PAN common nouns that begin with a vowel and the semantic categories of body parts
and kin terms would thus show 12 examples in support of the claim and 17 against it,
based on data currently in the ACD (February, 2014).
While this is a far cry from the perfect correlation between canonical shape and semantic
class that Blevins thought she had found, it is not the end of the problems with this
feature of her argument. I have quoted her claim above, which clearly asserts a unilateral
implication, namely that where X = vowel-initial base and Y = inalienable possession, X
implies Y, but not vice-versa, since the latter assertion can quickly be falsified by such
common PAN forms as *Caliŋa ‘ear’, *maCa ‘eye’, *(qa)lima ‘hand’, *qulu ‘head’,
*susu ‘female breast’, or *tiaN ‘belly’. Why, then, is she insistent on rewriting PAN
*qaCay ‘liver’ as *q/aCay (167, comparison 3.2), or PAN *qaNiC ‘animal skin, hide,
leather’ as *q/aNiC (168, comparison 3.6), with a reminder to the reader “that forms with
PAN/PON initial *q/Ø correspondences are assumed to result from *q-epenthesis in pre-
PAN” (168) when this is unnecessary unless the implicational relationship she claims is
reversed (Y implies X)?
Needless to say, the claim of *q-epenthesis also produces massive irregularity, since it
clearly didn’t apply to PAN *aki ‘grandfather’, *ama ‘father’, *aNak ‘child’, *apu
‘grandparent/grandchild’, *ikuR ‘tail’, *ina ‘mother’, *iSeq ‘urine’, or any of the other
vowel-initial forms that Blevins wants to relate to the two noun classes in Ongan
languages. I could go on, but at some point in evaluating a proposal of this type one must
simply stop and say ‘Enough!’. The Austronesian-Ongan hypothesis is a castle built on
sand, an elaborate illusion constructed by a skilled linguistic architect who has been
seduced by the siren of long-range comparison, by the thrill of believing she has made a
great discovery that other scholars have somehow overlooked. She is not the first
competent linguist to have convinced herself of the reality of a linguistic relationship that
29
virtually no one else can see. One need only remember Franz Bopp, one of the leading
Indo-Europeanists of his generation, and a scholar whose memory is still honored, who
believed he had found incontrovertible evidence that Austronesian and Indo-European are
divergent branches of one language family, and the gifted Swiss Indonesianist Renward
Brandstetter, who nearly a century later wandered into the same trap via a rather different
path.
2.2. The higher phylogeny of Austronesian. The second proposal concerning the
classification of the AN languages that I will consider is somewhat different than the
first. Whereas the Austronesian-Ongan hypothesis asserts a genetic relationship between
the Ongan languages of the southern Andamans and the Austronesian family, the French
linguist Laurent Sagart (2004) has proposed a radically new subgrouping of the AN
languages that includes Tai-Kadai within the AN language family. In addition, Sagart
has proposed a genetic relationship between Austro-Tai and Sino-Tibetan, but I have
address the early form of the Sino-Austronesian hypothesis elsewhere (Blust 1995), and
will not consider it further here.
xxx
===========================================================
sect. 3: The higher phylogeny of AN
1. Sino-AN (considered e.w.; not treated here)
2. Sagart (2004, 2013)
He claims the following:
PAN PPT PLM PNM PW-SW
*esa/isa *esa/isa *esa/isa *esa/isa *esa/isa
*duSa *duSa
*telu *telu
*Sepat
*RaCep
?
*RaCep-i-tuSa
*RaCep-a-telu
*RaCep-i-Sepat
?
PPL PMP
*esa/isa *esa/isa
one two three four five
PAN *esa/isa *duSa *telu *Sepat *RaCep
PPT *esa/isa *duSa *telu *Sepat *RaCep
PLM *esa/isa *duSa *telu *Sepat *lima
PNM *esa/isa *duSa *telu *Sepat *lima
30
PW-S *esa/isa *duSa *telu *Sepat *lima
PPL *esa/isa *duSa *telu *Sepat *lima
PMP *esa/isa *duha *telu *epat *lima
six seven eight nine ten
PAN ? RaCep-i-tuSa RaCep-a-telu RaCep-i-Sepat
PPT ? *pitu RaCep-a-telu RaCep-i-Sepat ?
PLM ? *pitu RaCep-a-telu RaCep-i-Sepat ?
PNM *enem *pitu RaCep-a-telu RaCep-i-Sepat ?
PW-S *enem *pitu *walu *Siwa ?
PPL *enem *pitu *walu *Siwa *puluq
PMP *enem *pitu *walu *Siwa *puluq
PAN
*esa/isa
*duSa
*telu
*Sepat
*RaCep
*?
*RaCep-i-tuSa
*RaCep-a-telu
*RaCep-i-Sepat
*?
START with these problems:
1. The derivation of 7-9 is ad hoc (he says no, but counter this). Fudge #1: The
evidence for *tuSa is from Thao and Amis, hence PLM, not PAN (even this is
questionable). Fudge #2: The historical source of the final consonant in the
Taokas, Babuza and Pazeh words for ‘five’ is ambiguous, but Saisiyat clearly
supports *RaCeb, not *RaCep; so does Favorlang as recorded by Ogawa (Li
2003). Also –a- vs. –i- as the linker after *RaCeb.
2. He has no reconstructions for 6, 10; where did they come from?
3. The classification cross-cuts EF and WP; he wants to wave this away, but that
is irresponsible (show why)
4. *maka-Sepat and *tanaCu are clear innovations shared exclusively by Thao
with Taokas and Babuza/Favorlang (Blust 1996: 280). But in S’s phylogeny
Thao is ‘Limaish’ and the other languages are ‘Pituish’, so ‘8’ and ‘9’ should
have been PPT, PLM *RaCep-a-telu and *RaCep-i-Sepat.
5. Reconstruction of ‘20’, ‘30’, etc. by Zeitoun, Teng and Ferrell (2010) implies
decimal system?
6. *RaCus ‘hundred’ must be reconstructed as high as his ‘Proto-Enemish’,
31
implying a decimal system (true?) that didn’t yet exist.
reconstruction of *ma-puSa-N ‘20’, *ma-telu-N ‘30’, *ma-Sepat-eN ‘40’, *ma-
lima-N ‘50’, *ma-enem-eN ‘60’, *ma-pitu-N ‘70’, *ma-walu-N ‘80’, and *ma-
Siwa-N ‘90’ implies decimal system (true?) in Proto-what?
APPENDIX 1: Properties of basic numeral systems in Formosan languages
The term ‘basic numeral system’ is used here to mean the numerals 1-10. An ‘X’
indicates for any given language that it has a reflex of the traditionally reconstructed PAN
numerals 1-10: *esa/isa ‘1’, *duSa ‘2’, *telu ‘3’, *Sepat ‘4’, *lima ‘5’, *enem ‘6’, *pitu
‘7’, *walu ‘8’, *Siwa ‘9’, *puluq ’10.
12345678910
Proto-Atayalic X X X X X
Saisiyat X X X X R
Pazeh X X X R
Thao X X X X X X
Kavalan X X X X X X X X X
Amis XXXXXXXX(X)
Bunun XXXXXXXXX
Tsou XXXXXXXX
Kanakanabu X X X X X X X X
Saaroa X X X X X X X X
Taokas X X X R X
Babuza X X X R X
Papora X X X X X X?
Hoanya X X X X X ? X
Siraya X X X X X X
Rukai X X X X X X X X (X)
Puyuma XXXXXXXXXX
Paiwan XXXXXXXXXX
depart markedly from commonly accepted constraints on use
Robert Blust
University of Hawai’i
Introduction. In its broadest sense, ‘linguistic classification’ refers to three distinct
scholarly enterprises: 1. the establishment of genetic relationship between languages, 2.
the subrelationship (subgrouping) of languages that are known to be related, and 3. the
grouping of the world’s languages by structural type. I will not be concerned with the
last of these endeavors, but will focus instead on the first two as they apply to the
Austronesian (AN) language family.
32
There have been a number of proposals over the past two centuries concerning the
classification of the AN languages, and some of these are noted briefly in Blust (2009).
My concern here is narrower, but also deeper and more focused, as it is a close
examination two radically different claims about the external relationships of the AN
languages, namely the Sino-Tibetan-Austronesian (STAN) hypothesis of the Sinologist
Laurent Sagart (1990, 1993, 1994, 1995, 2005), and the Austronesian-Ongan (AO)
hypothesis of the theoretical phonologist Juliette Blevins (2007). Both of these scholars
are distinguished in their own fields, yet in the arguments that each presents the seeds of
self-deception are everywhere apparent. Each claims to be using the comparative method
of linguistics, yet each departs in significant ways from normal use of that method. What
is revealing about a careful examination of such arguments is that it shows some striking
parallels in terms of methodological self-blindness that are shared with the authors of
many earlier claims about the external relationships of the AN languages, some made by
obscure writers, but others --- like those of Sagart and Blevins --- made by scholars who
have made significant contributions in other areas of linguistics.
In addition to providing a careful dissection of the arguments supporting the STAN and
AO hypotheses I offer a critique of Sagart (2004), which is concerned not with the
external relationships of AN languages, but with relationships within the AN family,
which in his view, includes the Tai-Kadai languages. Since my concern with Sagart’s
work is more complex in that is examines claims about both external and internal
classification, I will deal with it last.
The Austronesian-Ongan hypothesis. Blevins (2007) startled the scholarly world with
a novel proposal that many considered almost unimaginable before her: Jarawa and Önge
of the southern Andaman Islands form a linguistic clade (Öngan) that is coordinate with
the AN language family in a superfamily she calls ‘Austronesian-Ongan’.
0. Linguistic phylogenies: the two levels of classification (GR vs. s.g.ing).
1. GR:
a) Claims of Sagart and Blevins
b) Claims for external relations of ST
1. Tai-Kadai
2. IE
3. N. Caucasian
4. AN
9. Tai-Kadai (Benedict, Ostapirat etc.)
10. ST (Sagart)
11. Papuan (Lynch?)
12. Ongan (Blevins)
d) Examination of sample evidence for STAN
33
e) Examination of sample evidence for AO
II. Subrelationship
a) Sagart on ‘higher phylogeny of AN’
1. nesting of groups based on numerals
2. conflict with East Formosan
3. ad hoc assumptions of change
4. place of Tai-Kadai: why e.g. MAL-PAI far more similar than either to T-K?
==========================================================
To add:
METHOD
a) What is a ‘method’?
b) What has made the CM of linguistics a valid tool of science?
c) What is the CM designed to teach us about the nature of similarity between lgs?
(chance, universals, borrowing, GR)
d) How can the CM be compromised by scholars who claim to be using it?
THE TOOLKIT OF THE LONG-RANGER, OR HOW TO DEFEAT THE
COMPARATIVE METHOD OF LINGUISTICS
6. ad hoc hypotheses of change (Sagart)
7. other? (Kempler-Cohen?)
REFERENCES
Benedict, Paul K. 1942. Thai, Kadai and Indonesian: a new alignment in southeastern
Asia. American Anthropologist 44: 576-601.
__________. 1975. Austro-Thai: language and culture, with a glossary of roots. New
Haven: Human Relations Area Files.
Blevins, Juliette. 2007. A long lost sister of Proto-Austronesian? Proto-Ongan, mother
of Jarawa and Onge of the Andaman Islands. Oceanic Linguistics 46: 154-198.
Blust, Robert. 1981. Linguistic evidence for some early Austronesian taboos. American
Anthropologist 83.2: 285-319.
__________. 1995. An Austronesianist looks at Sino-Austronesian. In William S-Y.
34
Wang, ed., The ancestry of the Chinese language: 283-298. Journal of Chinese
Linguistics Monograph Series no. 8.
__________. 2010. Five patterns of semantic change in Austronesian languages. In
John Bowden, Nikolaus P. Himmelmann and Malcolm Ross, eds., A journey
through Austronesian and Papuan linguistic and cultural space: papers in
honour
of Andrew Pawley: 525-545. Canberra: Pacific Linguistics (PL 615).
__________. 2013. Terror from the sky: unconventional linguistic clues to the Negrito
past. In Phillip Endicott, ed., Revisiting the ‘Negrito’ Hypothesis, an Inter-
disciplinary Synthesis of the Prehistory of Southeast Asia. Human Biology
85.1: 401-416 [special issue].
__________, and Stephen Trussel. Ongoing. Austronesian comparative dictionary
(online open access site at: www.trussel2.com/ACD).
Bopp, Franz. 1841. Über die Verwandtschaft der malaisch-polynesischen Sprachen mit
den indo-europäischen. Gelesen in der Akademie der Wissenschaften am 10.
Aug. und 10. Dec. 1840. Berlin: Dümmler.
Campbell, Lyle, and William J. Poser. 2008. Language classification: history and
method. Cambridge: Cambridge University Press.
Cooper, Zarine. 2002. Archaeology and history: early settlements in the Andaman
islands. Oxford: Oxford University Press.
Dempwolff, Otto. 1934-1938. 3 vols. Vergleichende Lautlehre des austronesischen
Wortschatzes. Zeitschrift für Eingeborenen-Sprachen, Supplement 1. Induktiver
Aufbau einer indonesischen Ursprache (1934), Supplement 2. Deduktive
Anwendung des Urindonesischen auf austronesische Einzelsprachen (1937),
Supplement 3. Austronesisches Wörterverzeichnis (1938). Berlin: Reimer.
Friedlaender, Jonathan Scott, ed. 2007. Genes, language, and culture history in the
southwest Pacific. Oxford University Press.
Geraghty, Paul A. 1983. The history of the Fijian languages. Oceanic Linguistics
Special Publication 19. Honolulu: University of Hawaii Press.
Greenberg, Joseph. 1957. Essays in linguistics. Chicago: The University of Chicago
Press.
Jakobson, R. 1960. Why mama and papa? In B. Kaplan and S. Wapner, eds.,
Perspectives in psychological theory dedicated to Heinz Werner: 124-134. New
York: International Universities Press, Inc.
35
Kempler-Cohen, E.M. 2012. Austronesian cognates in Quechua – Part 1. The
Philippine Journal of Linguistics 43: 1-46.
Leplin, Jarrett. 1975. The concept of an ad hoc hypothesis. Studies in the history and
philosophy of science 5: 309-345.
Matthey de l’Etang, Alain, and Pierre Bancel. 2002. Tracing the ancestral kinship
system: the global etymon KAKA. Mother Tongue VII: 209-222.
Reid, Lawrence A., ed. 1971. Philippine minor languages: word lists and phonologies.
Oceanic Linguistics Special Publication 8. Honolulu: University of Hawaii Press.
Ruhlen, Merritt. 1987. A guide to the world’s languages, vol. 1: classification. Stanford,
California: Stanford University Press.
Sagart, Laurent. 2004. The higher phylogeny of Austronesian and the position of Tai-
Kadai. OL 43:411-444.
__________. To appear. In defense of the numeral-based model of Austronesian
phylogeny, and of Tsouic. Ms. 25 pp. Language and Linguistics 15.
Thomason, Sarah Grey, and Terrence Kaufman. 1991 [1988]. Language contact,
creolization, and genetic linguistics. Berkeley: University of California Press.
==================
Although the Negritos of Malaya and the Philippines have experienced considerable gene
flow from neighboring non-Negrito populations, and are consequently quite varied in
physical type, they tend to be noticeably darker than other populations of the region, with
curly to kinky hair. In the Andamans, where admixture with other groups was virtually
non-existent until the nineteenth century British colonial administration of India began to
export Indian political prisoners to Port Blair in Great Andaman, the people are
physically short, and jet black, with tightly curled hair, and
=======================
Negrito populations have a scattered distribution, occupying a solid block in the
Andaman Islands, pockets within the interior of the Malay peninsula, and isolated
fragments in the Philippine Islands that include a number of remote areas of both eastern
and western Luzon, the mountains of Palawan Island, the islands of Panay and Negros in
the central Philippines and northeast Mindanao in the southern Philippines.
Genetic studies have shown that even within the Philippines, Negrito groups have been
separated for at least 10,000 [CHECK DATE] years (Omoto et al. 1981, Delfin et al.
2011). Despite this separation time there is intriguing evidence of a common culture
trait, called the ‘thunder complex’ that is shared by at least the Negritos of the Malay
peninsula and those of the Philippines (Blust 1981, 2013).
36
37
... subgroup in the homeland (Blust 2014;Ross 2005). As part of the Austronesian languages, Banjarese does have a rich inventory of affixes that are added to roots to form the words of the language. ...
... The similarities with Proto-Basque are striking. The pao forms are from Blust's (2014) list extracted from Blevins (2007), and the pb (Proto-Basque) forms are quoted from Blevins (2018). The quotes end with pages numbers in Blevins (2018) between brackets, and some additions are by me when Blevins does include roots with these meanings in her (2018) book. ...
... The tree phylogenies used in this study are: 28. The tree of Glottolog 4.5 (Hammarström et al. 2021) is based on work by Blust (2009Blust ( , 2014 and Blust and Chen (2017). ...
Article
Full-text available
Ancestral State Reconstruction (ASR) is an essential part of historical linguistics (HL). Conventional ASR in HL relies on three core principles: fewest changes on the tree, plausibility of changes and plausibility of the resulting combinations of features in proto-languages. This approach has some problems, in particular the definition of what is plausible and the disregard for branch lengths. This study compares the classic approach of ASR to computational tools (Maximum Parsimony and Maximum Likelihood), conceptually and practically. Computational models have the advantage of being more transparent, consistent and replicable, and the disadvantage of lacking nuanced knowledge and context. Using data from the structural database Grambank, I compare reconstructions of the grammar of ancestral Oceanic languages from the HL literature to those achieved by computational means. The results show that there is a high degree of agreement between manual and computational approaches, with a tendency for classical HL to ignore branch lengths. Explicitly taking branch lengths into account is more conceptually sound; as such the field of HL should engage in improving methods in this direction. A combination of computational methods and qualitative knowledge is possible in the future and would be of great benefit.
... Initially, Great Andamanese was considered an 'isolate' (Basu 1952;1955;Manoharan 1980;1983). Categorisation of Onge-Jarawa as 'Ongan', within Austronesian argued for by Blevins (2007) may have its merits but has proved controversial and far from universally accepted (see Blust 2014). 6 Although it is not conclusively established whether the group Jarawa-Onge belongs to Austronesian, its typological and genealogical distinction from Great Andamanese has been established by Abbi (2003), who finds corroboration in 5 Although the Tai group of languages were considered to be the members of the Siamese-Chinese family of the Indo-Chinese forms of speech (Grierson 1904: 59-61), subsequent researchers establish that these languages spoken in India belong to the 'Southwestern branch of the Tai family and some, maybe all, have been in the area since the 13th century AD'. ...
Chapter
Full-text available
The Great Andamanese is a generic term used to refer to ten different tribes who spoke closely related varieties of the same language in the entire set of the Andaman Islands in the Bay of Bengal. Their language is known by the same name, i.e. Great Andamanese. It constitutes the sixth language family of India, the other five being Indo-Aryan, Dravidian, Tibeto-Burman, Austroasiatic, and Tai-Kadai, all of them spoken in mainland India.
... Initially, Great Andamanese was considered an 'isolate' (Basu 1952;1955;Manoharan 1980;1983). Categorisation of Onge-Jarawa as 'Ongan', within Austronesian argued for by Blevins (2007) may have its merits but has proved controversial and far from universally accepted (see Blust 2014). 6 Although it is not conclusively established whether the group Jarawa-Onge belongs to Austronesian, its typological and genealogical distinction from Great Andamanese has been established by Abbi (2003), who finds corroboration in 5 Although the Tai group of languages were considered to be the members of the Siamese-Chinese family of the Indo-Chinese forms of speech (Grierson 1904: 59-61), subsequent researchers establish that these languages spoken in India belong to the 'Southwestern branch of the Tai family and some, maybe all, have been in the area since the 13th century AD'. ...
Chapter
Full-text available
Historical Linguistics and Linguistic Typology have been used to demonstrate that PGA is an independent language family of India. Data from extra-linguistic sources such as anthropology, archaeology and genetics have been used as additional supportive evidence. This chapter will give a summary of the findings and will familiarise the audience with some distinct characteristics of the highly endangered language of the hunter-gatherer society of the Great Andamanese population.
Article
Voices from the Lost Horizon: Stories and Songs of the Great AdamaneseAnvita Abbi (2021)New Delhi: Niyogi Books. Pp. 176ISBN: 978-93-91125-06-6 (pbk)
Article
An understudied morphosyntactic innovation, reanalysis of the Proto-Austronesian (PAn) stative intransitive prefix *ma- as a transitive affix, offers new insights into Austronesian higher-order subgrouping. Malayo-Polynesian is currently considered a primary branch of Austronesian, with no identifiably closer relationship with any linguistic subgroup in the homeland ( Blust 1999 , 2009/2013 ; Ross 2005 ). However, the fact that it displays the same innovative use of ma- with Amis, Siraya, Kavalan and Basay-Trobiawan and shares the merger of PAn *C/t with this group suggests that Malayo-Polynesian and East Formosan may share a common origin – the subgroup that comprises the four languages noted above. This observation points to a revised subgrouping more consistent with a socio-historical picture where the out-of-Taiwan population descended from a seafaring community expanding to the Batanes and Luzon after having developed a seafaring tradition. It also aligns with recent findings in archaeology and genetics that (i) eastern Taiwan is the most likely starting point of Austronesian dispersal ( Hung 2005 , 2008 , 2019 ; Bellwood 2017 ; Bellwood & Dizon 2008 ; Carson & Hung 2018 ) and (ii) that the Amis bear a significantly closer relationship with Austronesian communities outside Taiwan ( Capelli et al. 2001 ; Trejaut et al. 2005 ; McColl et al. 2018 ; Pugach et al. 2021 ; Tätte et al. 2021 ). Future investigation of additional shared innovations between Malayo-Polynesian and East Formosan could shed further light on their interrelationships.
Article
This study investigates the relatedness and history of the Austronesian languages of Borneo, which is the third largest island in the world and home to significant linguistic diversity. We apply Bayesian phylogenetic dating methods to lexical cognate data based on four historical calibration points to infer a dated phylogeny of 87 languages. The inferred tree topology agrees with the mid and lower-level subgrouping proposals based on the classical comparative method, but suggests a different higher-level organization. The root age of the dated tree is shallower than the archaeological estimates but agrees with a hypothesis of a past linguistic leveling event. The inferred homelands of the major linguistic subgroups from a Bayesian phylogeographic analysis agree with the homeland proposals from archaeology and linguistics. The inferred homelands for four of the eight subgroups support the riverine homeland hypothesis whereby the major linguistic subgroups developed initially in communities situated along Borneo’s major rivers.
Article
Full-text available
A set of unique circumstances created a durable archaeological record of ancient human migration from Southeast Asia to Remote Oceania, useful as a global model of population dispersals. Finely made pottery with a very specific decorative signature is found in multiple locations in the Philippines and western Oceania, constituting a shared cultural trait that can be traced, both geographically and chronologically, to a specific homeland. Especially important for human migration models, this decorated pottery is linked to a system of cultural origin, so the spread as a diagnostic tradition can be related to the spread of a cultural group. Even more important, this decorated pottery appeared with the first peopling of the remote Pacific Islands, thus providing a clear and datable chronicle of where and when people spread from one location to another. The pottery trail points to a homeland in the Philippine Neolithic about 2000–1800 BC, followed by expansion into the remote Mariana Islands 1500 BC, and then slightly later into the Lapita world of Melanesia and Polynesia.
Article
Full-text available
There are three possible ways to account for the Austronesian look-alikes in Tai-Kadai: common inheritance, that is, the two languages families are genetically related; language contact, that is, the forms were borrowed into Tai-Kadai from Austronesian; and, chance, that is, the forms are merely look-alikes and nothing more. The evidence provided by recent reconstructions of various subgroups of Tai-Kadai shows that the Tai-Kadai forms are neither inherited on the one hand nor mere look-alikes on the other. Further, the reconstructions and the subgrouping evidence show that the bulk of the Tai-Kadai borrowing was from an early (pre-)Austronesian source and that the contact occurred in southwestern China and predated the Austronesian movement out onto the islands.
Article
Full-text available
Linguistic evidence for a knowledge of iron that predates the archaeological evidence for iron technology has had a checkered history in Austronesian linguistics over the past four decades. This squib reevaluates five comparisons first proposed by the writer in 1976, and discards three of them. Based on internal Formosan evidence from languages belonging to different primary branches of the family, it then draws attention to two new comparisons relating to iron that are not likely to be due to diffusion, and raises the question once more whether a knowledge of iron might have preceded iron-working in the Austronesian world by several millennia.
Conference Paper
This is a general survey of cardinal numerals in Formosan languages. Most languages distinguish between human and nonhuman numerals, not only in cardinal numerals, but also for terms that have to do with number, such as 'how(l) many/much', 'many/much', and even 'few/little'. Some languages have a third set of numerals as used in counting, different,from both human and nonhuman numerals. Tables of the numerals I-IO in the still extant Formosan languages are given in the appendixes. Most Formosan languages retain a decimal system, although a few numerals may have been modified in some of the languages. Pazih is the only language that has nearly a quinary system. Numerals may be derived from other numeral stems by addition, subtraction, or multiplication, and some are unique to Formosan languages. Numerals may function either as nouns or verbs, depending on their syntactic position, and they may appear in simple or derived form.
Article
Ross (2009) proposed the Nuclear Austronesian hypothesis, whereby Puyuma, Tsou and Rukai are each single-member first-order subgroups of Austronesian and all other Austronesian languages belong to a Nuclear Austronesian subgroup. The basis of this subgrouping is a complex innovation whereby certain Proto Austronesian nominalizers came also to mark indicative verbs. This paper falls into two parts. The first surveys kinds of evidence that historical linguists use in subgrouping and proposes metrics (§2) that are then applied to the innovations that support Nuclear Austronesian (§3) and other recent first-order subgroupings of Austronesian (principally Formosan) languages (§4). The second part argues that the commonly accepted Tsouic subgroup, which is incompatible with the Nuclear Austronesian hypothesis, is not supported by the evidence. Instead it reflects longterm contact between Tsou on one hand and Kanakanavu and Saaroa on the other (§5). In conclusion, it is tentatively suggested that the southern part of the Taiwan highlands appears to be the oldest Austronesian homeland area.
Article
Hello, Just google 'Robert Blust, Austronesian comparative dictionary and you will have the most complete set of Austronesian etymologies available anywhere. If you still have questions please email me (blust@hawaii.edu).
Article
The proposal that a genetic relationship exists between the Chinese and Austronesian (An) languages was first made by A. Conrady (1916, 1923) and K. Wulff (1942) on the basis of lexical evidence and typological observations of a fragmentary nature. Sagart (1990, 1993a), which are based on modern reconstructions of Old Chinese (OC) and Proto-Austronesian (PAn), expand the number of lexical comparisons (without attempting to reconstruct their common ancestor), establish new sound correspondences (that account in particular for the origin of Chinese tones), and present evidence of morphological congruence. Criticism of this work (Matisoff 1992) has concentrated on the quality of the lexical comparisons. In the present study I adopt a highly constrained methodology of lexical comparison (Section 3), allowing only semantically close, non-onomatopoetic comparisons based on Chinese words attested during the OC period and An words reconstructible at the highest level (PAn). Despite these constraints-which go far beyond most of the comparative work currently done in Sino-Tibetan research-56 comparisons are presented. They relate for the most part to noncultural notions and exhibit generally the same system of sound correspondences as in my earlier work, thus tending to confirm it. An updated account of the morphological processes shared by OC and PAn is given in Section 2, including new evidence for an intransitive or stative nasal prefix N- in OC, corresponding to PAn stative ma-. Supporting evidence from physical anthropology is described in Section 6, and Section 7 briefly discusses Sino-Austronesian vis-à-vis Austric and the Tibeto-Burman languages.