In this paper, advantages and disadvantages of using parallel texts in typological studies are considered according to the criteria of diversity, domains, analysis, perspective, quality, representativity, and comparability. It is shown in a case study of multi-verb constructions (including serial verb constructions, converb constructions, etc.) in two motion event domains (BRING and RUN) how typology can profit from parallel texts especially in the investigation of quantitative variables. A method is introduced to transform features with continuous distributions into ternary features with low, intermediate, and high values which can then be tested for correlations.
MICHAEL CYSOUW (Leipzig) CHRISTIAN BIEMANN (Leipzig) MATTHIAS ONGYERTH (Leipzig) Using StrongÕs Numbers in the Bible to test an automatic alignment of parallel texts 1 We describe a method for the automatic alignment of parallel texts using co-occurrence statistics. The assumption of this approach is that words which are often found together are linked in some way. We employ this assumption to automatically suggest links between words in different languages, using Bible verses as information units. The result is a word-byword alignment between between different translations of the Bible. The accuracy of our method is evaluated by using StrongÕs numbers as a benchmark. Overall, the performance is high, indicating that this approach can be used to give an approximate gloss of Bible verses.
An approach to the classification of languages through automated lexical comparison is described. This method produces near-expert classifications. At the core of the approach is the Automated Similarity Judgment Program (ASJP). ASJP is applied to 100-item lists of core vocabulary from 245 globally distributed languages. The output is 29,890 lexical similarity percentages for the same number of paired languages. Percentages are used as a database in a program designed originally for generating phylogenetic trees in biology. This program yields branching structures (ASJP trees) reflecting the lexical similarity of languages. ASJP trees for languages of the sample spoken in Middle America and South America show that the method is capable of grouping together on distinct branches languages of non-controversial genetic groups. In addition, ASJP sub-branching for each of nine respective genetic groups Mayan, Mixe-Zoque, Otomanguean, Huitotoan-Ocaina, Tacanan, Chocoan, Muskogean, Indo-European, and Austro-Asiatic agrees substantially with subgrouping for those groups produced by expert historical linguists. ASJP can be applied, among many other uses, to search for possible relationships among languages heretofore not observed or only provisionally recognized. Preliminary ASJP analysis reveals several such possible relationships for languages of Middle America and South America. Expanding the ASJP database to all of the world′s languages for which 100-word lists can be assembled is a realistic goal that could be achieved in a relatively short period of time, maybe one year or even less.
The semantics of natural gender in animate nouns is modeled in the framework of bidirectional Optimality Theory (OT). This allows for the interaction of lexical, conceptual and contextual constraints and for a straightforward treatment of the effect of blocking in this domain. Two versions of bidirectional OT are discussed and related to each other in terms of Blutner’s (2007) notion of fossilization.
In this contribution we seek support for the hypothesis that Otomí, a language from Mexico, which is in intense contact with Spanish, is developing a specialized set of adjectives, a category that is lacking from the classical language. Arguments are derived from two sources, a corpus of translations into Otomí of around 750 Spanish adjectives, and a corpus of 59 short texts in spoken Otomí, in which a number of Spanish adjectives were found as loanwords. Pointing out several changes in the contemporary language, mainly among younger, often bilingual speakers, we provide support for our hypothesis, which suggests that Otomí is undergoing a typological change.
In the past 20 years, a new class of verbs has seen the light of existence: 'unaccusative' or 'ergative' verbs. These verbs are intransitive, but different from the traditional notion of intransitive to the extent that their subject valency behaves like a direct object distributionally. Ever since the introduction of this new grammatical notion in (typologically non-ergative, i.e., accusative) languages like English a vast bulk of literature on this topic has come forth. The present article takes issue with this mainly Anglophil notion of unaccusativity/ergativity. The claim is that this notion does not make sense in languages which provide aspectual or aktionsart distinctions of perfectivity. 'Unaccusatives' are intransitive perfectives. This argument is carried through primarily on the empirical basis of German.
Parallel texts are texts in different languages that can be considered translational equivalent. We introduce the notion 'massively parallel text' for such texts that have translations into very many languages. In this introduction we discuss some massively parallel texts that might be used for the investigation of linguistic diversity. Further, a short summary of the articles in this issue is provided, finishing with a prospect on where the investigation of parallel texts might lead us. This issue grew out of a workshop with the same title held on April fool's day 2005 at the Max Planck Institute for Evolutionary Anthropology in Leipzig. Be- sides the present contributors, there was also a presentation by JOHAN VAN DER AUWERA on his work with parallel texts, which has already been published else- where (VAN DER AUWERA et al. 2005). The main goal of the workshop, and of this issue, was to bring together typologists that have been working with translated texts. The articles in this issue give a survey of past experiences, some words of caution for future aspirants in this line of research, but also various bold attempts to employ this rich source of data in spite of all possible problems.
Various typologically recurrent properties of reference-tracking systems can be given a coherent explanatory account in terms of functional, rather than formal criteria. Two principles in particular are proposed, namely that coreference is more likely to be marked than non-coreference in local domains (e.g. the arguments of a single predicate), whereas non-coreference is more likely to be marked than coreference in extended domains (e.g. across clause boundaries). For the former principle a cognitive explanation is proposed, while it is suggested that the latter principle has a discourse basis.
This paper investigates different means of expressing natural gender in personal terms, namely derivational suffixes, different inflectional classes, and the inflection of pronouns, adjectives, and determiners for grammatical gender in the history of Ger-man, English, and Swedish. These three languages were chosen as representatives of different routes of typological development among the Germanic languages. The major typological division, which can only in part be described in the classical terms of syntheticity vs. analyticity, separates English and Swedish on the one hand from German on the other. The differences in the expression of natural gender are related to these typological differences.