About
21
Publications
28,847
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
808
Citations
Introduction
I am currently a PhD student and research assistant at the University of Zurich, working on language contact in Northern Myanmar.
Current institution
Additional affiliations
May 2017 - present
Klubschule Migros
Position
- Instructor
Description
- Klingon language course
April 2013 - March 2014
September 2006 - February 2014
Education
March 2014 - May 2020
August 2009 - August 2010
Yunnan University for Nationalities
Field of study
- Chinese Language
October 2004 - March 2013
Publications
Publications (21)
With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, Web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatri...
With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. However, to date there has been no systematic analysis of the quality of these publicly available datasets, or whether the datasets actu...
The Kachin are a multilingual ethnic group, inhabiting the northern part of Myanmar and parts of Yunnan Province in China, where they are subsumed under the name Jingpo. In this paper, I argue that they do not only share many cultural peculiarities due to intense contact between the subgroups, but also a number of linguistic features, indicating th...
In the “Greater Burma Zone”, an area that includes Myanmar and adjacent regions of the neighboring countries, there are two different systems of personal pronouns that occur predominantly: a grammatical one and one that we call “hierarchical system”. The aim of this paper is to explain the two systems and their development. A sample of 42 languages...
Speakers of the various Kachin languages often use the expression ‘Kachin’ or ‘Kachin language’ when speaking in English or Burmese to refer to the Jinghpaw language. There is, however, no single ‘Kachin’ language. The languages included in the super-ethnic category ‘Kachin’ include Jinghpaw itself, also spoken in China and Northeast India, where i...
Multilingualism among the Kachin: Who speaks what?
The Northeast Caucasian language Tsez has a system of four grammatical genders, partly based on the natural gender. Inanimate nouns, however, can appear in either of three of these genders. This magister thesis explores how loanwords from various donor languages such as Russian, Avar, Arabic, Georgian, and Persian are integrated into Tsez and analy...
A systematic, computer-automated tool for narrowing down the homelands of linguistic families is presented and applied to 82 of the world's larger families. The approach is inspired by the well-known idea that the geographical area of maximal diversity within a language family corresponds to the original homeland. This is implemented in an algorith...
This paper describes a computerized alternative to glottochronology for estimating elapsed time since parent languages diverged into daughter languages. The method, developed by the Automated Similarity Judgment Program (ASJP) consortium, is different from glottochronology in four major respects: (1) it is automated and thus is more objective, (2)...
This paper applies a computerized method related to that of glottochronology and addresses the question whether such a method is useful as a heuristic for identifying deep genealogical relations among languages. We first measure lexical similarities for pairs of language families that are normally assumed to be unrelated, using a modification of th...
This paper explores how similar the Automated Similarity Judgment Program (ASJP) estimates a number of artificial languages (Esperanto, Ido, Interlingua, Lojban, Slovio, Toki Pona, Volapük, Klingon, Sindarin, Quenya) to be to natural languages, judged by their lexical and phonetic distance. Not surprisingly, a posteriori languages appear in the sim...
A systematic, computer-automated tool for narrowing down the homelands of linguistic families is presented and applied to 82 of the world's larger families. The approach is inspired by the well-known idea that the geographical area of maximal diversity within a language family corresponds to the original homeland. This is implemented in an algorith...
The ASJP project aims at establishing relationships between languages on the basis of the Swadesh word list. For this purpose, lists have been collected and phonologically transcribed for almost 3,500 languages. Using a method based on the algorithm proposed by Levenshtein (Cybernetics and Control Theory 10: 707–710, 1966), a custom-made computer p...
The World Language Tree graphically illustrates relative degrees of lexical similarity holding among 3384 of the world's languages and dialects (henceforth, languages) currently found in the ASJP database (ASJP stands for Automated Similarity Judgment Program). Languages branched more closely together on the ASJP tree are lexically more similar tha...
An earlier paper, to which some authors of the present paper have contributed (Brown et al. 2008), describes a method for automating language classification based on the 100-item referent list of Swadesh (1955). Here we discuss a refinement of the method, involving calculation of relative stabilities of list items and reduction of the list to a sho...
This contribution is concerned with alphabets and writing systems – both endangered and already extinct ones – as subsystems of language and their role in language death. It describes various reasons for different kinds of endanger- ment (entire writing systems vs. single graphemes) and draws examples from languages of different families and differ...
In this presentation I will examine the genetic relationships among nine Tibeto-Burman varieties spoken in the Kinnaur region in the Himachal Pradesh state in India, using a computational approach applied to empirical primary language data. Some older works make brief mention of some Kinnauri varieties (e.g., Gerard 1842; Cunningham 1844). However,...
Questions
Question (1)
I have a diglossic situation, in which the main different between L and H seems to be in the realm of grammar (also vocabulary, but less so). Can you recommend any specific literature on that, especially contemporary theories? I have already covered most of the general literature on diglossia.