Simon J Greenhill

Simon J Greenhill
Max Planck Institute for the Science of Human History · Department of Linguistic and Cultural Evolution

PhD

About

112
Publications
63,154
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,636
Citations
Additional affiliations
May 2016 - present
Max Planck Institute for the Science of Human History
Position
  • Researcher
March 2012 - May 2016
Australian National University
Position
  • DECRA Research Fellow
January 2009 - February 2012
University of Auckland
Position
  • PostDoc Position

Publications

Publications (112)
Article
Full-text available
The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computation...
Article
The Chapacuran language family, with three extant members and nine historically attested lects, has yet to be classified following modern standards in historical linguistics. This paper presents an internal classification of these languages by combining both the traditional comparative method (CM) and Bayesian phylogenetic inference (BPI). We ident...
Article
Full-text available
The island of New Guinea has the world's highest linguistic diversity, with more than 900 languages divided into at least 23 distinct language families. This diversity includes the world's third largest language family: Trans-New Guinea. However, the region is one of the world's least well studied, and primary data is scattered across a wide range...
Article
Full-text available
The effect of population size on patterns and rates of language evolution is controversial. Do languages with larger speaker populations change faster due to a greater capacity for innovation, or do smaller populations change faster due to more efficient diffusion of innovations? Do smaller populations suffer greater loss of language elements throu...
Article
Full-text available
The Bantu expansion transformed the linguistic, economic, and cultural composition of sub-Saharan Africa. However, the exact dates and routes taken by the ancestors of the speakers of the more than 500 current Bantu languages remain uncertain. Here, we use the recently developed “break-away” geographical diffusion model, specially designed for mode...
Preprint
Full-text available
The Uto-Aztecan language family is one of the largest language families in the Americas. However, there has been considerable debate about its origin and how it spread. Here we use Bayesian phylogenetic methods to analyze lexical data from 34 Uto-Aztecan varieties and 2 Kiowa-Tanoan languages. We infer the age of Proto-Uto-Aztecan to be around 4,10...
Article
Full-text available
The past decades have seen substantial growth in digital data on the world’s languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to diverse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, most published datasets lack standardizatio...
Article
Full-text available
Language diversity is under threat. While each language is subject to specific social, demographic and political pressures, there may also be common threatening processes. We use an analysis of 6,511 spoken languages with 51 predictor variables spanning aspects of population, documentation, legal recognition, education policy, socioeconomic indicat...
Article
Full-text available
While most animals play, only humans play games. As animal play serves to teach offspring important life-skills in a safe scenario, human games might, in similar ways, teach important culturally relevant skills. Humans in all cultures play games; however, it is not clear whether variation in the characteristics of games across cultural groups is re...
Article
Full-text available
Since Darwin’s “On the Origin of Species,” lin­guists and geneticists have implicitly or explicitly connected languages with genet­ics. A recent paper by Matsumae et al. (1) provides a new method for assessing whether different lines of evidence consistently tell similar stories about the past and applies this approach to genetic, linguistic, and m...
Article
Full-text available
Bayesian phylogenetic methods provide a set of tools to efficiently evaluate large linguistic datasets by reconstructing phylogenies—family trees—that represent the history of language families. These methods provide a powerful way to test hypotheses about prehistory, regarding the subgrouping, origins, expansion, and timing of the languages and th...
Preprint
Full-text available
For over a century, the phoneme has played a central role in linguistic research. In recent years, collections of phoneme inventories, originally designed for cross-linguistic purposes, have increasingly been used in comparative studies involving neighbouring disciplines. Despite the extended application of this type of data, there has been no rese...
Preprint
Full-text available
Humans currently collectively use thousands of languages1,2. The number of languages in a given region (i.e. language “richness”) varies widely3–7. Understanding the processes of diversification and homogenization that produce these patterns has been a fundamental aim of linguistics and anthropology. Empirical research to date has identified variou...
Article
Full-text available
Modern phylogenetic methods are increasingly being used to address questions about macro-level patterns in cultural evolution. These methods can illuminate the unobservable histories of cultural traits and identify the evolutionary drivers of trait change over time, but their application is not without pitfalls. Here, we outline the current scope o...
Article
Full-text available
In this paper, past plant knowledge serves as a case study to highlight the promise and challenges of interdisciplinary data collection and interpretation in cultural evolution. Plants are central to human life and yet, apart from the role of major crops, people–plant relations have been marginal to the study of culture. Archaeological, linguistic,...
Article
Full-text available
Language documentation faces a persistent and pervasive problem: How much material is enough to represent a language fully? How much text would we need to sample the full phoneme inventory of a language? In the phonetic/phonemic domain, what proportion of the phoneme inventory can we expect to sample in a text of a given length? Answering these que...
Article
Full-text available
Across the world people in different societies structure their family relationships in many different ways. These relationships become encoded in their languages as kinship terminology, a word set that maps variably onto a vast genealogical grid of kinship categories, each of which could in principle vary independently. But the observed diversity o...
Article
Full-text available
Humans in most cultures around the world play rule-based games, yet research on the content and structure of these games is limited. Previous studies investigating rule-based games across cultures have either focused on a small handful of cultures, thus limiting the generalizability of findings, or used cross-cultural databases from which the raw d...
Preprint
Full-text available
Modern phylogenetic methods are increasingly being used to address questions about macro-level patterns in cultural evolution. These methods can illuminate the unobservable histories of cultural traits and identify the evolutionary drivers of trait-change over time, but their application is not without pitfalls. Here we outline the current scope of...
Article
Full-text available
Historical linguistics has long dabbled in computational and quantitative approaches. More recently, new Bayesian phylogenetic methods from evolutionary biology – which do not share the fatal shortcomings of lexicostatistics and glottochronology – have been applied to linguistic questions. This chapter reviews the history of quantitative approaches...
Article
Full-text available
Language is one of the most complex of human traits. There are many hypotheses about how it originated, what factors shaped its diversity, and what ongoing processes drive how it changes. We present the Causal Hypotheses in Evolutionary Linguistics Database (CHIELD, https://chield.excd.org/), a tool for expressing, exploring, and evaluating hypothe...
Article
Full-text available
Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness...
Article
Full-text available
The diverse way that languages convey emotion It is unclear whether emotion terms have the same meaning across cultures. Jackson et al. examined nearly 2500 languages to determine the degree of similarity in linguistic networks of 24 emotion terms across cultures (see the Perspective by Majid). There were low levels of similarity, and thus high var...
Preprint
Full-text available
Social inequality is now pervasive in human societies, despite the fact that humans lived in relatively egalitarian, small-scale societies across most of our history. Prior literature highlights the importance of environmental conditions, economic defensibility, and wealth transmission for shaping early Holocene origins of social inequality. Howeve...
Article
Full-text available
Significance Given its size and geographical extension, Sino-Tibetan is of the highest importance for understanding the prehistory of East Asia, and of neighboring language families. Based on a dataset of 50 Sino-Tibetan languages, we infer phylogenies that date the origin of the language family to around 7200 B.P., linking the origin of the langua...
Article
Full-text available
Language diversity is distributed unevenly over the globe. Intriguingly, patterns of language diversity resemble biodiversity patterns, leading to suggestions that similar mechanisms may underlie both linguistic and biological diversification. Here we present the first global analysis of language diversity that compares the relative importance of t...
Article
Full-text available
Although many hypotheses have been proposed to explain why humans speak so many languages and why languages are unevenly distributed across the globe, the factors that shape geographical patterns of cultural and linguistic diversity remain poorly understood. Prior research has tended to focus on identifying universal predictors of language diversit...
Article
Full-text available
treemaker is a Python library to convert a text-based classification schema into a Newick file for use in phylogenetic and bioinformatic programs.
Preprint
Full-text available
Change is coming to historical linguistics. Big, or at least “bigish data” (Gray and Watts 2017), are now becoming increasingly available in the form of large web accessible lexical, typological and phonological databases (e.g. ABVD (Greenhill et al 2008), Chirilla (Bowern 2016), Phoible (Moran 2014), WALS (Haspelmath 2014), Autotyp (Bickel et al 2...
Article
Full-text available
The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative proposes new standards for two basic types of data in historical and typol...
Preprint
Full-text available
Language diversity is distributed unevenly over the globe. Why do some areas have so many different languages and other areas so few? Intriguingly, patterns of language diversity resemble biodiversity patterns, leading to suggestions that similar mechanisms may underlie both linguistic and biological diversification. Here we present the first globa...
Article
Full-text available
A growing number of studies seek to identify predictors of broad-scale patterns in human cultural diversity, but three sources of non-independence in human cultural variables can bias the results of cross-cultural studies. First, related cultures tend to have many traits in common, regardless of whether those traits are functionally linked. Second,...
Article
Full-text available
With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multi-lingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual analysis can drastically speed-up the process of cognate detection. Furthermo...
Article
Full-text available
The Database of Cross-Linguistic Colexifications (CLICS), has estab- lished a computer-assisted framework for the interactive representation of cross- linguistic colexification patterns. In its current form, it has proven to be a useful tool for various kinds of investigation into cross-linguistic semantic associations, ranging from studies on sema...
Article
Full-text available
Unlike a standard online experiment, a gaming app lets participants interact freely with a vast number of partners, as many times as they wish. The gain is not merely one of statistical power. Cultural evolutionists can use gaming apps to allow large numbers of participants to communicate synchronously; to build realistic transmission chains that a...
Article
Full-text available
Where a newly-married couple lives, termed post-marital residence, varies cross-culturally and changes over time. While many factors have been proposed as drivers of this change, among them general features of human societies like warfare, migration and gendered division of subsistence labour, little is known about whether changes in residence patt...
Article
Full-text available
With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multilingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual analysis can drastically speed up the process of cognate detection. Furthermor...
Article
Full-text available
What role does speaker population size play in shaping rates of language evolution? There has been little consensus on the expected relationship between rates and patterns of language change and speaker population size, with some predicting faster rates of change in smaller populations, and others expecting greater change in larger populations. The...
Article
Full-text available
The Dravidian language family consists of about 80 varieties (Hammarström H. 2016 Glottolog 2.7) spoken by 220 million people across southern and central India and surrounding countries (Steever SB. 1998 In The Dravidian languages (ed. SB Steever), pp. 1textendash39: 1). Neither the geographical origin of the Dravidian language homeland nor its exa...
Article
Full-text available
Understanding how and why language subsystems differ in their evolutionary dynamics is a fundamental question for historical and comparative linguistics. One key dynamic is the rate of language change. While it is commonly thought that the rapid rate of change hampers the reconstruction of deep language relationships beyond 6,000–10,000 y, there ar...
Article
Full-text available
From the foods we eat and the houses we construct, to our religious practices and political organization, to who we can marry and the types of games we teach our children, the diversity of cultural practices in the world is astounding. Yet, our ability to visualize and understand this diversity is limited by the ways it has been documented and shar...
Data
D-PLACE societies per language family. Currently, D-PLACE contains cultural data for over 1400 societies, drawn from two major cross-cultural datasets (the Ethnographic Atlas and Binford Hunter-Gatherer datasets). The societies are associated with 1202 unique languages and approximately 1315 dialects. Linguistic information for each society is avai...
Article
Full-text available
Phylogemetric is a Python library for calculating the δ-score (Holland et al. 2002) and Q-Residual (Gray, Bryant, and Greenhill 2010) for phylogenetic data. These methods are used in studies of linguistic and cultural evolution to quantify reticulation in data. This Python library provides a command-line script interface for use on Nexus-formatted...
Article
Full-text available
The varied islands of the Pacific provide an ideal natural experiment for studying the factors shaping human impact on the environment. Previous research into pre-European deforestation across the Pacific indicated a major effect of environment but did not account for cultural variation or control for dependencies in the data due to shared cultural...