Damián E. Blasi’s research while affiliated with Pompeu Fabra University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (75)


Fig. 4. Meta-analyses of contact effects under different types of contact scenarios. (A-D) Across domains of language of the TLI and GBI dataset; (E-H) across phonological features (TLI dataset only, the GBI dataset does not include phonological data). (A,E) Genetic contact, all pairs; (B,F) genetic contact: different area pairs only; (C,G) genetic contact: same area pairs
Global patterns of genetic admixture reveal effects of language contact
  • Preprint
  • File available

December 2024

·

64 Reads

Anna Graff

·

Damian E Blasi

·

Erik J Ringen

·

[...]

·

When speakers of different languages are in contact, they often borrow features like sounds, words, or syntactic patterns from one language to the other, but the lack of historical data has hampered estimation of this effect at a global scale. We break out of this impasse by using genetic admixture as a proxy for population contact. We find that language pairs whose speaker populations underwent genetic admixture or that are located in the same geo-historical area share more features than others, suggesting borrowing. The effect varies strongly across features, partly following expectations from differences in lifelong learnability, partly responding to differences in social imbalances during contact. Additionally, we find that for some features, admixture decreases sharing. This likely reflects signals of divergence (schismogenesis) under contact.

Download


Author Correction: The evolutionary dynamics of how languages signal who does what to whom

July 2024

·

80 Reads


Figure 1. Cartoon overview of the experimental design demonstrating an example of the experimental conditions. Here, only two participants are shown singing simultaneously, or speaking simultaneously (recitation) or sequentially (conversation), but the actual number of participants will be between 5-10 per experiment. Text columns #1 and #2 represent the first and second phrase of singing/speaking, such that when participant #1's text appears directly above participant #2's it indicates simultaneous singing/speaking, while when only one participant's text appears at a time this represents sequential speaking (conversation). This example shows lyrics for "Happy Birthday" but note that the actual song will be different (and generally in a different language) at each site. See Methods below for additional details regarding the experimental procedure.
Registered Report design planner (adapted from https://osf.io/sbmx9)
Does singing enhance cooperation more than speaking does? A global experimental Stage 1 Registered Report

June 2024

·

331 Reads

The evolution of music, language, and cooperation have been debated since before Darwin. The social bonding hypothesis proposes that these phenomena may be interlinked: musicality may have facilitated the evolution of group cooperation beyond the possibilities of spoken language. Although dozens of experimental studies have shown that synchronised rhythms can promote cooperation, it is unclear whether synchronous singing enhances cooperation relative to spoken language, particularly across diverse societies that differ in their musical/linguistic rhythms and social organisation. Here, we propose a Registered Report to test this hypothesis through a global experiment in diverse languages aiming to collect data from 1500 participants across 50 sites. The social bonding hypothesis predicts that cooperation will increase more after synchronous singing than after spoken (sequential) conversation or (simultaneous) recitation, while alternative hypotheses predict that song will not increase cooperation relative to speech. Regardless of outcome, these results will provide an unprecedented understanding of cross-cultural relationships between music, language, and cooperation.


12 competing causal models representing the potential relationships between case and two word order features, verb-final word order and flexible word order. Model i represents the diachronic scenarios inferred from processing hypothesis, noisy-channel hypothesis, and the word order universal. Model j stems from a theory of licensing and structural case and the results of empirical corpus studies. Other models represent the inverted directions of these models (k and l) and the possible combinations of causal paths between three or two features (a-h) that additionally test for the potential indirect relationship of one of word order features on case.
The averaged standardized regression coefficients (ranging from 0.25 to 1.58) of the best-fitting models with their Confidence Intervals. The coefficient values range from 0.1 to 0.41 for Flexible → Case, from 1.34 to 1.83 for Case → Flexible, from 0.58 to 1.16 for Verb-final → Case, and from 0.24 to 0.59 for Case → Verb-final, where values indicate how strongly the features are correlated. All identified causal links are positive and robust.
Maps showing the distribution of case and word order patterns in the language of the world (represented by colored dots); yellow: case without the word order type indicated in the map (11% of languages without verb-final and 17% of languages without flexible word order); blue: word order type indicated in the map without case (15% of verb-final languages and 22% of flexible word order languages); green: presence of case and word order type indicated in the map (22% of languages have case and verb-final word order and 16% of languages have case and flexible word order); gray: absence of case and word order type indicated in the map (in 52% of languages verb-final and case are absent, and in 46% flexible word order and case are absent). The top map depicts the combination of case with verb-final word order, while the bottom map illustrates case with flexible word order. The Indian subcontinent and the Caucasus region contain languages that almost exclusively combine case and verb-final word order. Many languages that combine case and flexible word order are located in South America, Eurasia, the Indian subcontinent, and Australia.
Phylogeny (global tree on the left) and the distribution of case and word order patterns (colored blocks on the right); yellow: case without the word order type indicated in the column header; blue: word order type indicated in the column header without case; green: presence of case and word order type indicated in the column header; gray: absence of case and word order type indicated in the column header. Verb-final word order is a more stable feature than flexible word order, which is visible from larger blocks of color sequences across languages representing its presence (blue and green) or absence (yellow and gray). By contrast, flexible word order can be present or absent in groups of closely related languages.
The evolutionary dynamics of how languages signal who does what to whom

March 2024

·

214 Reads

·

2 Citations

Languages vary in how they signal “who does what to whom”. Three main strategies to indicate the participant roles of “who” and “whom” are case, verbal indexing, and rigid word order. Languages that disambiguate these roles with case tend to have either verb-final or flexible word order. Most previous studies that found these patterns used limited language samples and overlooked the causal mechanisms that could jointly explain the association between all three features. Here we analyze grammatical data from a Grambank sample of 1705 languages with phylogenetic causal graph methods. Our results corroborate the claims that verb-final word order generally gives rise to case and, strikingly, establish that case tends to lead to the development of flexible word order. The combination of novel statistical methods and the Grambank database provides a model for the rigorous testing of causal claims about the factors that shape patterns of linguistic diversity.


Approximate location of the languages included in the first release of GATA, based on Glottolog¹.
Feature coverage for each typological domain in GATA.
Time intervals between states (published grammars) in the languages coded in GATA.
An example analysis of change across the different domains. The amount of change within each domain is plotted against the difference in years between both states of documentation.
Grammars Across Time Analyzed (GATA): a dataset of 52 languages

November 2023

·

147 Reads

·

1 Citation

Scientific Data

Grammars Across Time Analyzed (GATA) is a resource capturing two snapshots of the grammatical structure of a diverse range of languages separated in time, aimed at furthering research on historical linguistics, language evolution, and cultural change. GATA comprises grammatical information on 52 diverse languages across all continents, featuring morphological, syntactic, and phonological information based on published grammars of the same language at two different time points. Here we introduce the coding scheme and design features of GATA, and we describe some salient patterns related to language change and the coverage of grammatical descriptions over time.


Which Humans?

September 2023

·

129 Reads

·

48 Citations

Large language models (LLMs) have recently made vast advances in both generating and analyzing textual data. Technical reports often compare LLMs’ outputs with “human” performance on various tests. Here, we ask, “Which humans?” Much of the existing literature largely ignores the fact that humans are a cultural species with substantial psychological diversity around the globe that is not fully captured by the textual data on which current LLMs have been trained. We show that LLMs’ responses to psychological measures are an outlier compared with large-scale cross-cultural data, and that their performance on cognitive psychological tasks most resembles that of people from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies but declines rapidly as we move away from these populations (r = -.70). Ignoring cross-cultural diversity in both human and machine psychology raises numerous scientific and ethical issues. We close by discussing ways to mitigate the WEIRD bias in future generations of generative language models.


Fig. 1. The global distribution of fusion and informativity scores. The scores with a minimum of 0 (absence of all metric features) and a maximum of 1 (presence of all metric features) have been standardized to a mean of 0 and a variance of 1. The hotspots of low fusion are located in West Africa and Southeast Asia. Many Austronesian languages also rank low on fusion. The geographic patterns of informativity scores are less clear compared to fusion. Among lower-scoring languages are those spoken in West Africa, Southeast Asia, many Uralic languages, and languages spoken in India (Indo-Aryan and Dravidian).
Fig. 3. The scores of fusion and informativity on the global tree. The scores with a minimum of 0 (absence of all metric features) and a maximum of 1 (presence of all metric features) have been standardized to a mean of 0 and a variance of 1. We detect many patterns of closely related languages scoring similarly, which might indicate the faithful transmission of grammatical complexity from ancestor languages to their descendants rather than large-scale adaptations of grammatical complexity to changes in sociodemographic factors. Similar to geographic distribution, we see that fusion scores follow a more defined pattern of phylogenetic clustering compared to informativity scores.
Societies of strangers do not speak less complex languages

August 2023

·

352 Reads

·

17 Citations

Science Advances

Many recent proposals claim that languages adapt to their environments. The linguistic niche hypothesis claims that languages with numerous native speakers and substantial proportions of nonnative speakers (societies of strangers) tend to lose grammatical distinctions. In contrast, languages in small, isolated communities should maintain or expand their grammatical markers. Here, we test these claims using a global dataset of grammatical structures, Grambank. We model the impact of the number of native speakers, the proportion of nonnative speakers, the number of linguistic neighbors, and the status of a language on grammatical complexity while controlling for spatial and phylogenetic autocorrelation. We deconstruct "grammatical complexity" into two separate dimensions: how much morphology a language has ("fusion") and the amount of information obligatorily encoded in the grammar ("informativity"). We find several instances of weak positive associations but no inverse correlations between grammatical complexity and sociodemographic factors. Our findings cast doubt on the widespread claim that grammatical complexity is shaped by the sociolinguistic environment.



Speech and language markers of neurodegeneration: a call for global equity

July 2023

·

182 Reads

·

27 Citations

Brain

In the field of neurodegeneration, speech and language assessments are useful for diagnosing aphasic syndromes and for characterizing other disorders. As a complement to classic tests, scalable and low-cost digital tools can capture relevant anomalies automatically, potentially supporting the quest for globally equitable markers of brain health. However, this promise remains unfulfilled due to limited linguistic diversity in scientific works and clinical instruments. Here we argue for cross-linguistic research as a core strategy to counter this problem. First, we survey the contributions of linguistic assessments in the study of primary progressive aphasia and the three most prevalent neurodegenerative disorders worldwide –Alzheimer’s disease, Parkinson’s disease, and behavioral variant frontotemporal dementia. Second, we address two forms of linguistic unfairness in the literature: the neglect of most of the world’s 7,000 languages and the preponderance of English-speaking cohorts. Third, we review studies showing that linguistic dysfunctions in a given disorder may vary depending on the patient’s language and that English speakers offer a suboptimal benchmark for other language groups. Finally, we highlight different approaches, tools, and initiatives for cross-linguistic research, identifying core challenges for their deployment. Overall, we seek to inspire timely actions to counter a looming source of inequity in behavioral neurology.


Citations (48)


... On a basic level, ELCC could is somewhat analogous to any multi-lingual dataset (where "human language" is the phenomenon). Taking the notion of "phenomenon" more narrowly (i.e., of more direct scientific interest), it could be compared to Blum et al. [2023], which presents a collection of grammar snapshot pairs for 52 different languages as instances of diachronic language change. Zheng et al. [2024] present a dataset of conversations from Chatbot Arena containing where "text generated by different LLMs" is the phenomenon of interest. ...

Reference:

ELCC: the Emergent Language Corpus Collection
Grammars Across Time Analyzed (GATA): a dataset of 52 languages

Scientific Data

... First, the present studies were limited to United States representative samples. LLMs have been shown to be less morally aligned with non-Western populations and to exhibit biases in their outputs 14,74 . Promisingly, some studies have found methods for improving LLM cultural representation [75][76][77] , indicating that more widely representative training data could enhance their performance. ...

Which Humans?
  • Citing Preprint
  • September 2023

... Additionally, languages originating from the same regions often tend to be influenced by common factors, further complicating the analysis [49][50][51]. While we have included language family, macro-area and country as factors to account for the genealogical and geographic relatedness of languages in our prior paper, this approach ignores variation within language families and geographical units as pointed out in several recent studies [49][50][51][52][53]. To address this issue, we develop two quantitative approaches: (i) a semiparametric machine learning estimation method capable of simultaneously controlling for document-and language-specific characteristics while directly modelling potential effects due to phylogenetic relatedness and geographic proximity; (ii) a multi-model multilevel inference approach designed to test whether cross-linguistic outcomes are statistically associated with sociodemographic factors, while accounting for phylogenetic and spatial autocorrelation via the inclusion of random effects and slopes. ...

Societies of strangers do not speak less complex languages

Science Advances

... Gender bias in large language models can be evaluated through the Winograd Schema and Wino-Bias benchmarks [43,44,68], which test pronoun resolution by presenting the model with ambiguous pronouns in stereotypical (e.g., a sentence implying a "nurse" is female) and anti-stereotypical (e.g., a male "nurse") contexts, analyzing its ability to correctly resolve the pronouns without reinforcing biases; Occupational pronoun resolution [6,51], where sentences containing two or more professions (e.g., "doctor" and "nurse") and a gendered pronoun (e.g., "he" or "she") evaluate whether the model relies on gender stereotypes to associate professions with pronouns (e.g., incorrectly associating "he" with "doctor" due to stereotypes); Lexicon-based evaluation [5,52,63], which leverages predefined gendered lexicons across various languages and cultural settings to identify whether the model disproportionately links male or female terms (e.g., "man", "woman") with specific roles (e.g., "leader", "caregiver"), highlighting the model's alignment with societal stereotypes; and quantitative scoring metrics [28,59,66], which apply custom datasets and multiple bias-scoring metrics to measure the model's bias levels, evaluate improvements after debiasing strategies, and analyze how technical factors like model precision, quantization, and language variance impact gender bias mitigation. ...

On the Interpretability and Significance of Bias Metrics in Texts: a PMI-based Approach
  • Citing Conference Paper
  • January 2023

... Most knowledge on diagnostic features and naming impairment in lvPPA is based on monolingual English speakers (García et al., 2023;Gorno-Tempini et al., 2011). Research on lvPPA in bilinguals remains limited, mainly comprising single case and case series studies (for a review, see Grasso et al., 2023). ...

Speech and language markers of neurodegeneration: a call for global equity
  • Citing Article
  • July 2023

Brain

... The languages are evaluated as more similar if their phoneme distributions are alike. In the typology-based similarity assessment, we examine how similar the typological features are using the Grambank dataset [20], which numerically records the typological characteristics of languages. In this study, we primarily utilize corpus-based similarity assessment, while typology-based similarity evaluation is employed as a supplementary method to examine how well it aligns with the similarity evaluation of data within the same language family. ...

Grambank reveals the importance of genealogical constraints on linguistic diversity and highlights the impact of language loss

Science Advances

... Also other domains of language, such as phonology (Blaxter, 2017), phonotactics (Baumann & Matzinger, 2021;Napoleão de Souza & Sinnemäki, 2022) and syntax (Benítez-Burraco, S. Chen, Gil, Gaponov, et al., 2024) have been shown to be influenced by societal factors, such as language contact. At the same time, there is still discussion if and how language contact causes morphological simplification, from both experimental (Cuskley et al., 2015;De Smet, Rosseel, & Van De Velde, 2022) and quantitative cross-linguistic studies (Kauhanen, Einhaus, & Walkden, 2023;Koplenig, 2019;Lupyan & Raviv, 2024;Shcherbakova et al., 2023). In any case, instead of just looking at correlations between proportions of L2 speakers and morphological complexity, it is worthwhile to study which specific language-internal and sociodemographic factors mediate the relationship between social and language structure (Sinnemäki, 2020;Sinnemäki & Di Garbo, 2018). ...

Societies of strangers do not speak grammatically simpler languages

... Studies by Kato et al. (2020), Lu (2017), and Zhang & Lu (2019) demonstrate the effectiveness of quantitative analysis in assessing language proficiency, evaluating the impact of instructional interventions, and exploring the relationship between language and cognition. Additionally, research by Rietveld & van Hout (2010) and Shcherbakova et al. (2023) highlights the growing use of computational methods for large-scale language analysis, opening new avenues for understanding language patterns and variation. ...

A quantitative global test of the complexity trade-off hypothesis: the case of nominal and verbal grammatical marking

Linguistics Vanguard

... Pinhasi and von Cramon-Taubadel 2009), fino alla variabilità linguistica (e.g. Barbieri et al. 2022) e alla cultura materiale e immateriale. Un esempio particolarmente rilevante in questo contesto è rappresentato da uno studio sulla diffusione di 596 fiabe tradizionali in Eurasia (Bortolini et al. 2017), i cui risultati hanno mostrato come, su piccola scala (entro i 4.000 km), la migrazione delle popolazioni abbia avuto un impatto significativo sulla distribuzione delle fiabe tradizionali che osserviamo ancora oggi, mentre a scala ampia il segnale venga perso. ...

A global analysis of matches and mismatches between human genetic and linguistic histories

Proceedings of the National Academy of Sciences

... In general, there has been an overreliance on English in the language development literature (Kidd & Garcia, 2022). This does not only hinder cognitive science broadly (Blasi et al., 2022) but also severely limits our understanding of what language can be and the different paths children take toward mastering it. Only a few studies in other languages have looked at how cognitive skills may impact morphology development. ...

Over-reliance on English hinders cognitive science

Trends in Cognitive Sciences