Terry Regier’s research while affiliated with University of California, Berkeley and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (97)


A computational analysis of lexical elaboration across languages
  • Article

April 2025

·

44 Reads

Proceedings of the National Academy of Sciences

·

Terry Regier

·

·

Charles Kemp

Claims about lexical elaboration (e.g. Mongolian has many horse-related terms) are widespread in the scholarly and popular literature. Here, we show that computational analyses of bilingual dictionaries can be used to test claims about lexical elaboration at scale. We validate our approach by introducing BILA, a dataset including 1,574 bilingual dictionaries, and showing that it confirms 147 out of 163 previous claims from the literature. We then identify previously unreported examples of lexical elaboration, and analyze how lexical elaboration is influenced by ecological and cultural variables. Claims about lexical elaboration are sometimes dismissed as either obvious or fanciful, but our work suggests that large-scale computational approaches to the topic can produce nonobvious and well-grounded insights into language and culture.


A computational analysis of lexical elaboration across languages

March 2025

·

30 Reads

·

1 Citation

Claims about lexical elaboration (e.g. Mongolian has many horse-related terms) are widespread in the scholarly and popular literature. Here we show that computational analyses of bilingual dictionaries can be used to test claims about lexical elaboration at scale. We validate our approach by introducing BILA, a data set including 1574 bilingual dictionaries, and showing that it confirms 147 out of 163 previous claims from the literature. We then identify previously unreported examples of lexical elaboration, and analyze how lexical elaboration is influenced by ecological and cultural variables. Claims about lexical elaboration are sometimes dismissed as either obvious or fanciful, but our work suggests that large-scale computational approaches to the topic can produce non-obvious and well-grounded insights about language and culture.


Cultural evolution via iterated learning and communication explains efficient color naming systems

November 2024

·

3 Reads

·

3 Citations

Journal of Language Evolution

It has been argued that semantic systems reflect pressure for efficiency, and a current debate concerns the cultural evolutionary process that produces this pattern. We consider efficiency as instantiated in the Information Bottleneck (IB) principle, and a model of cultural evolution that combines iterated learning and communication. We show that this model, instantiated in neural networks, converges to color naming systems that are efficient in the IB sense and similar to human color naming systems. We also show that some other proposals such as iterated learning alone, communication alone, or the greater learnability of convex categories, do not yield the same outcome as clearly. We conclude that the combination of iterated learning and communication provides a plausible means by which human semantic systems become efficient.


A computational analysis of lexical elaboration across languages

August 2024

·

23 Reads

Claims about lexical elaboration (e.g. Mongolian has many horse-related terms) are widespread in the scholarly and popular literature. Here we introduce BILA, a dataset including 1606 bilingual dictionaries, and show how it can be used to study lexical elaboration at scale. We first validate our approach by showing that it confirms 146 out of 163 previous claims from the literature. We then identify previously unreported examples of lexical elaboration, and analyze how lexical elaboration relates to ecological and cultural variables. Claims about lexical elaboration are sometimes dismissed as either obvious or fanciful, but our work suggests that large-scale computational approaches to the topic can produce non-obvious and well-grounded insights about language and culture.


A computational analysis of lexical elaboration across languages

August 2024

·

6 Reads

Claims about lexical elaboration (e.g. Mongolian has many horse-related terms) are widespread in the scholarly and popular literature. Here we show that computational analyses of bilingual dictionaries can be used to test claims about lexical elaboration at scale. We validate our approach by introducing BILA, a data set including 1574 bilingual dictionaries, and showing that it confirms 147 out of 163 previous claims from the literature. We then identify previously unreported examples of lexical elaboration, and analyze how lexical elaboration is influenced by ecological and cultural variables. Claims about lexical elaboration are sometimes dismissed as either obvious or fanciful, but our work suggests that large-scale computational approaches to the topic can produce non-obvious and well-grounded insights about language and culture.


American Sign Language Handshapes Reflect Pressures for Communicative Efficiency

June 2024

·

6 Reads

Communicative efficiency is a prominent theory in linguistics and cognitive science. While numerous studies have shown how the pressure to save energy is reflected in the form of spoken languages, few have explored this phenomenon in signed languages. In this paper, we show how handshapes in American Sign Language (ASL) reflect these efficiency pressures and we present new evidence of communicative efficiency in the visual-gestural modality. We focus on handshapes that are used in both native ASL signs and signs borrowed from English to compare efficiency pressures from both ASL and English. First, we design new methodologies to quantify the articulatory effort required to produce handshapes as well as the perceptual effort needed to recognize them. Then, we compare correlations between communicative effort and usage statistics in ASL and English. Our findings reveal that frequent ASL handshapes are easier to produce and that pressures for communicative efficiency mostly come from ASL usage, not from English lexical borrowing.


A Computational Approach to Identifying Cultural Keywords Across Languages

January 2024

·

47 Reads

·

4 Citations

Cognitive Science A Multidisciplinary Journal

Zheng Wei Lim

·

Harry Stuart

·

·

[...]

·

Charles Kemp

Distinctive aspects of a culture are often reflected in the meaning and usage of words in the language spoken by bearers of that culture. Keywords such as душа (soul) in Russian, hati (heart) in Indonesian and Malay, and gezellig (convivial/cosy/fun) in Dutch are held to be especially culturally revealing, and scholars have identified a number of such keywords using careful linguistic analyses (Peeters, 2020b; Wierzbicka, 1990). Because keywords are expected to have different statistical properties than related words in other languages, we argue that a quantitative comparison of word usage across languages can help to identify cultural keywords. To support this claim, we describe a computational method that compares word frequencies across languages, and apply it to both linguistic corpora and word association data. The method identifies culturally specific words that range from “obvious” examples, such as Amsterdam in Dutch, to non‐obvious yet independently proposed examples, such as hati (heart) in Indonesian. We show in addition that linguistic corpora and word association data provide converging evidence about culturally specific words. Our results therefore show how computational analyses and behavioral experiments can supplement the methods previously used by linguists to identify culturally salient words across languages.



Figure 1: Top: Color naming stimulus grid. Bottom: 9 color naming systems displayed relative to this grid. The left column contains color naming systems from 3 languages in the WCS (from top to bottom: Bete, Colorado, Dyimini). Colored regions indicate category extensions, and the color code used for each category is the mean of that category in CIELAB color space. The named color categories are distributions, and for each category we highlight the level sets between 0.75 − 1.0 (unfaded area) and 0.3 − 0.75 (faded area). The middle and right columns contain randomly-generated systems of complexity comparable to that of the WCS system in the same row. The middle column shows random systems that are similar to the WCS system in the same row. The right column shows random systems that are dissimilar to the WCS system in the same row; at the same time, there is no other WCS system that is more similar to this random system.
Figure 2: Efficiency of color naming, following Zaslavsky et al., 2018. The dashed line is the IB theoretical limit of efficiency for color naming, indicating the greatest possible accuracy for each level of complexity. The color naming systems of the WCS are shown in orange, replicating the findings of Zaslavsky et al., 2018. Our RM systems are shown in blue. It can be seen that the RM systems are often closer to the IB curve than the WCS systems are. The inset shows the 9 color systems of Figure 1, with the dissimilar random systems shown as +.
Figure 3: Efficiency of the (top) IL+C, (middle) IL, and (c) C evolved color naming systems, in each case compared with the natural systems of the WCS. The black triangle indicates the end state of one run, shown in the inset color map.
Iterated learning and communication jointly explain efficient color naming systems
  • Preprint
  • File available

May 2023

·

67 Reads

It has been argued that semantic systems reflect pressure for efficiency, and a current debate concerns the cultural evolutionary process that produces this pattern. We consider efficiency as instantiated in the Information Bottleneck (IB) principle, and a model of cultural evolution that combines iterated learning and communication. We show that this model, instantiated in neural networks, converges to color naming systems that are efficient in the IB sense and similar to human color naming systems. We also show that iterated learning alone, and communication alone, do not yield the same outcome as clearly.

Download

Sunlight exposure cannot explain “grue” languages

February 2023

·

143 Reads

·

3 Citations


Citations (73)


... While these computational studies suggest that IL may not be crucial for the emergence of near-optimal semantic systems, their adaptive dynamics are nearly impossible to validate on actual human data. Carlsson et al. (2024) proposed a potential resolution by demonstrating that combining IL with communication can lead to efficient color naming systems. However, their results are based only on model simulations, rather than on IL with humans, and focus on the converged IL systems rather than on the full IL trajectories. ...

Reference:

Iterated language learning is shaped by a drive for optimizing lossy compression
Cultural evolution via iterated learning and communication explains efficient color naming systems
  • Citing Article
  • November 2024

Journal of Language Evolution

... Existing literature shows that humans are inclined to reduce the effort needed to convey their intended information and for their audience to comprehend it, leading to efficient communication (e.g., Zipf, 1949;Gibson et al., 2019;Yin et al., 2024). When human individuals interact through dialogue, this is manifested by developing and using ad-hoc linguistic conventions. ...

American Sign Language Handshapes Reflect Pressures for Communicative Efficiency
  • Citing Conference Paper
  • January 2024

... By contrast, adults generally possess these and a wide range of other concepts. Often, these concepts support specialized uses: this can be seen in both cross-cultural variation in words (Kemp et al., 2018;Lim et al., 2024;Regier et al., 2016) and lexical items shared only within specific speech communities, i.e. jargon (Clark, 1998). ...

A Computational Approach to Identifying Cultural Keywords Across Languages
  • Citing Article
  • January 2024

Cognitive Science A Multidisciplinary Journal

... Nevertheless, as highlighted in Josserand et al. (2021), we must keep in mind that this is very likely a multi-factorial complex causal process involving multiple temporal and organizational scales, ranging from the intra-individual physiological lens brunescence and the associated perceptual and cognitive mechanisms of compensating and adapting to it to the large-scale presumably cross-generational and inter-individual language change in structured communities reflecting the decreased perception of "blue" among its most affected (older) members. While many of these components are still in need of thorough study and require inter-disciplinary and methodologically diverse approaches, the conversation has already started (see, for example, Josserand et al., 2021, the recent technical comment to it in Hardy et al., 2023 andour response in Josserand et al., 2023, touching on these aspects). ...

Sunlight exposure cannot explain “grue” languages

... To specify the space of grounded meaning representations, we first define the set of world states U ⊂ R 2 , such that each element in the domain u ∈ U can be represented as u = (a u , r u ) where r u and a u are the values of its radius and angle respectively. Following prior work (e.g., Xu et al., 2016;Zaslavsky et al., 2022), we assume that each meaning takes the form of a similarity-based distribution, i.e., m t (u) ∝ exp(−γ · d(u, u t )). While it may seem natural at first to take d(u, u t ) to be the Euclidean distance, Shepard (1964) argued in his foundational work that human perceptual similarities in this domain are inconsistent with a Euclidean distance. ...

The evolution of color naming reflects pressure for efficiency: Evidence from the recent past
  • Citing Article
  • April 2022

Journal of Language Evolution

... For example, there are formal and informal registers of communication, where the choice of appropriate linguistic elements is important to convey the level of respect and distance between the participants of communication. The richness of grammatical forms in language also allows complex concepts and ideas to be expressed with a high degree of precision and emotional intensity (Britain, 2020;Haspelmath, 2021;Mollica et al., 2021). For example, different temporal forms and modal constructions can be used to convey shades of meaning and emotional colouring of an utterance. ...

The forms and meanings of grammatical markers support efficient communication
  • Citing Article
  • December 2021

Proceedings of the National Academy of Sciences

... Typometrics is a subfield of quantitative typology. Some research in the field of quantitative typology aims to introduce general models of language, such as probabilistic and informationtheoretic models (Perfors et al 2010, Ferrer-i-Cancho 2015& 2018, among others. Yet most work in the field of quantitative typology is based on categorical statements. ...

9. How recursive is language? A Bayesian exploration
  • Citing Chapter
  • January 2010

... The same observation can be gleaned from Schneider et al. (2020), one of a very small number of crosslinguistic studies on numeral term acquisition including children learning an Indo-Aryan language: like learners of other languages, children acquiring Indo-Aryan languages make use of the successor function when learning productive counting, but show lower successor function performance than children speaking other languages; nonetheless, successor function performance remains a strong predictor of mastery of productive counting. In a similar vein, Xu et al. (2020) demonstrate that languages provide near-optimal solutions to tradeoffs between informativeness (whether numeral terms refer to individual quantities, weighted according to their need probability) and complexity (operationalized according to a rule-based framework), occupying a region below a Pareto frontier representing the optimal balance between these two variables. Although considerably higher in complexity than the most complex language included in analyses (Georgian), an Indo-Aryan language such as Hindi/Urdu would likely still be found in this region representing near-optimal tradeoffs, but would be an extreme outlier. ...

Numeral Systems Across Languages Support Efficient Communication

Open Mind

... Probing of language models has focused on semantics (Yaghoobzadeh et al., 2019;Zhao et al., 2020b), morphosyntactic structure (Linzen et al., 2016;Bacon & Regier, 2018;Jawahar et al., 2019;McCoy et al., 2019;Lepori & McCoy, 2020), linguistic capability (Liu et al., 2019, and knowledge of context and surface properties (Adi et al., 2017;Khandelwal et al., 2018). Several studies have explored a combination of these areas (Conneau et al., 2018;Tenney et al., 2019;Wieting & Kiela, 2019;Mosbach et al., 2020;Puccetti et al., 2021). ...

Probing sentence embeddings for structure-dependent tense
  • Citing Conference Paper
  • January 2018

... A nomeação de cores é considerada tradicionalmente como um reflexo de como as pessoas percebem as cores. Entretanto, estudos sugerem que o nome dado às cores também tem como base a necessidade comunicativa (Zaslavsky et al., 2020 ...

Communicative need in colour naming

Cognitive Neuropsychology