Article

Correlations of valency alternations and morphological types: A typological perspective

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This article explores the connection between expression of valency alternations and the overall morphological typology of a language from a cross-linguistic perspective. On the basis of a typological survey of empirical data in 40 geographically and genealogically diverse languages, it finds two universal tendencies relating to the scale of morphological types from most to least bound: fusional – agglutinative – isolating. First, the compatibility of morphological techniques used to express valency alternations does not extend further left than the overall morphological typology of the language. Second, it may extend further right, with extensive attestation of this possibility. The morphological expression of valency alternations is thus constrained by the overall morphology of the language, but tends to be pushed further towards the right.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Our dataset comprises morphologically segmented words from four morphologically diverse languages (Ge and Comrie, 2022): English, Russian, Hungarian, and Arabic. The segmentation data for English, Russian, and Hungarian is sourced from the SIGMORPHON 2022 Shared Task on Morpheme Segmentation (Batsuren et al., 2022), which provides high-quality morpheme segmentations. ...
Preprint
Full-text available
Tokenization is fundamental to Natural Language Processing (NLP), directly impacting model efficiency and linguistic fidelity. While Byte Pair Encoding (BPE) is widely used in Large Language Models (LLMs), it often disregards morpheme boundaries, leading to suboptimal segmentation, particularly in morphologically rich languages. We introduce MorphBPE, a morphology-aware extension of BPE that integrates linguistic structure into subword tokenization while preserving statistical efficiency. Additionally, we propose two morphology-based evaluation metrics: (i) Morphological Consistency F1-Score, which quantifies the consistency between morpheme sharing and token sharing, contributing to LLM training convergence, and (ii) Morphological Edit Distance, which measures alignment between morphemes and tokens concerning interpretability. Experiments on English, Russian, Hungarian, and Arabic across 300M and 1B parameter LLMs demonstrate that MorphBPE consistently reduces cross-entropy loss, accelerates convergence, and improves morphological alignment scores. Fully compatible with existing LLM pipelines, MorphBPE requires minimal modifications for integration. The MorphBPE codebase and tokenizer playground will be available at: https://github.com/llm-lab-org/MorphBPE and https://tokenizer.llm-lab.org
Article
抽象的 基于族系和地理分布的相对均衡性,本文选取44种语言,含孤立语、黏着语、融合语和复综语各11种,考察增价型使役态的形态类型、能产性、语言及区域分布情况,通过计量方法归纳出多条优势序列。随后,文章讨论了增价型使役态中役事的句法地位与编码类型,发现4种实现方式:(a) 只作核心宾语,编码类型含无格标记、宾格、通格等;(b) 只作非核心宾语,编码类型含间接宾语标记、与格、工具格、方向格等;(c) 既可作核心宾语,也可作非核心宾语,编码类型包括无格标记/与格、宾格/与格、通格/旁格等;(d) 句法缺省。最后,文章分析了增价型使役态中影响役事句法实现的因素有“双宾语限制”的句法制约和役事施事性强弱的语义制约。
Article
Full-text available
As a language with agglutinative morphology type, Indonesian has not shown serious firmness in various morphological cases, especially valency mechanism caused by morphology + verb process. To deeply dissect the valency mechanism, especially inflected verbs in Indonesian, a qualitative method with descriptive type is used with agih (distributional) method and direct element sharing technique to analyze the data. The results of this study show the existence of valency mechanisms in the form of inflected verbs in Indonesian, namely (1) valency increase in the causative morphological structure with {meN-kan} marker, (2) valency increase in the benefactive applicative structure with {di-kan} marker, (3) valency increase in the locative applicative structure with {meN-i} marker, (4) valency decrease in the ergative sentence structure with morphological cases {ter-}, {ke-an}, and {∅}. This study shows various valency mechanisms that are dominant in inflected verbs even though in some cases they appear in ergative forms that do not have markers. Abstrak Sebagai bahasa dengan tipe morfologi aglutinatif, bahasa Indonesia belum menunjukkan ketegasan yang serius dalam berbagai kasus-kasus morfologis khususnya mekanisme valensi yang diakibatkan oleh proses morfologi + verba. Hal ini terlihat dari hasil-hasil penelitian yang hanya berfokus pada sifat perilaku verba dan transitivitas. Untuk membedah secara mendalam mekanisme valensi khususnya verba berinfleksi secara morfologis dalam bahasa Indonesia digunakan metode kualitatif dengan tipe deskriptif dengan metode agih (distribusional) dan teknik bagi unsur langsung untuk menganalis data. Hasil penelitian ini menunjukkan adanya mekanisme valensi dalam bentuk verba berinfleksi dalam bahasa Indonesia yaitu (1) kenaikan valensi pada struktur morfologi kausatif dengan pemarkah {meN-kan}, (2) kenaikan valensi pada struktur aplikatif benefaktif dengan pemarkah {di-kan}, (3) kenaikan valensi pada struktur aplikatif lokatif dengan pemarkah {meN-i}, (4) penurunan valensi pada struktur kalimat ergatif dengan kasus morfologi {ter-}, {ke-an}, dan {∅}. Penelitian ini menunjukkan berbagai mekanisme valensi yang dominan tampak pada verba yang berinfleksi walaupn pada beberapa kasus muncul pada bentuk ergatif yang tak memiliki permarkah. Kata kunci: bahasa Indonesia; Infleksi; Morfologi; Valensi PENDAHULUAN Konstruksi kalimat dalam membangun argumen banyak dipengaruhi berbagai hal. Jika melihat pola munculnya argumen ini, Kita dapat melihat bahwa tradisi memunculkan argumen sebelum dan setelah verba (core) dalam sebuah kalimat ditentukan oleh valensi. Istilah valensi ini oleh para linguis sudah banyak disinggung oleh Kulikov et al., (2006) dalam bukunya Case Valency and Transitivity. Dalam buku ini tiga gagasan tentang kasus, valensi, dan transitivitas adalah salah satu yang paling diperdebatkan dalam linguistik modern. Di satu sisi, ketiganya terkait erat dengan karakteristik morfologi klausa seperti penandaan kasus, persetujuan orang, dan suara. Di sisi lain, juga relevan dengan sejumlah masalah semantik, termasuk makna kasus, kelas kata semantik-sintaksis, dan korelasi semantik transitivitas. Buku ini menyatukan makalah-makalah yang ditulis dalam berbagai kerangka teori dan mewakili berbagai pendekatan (Teori Optimalitas, Pemerintah dan Pengikatan, berbagai versi pendekatan Fungsional, Analisis Lintas Bahasa dan Tipologi), yang berisi berbagai temuan baru
Article
Full-text available
This handbook offers an extensive cross-linguistic and cross-theoretical survey of polysynthetic languages, in which single multi-morpheme verb forms can express what would be whole sentences in English. These languages and the problems they raise for linguistic analyses have long featured prominently in language descriptions, and yet the essence of polysynthesis remains under discussion, right down to whether it delineates a distinct, coherent type, rather than an assortment of frequently co-occurring traits. Chapters in the first part of the handbook relate polysynthesis to other issues central to linguistics, such as complexity, the definition of the word, the nature of the lexicon, idiomaticity, and to typological features such as argument structure and head marking. Part II contains areal studies of those geographical regions of the world where polysynthesis is particularly common, such as the Arctic and Sub-Arctic and northern Australia. The third part examines diachronic topics such as language contact and language obsolence, while Part IV looks at acquisition issues in different polysynthetic languages. Finally, Part V contains detailed grammatical descriptions of over twenty languages which have been characterized as polysynthetic, with special attention given to the presence or absence of potentially criterial features.
Chapter
Full-text available
Book
Full-text available
Understanding Morphology offers students an introduction to the study of word structure that starts at the very beginning. Assuming no knowledge of the field of morphology, the book present a broad range of morphological phenomena from a variety of languages. The goal is to shed light on major issues of analysis, so chapters are structured around essential questions: What are the basic units of the lexicon -- words or morphemes? Is there a categorical difference between inflection and derivation? Do the same principles apply to both word formation and sentence formation? What makes on morphological rule more productive than another? Are inflectional paradigms part of the morphological architecture? To answer these questions, the authors draw on the best research available, discussing a variety of theoretical approaches. This second edition also expands the discussion of several topics, including frequency effects, the structure of the lexicon, and productivity. Each chapter includes a summary, suggestions for further reading, and comprehension exercises (with answers). New to this second edition are exploratory exercises which allow students to put what they have read into practice and extend their knowledge.
Chapter
Full-text available
Full description of valency classes of Yucatec Maya in the framework of the Leipzig valency database
Chapter
Full-text available
The present chapter gives an overview of valency classes in Icelandic and the most common, noticeable, or productive alternations found in the language. The overview is based on my own native-speaker knowledge of the language, on my earlier research and on the existing literature on Icelandic. Most of the examples are attested, taken from real texts found online, supplemented with some constructed examples. The chapter is structured as follows: Section 2 presents the basics of Icelandic by placing it into its genealogical, linguistic and social context. Section 3 deals with basic valency, focusing particularly on two- and three-place predicates in Icelandic. There I present an overview of which predicates may instantiate the different argument structure constructions: Nominative Subject Construction, Accusative Subject Construction, Dative Subject Construction, and the different sub-constructions of ditransitives. Section 4 deals with uncoded alternations, i.e. alternations not coded on the verb. These are divided into three types, case variations, case and structure changing alternations, and structure changing alternations. Section 5 deals with coded alternations, i.e. alternations that are coded on the verb, such as the Active–Passive Alternation, the Impersonal Passive, the Transitive–Inchoative, the Reflexive and the Mediopassive. In Section 6, additional alternations are discussed, namely the Oblique Ambitransitive, which is found with accusative, dative and genitive subjects, and the Actional Passive, which is an extension of the Impersonal Passive, found with transitive and ditransitive predicates. Section 7 concludes the present discussion on alternations and valency classes in Icelandic.
Chapter
Full-text available
This monograph constitutes the first comprehensive investigation of reciprocal constructions and related phenomena in the world’s languages. Reciprocal constructions (of the type The two boys hit each other, The poets admire each other’s poems ) have often been the subject of language-particular studies, but it is only in this work that a truly global comparative picture emerges. Nine stage-setting chapters dealing with general and theoretical matters are followed by 40 chapters containing in-depth descriptions of reciprocals in individual languages by renowned specialists. The introductory papers provide a conceptual and terminological framework that allows the authors of the individual chapters to characterize their languages in comparable terms, making it easy for the reader to see points of commonality between languages and constructions that have never been compared before. This set of volumes is an indispensable starting point and will be a lasting reference work for any future studies of reciprocals.
Chapter
Full-text available
We analyse morphological causative verbs in Lithuanian on the basis of an annotated corpus, studying the distribution of different causative suffixes across the valency types of base verbs, as well as the argument structure of the causatives themselves. We show that different causative suffixes are unevenly distributed with respect to the transitivity and agentivity of the base verbs and that morphological causatives in Lithuanian, no longer being productive, tend to pattern in their argument structure and interpretation together with ordinary transitive verbs. The not very numerous causatives based on transitive verbs are investigated, and it is shown that causatives based on “ingestive” verbs like ‘eat’ or ‘drink’ behave differently from causatives formed from other semantic types of bases, in particular in that they allow the expression of both participants of the caused event. The non-ingestive transitive verbs derive the so called “curative” causatives which are peculiar in that they never allow an overt regular expression of the agent of the caused situation, therefore not being valency-increasing in the strict sense of the term. Such causatives are also shown to undergo meaning shifts rendering them partly synonymous to their base verbs thus losing the original causative semantics.
Article
Full-text available
The general distinction between morphology and syntax is widely taken for granted, but it crucially depends on the notion of a cross-linguistically valid concept of "(morphosyntactic) word". I show that there are no good criteria for defining such a concept. I examine ten criteria in some detail (potential pauses, free occurrence, mobility, uninterruptibility, non-selectivity, non-coordinatability, anaphoric islandhood, nonextractability, morphophonological isiosyncrasies, and deviations from biuniqueness), and I show that none of them is necessary and sufficient on its own, and no combination of them gives a definition of "word" that accords with linguists' orthographic practice. "Word" can be defined as a language-specific concept, but this is not relevant to the general question pursued here. "Word" can be defined as a fuzzy concept, but this is theoretically meaningful if the continuum between affixes and words, or words and phrases, shows some clustering, for which there is no systematic evidence at present. Thus, I conclude that we do not currently have a good basis for dividing the domain of morphosyntax into "morphology" and "syntax", and that linguists should be very careful with cross-linguistic claims that make crucial reference to a cross-linguistic "word" notion.
Article
Full-text available
Article
Full-text available
This study investigates the relationship between the lexicon and language use through the lens of type and token frequency. Type frequency is taken to reflect the lexicon, token frequency language use. Type and token frequencies were compared for a total of 10 basic distinctions at the phonological, morphological, lexical and lexico-syntactic levels in English. These include consonants vs. vowels, prefixes vs. suffixes, count vs. mass nouns and transitive vs. intransitive verbs. The empirical analysis reveals that type frequencies may be more, or less, extreme than token frequencies. Non-phonological distinctions evince a higher discrepancy between type and token frequency than phonological ones. In addition, complexity is more strongly discouraged in token than in type frequency, suggesting that it is more of an issue in processing than in storage. It is concluded that the lexicon constrains language use, though only to a limited extent.
Article
Full-text available
The purpose of this note is to address some of the criticisms made by Alexandra Georgakopoulou and Dionysis Goutsos in their review of our Greek: a comprehensive grammar of the modern language in BMGS 23 (1999), pp. 337-40. We consider it necessary to clarify a number of issues because some of the points raised in the review are either unsubstantiated or inappropriate, stemming as they apparently do from the reviewers' misunderstanding of our aims. We will examine the larger issues first.
Chapter
This chapter discusses the valency properties of verbs in Modern Standard Arabic
Chapter
This monograph constitutes the first comprehensive investigation of reciprocal constructions and related phenomena in the world’s languages. Reciprocal constructions (of the type The two boys hit each other, The poets admire each other’s poems ) have often been the subject of language-particular studies, but it is only in this work that a truly global comparative picture emerges. Nine stage-setting chapters dealing with general and theoretical matters are followed by 40 chapters containing in-depth descriptions of reciprocals in individual languages by renowned specialists. The introductory papers provide a conceptual and terminological framework that allows the authors of the individual chapters to characterize their languages in comparable terms, making it easy for the reader to see points of commonality between languages and constructions that have never been compared before. This set of volumes is an indispensable starting point and will be a lasting reference work for any future studies of reciprocals.
Book
This is a comprehensive handbook on all aspects of linguistic morphology.
Chapter
The volume's central concern is grammatical voice, traditionally known as diathesis, and its classical manifestations as Active, Middle, and Passive. While numerous problems in the meaning, syntax, and morphology of these categories in Indo-European remain unsolved, their counterparts in more exotic languages have raised still further questions. What discourse functions and diachronic events unite 'voice' as a recognizable phenomenon across languages? How are they typically grammaticalized? What stages do children go through in learning them? How does 'voice' link up with ergativity and with other categories and constructions such as the Inverse and the Antipassive? The authors in this volume have different perspectives on these problems: they discuss voice, e.g., from a typological-universal view, in relation to language acquisition and to ergativity, and from diachronic and cross-linguistic perspectives.
Article
This paper presents a unifying analysis of the uses of the Russian marker -sja in passives, anticausatives and antipassives. The analysis is couched in terms of Distributed Morphology and does not assume any argument structure on roots, i.e. all arguments are licensed by functional heads. The main proposal that is put forward is that -sja is a head that fulfils two tasks in a derivation: in syntax, it saturates a selectional feature on an argument-introducing head, in semantics, it existentially quantifies over an unsaturated argument variable. Аннотация В данной статье предлагается обобщенный анализ употребления русского маркера -ся в пассивных, антикаузативных и антипассивных конструкциях. Анализ выработан на основе модели дистрибуированной морфологии и исходит из гипотезы, что корни глагола не носят информации о структуре аргументов, т.е. все глагольные аргументы определяются функциональными вершинами. Выдвигается гипотеза о том, что -ся является функциональной вершиной, выполняющей две задачи в деривации: в синтаксисе она сатурирует селекционный признак другой вершиной, лицензирующей аргумент, в семантике она выполняет функцию квантора существования, т.е. она квантифицирует незаполненную переменную.