ArticlePDF Available

The Menzerath-Altmann law as the relation between lengths of words and morphemes in Czech

Authors:
  • Mathematical Institute Slovak Academy of Sciences & Department of Mathematics Constantine the Philosopher University in Nitra

Abstract

It is shown that the mean morpheme length (measured in phonemes) decreases with the increasing length of word types (in morphemes) in Czech texts, i.e. these language units behave according to the Menzerath-Altmann law. The law is not valid in general for word tokens. Some hints towards an interpretation of parameters are presented.
3
y x a x
b
e
cx
y x
x a b
c
y x a x
b
ch
-ech ich
ch 3ch
-i-
3
n- v-
vy vy
Prah-a
Prah-y
-ov-
hoch hox
WL MML
fr
R
R
2
0.9
y
1
a
a
b
R
WL MML fr MML fr MML fr MML fr MML fr
3
a
b
R
19 6 3 7 4 8 19 3 4 6.42
.
b
WL MML fr MML fr MML fr MML fr MML fr
3
a
b
R
WL MML fr MML fr MML fr MML fr MML fr
3
a
b
R
R
R
2
0.91
WL MML fr MML fr MML fr MML fr MML fr
3
a
b
b
a b
a
b
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The paper deals with two important questions in linguistic research: 1) What do we actually model when we model language usage? and 2) What is an appropriate sample or ‘text unit’ for the analysis of language usage? In the beginning, we critically discuss several approaches to the analysis of language behaviour. Then, we introduce the most important characteristics of both Zipf’s linguistic theory and synergetic linguistics. We focus in particular on the aspects of these theories which are connected to the above-mentioned questions. Specifically, we emphasize that one of the fundamental features of these theories is the assumption that there are linguistic laws which govern human language behaviour and which can be best detected by observing the language behaviour of an individual (in a particular context). As a consequence, if the goal of the research is to examine laws of this kind, the individual text is used as a basic unit for the analysis. The mixing of texts can, in some cases, lead to the “concealing” of the laws, as is presented in an example. We also offer another example which shows how characteristics of the same law (in this case, the Menzerath-Altmann law) differ in different texts. Finally, we emphasize that using individual texts in linguistic research is but one possible approach to analysis, i.e. we do not attempt to make it a linguistic research dogma.
Article
Full-text available
According to the Menzerath-Altmann law, there is a relation between the size of the whole and the mean size of its parts. The validity of the law was demonstrated on relations between several language units, e.g., the longer a word, the shorter the syllables the word consists of. In this paper it is shown that the law is valid also in syntactic dependency structure in Czech. In particular, longer clauses tend to be composed of shorter phrases (the size of a phrase is measured by the number of words it consists of).
Conference Paper
Full-text available
Words that are used more frequently tend to be shorter. This statement is known as Zipf's law of abbreviation. Here we perform the widest investigation of the presence of the law to date. In a sample of 1262 texts and 986 different languages-about 13% of the world's language diversity-a negative correlation between word frequency and word length is found in all cases. In line with Zipf's original proposal, we argue that this universal trend is likely to derive from fundamental principles of information processing and transfer.
Article
Full-text available
Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf’s law, which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this “law” of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora since the late 1990s have revealed the existence of two scaling regimes. These regimes have thus far been explained by a hypothesis suggesting a separability of languages into core and noncore lexica. Here we present and defend an alternative hypothesis that the two scaling regimes result from the act of aggregating texts. We observe that text mixing leads to an effective decay of word introduction, which we show provides accurate predictions of the location and severity of breaks in scaling. Upon examining large corpora from 10 languages in the Project Gutenberg eBooks collection, we find emphatic empirical support for the universality of our claim.
Article
The paper questions the use of the Pearson chi-square goodness-of-fit test for discrete models in linguistics. It is argued that the stochastic independence, one of necessary conditions for a correct application of the test, is not realistic for linguistic data. Several alternative possibilities (computational and empirical approaches) are suggested. Advantages and drawbacks of the alternatives are discussed.
Article
The law-like relation between word and syllable length as part of the Men-zerath law has been corrobated empirically in many different languages. As to South Slavic languages, we have the studies by Gaji (1950) and Grzybek (1999) on Croatian, and by Grzybek (2000) on Slovene. The aim of the present paper is first of all to provide empirical evidence of the Menzerath law for an-other South Slavic language, namely Serbian, distinguishing different text sorts in our analysis. Second, a linguistic interpretation of the usually iteratively de-rived parameters of the Menzerath law is offered. Furthermore it will be shown that some parameters of the Menzerath law can be replaced by empirically obtainable quantitative features. 2 Word and syllable length: Theoretical background The Menzerath law is one the most important insights of quantitative linguis-tics – cf. Altmann (1980), Altmann and Schwibbe (1989), Hřebíček (1990) from recent years. It contains some law-like statements of interrelations be-tween language constituents and their components, such as the relation be-tween the sound duration and the syllable length, between the word and the syllable length, between word and sentence length etc. In this paper special attention is paid to the relation between word and syllable length. According to the Menzerath law, it is expected that with increasing word length (WoL), measured by the number of syllables, the mean syllable length (SyL), measured in number of graphemes, phonemes or sounds, decreases. Mathematically this can be expressed as SyL = a · WoL −b . Usually the parameters a and b are de-rived iteratively by means of statistical software. The meaning of these param-eters is as follows: Parameter a determines the shift on the y-axis and can be understood as the "starting value" of the fitting curve, while parameter b is re-sponsible for the steepness and "speed" of the decrease of the curve. Before a more detailed analysis of the parameters of the Menzerath law can be carried out, the Serbian texts used and the behaviour of word and syllable length in Serbian first have to be discussed.