## About

44

Publications

3,885

Reads

210

Citations

## Publications

Publications (44)

For every discrete probability distribution, there is one and only one partial summation which leaves the distribution unchanged. This invariance property is reconsidered for distributions with one parameter. We show that if we change the parameter value in the function which defines the summation, two families of distributions can be observed. The...

The paper focuses on dynamics of changes of several linguistic and text properties in diachronic development of Czech. Specifically, we analyze the proportion of identical word-forms (types), the average type length, text length, the proportion of hapax legomena, the moving average type-token ratio, and entropy. For the analysis, seven translations...

Rewriting books was a widespread phenomenon during the Baroque period of the Czech literature. The manuscripts were not always „honest copies”, on the contrary, scribes often compiled several sources or added their own texts to the original. The famous book Golden Key of Heaven by Martin of Cochem is compared with a manuscript Key of Heaven from a...

The paper deals with two important questions in linguistic research: 1) What do we actually model
when we model language usage? and 2) What is an appropriate sample or ‘text unit’ for the analysis of language usage? In the beginning, we critically discuss several approaches to the analysis
of language behaviour. Then, we introduce the most importan...

The Zipf-Mandelbrot distribution serves as a mathematical model for ranked frequencies in many areas of scientific research, including linguistics. Many linguistic units, like e.g., words or word n-grams, follow this distribution. However, in some cases, such as for graphemes in linguistics or species abundance and diversity data in biology, the pa...

In Tackling the Toolkit, we focus on the methodological innovations, challenges, obstacles and even shortcomings associated with applying quantitative methods to poetry specifically and poetics more broadly. Using tools including natural language processing, web ontologies, similarity detection devices and machine learning, our contributors explore...

The problem of iterated partial summations is solved for some discrete distributions defined on finite supports. The power method, usually used as a computational approach to the problem of finding matrix eigenvalues and eigenvectors, is in some cases an effective tool to prove the existence of the limit distribution, which is then expressed as a s...

Annual speeches of Czech and Czechoslovak presidents on the occasion of the end of the year are analyzed in this study. Several stylometric methods are used, namely, vocabulary richness expressed by the moving-average type–token ratio, an index of text activity, mean word length, mean verb distance, and cluster analysis of the most frequent words....

The paper focuses on analyzing the relationship among word order positions of pronominal enclitics in the history of Czech. Specifically, we look at the Wackernagel’s position and the contact position and we try to decide whether these two positions compete, as usually taken for granted, or whether there is a certain kind of cooperation between the...

Bivariate partial-sums discrete probability distributions are defined. The question of the existence of a limit distribution for iterated partial summations is solved for finite-support bivariate distributions which satisfy conditions under which the power method (known from matrix theory) can be used. An oscillating sequence of distributions, a ph...

The problem of iterated partial summations is solved for some discrete distributions defined on discrete supports. The power method, usually used as a computational approach to finding matrix eigenvalues and eigenvectors, is in some cases an effective tool to prove the existence of the limit distribution, which is then expressed as a solution of a...

The paper is focused on the analysis of the relationship between the full valency of the predicate and the position of enclitics in the clause. For this analysis, ones of the oldest Old Czech prose texts were used. We set up the hypothesis - the higher the full valency of the predicate, the lower the probability of the occurrence of the enclitic af...

Lengths (in words) of projective and non-projective sentences from a Czech UD dependency treebank are compared. It is shown that non-projective sentences are significantly longer (in addition, the same result was obtained in this study also for Arabic, Polish, Russian, and Slovak). The hyperpascal distribution, which was suggested as the model for...

The presented study deals with the historical development of Czech (en)clitics (AuxP). Based on the data from the previous research (Kosek 2015a,b, 2017), it focuses on the development of one group the Czech (en)clitics – on the preterite auxiliary forms. In the article, three hypotheses are formulated and then tested on the data gained from select...

The paper deals with the word order of reflexive sě, which is an item on the boundary between a pronominal form and a discrete morpheme. In the first part of the study, we investigate the (en)clitic status of sě in eight books of the oldest complete Czech Bible translation. The analysis focuses only on sě that is dependent on a finite verb: it iden...

In this part of the paper, the distribution of clause positions of the reflexive pronoun sě is analyzed statistically. Specifically, the impact of both stylistic factors and the length of the element in the initial position are investigated. The authors also discuss the possible influence of the word order of the Latin pretext (the Vulgate) on the...

The article presents a quantitative analysis of some syntactic dependency properties in Czech. A dependency frame is introduced as a linguistic unit and its characteristics are investigated. In particular, a ranked frequencies of dependency frames are observed and modelled and a relationship between particular syntactic functions and the number of...

This paper discusses the Menzerath-Altmann law in general at first, then it is shown that the law is valid in spoken Czech. In particular, the relation between word length (measured in the number of syllables) and the mean syllable length (measured in the number of phonemes) is investigated. In addition, we model the relation between the relative o...

According to the Menzerath-Altmann law, there is a relation between the size of the whole and the mean size of its parts. The validity of the law was demonstrated on relations between several language units, e.g., the longer a word, the shorter the syllables the word consists of. In this paper it is shown that the law is valid also in syntactic dep...

Příspěvek je zaměřen na analýzu negace z perspektivy obecného jazykového zákona, který je znám jako Menzerathův-Altmannův zákon (Altmann 1980; Crammer 2005). Tento zákon vyjadřuje vztah mezi délkou jazykového konstruktu (v našem případě tzv. segmentu, viz níže) a průměrnou délkou bezprostředních jednotek daného konstruktu, tzv. konstituentů (v naše...

A new type of mixtures of discrete probability distributions is presented. A family of discrete averaged mixed distributions is introduced. Its subclass of averaged mixed logarithmic distributions is analyzed. Probabilistic characterizations and connections with other types of mixing are derived. We show also some examples of the analyzed distribut...

We present a review of the development and the state of the art of syntactic complex network analysis. Some characteristics of such networks and problems connected with their construction are mentioned. Relations between global network indicators and specific language properties are discussed. Applications of syntactic networks (language acquisitio...

The relationship between two important semantic properties (polysemy and syn-onymy) of language and one of the most fundamental syntactic network properties (a degree of the node) is observed. Based on the synergetic theory of language, it is hypothesized that a word which occurs in more syntactic contexts, i.e. it has a higher degree, should be mo...

The Ord's graph is a simple graphical method for displaying frequency
distributions of data or theoretical distributions in the two-dimensional
plane. Its coordinates are proportions of the first three moments, either
empirical or theoretical ones. A modification of the Ord's graph based on
proportions of indices of qualitative variation is present...

Menzerath's law, the tendency of Z, the mean size of the parts, to decrease
as X, the number of parts, increases is found in language, music and genomes.
Recently, it has been argued that the presence of the law in genomes is an
inevitable consequence of the fact that Z = Y/X, which would imply that Z
scales with X as Z ~ 1/X. That scaling is a ver...

At the very beginning I want to emphasize that this comment is meant neither as a criticism of complex networks in general, nor of the work by Cong and Liu [4] in particular. I consider complex networks a useful tool in language modelling, and the presented review is, in fact, more than a review - not only it sums up a considerable volume of previo...

The paper questions the use of the Pearson chi-square goodness-of-fit test for discrete models in linguistics. It is argued that the stochastic independence, one of necessary conditions for a correct application of the test, is not realistic for linguistic data. Several alternative possibilities (computational and empirical approaches) are suggeste...

Partial-sums discrete probability distributions occurred in description of many stochastic models. They were used also as a tool for creating new distributions, or as a link between known distributions. It is shown in this paper that every discrete distribution with only non-zero probabilities is a partial-sums distribution, and, moreover, that it...

Syntax of natural language has been the focus of linguistics for decades. The complex network theory, being one of new research tools, opens new perspectives on syntax properties of the language. Despite numerous partial achievements, some fundamental problems remain unsolved. Specifically, although statistical properties typical for complex networ...

The aim of this article is to find fixed points and regularities in musical texts, set up statistical tests for their comparison and observe their development. The analysis is based on rank-frequency distributions of pitches. The following indicators are described: the h-point and its angle, the a-indicator, the H-point and the H-coverage having an...

A new discrete distribution which is a generalization of the right truncated geometric distribution is presented. Its basic properties are studied. The distribution is applied to modelling rank frequencies of graphemes.

A generalization of the STER summation is presented. Relations between pro-bability generating functions and moments of the generating and generated dis-tributions are analyzed. It is shown that the Yule distribution is invariant with respect to the considered summation.

A generalization of the partial summation given by N. L. Johnson, S. Kotz and A. W. Kamp [“Univariate discrete distributions” (1992; Zbl 0773.62007), p. 448] and by G. Wimmer and G. Altmann [Acta Univ. Palacki. Olomuc., Fac. Rerum Nat., Math. 39, 215–247 (2000; Zbl 1041.62009)] is presented. Relations between probability generating functions and mo...

## Projects

Project (1)

The project focuses on the development of the word order of Czech pronominal (en)clitics mi "to me", si "REFLdat", ti "to you"; ho "him", mu "to him", sě "REFLacc", tě "you". The analysis is based on representative sonds parts of Old and Middle Czech Bible (created in 14th‒18th Century). The word order of pronominal (en)clitics is investigated: 1. in the phrase of finite verb, 2. in the infinitive, participle, (deverbative) adjective and (deverbative) substantive phrase. The research deals especially with the competition between the second position and contact (verb adjacent) position of the (en)clitics, with the (en)clitic cluster, with the change of originally orthotonic pronominal
forms ho, mu, sě, tě to “constant” (en)clitics and with the proclitization of pronominal (en)clitics. The project methodology relates to the tradition of Czech dependence and functional syntax. As the analysis
of historical development of (en)clitics is also based on frequency characteristics of the observed phenomena, methods of quantitative linguistics are used for a further interpretation of the data.