Radek Čech

Radek Čech
University of Ostrava · Department of Czech Language

PhD

About

66
Publications
16,027
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
268
Citations
Introduction
The Menzerath-Altmann Law, development of lexical properties, syntactic dependency

Publications

Publications (66)
Chapter
This study is devoted to the word order of the short pronominal forms mi, sě, tě ‘me.dat, refl. acc, you.acc’ dependent on a finite verb in the 1st edition of the Old Czech Bible. The forms studied - permanent enclitics in modern Czech - are numerous enough so that their analysis is possible (unlike other pronominal enclitic forms, i.e., si, ti, ho...
Article
Full-text available
It is shown that the mean morpheme length (measured in phonemes) decreases with the increasing length of word types (in morphemes) in Czech texts, i.e. these language units behave according to the Menzerath-Altmann law. The law is not valid in general for word tokens. Some hints towards an interpretation of parameters are presented.
Article
The paper focuses on dynamics of changes of several linguistic and text properties in diachronic development of Czech. Specifically, we analyze the proportion of identical word-forms (types), the average type length, text length, the proportion of hapax legomena, the moving average type-token ratio, and entropy. For the analysis, seven translations...
Article
Full-text available
Rewriting books was a widespread phenomenon during the Baroque period of the Czech literature. The manuscripts were not always „honest copies”, on the contrary, scribes often compiled several sources or added their own texts to the original. The famous book Golden Key of Heaven by Martin of Cochem is compared with a manuscript Key of Heaven from a...
Article
Full-text available
The review of the book Veselovská, K. (2017). Sentiment Analysis in Czech. Praha: Ústav formální a aplikované lingvistiky
Article
Full-text available
The paper deals with two important questions in linguistic research: 1) What do we actually model when we model language usage? and 2) What is an appropriate sample or ‘text unit’ for the analysis of language usage? In the beginning, we critically discuss several approaches to the analysis of language behaviour. Then, we introduce the most importan...
Article
Full-text available
The article deals with the word order of the Czech reflexive pronoun/ morpheme se dependent on a finite verb in eight selected books of the first edition of the Kralice Bible (1579–1594). The study is a follow-up to the previous research on the word order of the pronominal enclitics mi, ho, mu in the Kralice Bible and of the reflexive pronoun/morph...
Article
Full-text available
The study deals with the word order of the Czech pronominal enclitics mi, si, ti, ho, mu dependent on a finite verb in eight selected books of the first edition of the Kralice Bible (1579–1594). The results are tested using the tools of quantitative linguistics. First, only the forms mi, ho and mu are documented in the analyzed Biblical books, whil...
Book
Full-text available
In Tackling the Toolkit, we focus on the methodological innovations, challenges, obstacles and even shortcomings associated with applying quantitative methods to poetry specifically and poetics more broadly. Using tools including natural language processing, web ontologies, similarity detection devices and machine learning, our contributors explore...
Article
Full-text available
Tato studie se zaměřuje na popis slovosledu pronominálních enklitik mi, ti, si, ho, mu ve vybraných knihách Starého a Nového zákona Bible kralické. Zaměřuje se hlavně na distribuci jednotlivých forem dvou konkurenčních pozic, tj. postiniciální pozice a pozice kontaktní. Příspěvek se zaměřuje na nepostiniciální pozice enklitik v Bibli kralické, kter...
Article
Annual speeches of Czech and Czechoslovak presidents on the occasion of the end of the year are analyzed in this study. Several stylometric methods are used, namely, vocabulary richness expressed by the moving-average type–token ratio, an index of text activity, mean word length, mean verb distance, and cluster analysis of the most frequent words....
Article
Full-text available
The paper focuses on analyzing the relationship among word order positions of pronominal enclitics in the history of Czech. Specifically, we look at the Wackernagel’s position and the contact position and we try to decide whether these two positions compete, as usually taken for granted, or whether there is a certain kind of cooperation between the...
Article
The paper presents results of analysis of the lemma mateřství (motherhood). The authors applied methods of corpus linguistics and discourse analysis-the corpus assisted discourse studies approach-in order to survey representations of the lemma in Czech journalistic texts published from 2010 to 2014, sorted the results into discourse categories on t...
Article
Full-text available
This study deals with the recently proposed concept of so-called Context Specificity of Lemma (CSL). CSL is based on the word embedding technique called Word2vec which enables measuring lexical context similarity between lemmas. Specifically, a recently proposed method Closest Context Specificity (CCS) is applied to a diachronic analysis of Czech t...
Article
Full-text available
The study examines selected samples of Czech 1830s poetry production through the prism of a quantitative conception of euphony. Stemming from Jan Mukařovský’s reflections on the topic, it tries to strengthen the notion through the creation of exact figures with intersubjective validity. To this end, the count of this property devised by Gabriel Alt...
Conference Paper
Full-text available
The paper is focused on the analysis of the relationship between the full valency of the predicate and the position of enclitics in the clause. For this analysis, ones of the oldest Old Czech prose texts were used. We set up the hypothesis - the higher the full valency of the predicate, the lower the probability of the occurrence of the enclitic af...
Conference Paper
Full-text available
Lengths (in words) of projective and non-projective sentences from a Czech UD dependency treebank are compared. It is shown that non-projective sentences are significantly longer (in addition, the same result was obtained in this study also for Arabic, Polish, Russian, and Slovak). The hyperpascal distribution, which was suggested as the model for...
Article
Full-text available
The paper is focused on the short pronominal forms that have status of so called stálá enklitika (‘permanent enclitics’ or enclitica tantum) in Modern Czech: mi ‘me’, ti ‘to you’, si ‘to myself / to yourself etc.’, sě (> se) ‘myself / yourself etc.’, tě ‘you’, ho ‘him’, mu ‘to him’. The analysis is based on the material gained from the selected boo...
Chapter
Full-text available
The article presents a quantitative analysis of chosen books of the Bible svatováclavská (St. Wenceslas Bibel, 1677–1715) and their commentaries collected by Bible translators. A cluster analysis based on the one hundred the most frequent words is used for a comparison of the texts - by using this method, it is possible to differentiate genres. Mor...
Chapter
Full-text available
The aim of this paper is to analyze the word order of short dative pronominal forms mi “to me”, ti “to you”, si “REFLdat” that are dependent on a finite verbal form in the selected parts of the oldest Czech Bible translation, e. g. in the Olomouc Bible (Olomoucká bible). It focuses on those pronominal forms that had status of “permanent enclitics (...
Article
The presented study deals with the historical development of Czech (en)clitics (AuxP). Based on the data from the previous research (Kosek 2015a,b, 2017), it focuses on the development of one group the Czech (en)clitics – on the preterite auxiliary forms. In the article, three hypotheses are formulated and then tested on the data gained from select...
Article
This is a pilot study of usability of Context Specificity measure for stylometric purposes. Specifically, the word embedding Word2vec approach based on measuring lexical context similarity between lemmas is applied to the analysis of texts that belong to different styles. Three types of Czech texts are investigated: fiction, non-fiction, and journa...
Article
Full-text available
The paper deals with the word order of reflexive sě, which is an item on the boundary between a pronominal form and a discrete morpheme. In the first part of the study, we investigate the (en)clitic status of sě in eight books of the oldest complete Czech Bible translation. The analysis focuses only on sě that is dependent on a finite verb: it iden...
Article
Full-text available
In this part of the paper, the distribution of clause positions of the reflexive pronoun sě is analyzed statistically. Specifically, the impact of both stylistic factors and the length of the element in the initial position are investigated. The authors also discuss the possible influence of the word order of the Latin pretext (the Vulgate) on the...
Chapter
Full-text available
The article presents a quantitative analysis of some syntactic dependency properties in Czech. A dependency frame is introduced as a linguistic unit and its characteristics are investigated. In particular, a ranked frequencies of dependency frames are observed and modelled and a relationship between particular syntactic functions and the number of...
Article
Full-text available
The study deals with the application of the neural networks in the linguistic research of word semantics. A new method of measuring context specificity of lemma based on Word Embeddings Word2vec technique is proposed in the first part of the article. Then the method is illustrated in the analysis of the Czech political discourse in the second part....
Article
Full-text available
The presented study deals with the historical development of Czech (en)clitics (AuxP). Based on the data from the previous research (Kosek, 2015a,b, 2017), it focuses on the development of one group the Czech (en)clitics on the preterite auxiliary forms. In the article, three hypotheses are formulated and then tested on the data gained from selecte...
Article
Full-text available
Linguistics Without Langue: A reply to Martin Beneš. The present paper is a reply to the article Should We Abandon the Dichotomous Approach to Language? by Martin Beneš (2015). In the paper, I try to show that Beneš’s critics is based on a misunderstanding of theoretical reasons that lead me to a rejection of the langue-parole dichotomy. Next, I re...
Chapter
This study proposes a method for measuring the morphological richness of text. The method enables us to characterize the morphological complexity of a text (or a corpus). It is based on a computation of the difference between two measurements — the vocabulary richness of lemmas and the vocabulary richness of word forms. The greater the difference,...
Article
Full-text available
According to the Menzerath-Altmann law, there is a relation between the size of the whole and the mean size of its parts. The validity of the law was demonstrated on relations between several language units, e.g., the longer a word, the shorter the syllables the word consists of. In this paper it is shown that the law is valid also in syntactic dep...
Chapter
Full-text available
Různě chápaný koncept měření míry koncentrovanosti textu vzhledem k použitému lexiku, tedy vztah počtu různých slov (typů) ke všem slovům (↗tokenům) v textu. Též bohatství slovníku či rozsah lexika (vocabulary richness // lexical richness). Existuje mnoho různých způsobů výpočtu s.b.t., zpravidla se však jedná o různé modifikace základního indexu o...
Chapter
Full-text available
Vyjádření míry zaměřenosti textu na ústřední téma či témata. Princip měření t.k.t. je založen na vlastnostech rankové frekvenční distribuce zvolených jednotek (zpravidla jde o slovní tvar či lemma) a tzv. h-bodu, který je definován jako místo, kde se pořadí slova rovná jeho frekvenci, tj. r = f (r) kde r je pořadí jednotky a f(r) její frekvence v d...
Article
Full-text available
1. ÚVOD. _v KGA_ 12/2015 uveřejnil Frai:_ti~ek Štícha článek Perspektivy korpusové lingvis-tiky. deskn~ce, neb~_explan_,~c:_e~.(St1cha, 2015), v němž reaguje na naše dva texty z korp~s.ove~o _dvoj_c1sla Nas1 rec1. Jednak šlo o text Jen popis s čísly? Perspektivy kor-pusove J1~gv1st1ky_(~ech, 2014),)ed~ak o text Korpus a reprezentativnost (Chromý, 2...
Chapter
The present study mainly scrutinizes the question of motif-like characters of verb valency. Do they behave as other linguistic units? Tests are performed using the Czech text Šlépej (Footprint) written by Karel Capek and the Hungarian translation of G. Orwell's 1984. The rank-frequency, the spectrum of motifs and the relation between length and fre...
Presentation
Full-text available
Příspěvek je zaměřen na analýzu negace z perspektivy obecného jazykového zákona, který je znám jako Menzerathův-Altmannův zákon (Altmann 1980; Crammer 2005). Tento zákon vyjadřuje vztah mezi délkou jazykového konstruktu (v našem případě tzv. segmentu, viz níže) a průměrnou délkou bezprostředních jednotek daného konstruktu, tzv. konstituentů (v naše...
Presentation
Full-text available
Any text is a sequence of various units such as phonemes, words, sentences, etc. These units can be used for a measurement of many properties of text, e.g. vocabulary richness, thematic concentration, entropy, descriptivity. These text features are usually expressed by one resulting value, mostly in the interval <0; 1> (cf. Popescu et al. 2009; Tuz...
Book
The purpose of this book is to present a systematic analysis of a method to measure a thematic text property, termed thematic concentration, and to introduce ways of applying this method in textology. The method is based on frequency characteristics of a text. Select properties of rank frequency distribution of words are used to detect thematic wor...
Article
Full-text available
We present a review of the development and the state of the art of syntactic complex network analysis. Some characteristics of such networks and problems connected with their construction are mentioned. Relations between global network indicators and specific language properties are discussed. Applications of syntactic networks (language acquisitio...
Article
Full-text available
The impact of text length very often biases results of stylometric indices which are based on rank-frequency distribution (e.g. type-token ratio, repeat rate, entropy). The aim of the article is to observe the relation between text size and thematic concentration indicators (TC, STC). The corpus consists of 1471 English texts of various genres. The...
Article
Full-text available
The research aims to investigate several features of inaugural addresses of the presidents of the United States. The goal of the paper is to observe the presidential speeches from a viewpoint of stylometry indices and to discover whether political and historical circumstances (wars, financial crisis, ideology, etc.) influence the style of inaugural...
Chapter
Full-text available
The contribution investigates a relation between two stylometric features with promising results in text classification: thematic concentration and vocabulary richness. Namely secondary thematic concentration (STC), moving average type-token ratio (MATTR), and repeat rate (RRMC) are analysed. The main aim is to test the hypothesis that vocabulary r...
Article
Full-text available
The relationship between two important semantic properties (polysemy and syn-onymy) of language and one of the most fundamental syntactic network properties (a degree of the node) is observed. Based on the synergetic theory of language, it is hypothesized that a word which occurs in more syntactic contexts, i.e. it has a higher degree, should be mo...
Article
Full-text available
The aim of the article is to evaluate and address the limits of an existing approach to the analysis of the thematic concentration of text. To overcome these limits, the article proposes and applies both a modification of the measurement of thematic concentration – known as secondary thematic concentration and proportional thematic concentration –...
Article
Full-text available
The relationship between ideology and language is analyzed by using quantitative linguistic methods to measure the thematic concentration of texts. The assumption is that totalitarianism and democracy represent radically different types of ideology and that this difference will be reflected in different levels of thematic concentration in texts of...
Book
Full-text available
Quantitative experimental methods have been increasingly used in the humanities in recent years. We can hardly imagine the disciplines of social science, such as psychology, sociology or economics, without a quantitative approach. On the other hand, the majority of linguists, historians and especially literary critics are still refusing to use quan...
Article
Full-text available
The aim of the article is to introduce the measurement of “activity” and “descriptivity” of a text based on the proportions of adjectives and verbs. Functions based on interaction of forces, tests for significance of a property and for comparison of two texts are introduced. The methods are applied to the poetry of the Slovak poetess Eva Bachletová...
Article
Full-text available
This study analyzes the thematic characteristics of journalistic texts written by the Czech Catholic intellectual and journalist Ladislav Jehlička (1916-1996) and the writer, journalist and representative of pre-war democracy Karel Čapek (1890-1938). The main aim of the article is to illustrate how Jehlička's pre-war journalism does not correspond...
Article
The article is focused on the analysis of the frequency structure of texts. Specifically, a geometric characterization of the rank-frequency sequence, which is determined by relationships among the highest word frequency, a number of particular word forms in a text, and the so-called h-point, is analysed. We observe that the geometric characterizat...
Article
The article is concerned with the critical analysis of some aspects of the methodology and language theory on which Mluvnice soucasne cestiny [Grammar of Contemporary Czech] (Cvrcek et al. 2010) is based. First, the statement by the authors of Mluvnice soucasne cestiny concerning the character of the description of language and its relationship to...
Article
Syntax of natural language has been the focus of linguistics for decades. The complex network theory, being one of new research tools, opens new perspectives on syntax properties of the language. Despite numerous partial achievements, some fundamental problems remain unsolved. Specifically, although statistical properties typical for complex networ...
Article
Full-text available
The aim of the article is to introduce a new approach to verb valency analysis. This approach – full valency – observes properties of verbs which occur solely in actual language usage. The term “full valency” means that all arguments, without distinguishing complements (obligatory arguments governed by the verb) and adjuncts (optional arguments dir...
Article
The aim of the article is to test empirically predictions formulated in the Transitivity Hypothesis framework. Methodological problems of the original approach are discussed and some solutions are offered. For the testing of the hypotheses two corpora of Czech were used (Prague Spoken Corpus and Prague Dependency Treebank). The results question bot...
Article
This article is a reaction to M. Komárek's essay Communication versus system? (1999) and is primarily concerned with the critical analysis of the dichotomic concept of natural language. In particular, the absence of empirical evidence for a language system (langue) is pointed out, which creates serious issues for the entire structuralist approach....
Article
Full-text available
This paper deals with structuralism, its roots, general principles and limitations. It follows the evolution of the main structuralist notions (structure, system) in Schleiermacher's and Humboldt's theories of language and tries to explain the causes of the Saussurean langue-parole dichotomy. It argues that the ambiguous Saussurean concept of the s...
Article
Full-text available
Zřejmě není náhoda, že se J. Monod ve své stati odvolává k lingvistovi N. Chomskému, byť jen krátce. Nevelký rozsah totiž nutně nemusí znamenat, že paralely mezi oběma vědci nejsou hlubší, než by se snad na první pohled mohlo zdát. Jméno N. Chomského je v lingvistice nerozlučně spjato se vznikem a rozvojem generativní gramatiky, jež v průběhu 60. l...
Article
This paper deals with structuralism, its roots, general principles and limitations. It follows the evolution of the main structuralist notions (structure, system) in Schleiermacher’s and Humboldt’s theories of language and tries to explain the causes of the Saussurean langue-parole dichotomy. It argues that the ambiguous Saussurean concept of the s...
Article
Full-text available
This article is a reaction to M. Komárek’s essay Communication versus system? (1999) and is primarily concerned with the critical analysis of the dichotomic concept of natural language. In particular, the absence of empirical evidence for a language system (langue) is pointed out, which creates serious issues for the entire structuralist approach....

Projects

Projects (3)
Project
The project focuses on the development of the word order of Czech pronominal (en)clitics mi "to me", si "REFLdat", ti "to you"; ho "him", mu "to him", sě "REFLacc", tě "you". The analysis is based on representative sonds parts of Old and Middle Czech Bible (created in 14th‒18th Century). The word order of pronominal (en)clitics is investigated: 1. in the phrase of finite verb, 2. in the infinitive, participle, (deverbative) adjective and (deverbative) substantive phrase. The research deals especially with the competition between the second position and contact (verb adjacent) position of the (en)clitics, with the (en)clitic cluster, with the change of originally orthotonic pronominal forms ho, mu, sě, tě to “constant” (en)clitics and with the proclitization of pronominal (en)clitics. The project methodology relates to the tradition of Czech dependence and functional syntax. As the analysis of historical development of (en)clitics is also based on frequency characteristics of the observed phenomena, methods of quantitative linguistics are used for a further interpretation of the data.
Archived project
Quantitative Index Text Analyzer (QUITA) covers the most common indicators, especially those connected with frequency structure of a text. In addition to computing results of the indicators, QUITA also provides statistical testing and graphical visualization of obtained data. QUITA is a versatile tool with many uses designed for researchers from various disciplines (linguistics, literary criticism, history, sociology, psychology, politics, biology, etc.). The programme enables basic text processing functions – such as creating word lists, text lemmatizing, or creating n-grams. The program also provides more advanced tools, such as a random text creator or a binary file translator. However, the main part of the software is an indicator computing. Although the authors focused mainly on the indicators connected to frequency structure of a text (e.g., h-point, entropy, repeat rate, adjusted modulus, Gini’s coefficient, lambda), there are also several other characteristics, such as thematic concentration, activity & descriptivity, or writer’s view. More information about the software is to be found in the book QUITA – Quantitative Index Text Analyzer and in the diploma thesis Kvantitativně lingvistický software.
Project
Context specificity of lemma (CSL) is a new method based on the Word2Vec technique and measures the similarity of the context of lemmas. More specifically, each lemma is represented by a vector. A size and orientation of the vector express the position of a lemma in a contextual multi-dimensional space. Thus, it is possible to measure lexical context similarities among lemmas. In case there are two lemmas which appear in the very same context in the corpus, these vectors would be identical. The similarity of these two lemmas would be therefore 1. The goal of the projcect is to develop this method and to test its applicability in various linguistics fields: - Semantic changes from diachronic viewpoint - Discourse analysis - Stylometry: genre analysis - Parts of speech analysis