Václav Cvrček

Václav Cvrček
Charles University in Prague | CUNI · Institute of the Czech National Corpus

About

36
Publications
5,711
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
191
Citations
Citations since 2017
18 Research Items
164 Citations
20172018201920202021202220230102030
20172018201920202021202220230102030
20172018201920202021202220230102030
20172018201920202021202220230102030
Introduction

Publications

Publications (36)
Article
Full-text available
Vladimir V. Putin has banned the use of the word ‘war’ to refer to the conflict in Ukraine. While one’s choice of words is deliberate and conscious, grammatical categories are obligatory and pivotal to signaling the roles notions have in a discourse. Over- and underrepresentation of grammatical cases can be identified by Keymorph Analysis, which me...
Chapter
Full-text available
This paper focuses on lexical intertextuality, namely the three following intertextual properties: 1) the number of word-types shared by two texts; 2) the number of word-types shared by all texts in a collection; 3) the number of wordtypes shared by equal-sized segments of a collection. We have observed that the relation between the number of texts...
Article
This paper describes how corpus-assisted discourse analysis based on keyword identification and interpretation can benefit from employing Market Basket Analysis (mba) after keyword extraction. mba is a data mining technique used originally in marketing that can reveal consistent associations between items in a shopping cart, but also between keywor...
Preprint
Full-text available
This paper describes how corpus-assisted discourse analysis based on keyword (KW) identification and interpretation can benefit from employing Market basket analysis (MBA) after KW extraction. MBA is a data mining technique used originally in marketing that can reveal consistent associations between items in a shopping cart, but also between keywor...
Article
N‑gram analysis (popularized e.g. by Biber et al ., 1999 ) has become a popular method for the identification of recurrent language patterns. Although the extraction of n‑grams from a corpus may seem straightforward, it proves to be very challenging when applied cross-linguistically (cf. e.g. Ebeling and Ebeling, 2013 ; Granger and Lefer, 2013 ; Če...
Article
This paper investigates the contribution of author/idiolect vs. register/type-of-text-as the most salient factors influencing the final shape of a text-towards explaining the variation observed in Czech texts. Since it is almost impossible to explore the effect of these factors on authentic data, we used elicited letters collected in a fully crosse...
Chapter
Full-text available
Our paper introduces Morfio, a corpus-based online tool for the study of derivation and morphological productivity. Originally, Morfio was created for Czech, in this paper, however, we would like to introduce its Latvian implementation. Apart from the tool description, we want to showcase its possibilities for describing Latvian morphology by way o...
Article
Full-text available
Using a multi-dimensional (MD) analysis of register variability, the study compares two corpora of Czech: Koditex, a “traditional” corpus carefully designed using various sources with rich metadata, and Araneum Bohemicum Maximum, a web-crawled corpus with an opportunistic composition representative of the “searchable” web. Both types of corpora are...
Book
Při pohledu na texty a jejich různorodost můžeme uplatnit dvě perspektivy: buď se zaměříme na vnější charakteristiky textu, jako je obálka, komunikační médium, grafická stránka atp., nebo zkoumáme použité jazykové prostředky, např. jak často se v textu používají otázky nebo zda je v něm podmiňovací způsob. Zde nám půjde především o tu druhou, vnitr...
Article
When compiling a list of headwords, every lexicographer comes across words with an unattested representative dictionary form in the data. This study focuses on how to distinguish between the cases when this form is missing due to a lack of data and when there are some systemic or linguistic reasons. We have formulated lexicographic recommendations...
Article
The article summarizes the theoretical foundations and results of a corpus-driven study of register variability in contemporary Czech. The descriptive framework is based on the methodology of multidimensional analysis, as previously applied to various other languages (see Biber 1995). The starting point is a quantitative analysis of a custom-bui...
Article
This paper is part of a larger research effort on language variability aimed at uncovering the relations between extra- and intratextual characteristics of Czech texts by means of multi-dimensional analysis. The palpable lack of prior art on quantitative register analysis of Czech led to several distinctive methodological decisions, concerning name...
Chapter
This paper is an attempt to unpack the “alternativeness” of Sputnik Czech Republic, an online news-opinion portal that targets the Czech-speaking audience. The overarching principle used in the analysis is prominence, a concept used in the corpus linguistic method of keyword analysis. The use of Multi-level Discourse Prominence Analysis (MLDPA), wh...
Article
This paper introduces keymorph analysis (KMA), a new extension of the discourse-probing technique of keyword analysis (KWA). While KWA focuses on lexicon and provides a key predominantly to textual topics and their semantic associations, KMA focuses on morphosyntactic features and captures more general characteristics of texts as wholes. Speeches b...
Article
The paper describes the new corpus SYN2015, the most recent 100 million word corpus of contemporary written Czech. General notions of corpus representativeness and balance are discussed in this context with a focus on the new design of representativeness adopted for SYN2015. Unlike the previous synchronic corpora SYN2000, SYN2005 and SYN2010, which...
Article
Full-text available
The main objective of the paper is to examine whether simplification can be demonstrated to exist in Czech translated texts. In general, simplification as one of the so-called translation universals, is defined as a translators’ tendency to create simpler texts. According to research of English texts, simplification may be manifested e.g. by a lowe...
Article
Full-text available
The exploitation of hapax legomena, i.e. word or lemma types which occur in a corpus only once, is usually overlooked in language description. These types cannot be systematically used for a vast majority of analyses as they do not provide a basis for any type of generalization. On the other hand, the overall number of hapaxes can be used as an ind...
Article
After briefly discussing the heterogeneities inherent to language production and how they influence corpus evidence, we describe a scale for the classification of individual morphological variants by their relative frequencies that has recently been independently proposed in Mluvnice soucasne cestiny (2010) (A Grammar of Contemporary Czech, hereaft...
Article
Since the publication of the Concept of Minimal Intervention (Cvrcek 2008a, Cvrcek 2008b), three critical reactions have been published (Adam 2009, Benes & Prosek 2011, Homolac & Mrazkova 2011) defending the current language policy (based on the Theory of Language Cultivation). This paper discusses the most important points of their criticism: axio...
Conference Paper
We present a new method for automatic term extraction which is based on training datasets created to build inductive models for term identification. Existing approaches employ simple statistical and linguistic rules designed merely ad-hoc and are unable to utilize complex relations of linguistic units. In contrast to those approaches, our method do...
Article
Full-text available
The Concept of Minimal Intervention (CMI) is a "methodological bill" concerning linguists and their approach toward the language and its speakers. CMI represents one possible approach to language, implying programmatic character. CMI prerequisites are: 1) There is no reason why linguistics should infringe upon language development through its inter...
Article
Full-text available
Although "literary language", i.e. standard language or spisovná čestina, was the central notion of the Prague Linguistic Circle's Theory of the Cultivation of Language, it has never been defined. This article deals with the problem of definition of "literariness", a concept which forms the base for the codification criterion of "correspondence wit...
Article
Although “literary language”, i.e. standard language or spisovná čeština, was the central notion of the Prague Linguistic Circle’s Theory of the Cultivation of Language, it has never been defined. This article deals with the problem of definition of “literariness”, a concept which forms the base for the codification criterion of “correspondence wit...

Network

Cited By

Projects

Projects (3)
Project
Corpus-based multidimensional analysis (MDA) of register has proven its worth in the empirical study of English and a typologically varied handful of other languages. However, it has never been extensively applied to Slavic languages, which are known for their rich inflection, distinctive morphology and a fairly long literary tradition shaping the styles of different genres. This project aims to describe a register variation in Czech, a language with sociolinguistic situation bordering on diglossia.