
Zuzana KomrskovaCharles University in Prague | CUNI · Institute of the Czech National Corpus
Zuzana Komrskova
Master of Arts
About
18
Publications
2,912
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
52
Citations
Citations since 2017
Publications
Publications (18)
This paper investigates the contribution of author/idiolect vs. register/type-of-text-as the most salient factors influencing the final shape of a text-towards explaining the variation observed in Czech texts. Since it is almost impossible to explore the effect of these factors on authentic data, we used elicited letters collected in a fully crosse...
Při pohledu na texty a jejich různorodost můžeme uplatnit dvě perspektivy: buď se zaměříme na vnější charakteristiky textu, jako je obálka, komunikační médium, grafická stránka atp., nebo zkoumáme použité jazykové prostředky, např. jak často se v textu používají otázky nebo zda je v něm podmiňovací způsob. Zde nám půjde především o tu druhou, vnitr...
Using a multi-dimensional (MD) analysis of register variability, the study compares two corpora of Czech: Koditex, a “traditional” corpus carefully designed using various sources with rich metadata, and Araneum Bohemicum Maximum, a web-crawled corpus with an opportunistic composition representative of the “searchable” web. Both types of corpora are...
The present paper seeks to review relevant criteria used in classifying speech events (SEs) from the perspective of spoken corpus design. The primary goal is to survey the landscape of possible types of spoken language, so as to assess in which directions the coverage of spoken Czech offered by Czech National Corpus corpora can be expanded in the f...
This paper deals with the position of three Czech subordinating conjunctions že 'that', když 'when', and až 'when' within the prosodic word, using the phonetic annotation in the ORTOFON corpus. The position of subordinating conjunctions is traditionally described as initial within the subordinate clause, but the situation in spontaneous speech is n...
The article describes a new representative and reference 9-milion-corpus corpus of contemporary Czech Koditex. Koditex was designed to be as diverse as possible for the purpose of conducting a multidimensional analysis (MDA) of Czech. At the topmost level, it is divided into three modes of communication: written language, spoken language, and web-b...
The article summarizes the theoretical foundations and results of a corpus-driven
study of register variability in contemporary Czech. The descriptive framework is based on the
methodology of multidimensional analysis, as previously applied to various other languages
(see Biber 1995). The starting point is a quantitative analysis of a custom-bui...
This paper is part of a larger research effort on language variability aimed at uncovering the relations between extra- and intratextual characteristics of Czech texts by means of multi-dimensional analysis. The palpable lack of prior art on quantitative register analysis of Czech led to several distinctive methodological decisions, concerning name...
Particles are known to be typical characteristics of spontaneous speech. Their function is to modify the content of the statement, to structure the sentence and to ensure contact with the listener. Both "chápeš" ('you comprehend') and "rozumíš" ('you understand') belong to this group: their form suggests a request for attention or reaction. An anal...
The aim of this paper is to present some remarks concerning the phenomenon of overlaps in spoken language. Simultaneous speech has been studied since the 1970s, when Sacks, Schegloff and Jefferson described their theory of turn-taking organisation in conversation. In the literature, overlaps are sometimes considered as going against the principle o...
The goal of this paper is to examine the role of two collocations (že jo and že ne) in spoken dialogue. Both are said to be typical of spontaneous conversation and express a large scale of pragmatic functions, e.g. uncertainty of the speaker or a request for a backchannel. The examination of their positioning within the utterance in relation to the...
The paper introduces the ORTOFON corpus of spontaneous spoken Czech and the DIALEKT corpus of Czech dialects, their design principles and practical solutions adopted during data collection.
Research into causal conjunctions suggests that there are various degrees of causality and that causality is better situated on a cline between strong and weak. Some studies of English because/’cause/cos suggest a diachronic change in the spoken language, where the use of because is shifting from prototypical subordinator to discourse marker (Stens...
This article deals with emoticons und their meaning in computer-mediated communication. The first part is focused on general information about emoticons, their various classification, functions and meaning of the ten most popular West or East emoticons. The second part introduces the several conclusions of my research. The issue is a comparison bet...
The ORAL series corpora of spontaneous spoken Czech currently contain neither lemmatization nor part of speech tagging. The main reason for this is that readily available NLP tools, designed primarily with written texts in mind, underperform when applied directly to speech transcripts, due to various morpohological and syntactic specificities of in...
The word form to (that.SG.N) traditionally tops frequency tables in corpora of spoken Czech: as a universal (gender- and number-neutral) exophoric (deictic) and endophoric (co-referential) device, it is crucial for spontaneous, unplanned discourse which requires reinforcing references to the context and co-text. Our estimate based on the ORAL serie...
The article deals with the structures that are specific to Internet texts and SMS, which are called emoticons (or smileys). The article summarizes the results of scientific literature on the use of emoticons as carriers of emotions in written texts. The first part focuses on their origin, then several different concepts (or definitions) of emoticon...
Projects
Project (1)
Corpus-based multidimensional analysis (MDA) of register has proven its worth in the empirical study of English and a typologically varied handful of other languages. However, it has never been extensively applied to Slavic languages, which are known for their rich inflection, distinctive morphology and a fairly long literary tradition shaping the styles of different genres. This project aims to describe a register variation in Czech, a language with sociolinguistic situation bordering on diglossia.