
Silvie CinkováCharles University in Prague | CUNI · Institute of Formal and Applied Linguistics
Silvie Cinková
Doctor of Philosophy
About
41
Publications
3,417
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
465
Citations
Publications
Publications (41)
We present a data story, the central concept of our international onesemester data analytics course for students of social sciences and humanities. Namely, we demonstrate the four stages of the data lifecycle – gathering, analyzing, annotating, licensing & sharing – using the multilingual correspondence collection of French Slavist André Mazon and...
In our experiment with comprehension testing of Czech legal texts and their paraphrases by two lawyers committed to plain writing, we observed that readers achieved different reading comprehension scores, the main factor being the text author. This effect was particularly strong on weaker readers. In this chapter, we seek to find textual features t...
This article presents a set of standardised corpora of poetry comprising over 330,000 poems in ten languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian, and Spanish). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata, and converted into a unified json...
To help realise its potential as the research infrastructure for language as social and cultural data, CLARIN is supporting the training of students and scholars in using its language data, tools and services. Lecturers and teachers in the CLARIN network have integrated CLARIN language resources into higher education programmes and other training a...
A linguistically informed distant reading presupposes an adequate performance of Natural Language Processing tools. This article describes our evaluation of the UDPipe parser on a manually annotated sample of nineteenth-century Czech poetry in the following steps: (1) creation of a documented data set for this domain (poetry, nineteenth century, Cz...
We have fitted four classic readability metrics to Czech, using InterCorp (a parallel corpus with manual sentence alignment), CzEng 2.0 (a large parallel corpus of crawled web texts), and the optimize.curve fit algorithm from the SciPy library. The adapted metrics are: Flesch Reading Ease, Flesch-Kincaid Grade Level, Coleman-Liau Index, and Automat...
We have fitted four classic readability metrics to Czech, using InterCorp (a parallel corpus with manual sentence alignment), CzEng 2.0 (a large parallel corpus of crawled web texts), and the optimize.curve fit algorithm from the SciPy library. The adapted metrics are: Flesch Reading Ease, Flesch-Kincaid Grade Level, Coleman-Liau Index, and Automat...
Readability of professional and administrative texts in Czech-why to study it and how to measure it? Text comprehension is one of the key skills that are learned during the school years. On the one hand, reading literacy is fundamental for successful comprehension, on the other hand, comprehension success is determined by textual features, i.e. its...
We explore human judgments on how well individual patterns of 29 target verbs from the Pattern Dictionary of English Verbs describe their random KWICs. We focus on cases where more than one pattern is judged as highly appropriate for a given KWIC and seek to estimate the effect of event participants (arguments) being denotatively similar in two pat...
Human judgments of lexical similarity/relatedness are used as evaluation data for Vector Space Models, helping to judge how the distributional similarity captured by a given Vector Space Model correlates with human intuitions. A well established data set for the evaluation of lexical similarity/relatedness is WordSim353, along with its translations...
Low interannotator agreement (IAA) is a well-known issue in manual semantic tagging (sense tagging). IAA correlates with the granularity of word senses and they both correlate with the amount of information they give as well as with its reliability. We compare different approaches to semantic tagging in WordNet, FrameNet, PropBank and OntoNotes wit...
Experiments with semantic annotation based on the Corpus pattern Analysis and the lexical resource PDEV (Hanks and Pustejovsky, 2005), revealed a need of an evaluation measure that would identify the optimum relation between the semantic granularity of the semantic categories in the description of a verb and the reliability of the annotation expres...
Corpus Pattern Analysis (CPA) [1], coined and implemented by Hanks as the Pattern Dictionary of English Verbs (PDEV) [2], appears to be the only deliberate and consistent implementation of Sinclair’s concept of Lexical Item [3]. In his theoretical inquiries [4] Hanks hypothesizes that the pattern repository produced by CPA can also support the word...
This paper presents a real-time implementation of an automatic dialogue system called ‘Senior Companion’, which is not strictly
task-oriented, but instead it is designed to ‘chat’ with elderly users about their family photographs. To a large extent,
this task has lost the usual restriction of dialogue systems to a particular (narrow) domain, and th...
We present a description of a new resource (Prague Dependency Treebank of Spoken Language) being created for English and Czech to be used for the task of speech understanding, broad natural language analysis for dialog systems and other speech-related tasks, including speech editing. The resources we have created so far contain audio and a standard...
This paper aims at a lexical description of frequent uses of frequent lexical verbs in Swedish on the background of Czech, with some implications for the lexical description of such verb uses verbs in general. It results in a draft of a production lexicon of Swedish frequent verbs for advanced Czech learners of Swedish, with focus on their uses as...
Being confronted with spontaneous speech, our current annotation scheme requires alterations that would reflect the abundant use of non-sentential fragments with clausal meaning tightly connected to their context, which do not systematically occur in written texts. The purpose of this paper is to list the common patterns of non-sentential fragments...
This paper gives an overview of the current state of the Prague English Dependency Tree- bank project. It is an updated version of a draft text that was released along with a CD present- ing the first 25% of the PDT-like version of the Penn Treebank - WSJ section (PEDT 1.0). Before the January 2009 release, the conversion from the original phrase s...
Being confronted with spontaneous speech, our current annotation scheme requires alterations that would reflect the abundant use of non-sentential fragments with clausal meaning tightly connected to their context, which do not systematically occur in written texts. The purpose of this paper is to list the common patterns of non-sentential fragments...
Two Languages - One Annotation Scenario? Experience from the Prague Dependency Treebank
This paper compares the two FGD-based annotation scenarios for Czech and for English, with the Czech as the basis. We discuss the secondary predication expressed by infinitive and its functions in Czech and English, respectively. We give a few examples of Englis...
Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories. Editors: Koenraad De Smedt, Jan Hajič and Sandra Kübler. NEALT Proceedings Series, Vol. 1 (2007), 7-18. © 2007 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically publ...
This paper presents the English valency lexicon EngValLex, built within the Func-tional Generative Description framework. The form of the lexicon, as well as the process of its semi-automatic creation is described. The lexicon describes valency for verbs and also includes links to other lexical sources, namely PropBank. Basic statistics about the l...
EngValLex is the name of an FGD-compliant valency lexicon of English verbs, built from the PropBank-Lexicon and following the structure of Vallex, the FGD-based lexicon of Czech verbs. EngValLex is interlinked with the PropBank-Lexicon, thus preserving the original links between the PropBank-Lexicon and the PropBank-Corpus. Therefore it is also sup...
This work focuses on semi-automatic extraction of verb-noun collocations from a corpus, performed to provide lexical evidence for the manual lexicographical processing of Support Verb Constructions (SVCs) in the Swedish-Czech Combinatorial Valency Lexicon of Predicate Nouns. Efficiency of pure manual extraction procedure is significantly improved b...
We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This pa...
We introduce a bilingual MR lexicon of Swedish support verb constructions that lemmatizes their noun components (predicate nouns). The lexicon is meant to be part of a valency lexicon of common Swedish verbs. It is based on the valency theory developed within the Functional Generative Description and it is enriched with Lexical Functions. In order...
We are presenting VPS-30-En, a small lexical resource that contains the following 30 English verbs: access, ally, arrive, breathe, claim, cool, crush, cry, deny, enlarge, enlist, forge, furnish, hail, halt, part, plough, plug, pour, say, smash, smell, steer, submit, swell, tell, throw, trouble, wake and yield. We have created and have been using VP...
Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories. Editors: Koenraad De Smedt, Jan Hajič and Sandra Kübler. NEALT Proceedings Series, Vol. 1 (2007), 163-174. © 2007 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically p...