Irina Pavlova’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (2)


the statistics for relations expressed overt- ly via markers. The most frequent marker for this relation is given. 
Rhetorical Relations Markers in Russian RST Treebank
  • Conference Paper
  • Full-text available

September 2017

·

186 Reads

·

18 Citations

·

·

·

[...]

·

Alexey Shelepov

The paper deals with the pilot version of the first RST discourse treebank for Russian. The project started in 2016. At present, the treebank consists of sixty news texts annotated for rhetorical relations according to RST scheme. However, this scheme was slightly modified in order to achieve higher inter-annotator agreement score. During the annotation procedure, we also registered the discourse connectives of different types and mapped them onto the corresponding rhetoric relations. In present paper, we discuss our experience of RST scheme adaptation for Russian news texts. Besides, we report on the distribution of the most frequent discourse connectives in our corpus. The pdf version of the paper is available here: https://drive.google.com/file/d/0B2w9OsTtG6QMYkYtd21XYjZVSGM/view

Download

Towards building a Discourse-annotated corpus of Russian

June 2017

·

174 Reads

·

9 Citations

For many natural language processing tasks (machine translation evaluation, anaphora resolution, information retrieval, etc.) a corpus of texts annotated for discourse structure is essential. As for now, there are no such corpora of written Russian, which stands in the way of developing a range of applications. This paper presents the first steps of constructing a Rhetorical Structure Corpus of the Russian language. Main annotation principles are discussed, as well as the problems that arise and the ways to solve them. Since annotation consistency is often an issue when texts are manually annotated for something as subjective as discourse structure, we specifically focus on the subject of inter-annotator agreement measurement. We also propose a new set of rhetorical relations (modified from the classic Mann & Thompson set), which is more suitable for Russian. We aim to use the corpus for experiments on discourse parsing and believe that the corpus will be of great help to other researchers. The corpus will be made available for public use.

Citations (2)


... We have 6 corpora for English (Prasad et al., 2019;Zeldes, 2017;Carlson et al., 2001;Asher et al., 2016;Yang and Li, 2018;Nishida and Matsumoto, 2022), 4 for Chinese (Zhou et al., 2014;Cao et al., 2018;Cheng and Li, 2019;Yi et al., 2021), 2 for Spanish (da Cunha et al., 2011;Cao et al., 2018), 2 for Portuguese (Cardoso et al., 2011;Mendes and Lejeune, 2022), 1 for German (Stede and Neumann, 2014), 1 for Basque (Iruskieta et al., 2013), 1 for Farsi (Shahmohammadi et al., 2021), 1 for French , 1 for Dutch (Redeker et al., 2012), 1 for Russian (Toldova et al., 2017), 1 for Turkish (Zeyrek and Webber, 2008;Zeyrek and Kurfalı, 2017), 1 for Italian (Tonelli et al., 2010; and 1 for Thai. In addition, OOD datasets come from the multilingual TED Discourse Bank with data for English, Portuguese and Turkish (Zeyrek et al., 2018(Zeyrek et al., , 2020 Identifying EDU boundaries and connectives (Tasks 1 and 2) corresponds to different corpora: PDTB-based datasets have connectives annotated, but not segmentation, while the others have no connectives. ...

Reference:

DisCut and DiscReT: MELODI at DISRPT 2023
Rhetorical Relations Markers in Russian RST Treebank

... Due to the availability of the large annotated discourse corpora for many languages, especially English, discourse parsers [19,21,17] provide reliable and correct DT for the text. Manually annotated Ru-RSTreebank corpus [26] has been recently introduced which resulted in the creation of discourse parser for Russian [9]. The availability of state-of-the-art discourse parsers for different languages makes the discourse-based models universal, so they could be applied to different texts without modifications. ...

Towards building a Discourse-annotated corpus of Russian