Elena Yagunova

Elena Yagunova
Saint Petersburg State University | SPBU · Department of Informational Systems in Arts and Humanities

PhD hab. Philology (Dr. Sci.)

About

34
Publications
5,143
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
104
Citations

Publications

Publications (34)
Chapter
This paper is the first part of contextual predictability model investigation for Russian, it is focused on linguistic and psychology interpretation of models, features, metrics and sets of features. The aim of this paper is to identify the dependence of the implementation of contextual predictability procedures on the genre characteristics of the...
Article
Full-text available
In this paper, we construct paraphrase graphs for news text collections (clusters). Our aims are, first, to prove that paraphrase graph construction method can be used for news clusters identification and, second, to analyze and compare stylistically different news collections. Our news collections include dynamic, static and combined (dynamic and...
Chapter
In this paper we propose a simple but powerful method of extracting key client requests from bank chat logs. Many companies nowadays are interested in building a chat bot to optimize their business, and are ready to provide chat bot developers with large amounts of data, but such data often need special preparation to be successfully used for a cha...
Chapter
In this paper we analyze news text collections (clusters) via extracting their paraphrase headlines into a paraphrase graph and working with this graph. Our aim is to test whether news headline is an appropriate form of news text compression. Different types of news collections: dynamic, static and combined (both dynamic and static) clusters are an...
Conference Paper
The paper describes the results of the First Russian Paraphrase Detection Shared Task held in St.-Petersburg, Russia, in October 2016. Research in the area of paraphrase extraction, detection and generation has been successfully developing for a long time while there has been only a recent surge of interest towards the problem in the Russian commun...
Chapter
In this paper we present a new Russian paraphrase corpus derived from the news feed of the social network and conduct its primary analysis. Most media agencies post their news reports on their pages in social networks, and the headlines of the messages are often the same as those of the corresponding news articles from the official websites of the...
Conference Paper
As part of our project ParaPhraser on the identification and classification of Russian paraphrase, we have collected a corpus of more than 8000 sentence pairs annotated as precise, loose or non-paraphrases. The corpus is annotated via crowdsourcing by naïve native Russian speakers, but from the point of view of the expert, our complex paraphrase de...
Conference Paper
In this paper information extraction task for the restaurant recommendation system is considered. We develop an information extraction system which is intended to gather restaurants aspects from users’ reviews and output them to the recommendation module. As many of the restaurant aspects are subjective, our task can also be called sentiment analys...
Chapter
This paper presents a crowdsourcing project on the creation of a publicly available corpus of sentential paraphrases for Russian. Collected from the news headlines, such corpus could be applied for information extraction and text summarization. We collect news headlines from different agencies in real-time; paraphrase candidates are extracted from...
Article
Full-text available
Среди разнообразных методов изучения детской речи до настоящего времени не предпринималось попыток количественного статистического анализа распределения грамматических категорий в текстах детей как с нормой развития, так и при состояниях первичного недоразвития речи. Существует много свидетельств, что в процессе развития языка и речи ребенок начина...
Conference Paper
Full-text available
In this paper we analyze and compare different types of sentence similarity measures applied to the problem of sentential paraphrase identification. We work with Russian, and all the experiments are conducted on the Russian paraphrase corpus we have collected from the news headlines (and are collecting at the moment). Apart from the similarity meas...
Conference Paper
Full-text available
This paper deals with the task of sentential paraphrase identification. We work with Russian but our approach can be applied to any other language with rich morphology and free word order. As part of our ParaPhraser.ru project, we construct a paraphrase corpus and then experiment with supervised methods of paraphrase identification. In this paper w...
Conference Paper
Full-text available
Automatic text summarization is a text compression problem with many applications in natural language processing. In this paper we focus the problem of the evaluation of text summarization system. We propose an unsupervised approach based on keywords: it does not require large amount of manual processing and can be implemented as a fully automatic...
Conference Paper
In this paper information extraction method for the restaurant recommendation system is proposed. We aim at the development of an information extraction (IE) system which is intended to be a module of the recommendation system. The IE system is to gather information about different aspects of restaurants from online reviews, structure it and feed t...
Conference Paper
In this paper corpus-based information extraction and opinion mining method is proposed. Our domain is restaurant reviews, and our information extraction and opinion mining module is a part of a Russian knowledge-based recommendation system. Our method is based on thorough corpus analysis and automatic selection of machine learning models and featu...
Conference Paper
This paper is devoted to the analysis of the Russian Media during the "Snow Revolution": the period of a political crisis in Russia between December_2011 and March_2012. The falsification during the parliamentary elections caused numerous demonstrations and street actions, which continued till the presidential elections. The social networks played...
Conference Paper
A compactified horizontal visibility graph for the language network and identification of the words that define the informational structure of a text is proposed. It was found that the networks constructed in such a way are scale free, and have a property that among the nodes with largest degrees there are words that determine not only communicativ...
Conference Paper
Full-text available
We present the study of the terminology, subdomain interaction, information structure of the text and the corpus. The goal of this research is to determine the distribution features of lexis that are able to distinguish common and subdomain terminology. Our objective is to identify keyword features as the most informative structural elements, descr...
Book
Full-text available
В учебном пособии рассматриваются базовые вопросы компьютерной лингвистики: от теории лингвистического и математического моделирования до вариантов технологических решений. Дается лингвистическая интерпретация основных лингвистических объектов и единиц анализа. Приведены сведения, необходимые для создания отдельных подсистем, отвечающих за анализ т...
Book
Full-text available
We are delighted to hereby present the proceedings of CHAT 2011. Altogether, 11 papers were accepted for the presentation: 3 regular papers, 6 short papers, and 2 demonstration papers. The workshop papers cover various topics on automated approaches to terminology extraction and creation of terminology resources, compiling multilingual terminology,...
Article
Full-text available
Collocations are understood in this work as the nonrandom combination of two or more lexical units that is typical for both a language as a whole (texts of any type) and a definite type of text. A text is a structured sequence of units of different levels; collocations, as complex text substructures, act as an important object when investigating te...

Network

Cited By

Projects

Projects (3)
Project
The project aims at (a) studying the patterns and mechanisms of discourse development in preschool children and (b) elaborating a multifactorial and multidimensional model of the discourse skills development. The essential objective of the project is to carry out a longitudinal study from the psycholinguistic perspective and to analyze the development of oral discourse skills in different communicative contexts. The main attention is paid to three genres: 1) daily conversational dialogue, 2) personal narrative, and 3) fictional story. The project is supported by the Russian Science Foundation, research grant No.18-18-00114.