
Ekaterina PronozaSaint Petersburg State University | SPBU
Ekaterina Pronoza
About
20
Publications
1,731
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
150
Citations
Publications
Publications (20)
Ethnicity-targeted hate speech has been widely shown to influence on-the-ground inter-ethnic conflict and violence, especially in such multi-ethnic societies as Russia. Therefore, ethnicity-targeted hate speech detection in user texts is becoming an important task. However, it faces a number of unresolved problems: difficulties of reliable mark-up,...
This paper is the first part of contextual predictability model investigation for Russian, it is focused on linguistic and psychology interpretation of models, features, metrics and sets of features. The aim of this paper is to identify the dependence of the implementation of contextual predictability procedures on the genre characteristics of the...
In this paper, we construct paraphrase graphs for news text collections (clusters). Our aims are, first, to prove that paraphrase graph construction method can be used for news clusters identification and, second, to analyze and compare stylistically different news collections. Our news collections include dynamic, static and combined (dynamic and...
In this paper we propose a simple but powerful method of extracting key client requests from bank chat logs. Many companies nowadays are interested in building a chat bot to optimize their business, and are ready to provide chat bot developers with large amounts of data, but such data often need special preparation to be successfully used for a cha...
In this paper we analyze news text collections (clusters) via extracting their paraphrase headlines into a paraphrase graph and working with this graph. Our aim is to test whether news headline is an appropriate form of news text compression. Different types of news collections: dynamic, static and combined (both dynamic and static) clusters are an...
The paper describes the results of the First Russian Paraphrase Detection Shared Task held in St.-Petersburg, Russia, in October 2016. Research in the area of paraphrase extraction, detection and generation has been successfully developing for a long time while there has been only a recent surge of interest towards the problem in the Russian commun...
In this paper we present a new Russian paraphrase corpus derived from the news feed of the social network and conduct its primary analysis. Most media agencies post their news reports on their pages in social networks, and the headlines of the messages are often the same as those of the corresponding news articles from the official websites of the...
As part of our project ParaPhraser on the identification and classification of Russian paraphrase, we have collected a corpus of more than 8000 sentence pairs annotated as precise, loose or non-paraphrases. The corpus is annotated via crowdsourcing by naïve native Russian speakers, but from the point of view of the expert, our complex paraphrase de...
In this paper information extraction task for the restaurant recommendation system is considered. We develop an information extraction system which is intended to gather restaurants aspects from users’ reviews and output them to the recommendation module. As many of the restaurant aspects are subjective, our task can also be called sentiment analys...
This paper presents a crowdsourcing project on the creation of a publicly available corpus of sentential paraphrases for Russian. Collected from the news headlines, such corpus could be applied for information extraction and text summarization. We collect news headlines from different agencies in real-time; paraphrase candidates are extracted from...
In this paper we analyze and compare different types of sentence similarity measures applied to the problem of sentential paraphrase identification. We work with Russian, and all the experiments are conducted on the Russian paraphrase corpus we have collected from the news headlines (and are collecting at the moment). Apart from the similarity meas...
This paper deals with the task of sentential paraphrase identification. We work with Russian but our approach can be applied to any other language with rich morphology and free word order. As part of our ParaPhraser.ru project, we construct a paraphrase corpus and then experiment with supervised methods of paraphrase identification. In this paper w...
Automatic text summarization is a text compression problem with many applications in natural language processing. In this paper we focus the problem of the evaluation of text summarization system. We propose an unsupervised approach based on keywords: it does not require large amount of manual processing and can be implemented as a fully automatic...
In this paper information extraction method for the restaurant recommendation system is proposed. We aim at the development of an information extraction (IE) system which is intended to be a module of the recommendation system. The IE system is to gather information about different aspects of restaurants from online reviews, structure it and feed t...
In this paper corpus-based information extraction and opinion mining method is proposed. Our domain is restaurant reviews, and our information extraction and opinion mining module is a part of a Russian knowledge-based recommendation system. Our method is based on thorough corpus analysis and automatic selection of machine learning models and featu...
A compactified horizontal visibility graph for the language network and identification of the words that define the informational structure of a text is proposed. It was found that the networks constructed in such a way are scale free, and have a property that among the nodes with largest degrees there are words that determine not only communicativ...