Nataliya Kochetkova's research while affiliated with National Research University Higher School of Economics and other places

Publications (3)

Article
Full-text available
In this paper, we construct paraphrase graphs for news text collections (clusters). Our aims are, first, to prove that paraphrase graph construction method can be used for news clusters identification and, second, to analyze and compare stylistically different news collections. Our news collections include dynamic, static and combined (dynamic and...
Chapter
In this paper we analyze news text collections (clusters) via extracting their paraphrase headlines into a paraphrase graph and working with this graph. Our aim is to test whether news headline is an appropriate form of news text compression. Different types of news collections: dynamic, static and combined (both dynamic and static) clusters are an...
Conference Paper
As part of our project ParaPhraser on the identification and classification of Russian paraphrase, we have collected a corpus of more than 8000 sentence pairs annotated as precise, loose or non-paraphrases. The corpus is annotated via crowdsourcing by naïve native Russian speakers, but from the point of view of the expert, our complex paraphrase de...

Citations

... Extracting more significant keywords is important for many different tasks in big data such as classification [7], clustering [28], indexing [9] and data-analysis [6]. For instance, when more significant keywords are extracted then the subsequently utilized classification algorithms could potentially place the documents into more relevant categories. ...
... In this paper we test a hypothesis that paraphrase construction method allows us to identify thematically homogeneous news clusters. A paraphrase graph is a graph where news headlines are vertices, and two vertices are connected by an edge if they are paraphrases [4]. Such graph reflects the structure of the corresponding news cluster: for example, similar headlines tend to group into subgraphs which refer to the subtopics in the news cluster. ...