Anna Glazkova

Anna Glazkova
University of Tyumen · School of Computer Science

PhD

About

56
Publications
7,457
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
177
Citations
Additional affiliations
May 2023 - present
The Institute for Information Transmission Problems (Kharkevich Institute)
Position
  • Senior Researcher
May 2023 - present
University of Tyumen
Position
  • Researcher
Education
September 2007 - June 2012
University of Tyumen
Field of study
  • Applied Informatics

Publications

Publications (56)
Article
Full-text available
Text complexity assessment is a challenging task requiring various linguistic aspects to be taken into consideration. The complexity level of the text should correspond to the reader's competence. A too complicated text could be incomprehensible, whereas a too simple one could be boring. For many years, simple features were used to assess readabili...
Preprint
Full-text available
Keyphrases are crucial for searching and systematizing scholarly documents. Most current methods for keyphrase extraction are aimed at the extraction of the most significant words in the text. But in practice, the list of keyphrases often includes words that do not appear in the text explicitly. In this case, the list of keyphrases represents an ab...
Article
Full-text available
Repair is recognized as an important part of the circular economy and leads to fewer resources being used, less waste, and less emissions generation. The crucial condition for scaling repairs is people’s perception of repairs as a significant social practice harmonizing the relationship between society and nature. This paper aims to analyze the key...
Article
Full-text available
Preprocessing is a crucial step for each task related to text classification. Preprocessing can have a significant impact on classification performance, but at present there are few large-scale studies evaluating the effectiveness of preprocessing techniques and their combinations. In this work, we explore the impact of 26 widely used text preproce...
Article
Full-text available
Green practices are social practices that aim to harmonize the relations between people and the natural environment. They may involve minimizing the use of resources and the generation of waste and emissions. Detecting green practices in social media posts helps to understand which green practices are currently common and to develop recommendations...
Preprint
Full-text available
Large language models (LLMs) play a crucial role in natural language processing (NLP) tasks, improving the understanding, generation, and manipulation of human language across domains such as translating, summarizing, and classifying text. Previous studies have demonstrated that instruction-based LLMs can be effectively utilized for data augmentati...
Preprint
Full-text available
Keyphrase selection is a challenging task in natural language processing that has a wide range of applications. Adapting existing supervised and unsupervised solutions for the Russian language faces several limitations due to the rich morphology of Russian and the limited number of training datasets available. Recent studies conducted on English te...
Chapter
Full-text available
Modern models for text generation show state-of-the-art results in many natural language processing tasks. In this work, we explore the effectiveness of abstractive text summarization models for keyphrase selection. A list of keyphrases is an important element of a text in databases and repositories of electronic documents. In our experiments, abst...
Preprint
Full-text available
Keyphrase selection plays a pivotal role within the domain of scholarly texts, facilitating efficient information retrieval, summarization, and indexing. In this work, we explored how to apply fine-tuned generative transformer-based models to the specific task of keyphrase selection within Russian scientific texts. We experimented with four distinc...
Article
Full-text available
The text complexity assessment is an applied problem of current interest with potential application in the drafting of legal documents, editing textbooks, and selecting books for extracurricular reading. The methods for generating a feature vector when automatically assessing the text complexity are quite diverse. Early approaches relied on easily...
Article
Full-text available
In this work, we applied the multilingual text-to-text transformer (mT5) to the task of keyphrase generation for Russian scientific texts using the Keyphrases CS&Math Russian corpus. The automatic selection of keyphrases is a relevant task of natural language processing since keyphrases help readers find the article easily and facilitate the system...
Article
Full-text available
This article provides a review of publications on the analysis of students’ satisfaction with the educational process based on natural language processing methods. 197 student feedback on 129 elective disciplines at University of Tyumen was collected. A comparative analysis of keyword extraction methods was conducted: statistical TF-IDF, RAKE and Y...
Conference Paper
Determining the morphemic structure of a word is a problem that is particularly relevant in teaching the Russian language. Automatic evaluation of this structure is complicated by the lack of agreement among linguists in some complex cases. At the same time, several papers have been published in recent years, whose authors use various machine learn...
Conference Paper
Full-text available
The paper presents an approach to named entity oriented sentiment analysis of Russian news texts proposed during the RuSentNE evaluation. The approach is based on RuRoBERTa-large, a pre-trained RoBERTa model for Russian. We compared several types of entity representation in the input text, and evaluated strategies for handling class imbalance and r...
Article
Full-text available
In this paper, we attempted to adapt various well-known algorithms for keyword selection to a very specific text corpus containing abstracts of Russian academic papers from the mathematical and computer science domain. We faced several challenges including the lack of research in the field of keyword extraction for Russian, the absence of large tex...
Article
Keyphrases are crucial for searching and systematizing scholarly documents. Most current methods for keyphrase extraction are aimed at the extraction of the most significant words in the text. But in practice, the list of keyphrases often includes words that do not appear in the text explicitly. In this case, the list of keyphrases represents an ab...
Conference Paper
Automatic selection of keyphrases (keywords) is a major challenge to finding and systematizing scholarly documents. This paper investigates the efficiency of using titles of scientific papers as additional information for keyphrase generation. We propose an approach to multi-task fine-tuning the BART model using control codes . It is shown that the...
Preprint
Full-text available
The paper describes a transformer-based system designed for SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis. The purpose of the task was to predict the intimacy of tweets in a range from 1 (not intimate at all) to 5 (very intimate). The official training set for the competition consisted of tweets in six languages (English, Spanish, Itali...
Conference Paper
Full-text available
Automatic text classification is a common task of natural language processing and machine learning. A typical limitation of machine learning models that are trained on data from a specific genre, is their lower performance on other genres. In this work, we evaluated the performance of several in-genre and cross-genre age-based text classification m...
Conference Paper
Full-text available
The paper describes a transformer-based system designed for SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis. The purpose of the task was to predict the intimacy of tweets in a range from 1 (not intimate at all) to 5 (very intimate). The official training set for the competition consisted of tweets in six languages (English, Spanish, Itali...
Conference Paper
Full-text available
State-of-the-art data augmentation methods help improve the generalization of deep learning models. However, these methods often generate examples that contradict the preserving class labels. This is crucial for some natural language processing tasks, such as fake news detection. In this work, we combine sequence-to-sequence and natural language in...
Article
Full-text available
In modern societies, growing consumption rates lead to the depletion of the planet's resources and the waste generation. This paper studies green practices aimed at reducing consumption, which were published in social media communities covering separate waste collection. First, we selected nine green practices regarding separate waste collection an...
Article
Full-text available
The paper is devoted to the task of searching for mentions of green practices in social media texts. The relevance of this task is dictated by the need to expand existing knowledge about the use of green practices in society and the spread of existing green practices. This paper uses a text corpus consisting of the texts published on the environmen...
Conference Paper
Full-text available
The paper describes neural models developed for the DAGPap22 shared task hosted at the Third Workshop on Scholarly Document Processing. This shared task targets the automatic detection of generated scientific papers. Our work focuses on comparing different transformer-based models as well as using additional datasets and techniques to deal with imb...
Preprint
Full-text available
The paper describes neural models developed for the DAGPap22 shared task hosted at the Third Workshop on Scholarly Document Processing. This shared task targets the automatic detection of generated scientific papers. Our work focuses on comparing different transformer-based models as well as using additional datasets and techniques to deal with imb...
Article
Full-text available
Предлагается подход к улучшению качества генерации заголовков, основанный на ранжировании примеров обучающей выборки в соответствии со значениями метрики ROUGE-1, вычисленных для текстов и заголовков, фильтрации данных и генерации искусственных обучающих примеров. Предложенный подход, протестированный на примере нейросетевой модели BART, показал ул...
Article
Full-text available
The authors hypothesize that the textual information posted on personal pages on social media reflects the political views of users to some extent. Therefore, this textual information can be used to predict political views on social media. The authors conduct experiments on textual data from user pages and test two machine learning methods to class...
Article
Full-text available
The authors hypothesize that the textual information posted on personal pages on social media reflects the political views of users to some extent. Therefore, this textual information can be used to predict political views on social media. The authors conduct experiments on textual data from user pages and test two machine learning methods to class...
Chapter
Insulting speech acts have become the subject of public discussion in the media, social media, the basis for speculation in political communication, and a working concept in the legal environment. The present research article explores insulting speech acts on the social network site “VKontakte” aiming to develop an algorithm for automatic classific...
Conference Paper
Full-text available
This paper describes neural models developed for the Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages Shared Task 2021. Our team called neuro-utmn-thales participated in two tasks on binary and fine-grained classification of English tweets that contain hate, offensive, and profane content (English Subtasks A & B)...
Preprint
Full-text available
This paper describes neural models developed for the Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages Shared Task 2021. Our team called neuro-utmn-thales participated in two tasks on binary and fine-grained classification of English tweets that contain hate, offensive, and profane content (English Subtasks A & B)...
Article
Full-text available
The article examines the correlation of two indices characterizing the level of linguistic or semantic complexity of the book content. The first index is the age rating in accordance with the Russian Age Rating System for information products. The second index is the ease of understanding of the text, calculated based on the common readability metr...
Chapter
This article is devoted to the main possibilities of using natural language text processing technologies for historical research. The authors present a taxonomy of biographical facts and use text mining technologies (extraction of information from texts) to obtain biographical information from texts in Russian in accordance with the proposed types...
Chapter
This paper describes neural models developed for the First Workshop on Scope Detection of the Peer Review Articles shared task collocated with PAKDD 2021. The aim of the task is to identify topics or category of scientific abstracts. We investigate the use of several fine-tuned language representation models pretrained on different large-scale corp...
Preprint
Full-text available
This paper describes our system for SemEval-2021 Task 5 on Toxic Spans Detection. We developed ensemble models using BERT-based neural architectures and post-processing to combine tokens into spans. We evaluated several pre-trained language models using various ensemble techniques for toxic span identification and achieved sizable improvements over...
Chapter
The COVID-19 pandemic has had a huge impact on various areas of human life. Hence, the coronavirus pandemic and its consequences are being actively discussed on social media. However, not all social media posts are truthful. Many of them spread fake news that cause panic among readers, misinform people and thus exacerbate the effect of the pandemic...
Chapter
The ability to automatically determine the age audience of a novel provides many opportunities for the development of information retrieval tools. Firstly, developers of book recommendation systems and electronic libraries may be interested in filtering texts by the age of the most likely readers. Further, parents may want to select literature for...
Preprint
Full-text available
The COVID-19 pandemic has had a huge impact on various areas of human life. Hence, the coronavirus pandemic and its consequences are being actively discussed on social media. However, not all social media posts are truthful. Many of them spread fake news that cause panic among readers, misinform people and thus exacerbate the effect of the pandemic...
Preprint
Full-text available
We present an approach to topical classification of biographical text fragments that takes into account the nearest context of classified fragments using a neural network with several inputs. The choice of the model architecture is based on the assumption that since texts written in a natural language differ in consistency and coherence, the contex...
Preprint
Full-text available
The authors hypothesize that the textual information posted on personal pages on social media reflects the political views of users to some extent. Therefore, this textual information can be used to predict political views on social media. The authors conduct experiments on textual data from user pages and test two machine learning methods to class...
Preprint
Full-text available
The ability to automatically determine the age audience of a novel provides many opportunities for the development of information retrieval tools. Firstly, developers of book recommendation systems and electronic libraries may be interested in filtering texts by the age of the most likely readers. Further, parents may want to select literature for...
Preprint
Full-text available
The article describes a fast solution to propaganda detection at SemEval-2020 Task 11, based onfeature adjustment. We use per-token vectorization of features and a simple Logistic Regressionclassifier to quickly test different hypotheses about our data. We come up with what seems to usthe best solution, however, we are unable to align it with the r...
Preprint
Full-text available
The authors compared oversampling methods for the problem of multi-class topic classification. The SMOTE algorithm underlies one of the most popular oversampling methods. It consists in choosing two examples of a minority class and generating a new example based on them. In the paper, the authors compared the basic SMOTE method with its two modific...
Conference Paper
Full-text available
The addressee plays a major role in communication. Text creating involves taking into account the features of the target audience, to which he refers in writing. In this article, the text addressee detection is considered from the point of view of natural language processing. The task of age classification deserves special attention. Its relevance...
Article
The paper presents the results of evaluating the informative value of quantitative and binary signs to solve the problem of finding semantically close sentences (paraphrases). Three types of signs are considered in the article: those built on vector representations of words (according to the Word2Vec model), based on the extraction of numbers and s...
Chapter
In recent years, there has been an increasing interest in digital humanities. This interest is justified by the development of natural language processing tools and the emergence of digitized text collections of documents in different fields of knowledge, for example, literature, art, philosophy, and history. In this paper, we applied unsupervised...
Chapter
Figurative speech is an umbrella term for metaphor, irony, sarcasm, puns and some other speech genres and figures of speech. In research and competitions like SemEval, each of them is usually processed separately with a task-specific model. However, being altogether called “figurative speech”, they should share some property: “figurativeness”. If s...
Article
Full-text available
The search and classification of text documents are used in many practical applications. These are the key tasks of information retrieval. Methods of text searching and classifying are used in search engines, electronic libraries and catalogs, systems for collecting and processing information, online education and many others. There are a large num...
Article
Full-text available
Objectives The aim is to compare the efficiency of using the Euclidean and Mahalanobis metrics to solve the problem of determining the category of potential text recipients. The relevance of the task is determined by the need to develop a means of identifying the recipients of electronic documents. This has been complicated with the introduction of...
Article
Full-text available
The article deals with a new approach to text classification considering the existence of different types of classification features (binary, nominal, ordinal and interval). The specialty of the approach is a phased classification process, which makes it possible to not cause different types of classification features to a single range. The author...

Network

Cited By