Ivan Shchitov’s research while affiliated with Yaroslavl State University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (6)


Analysis of Influence of Different Relations Types on the Quality of Thesaurus Application to Text Classification Problems
  • Article

December 2019

·

13 Reads

Automatic Control and Computer Sciences

N. S. Lagutina

·

K. V. Lagutina

·

I. A. Shchitov

·

I. V. Paramonov




Table 1 . Topical classification of the BBCSport corpus 
Analysis of Influence of Different Relations Types on the Quality of Thesaurus Application to Text Classification Problems
  • Article
  • Full-text available

January 2017

·

76 Reads

·

4 Citations

Modeling and Analysis of Information Systems

The main purpose of the article is to analyze how effectively different types of thesaurus relations can be used for solutions of text classification tasks. The basis of the study is an automatically generated thesaurus of a subject area, that contains three types of relations: synonymous, hierarchical and associative. To generate the thesaurus the authors use a hybrid method based on several linguistic and statistical algorithms for extraction of semantic relations. The method allows to create a thesaurus with a sufficiently large number of terms and relations among them. The authors consider two problems: topical text classification and sentiment classification of large newspaper articles. To solve them, the authors developed two approaches that complement standard algorithms with a procedure that take into account thesaurus relations to determine semantic features of texts. The approach to topical classification includes the standard unsupervised BM25 algorithm and the procedure, that take into account synonymous and hierarchical relations of the thesaurus of the subject area. The approach to sentiment classification consists of two steps. At the first step, a thesaurus is created, whose terms weight polarities are calculated depending on the term occurrences in the training set or on the weights of related thesaurus terms. At the second step, the thesaurus is used to compute the features of words from texts and to classify texts by the algorithm SVM or Naive Bayes. In experiments with text corpora BBCSport, Reuters, PubMed and the corpus of articles about American immigrants, the authors varied the types of thesaurus relations that are involved in the classification and the degree of their use. The results of the experiments make it possible to evaluate the efficiency of the application of thesaurus relations for classification of raw texts and to determine under what conditions certain relationships affect more or less. In particular, the most useful thesaurus connections are synonymous and hierarchical, as they provide a better quality of classification.

Download

Analysis of relation extraction methods for automatic generation of specialized thesauri: Prospect of hybrid methods

November 2016

·

30 Reads

·

5 Citations

The paper is devoted to analysis of methods that can be used for automatic generation of specialized thesauri. The authors developed a test bench that allows to estimate most popular methods for relation extraction that constitute the main part of such generation. On the basis of experiments conducted on the test bench the idea of hybrid thesaurus generation methods that combine the algorithms showed the best performance was proposed. Its efficiency was illustrated by creation of the thesaurus for the medical domain with its subsequent estimation on the test bench.

Citations (4)


... Sentiments can be broadly characterized as positive, negative, and neutral. Studies [17], [26] have shown that core sentiment classes are apt for short textual data, such as user tweets. The analysis of user emotions is another subcategory that comprises classes of distinct people's emotions, such as fear, anxiety, joy, happiness, and disgust. ...

Reference:

Automated Classification of Societal Sentiments on Twitter With Machine Learning
Sentiment Classification into Three Classes Applying Multinomial Bayes Algorithm, N-Grams, and Thesaurus
  • Citing Conference Paper
  • April 2019

... Thesauri are typically used, for example, for literature searches where they can support search extensions or suggest more restricted search terms. They are also used for various machine learning tasks, such as automatic indexing, sentiment classification, and coordination between datasets and models [15,16]. ...

A survey on thesauri application in automatic natural language processing
  • Citing Conference Paper
  • November 2017

... It is of interest to study texts tonality classification methods based on machine learning and tonality dictionaries using support vectors and Bayesian classifier [14,15]. Various statistical characteristics are used: TF-IDF, mutual information, Gini coefficient, Kullback-Leibler distance, ꭓ 2 -criterion, etc. [16]. Associative connectivity measures and their effectiveness are investigated when calculating the strength of connectivity of the word combinations components within bigrams and trigrams. ...

Analysis of Influence of Different Relations Types on the Quality of Thesaurus Application to Text Classification Problems

Modeling and Analysis of Information Systems

... The (semi-)automatic creation of thesauri has been actively investigated in recent years. Some researchers aimed to generate thesauri from textual resources using a range of techniques, such as natural language processing algorithms (Lagutina et al., 2016), Siamese Networks (Dhaliwal et al., 2021), statistical methods (Liebeskind et al., 2019), and other mathematical models (Volkovskiy et al., 2019). Others exploited (semi-)structured sources for this task, such as websites (Chen et al., 2003) and Wikipedia data (Nakayama et al., 2007). ...

Analysis of relation extraction methods for automatic generation of specialized thesauri: Prospect of hybrid methods