December 2019
·
13 Reads
Automatic Control and Computer Sciences
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
December 2019
·
13 Reads
Automatic Control and Computer Sciences
April 2019
·
33 Reads
·
3 Citations
November 2017
·
12 Reads
·
4 Citations
November 2017
·
25 Reads
·
1 Citation
January 2017
·
76 Reads
·
4 Citations
Modeling and Analysis of Information Systems
The main purpose of the article is to analyze how effectively different types of thesaurus relations can be used for solutions of text classification tasks. The basis of the study is an automatically generated thesaurus of a subject area, that contains three types of relations: synonymous, hierarchical and associative. To generate the thesaurus the authors use a hybrid method based on several linguistic and statistical algorithms for extraction of semantic relations. The method allows to create a thesaurus with a sufficiently large number of terms and relations among them. The authors consider two problems: topical text classification and sentiment classification of large newspaper articles. To solve them, the authors developed two approaches that complement standard algorithms with a procedure that take into account thesaurus relations to determine semantic features of texts. The approach to topical classification includes the standard unsupervised BM25 algorithm and the procedure, that take into account synonymous and hierarchical relations of the thesaurus of the subject area. The approach to sentiment classification consists of two steps. At the first step, a thesaurus is created, whose terms weight polarities are calculated depending on the term occurrences in the training set or on the weights of related thesaurus terms. At the second step, the thesaurus is used to compute the features of words from texts and to classify texts by the algorithm SVM or Naive Bayes. In experiments with text corpora BBCSport, Reuters, PubMed and the corpus of articles about American immigrants, the authors varied the types of thesaurus relations that are involved in the classification and the degree of their use. The results of the experiments make it possible to evaluate the efficiency of the application of thesaurus relations for classification of raw texts and to determine under what conditions certain relationships affect more or less. In particular, the most useful thesaurus connections are synonymous and hierarchical, as they provide a better quality of classification.
November 2016
·
30 Reads
·
5 Citations
The paper is devoted to analysis of methods that can be used for automatic generation of specialized thesauri. The authors developed a test bench that allows to estimate most popular methods for relation extraction that constitute the main part of such generation. On the basis of experiments conducted on the test bench the idea of hybrid thesaurus generation methods that combine the algorithms showed the best performance was proposed. Its efficiency was illustrated by creation of the thesaurus for the medical domain with its subsequent estimation on the test bench.
... Sentiments can be broadly characterized as positive, negative, and neutral. Studies [17], [26] have shown that core sentiment classes are apt for short textual data, such as user tweets. The analysis of user emotions is another subcategory that comprises classes of distinct people's emotions, such as fear, anxiety, joy, happiness, and disgust. ...
April 2019
... Thesauri are typically used, for example, for literature searches where they can support search extensions or suggest more restricted search terms. They are also used for various machine learning tasks, such as automatic indexing, sentiment classification, and coordination between datasets and models [15,16]. ...
November 2017
... It is of interest to study texts tonality classification methods based on machine learning and tonality dictionaries using support vectors and Bayesian classifier [14,15]. Various statistical characteristics are used: TF-IDF, mutual information, Gini coefficient, Kullback-Leibler distance, ꭓ 2 -criterion, etc. [16]. Associative connectivity measures and their effectiveness are investigated when calculating the strength of connectivity of the word combinations components within bigrams and trigrams. ...
January 2017
Modeling and Analysis of Information Systems
... The (semi-)automatic creation of thesauri has been actively investigated in recent years. Some researchers aimed to generate thesauri from textual resources using a range of techniques, such as natural language processing algorithms (Lagutina et al., 2016), Siamese Networks (Dhaliwal et al., 2021), statistical methods (Liebeskind et al., 2019), and other mathematical models (Volkovskiy et al., 2019). Others exploited (semi-)structured sources for this task, such as websites (Chen et al., 2003) and Wikipedia data (Nakayama et al., 2007). ...
November 2016