Valery D Solovyev

Valery D Solovyev
Kazan (Volga Region) Federal University · Institute of Computer Mathematics and Information Technologies

professor

About

101
Publications
22,473
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
484
Citations

Publications

Publications (101)
Article
Concrete/abstract words are used in a growing number of psychological and neurophysiological research. For a few languages, large dictionaries have been created manually. This is a very time-consuming and costly process. To generate large high-quality dictionaries of concrete/abstract words automatically one needs extrapolating the expert assessmen...
Article
Full-text available
The problem of determining semantic similarity between words affects the understanding of synonymy and creates obstacles to the work of lexicographers. The study was carried out as a part of a larger research project on expert assessment of syn-onymic rows in RuWordNet thesaurus (a WordNet–like thesaurus for the Russian language). The aim of this s...
Chapter
The article presents the analyses of analogues in the RuWordNet thesaurus regarding their semantic distance. Consideration of analogues is by nature a comprehensive analysis for identifying of possible routes for each pair of words with application of both traditional linguistic methods and approaches of cognitive and corpus linguistics. The lack o...
Chapter
Full-text available
In recent years, several scientific disciplines – linguistics, psychology, psycholinguistics, neurophysiology, medicine, philosophy, education – have paid considerable attention to studying the concreteness/abstractness concept. Existing reviews on this topic cover only one area of research. This article provides an interdisciplinary overview from...
Conference Paper
The article presents findings of distribution patterns of Russian grammatical categories computed with the help of MyStem.3 tagger and a proprietary Russian language processor, ETAP-3. The corpus of over 1.1 mln tokens compiled for the study comprises two types of academic texts used in Russian schools: Science textbooks and Humanities textbooks. W...
Article
Full-text available
Large dictionaries of abstract/concrete words were compiled for several languages by interviewing native speakers. The Russian dictionary contains only one thousand words. This article proposes a new method for automatic generation of a large (tens of thousands of words) Russian dictionary of abstract/concrete words by using neural networks trained...
Article
The purpose of this study is to survey the correlation and association coefficients introduced previously on the set of binary n-tuples and to determine coefficients satisfying the properties of correlation functions. These functions were recently introduced on the sets with involutive operation as functions generalizing classical correlation coeff...
Chapter
The study explores the problem of assessing complexity of Russian educational texts. In this paper, we focus on measuring conceptual complexity which is rarely selected as a research question and propose to use a thesaurus (or a linguistic ontology) to this end. We also compiled an original corpus of school textbooks on Social Studies, History used...
Chapter
The article presents new method implemented by the authors to generate dictionaries of concrete/abstract words for Russian. The method based on pretrained word embeddings computes concreteness ranking defined as a function of similarity between word vectors and the distance between a word in question and the ‘seed’ of concrete/abstract words. Imple...
Chapter
The article discusses representativeness of Google Books Ngram as a multi-purpose corpus. Criticism of the corpus is analysed and discussed. A comparative study of the GBN data and the data obtained using the Russian National Corpus and the General Internet Corpus of Russian is performed to show that the Google Books Ngram corpus can be successfull...
Article
Full-text available
Determination of markers of systemic inflammation is one of the important directions in the study of pathogenesis and improvement of diagnosis of chronic obstructive pulmonary disease (COPD), asthma-COPD overlap (ACO), and bronchial asthma (BA). The aim of our work was a comparative study of the features of changes in serum levels of IL-17, IL-18,...
Article
Creation of dictionaries of abstract and concrete words is a well-known task. Such dictionaries are important in several applications of text analysis and computational linguistics. Usually, the process of assembling of concreteness scores for words begins with a lot of manual work. However, the process can be automated significantly using informat...
Article
Full-text available
The recently created RuWordNet thesaurus reflects hierarchical, primarily synonymous, relations in the vocabulary so that it is built from a set of synsets. In this paper, we applied the method of a comparative analysis: we compared the data of classical dictionaries of Russian language synonyms, RuWordNet thesaurus and the results of a students' p...
Chapter
The article proposes a method for detecting semantic change using diachronic corpora data. The method is based on the distributional hypothesis. The analysis is performed using frequencies of syntactic bigrams from the English and Russian sub-corpora of Google Books Ngram. To obtain the word co-occurrence profile in its new meaning, syntactic bigra...
Chapter
This article explores the principles of synsets in the RuWordNet thesaurus and synonyms in the classical dictionaries of Russian synonyms (N = 10) to identify discrepancies and improve the principles of organising synsets in RuWordNet. The relevance of the study is determined by the demand for WordNet resources in natural language processing tasks....
Article
Full-text available
It was previously shown that the WordNet thesaurus has a small-world structure. We obtained a similar result for RuWordNet, a recently created thesaurus of the Russian language, and determined the main characteristics of the network of semantic relations in the Russian language. They are the average length of the path between the vertices, the maxi...
Chapter
The article describes general regularities of frequency dynamics of syntactic bigrams and the method used to analyse them. The work objective is to quantitatively estimate the typical rate of change in frequency of syntactic bigrams in English and Russian. Both changes in frequency of words contained in syntactic bigrams and changes in the co-occur...
Article
Education policy makers view measuring academic texts readability and profiling classroom textbooks as a primary task of education management aimed at sustaining quality of reading programs. As Russian readability metrics, i.e. “objective” features of texts determining its complexity for readers, are still a research niche, we undertook a comparati...
Chapter
Traditionally, it is believed in linguistics that the center of any semantic field is more stable than the periphery. Quantitative testing of this hypothesis has become possible due to creation of large diachronic text corpora. The article describes the results of quantitative analysis of “central elements” (semantic dominants) of 82 synonymic sets...
Article
Full-text available
Soluble molecules of the major histocompatibility complex play an important role in the development of various immune-mediated diseases. However, there is not much information on the participation of these proteins in the pathogenesis of chronic obstructive pulmonary disease (COPD). The aim of our work was to determine the content of soluble molecu...
Chapter
Full-text available
The authors of the article offer new readability formulas for academic texts which provide a comparatively higher degree of accuracy than other Russian readability formulas. The results achieved are due to using original syntactic, lexical and frequency metrics ignored in previous research on Russian readability. The methods applied by the authors...
Conference Paper
We introduce new correlation measures for measuring similarity and association of rating profiles obtained from bipolar rating scales. Instead of the measurement based approach when the user’s rating is considered as a number measured in ordinal, interval or ratio scales we use model based approach when user’s rating is modeled by bipolar score fun...
Chapter
Full-text available
In this paper, the problem of fake accounts in online social networks is addressed through the lens of resulting misstatements of the structure of network interactions between users. The study of a network as a social space becomes difficult because of additional noise created by fakes.
Article
Full-text available
In this paper we explore to what extent text parameters, such as average number of words per sentence, syllables per word, nouns per sentence, frequency of content words, etc. can successfully rank Russian academic texts for different age and grade levels. We provide a brief overview of previous research on readability of Russian texts and describe...
Article
Full-text available
Bronchial asthma (BA) is often associated with chronic inflammatory processes in the nasal mucosa; these processes give rise to allergic rhinitis, chronic rhinosinusitis, adenoiditis, and polypous rhinosinusitis. Due to their multiple symptoms, these diseases of the upper respiratory tract, especially allergic rhinitis, are often difficult to verif...
Article
Google Books Ngram was used to assess changes in frequency of usage in words corresponding to collectivistic and individualistic values in Russia during the time of economic changes. It was found that in many domains transition to market economy was associated with a rise in the use of words corresponding to individualistic values and a decrease in...
Article
Full-text available
A bipolar rating scale is a linearly ordered set with symmetry between elements considered as negative and positive categories. First, we present a survey of bipolar rating scales used in psychology, sociology, medicine, recommender systems, opinion mining, and sentiment analysis. We discuss different particular cases of bipolar scales and, in part...
Article
The necessity to transfer from the paper-based Dialectological Atlas of the Russian Language, published 30 years ago, to an electronic database is substantiated. This would make the information contained in the atlas available to a large number of users and ensure on-the-fly information retrieval, isogloss plotting, and finding dialects with a spec...
Article
Full-text available
We propose methods of 3D visualization of the main similarity measures for binary data and 2 x 2 tables. We present the shapes of Jaccard, Dice, Sokal & Sneath, Roger & Tanimoto and other similarity measures. Such visualization of the similarity measures gives the direct, visual, method of comparison of these measures and helps to understand the si...
Article
Full-text available
Studies of the overall structure of vocabulary and its dynamics became possible due to creation of diachronic text corpora, especially Google Books Ngram. This article discusses the question of core change rate and the degree to which the core words cover the texts. Different periods of the last three centuries and six main European languages prese...
Article
Full-text available
Background/Objectives: The article regards the areal community of the Caucasian languages aiming to reveal relevant features of each family and suggest hypotheses on the development of separate representatives of each family. Methods/ Statistical Analysis: The main tool for the research is the database “Languages of the World” of Institute of Lingu...
Article
Full-text available
Background/Objectives: The article regards the areal community of the Caucasian languages aiming to reveal relevant features of each family and suggest hypotheses on the development of separate representatives of each family. Methods/Statistical Analysis: The main tool for the research is the database "Languages of the World" of Institute of Lingui...
Article
Full-text available
Automatic event extraction form text is an important step in knowledge acquisition and knowledge base population. Manual work in development of extraction system is indispensable either in corpus annotation or in vocabularies and pattern creation for a knowledge-based system. Recent works have been focused on adaptation of existing system (for extr...
Article
Full-text available
One of the major issues dealing with time-series classification problem is the choice of similarity measure. This article presents a comparative analysis of the similarity measure for time series based on moving approximations transform (MAP transforms) with other two most useful measures: Algorithm Dynamic Transformation and Euclidean distance for...
Conference Paper
The study compares the dynamics of a number of important socio-demographic parameters with the frequency of the use of keywords for Russian society. We have shown that the frequency of words typical for the patriarchal way of life of Russian society in the 19th century decreases with the reduction of rural population. The paper discusses the intera...
Article
The article observes the evolution of lexical meanings for such words as 'knyazych / knyazhna', 'boyarich / boyaryshnya', 'prince / princess' and 'crown prince / crown princess' through the prism of historical change in different periods of Russian language world picture development. In the introduction, we note that the study of social semantics l...
Article
The intensive development of computer linguistics in recent years shows that linguistic theories can be quite useful in solving various computer technology-associated problems. In particular, we mean information search and scanned text, or spam recognition. In turn, such technologies can be useful to solve purely linguistic tasks. This article exem...
Article
Full-text available
The frequency with which we use different words changes all the time, and every so often, a new lexical item is invented or another one ceases to be used. Beyond a small sample of lexical items whose properties are well studied, little is known about the dynamics of lexical evolution. How do the lexical inventories of languages, viewed as entire sy...
Conference Paper
Phylogenetic algorithms are a tool that is frequently used in biology and linguistics for reconstruction of the evolution trees for species or languages. However, there is no a definitely superior algorithm: various algorithms have shown the best results in various studies. In this paper we test four most popular algorithms. We make recommendations...
Conference Paper
Full-text available
This paper describes a system for problem phrase extraction from texts that contain users' reviews of products. In contrast to recent works, this system is based on dictionaries and heuristics, not a machine learning algorithms. We explored two approaches to dictionary construction: manual and automatic. We evaluated the system on a dataset constru...
Article
Full-text available
The paper provides a survey of semantic methods for solution of fundamental tasks in mathematical knowledge management. Ontological models and formalisms are discussed. We propose an ontology of mathematical knowledge, covering a wide range of fields of mathematics. We demonstrate applications of this representation in mathematical formula search,...
Conference Paper
The paper introduces new time series shape association measures based on Euclidean distance. The method of analysis of associations between time series based on separate analysis of positively and negatively associated local trends is discussed. The examples of application of the proposed measures and methods to analysis of associations between his...
Article
A survey of the key approaches to the semantic processing of mathematical texts is presented. A software platform prototype for the electronic storage of mathematical documents, which is based on the linked open-data (LOD) model and uses semantic information for data management, including formula-fragment searching, is proposed. The analysis of mat...
Conference Paper
Full-text available
The article contains results of the first stage of a research and development project aimed at creating a new generation of intellectual systems for semantic text analysis. Described are the main principles, system architecture, and task list. The features cloud and cluster architecture realization are regarded as well.
Conference Paper
Full-text available
Using event indicators is a well-known approach for event extraction. However, in most cases, event indicators are represented as single isolated words. In this paper, we deal with composite event indicators consisting of two and more words. Composite indicators are crucial to track modality when extracting events (e.g., possible events, desirable...
Conference Paper
Full-text available
Download database "Languages of the World" of IL RAS (version from spring 2013): https://cloud.mail.ru/public/KEhw/paQ35qYAi The article is dedicated to the largest digital resource in the world that contains a uniform description of language grammars – typological database " Languages of the World " ("Jazyki Mira"). There is information on the co...
Conference Paper
Full-text available
Current research efforts in Named Entity Recognition deal mostly with the English language. Even though the interest in multi-language Information Extraction is growing, there are only few works reporting results for the Russian language. This paper introduces quality baselines for the Russian NER task. We propose a corpus which was manually annota...
Technical Report
This paper describes a part of the event extraction system which has been developed in collaboration with HP Labs Russia. The domain of input texts is business news feeds. One of the most important event participant types is 'Organization'. This paper is focused on the problem of organization names recognition in Russian news texts. Two approaches...
Article
Full-text available
The article describes the original method of creating a dictionary of abbreviations based on the Google Books Ngram Corpus. The dictionary of abbreviations is designed for Russian, yet as its methodology is universal it can be applied to any language. The dictionary can be used to define the function of the period during text segmentation in variou...
Conference Paper
Full-text available
In this paper we describe methodology for building information extraction (IE) rules. Rules are usually developed by experts and are widely used in knowledge-based IE systems. They consist of two parts: the left-hand side (LHS) of a rule is a template that matches a certain syntactico-semantic structure (SSS) and the right-hand side is an action th...
Article
Full-text available
Dynamics of average length of words in Russian and English is analysed in the article. Words belonging to the diachronic text corpus Google Books Ngram and dated back to the last two centuries are studied. It was found out that average word length slightly increased in the 19th century, and then it was growing rapidly most of the 20th century and s...
Conference Paper
Full-text available
Even though the Linking Open Data cloud is constantly growing, there is a serious lack of published data sets related to the domain of academic mathematics. At the same time, since most scholarly publications in mathematics are well-structured and conventional, it's promising to get their helpful detailed representation. The paper describes an appr...
Article
Full-text available
We test two hypotheses relevant to the form-meaning relationship and o¤er a methodological contribution to the empirical study of near-synonymy within the framework of cognitive linguistics. In addition, we challenge im-plicit assumptions about the nature of the paradigm, which we show is skewed in favor of a few forms that are prototypical for a g...
Article
Full-text available
The paper's primary concern is to address the usage of WALS through comparing it with another typological database of similar scope, Jazyki Mira. Such a comparison is carried out based on a set of criteria. In Section 2, the scope of the databases is compared, as well as their differences and similarities in structure, the number of errors, and in...
Book
Full-text available
Computer models and methods in Typology and Comparative Linguistics Vladimir Polyakov and Valery Solovyev The book by V.N. Polyakov and V.D. Solovyev «Computer models and methods in Typology and Comparative Linguistics» is devoted to the detailed description of the da-tabase «Languages of the World» and the ways of its usage. The database is one...
Article
Full-text available
A new predicate-argument relation is introduced in this paper. Some arguments of the verb are distinguished as central to the base of surface marking. Information transfers from arguments to the verb are critical items of distinguishment; central arguments can be found in every language. While offering information transfers and centers classificati...
Conference Paper
Spaces of word attributes are investigated for different semantic classes of verbs. These spaces are designated to collate the semantics of verbs with close meaning. It is an alternative to dynamic maps proposed in [1].
Article
We pose the problem to study the structure of distribution of information in infinite sequences. To solve this problem, we suggest an approach based on restoring the whole sequence by its subsequence. To realize this approach, we introduce the needed apparatus, in particular, the notions of rigid and densely packed sequences which characterize the...