About
16
Publications
5,007
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
194
Citations
Introduction
Additional affiliations
September 2007 - present
January 2003 - February 2015
September 1994 - present
Publications
Publications (16)
The detection of human values, beliefs or tonality in large text collections, e.g. publications in social networks, requires ML algorithms and an interdisciplinary expertise. Narratives and worldviews can be uncovered via a context-dependent information markup. The formalization is achieved by the markup representation as a hyper-graph model. Here...
This article proposes a new approach for building topic models on unbalanced collections in topic modelling, based on the existing methods and our experiments with such methods. Real-world data collections contain topics in various proportions, and often documents of the relatively small theme become distributed all over the larger topics instead o...
Topic modeling is an actively developing field in thestatistical analysis of texts [1]. A probabilistic topicmodel identifies the topic of a text collection, describing each topic by a discrete distribution over a set ofwords and each document by a discrete distributionover a set of topics. Topic models are used for information search, classificati...
In this paper we introduce a generalized learning algorithm for probabilistic topic models (PTM). Many known and new algorithms for PLSA, LDA, and SWB models can be obtained as its special cases by choosing a subset of the following "options": regularization, sampling, update frequency, sparsing and robustness. We show that a robust topic model, wh...
We propose a combinatorial technique for obtaining tight data dependent generalization bounds based on a splitting and connectivity
graph (SC-graph) of the set of classifiers. We apply this approach to a parametric set of conjunctive rules and propose an algorithm
for effective SC-bound computation. Experiments on 6 data sets from the UCI ML Reposi...
Over the last decade learning to rank (L2R) has gained a lot of attention and many algorithms have been proposed. One of the most successful approach is to build an algorithm following the ensemble principle. Boosting is the key representative of this approach. However, even boosting isn't effective when used to increase the performance of individu...
Three general methods for obtaining exact bounds on the probability of overfitting are proposed within statistical learning
theory: a method of generating and destroying sets, a recurrent method, and a blockwise method. Six particular cases are considered
to illustrate the application of these methods. These are the following model sets of predicto...
A combinatorial approach is developed that leads to tight bounds for the probability of overfitting in a number of special cases. The Vapnik Chervonenkis classical bound is easy to restate under the weak probabillity assumptions where △ is the diversity coefficient of A, which is equal to the number of different error vectors generated by all possi...
It is shown that computationally tight bounds for the probability of overfitting can be obtained only by simultaneous consideration
of the following two properties of classifier sets: splitting into error levels and similarity of classifiers. For a set consisting
of only two classifiers, an exact bound is obtained for the probability of overfitting...
Accurate prediction of the generalization ability of a learning algorithm is an important problem in computational learning
theory. The classical Vapnik-Chervonenkis (VC) generalization bounds are too general and therefore overestimate the expected
error. Recently obtained data-dependent bounds are still overestimated. To find out why the bounds ar...
2 CC RAS, Moscow, voron@ccas.ru 3 MIPT, Moscow, vleksin@mail.ru The symmetric EM algorithm is proposed for probabilistic latent semantic analysis in collaborative filtering. The algorithm allows to reveal the latent interest profiles of both users and items, then to easily construct high-quality similarity measures of all required types: user–user,...
A new optimization technique is proposed for classifiers fu- sion — Cooperative Coevolutionary Ensemble Learning (CCEL). It is based on a specific multipopulational evolutionary algorithm — cooper- ative coevolution. It can be used as a wrapper over any kind of weak algorithms, learning procedures and fusion functions, for both classifica- tion and...
Cross-validation functionals and their upper bounds are considered that characterize the generalization per- formance of learning algorithms. The initial data are not assumed to be independent, identically distributed (i.i.d.) or even to be random. The effect of localization of an algorithm family is described, and the concept of a local growth fun...
This paper describes special-purpose optimization methods for constructing correct algorithms based on the algebraic approach to the recognition problem. A general method for synthesizing problem-oriented bases for recognition and prediction problems is suggested; this method reduces constructing such bases to a sequence of classical optimization p...
Combinatorial cross-validation functionals that characterize the generalization performance of learning algorithms are considered. Upper bounds are derived that are tighter than those in the Vap-nik–Chervonenkis statistical theory. The initial data set is not assumed to be independent, identically dis-tributed, or even random. The effect of localiz...
Preliminary data processing for recognition problems in which the values of one of the indices are separable functions of two variables, given on a rectangular mesh, is considered. Algorithms by means of which structural descriptions of these functions, satisfying uniqueness and stability conditions with respect to errors of the initial data, can b...