Gleb Gusev

Gleb Gusev
  • Yandex

About

47
Publications
13,794
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
799
Citations
Current institution
Yandex

Publications

Publications (47)
Article
We study information dissemination in Twitter. We present an analysis of two important characteristics of so called retweet cascades: retweet count and show count, i.e., number of users that receive the tweet in their feed. We show that these two measures behave differently. We describe three models that are aimed to predict the audience size of a...
Conference Paper
We present you a program of a balanced mix between an overview of academic achievements in the field of online evaluation and a portion of unique industrial practical experience shared by both the leading researchers and engineers from global Internet companies. First, we give basic knowledge from mathematical statistics. This is followed by founda...
Preprint
We study the problem of aggregation noisy labels. Usually, it is solved by proposing a stochastic model for the process of generating noisy labels and then estimating the model parameters using the observed noisy labels. A traditional assumption underlying previously introduced generative models is that each object has one latent true label. In con...
Preprint
Full-text available
We study the problem of ranking from crowdsourced pairwise comparisons. Answers to pairwise tasks are known to be affected by the position of items on the screen, however, previous models for aggregation of pairwise comparisons do not focus on modeling such kind of biases. We introduce a new aggregation model factorBT for pairwise comparisons, whic...
Preprint
Development of the majority of the leading web services and software products today is generally guided by data-driven decisions based on evaluation that ensures a steady stream of updates, both in terms of quality and quantity. Large internet companies use online evaluation on a day-to-day basis and at a large scale. The number of smaller companie...
Thesis
Full-text available
Development of the majority of the leading web services and software products today is generally guided by data-driven decisions based on evaluation that ensures a steady stream of updates, both in terms of quality and quantity. Large Internet companies use online evaluation on a day-to-day basis and at a large scale. The number of smaller companie...
Article
Full-text available
While gradient boosting algorithms are the workhorse of modern industrial machine learning and data science, all current implementations are susceptible to a non-trivial but damaging form of label leakage. It results in a systematic bias in pointwise gradient estimates that lead to reduced accuracy. This paper formally analyzes the issue and presen...
Article
Nowadays, billions of people use the Web in connection with their daily needs. A significant part of these needs are constituted by search tasks that are usually addressed by search engines. Thus, daily search needs result in regular user engagement with a search engine. User engagement with web services was studied in various aspects, but there ap...
Article
Skip-Gram Negative Sampling (SGNS) word embedding model, well known by its implementation in "word2vec" software, is usually optimized by stochastic gradient descent. However, the optimization of SGNS objective can be viewed as a problem of searching for a good matrix with the low-rank constraint. The most standard way to solve this type of problem...
Conference Paper
State-of-the-art user engagement metrics (such as session-per-user) are widely used by modern Internet companies to evaluate ongoing updates of their web services via A/B testing. These metrics are predictive of companies' long-term goals, but suffer from this property due to slow user learning of an evaluated treatment, which causes a delay in the...
Article
Despite the growing importance of multilingual aspect of web search, no appropriate offline metrics to evaluate its quality are proposed so far. At the same time, personal language preferences can be regarded as intents of a query. This approach translates the multilingual search problem into a particular task of search diversification. Furthermore...
Article
Full-text available
In this paper, we consider a non-convex loss-minimization problem of learning Supervised PageRank models, which can account for some properties not considered by classical approaches such as the classical PageRank model. We propose gradient-based and random gradient-free methods to solve this problem. Our algorithms are based on the concept of an i...
Article
With the growth of user-generated content, we observe the constant rise of the number of companies, such as search engines, content aggregators, etc., that operate with tremendous amounts of web content not being the services hosting it. Thus, aiming to locate the most important content and promote it to the users, they face the need of estimating...
Article
Cold start problem in Collaborative Filtering can be solved by asking new users to rate a small seed set of representative items or by asking representative users to rate a new item. The question is how to build a seed set that can give enough preference information for making good recommendations. One of the most successful approaches, called Repr...
Conference Paper
Nowadays, the development of most leading web services is controlled by online experiments that qualify and quantify the steady stream of their updates achieving more than a thousand concurrent experiments per day. Despite the increasing need for running more experiments, these services are limited in their user traffic. This situation leads to the...
Article
Full-text available
We generalize the Abel–Ruffini theorem to arbitrary dimension, i.e. classify general square systems of polynomial equations solvable by radicals. In most cases, they reduce to systems whose tuples of Newton polytopes have mixed volume not exceeding 4. The proof is based on topological Galois theory, which ensures non-solvability by any formula invo...
Conference Paper
Full-text available
Relevance labels is the essential part of any learning to rank framework. The rapid development of crowdsourcing platforms led to a significant reduction of the cost of manual labeling. This makes it possible to collect very large sets of labeled documents to train a ranking algorithm. However, relevance labels acquired via crowdsourcing are typica...
Conference Paper
This study introduces a novel feature selection approach CMICOT, which is a further evolution of filter methods with sequential forward selection (SFS) whose scoring functions are based on conditional mutual information (MI). We state and study a novel saddle point (max-min) optimization problem to build a scoring function that is able to identify...
Conference Paper
Full-text available
Online controlled experiments, e.g., A/B testing, is the state-of-the-art approach used by modern Internet companies to improve their services based on data-driven decisions. The most challenging problem is to define an appropriate online metric of user behavior, so-called Overall Evaluation Criterion (OEC), which is both interpretable and sensitiv...
Conference Paper
Full-text available
It is well known that a great number of query–document features which significantly improve the quality of ranking for popular queries, however, do not provide any benefit for new or rare queries since there is typically not enough data associated with those queries that is required to reliably compute the values of those features. It is a common p...
Conference Paper
Full-text available
Nowadays, the development of most leading web services is controlled by online experiments that qualify and quantify the steady stream of their updates. The challenging problem is to define an appropriate online metric of user behavior, so-called Overall Evaluation Criterion (OEC), which is both interpretable and sensitive. The state-of-the-art app...
Article
We study the stochastic multi-armed bandit problem with non-equivalent multiple plays where, at each step, an agent chooses not only a set of arms, but also their order, which influences reward distribution. In several problem formulations with different assumptions, we provide lower bounds for regret with standard asymptotics $O(\log{t})$ but nove...
Conference Paper
In this paper, we suggest a novel approach to studying user browsing behavior, i.e., the ways users get to different pages on the Web. Namely, we classified all user browsing paths leading to web pages into several types or browsing patterns. In order to define browsing patterns, we consider several important points of the browsing path: its origin...
Conference Paper
Modern Internet companies improve their services by means of data-driven decisions that are based on online controlled experiments (also known as A/B tests). To run more online controlled experiments and to get statistically significant results faster are the emerging needs for these companies. The main way to achieve these goals is to improve the...
Conference Paper
Implicit feedback from users of a web search engine is an essential source providing consistent personal relevance labels from the actual population of users. However, previous studies on personalized search employ this source in a rather straightforward manner. Basically, documents that were clicked on get maximal gain, and the rest of the documen...
Conference Paper
Given a repeatedly issued query and a document with a not-yet-confirmed potential to satisfy the users' needs, a search system should place this document on a high position in order to gather user feedback and obtain a more confident estimate of the document utility. On the other hand, the main objective of the search system is to maximize expected...
Conference Paper
Full-text available
Nowadays, billions of people use the Web in connection with their daily needs. A significant part of the needs are constituted by search tasks that are usually addressed by search engines. Thus, daily search needs result in regular user engagement with a search engine. User engagement with web sites and services was studied in various aspects, but...
Conference Paper
Full-text available
The task of discovering places of interest is a key step for many location-based recommendation tasks. In this paper we propose a fully unsupervised and parameter-free approach to deal with this problem based on the collection of geotagged photos. While previous papers are mostly devoted to discovering points (POI), we focus on areas of interest (A...
Conference Paper
Graph-based ranking plays a key role in many applications, such as web search and social computing. Pioneering methods of ranking on graphs (e.g., PageRank and HITS) computed ranking scores relying only on the graph structure. Recently proposed methods, such as Semi-Supervised PageRank, take into account both the graph structure and the metadata as...
Conference Paper
In this paper, we focus on crawling strategies for newly discovered URLs. Since it is impossible to crawl all the new pages right after they appear, the most important (or popular) pages should be crawled with a higher priority. One natural measure of page importance is the number of user visits. However, the popularity of newly discovered URLs can...
Conference Paper
Our work is devoted to Web revisitation patterns of individual users. Everybody revisits Web pages, but their reasons for doing so can differ. We analyzed Web interaction logs of millions users to characterize how people revisit Web content. We revealed that each user have its own distribution of revisitation times. This distribution follows Power...
Conference Paper
With increasing popularity of browser toolbars, the challenge of employing user behavior data stored in their logs rises in its importance. The analysis of post-click search trails was shown to provide important knowledge about user experience, helpful for improving existing search systems. However, the utility of different trail properties for imp...
Conference Paper
With the ever-increasing speed of content turnover on the web, it is particularly important to understand the patterns that pages' popularity follows. This paper focuses on the dynamical part of the web, i.e. pages that have a limited lifespan and experience a short popularity outburst within it. We classify these pages into five patterns based on...
Conference Paper
In the last years, a lot of attention was attracted by the problem of page authority computation based on user browsing behavior. However, the proposed methods have a number of limitations. In particular, they run on a single snapshot of a user browsing graph ignoring substantially dynamic nature of user browsing activity, which makes such methods...
Conference Paper
BrowseRank algorithm and its modifications are based on analyzing users' browsing trails. Our paper proposes a new method for computing page importance using a more realistic and effective search-aware model of user browsing behavior than the one used in BrowseRank.
Conference Paper
Traditional link-based web ranking algorithms are applied to web snapshots in the form of webgraphs consisting of pages as vertices and links as edges. Constructing webgraph, researchers do not pay attention to a particular method of how links are taken into account, while certain details may significantly affects the contribution of link-based fac...
Article
Full-text available
For a generic (polynomial) one-parameter deformation of a complete intersection, there is defined its monodromy zeta-function. We provide explicit formulae for this zeta-function in terms of the corresponding Newton polyhedra in the case the deformation is non-degenerate with respect to its Newton polyhedra. Using this result we obtain the formula...
Article
We classify general systems of polynomial equations with a single solution, or, equivalently, collections of lattice polytopes of minimal positive mixed volume. As a byproduct, this classification provides an algorithm to evaluate the single solution of such a system.
Conference Paper
Full-text available
Retweet cascades play an essential role in information diffusion in Twitter. Popular tweets reflect the current trends in Twitter, while Twitter itself is one of the most important online media. Thus, understanding the reasons why a tweet becomes popular is of great interest for sociologists, marketers and social media researches. What is even more...
Article
Traditional link-based web ranking algorithms run on a single web snapshot without concern of the dynamics of web pages and links. In particular, the correlation of web pages freshness and their classic PageRank is negative (see [11]). For this reason, in recent years a number of authors introduce some algorithms of PageRank actualization. We intro...
Article
Full-text available
We consider the Buckley-Osthus implementation of preferential attachment and its ability to model the web host graph in two aspects. One is the degree distribution that we observe to follow the power law, as often being the case for real-world graphs. Another one is the two-dimensional edge distribution, the number of edges between vertices of give...
Article
Full-text available
For a one-parameter deformation of an analytic complex function germ of several variables, there is defined its monodromy zeta-function. We give a Varchenko type formula for this zeta-function if the deformation is non-degenerate with respect to its Newton diagram.
Article
Full-text available
Assume that the coefficients of a polynomial in a complex variable are Laurent polynomials in some complex parameters. The parameter space (a complex torus) splits into strata corresponding to different combinations of coincidence of the roots of the polynomial. For generic Laurent polynomials with fixed Newton polyhedra the Euler characteristics o...

Network

Cited By