ArticlePDF Available

Modeling item--item similarities for personalized recommendations on Yahoo! front page

Authors:

Abstract and Figures

We consider the problem of algorithmically recommending items to users on a Yahoo! front page module. Our approach is based on a novel multilevel hierarchical model that we refer to as a User Profile Model with Graphical Lasso (UPG). The UPG provides a personalized recommendation to users by simultaneously incorporating both user covariates and historical user interactions with items in a model based way. In fact, we build a per-item regression model based on a rich set of user covariates and estimate individual user affinity to items by introducing a latent random vector for each user. The vector random effects are assumed to be drawn from a prior with a precision matrix that measures residual partial associations among items. To ensure better estimates of a precision matrix in high-dimensions, the matrix elements are constrained through a Lasso penalty. Our model is fitted through a penalized-quasi likelihood procedure coupled with a scalable EM algorithm. We employ several computational strategies like multi-threading, conjugate gradients and heavily exploit problem structure to scale our computations in the E-step. For the M-step we take recourse to a scalable variant of the Graphical Lasso algorithm for covariance selection. Through extensive experiments on a new data set obtained from Yahoo! front page and a benchmark data set from a movie recommender application, we show that our UPG model significantly improves performance compared to several state-of-the-art methods in the literature, especially those based on a bilinear random effects model (BIRE). In particular, we show that the gains of UPG are significant compared to BIRE when the number of users is large and the number of items to select from is small. For large item sets and relatively small user sets the results of UPG and BIRE are comparable. The UPG leads to faster model building and produces outputs which are interpretable.
Content may be subject to copyright.
A preview of the PDF is not available
... More generally speaking, collective matrix completion finds a natural application in the problem of recommender system with side information. In this problem, in addition to the conventional user-item matrix, it is assumed that we have side information about each user (Chiang et al., 2015;Jain and Dhillon, 2013;Fithian and Mazumder, 2018;Agarwal et al., 2011). For example, in blog recommendation task, we may have access to user generated content (images, tags and text) or user activity (e.g., likes and reblogs). ...
... Based on the type of available side information, various methods for recommender systems with side information have been proposed. It can be user generated content (Armentano et al., 2013;Hannon et al., 2010), user/item profile or attribute (Agarwal et al., 2011), social network (Jamali and Ester, 2010;Ma et al., 2011) and context information (Natarajan et al., 2013). A very interesting surveys of the state-of-the-art methods can be found in (Fithian and Mazumder, 2018;Natarajan et al., 2013). ...
... They use a generalized weighted nuclear norm penalty where the matrix is multiplied by positive semidefinite matrices P and Q which depend on the matrix of features. In Agarwal et al. (2011), the authors introduce a per-item user covariate logistic regression model augmenting with user-specific random effects. Their approach is based on a multilevel hierarchical model. ...
Preprint
Full-text available
Matrix completion aims to reconstruct a data matrix based on observations of a small number of its entries. Usually in matrix completion a single matrix is considered, which can be, for example, a rating matrix in recommendation system. However, in practical situations, data is often obtained from multiple sources which results in a collection of matrices rather than a single one. In this work, we consider the problem of collective matrix completion with multiple and heterogeneous matrices, which can be count, binary, continuous, etc. We first investigate the setting where, for each source, the matrix entries are sampled from an exponential family distribution. Then, we relax the assumption of exponential family distribution for the noise and we investigate the distribution-free case. In this setting, we do not assume any specific model for the observations. The estimation procedures are based on minimizing the sum of a goodness-of-fit term and the nuclear norm penalization of the whole collective matrix. We prove that the proposed estimators achieve fast rates of convergence under the two considered settings and we corroborate our results with numerical experiments.
... Hence, using this information as user covariates helps in improving predictions for explicit ratings. Further, one can derive an item graph where edge weights represent movie similarities that are based on global "who-rated-what" matrix (Kouki et al., 2015;Wang et al., 2015;Agarwal et al., 2011;Mazumder and Agarwal, 2011). Imposing sparsity on such a graph and finding its fair communities is attractive since it is intuitive that an item is generally related to only a few other items. ...
... The clustering balance and root mean squared error (RMSE) have been used to evaluate different modeling methods on this dataset. Since reducing RMSE is the goal, statistical models assume the response (ratings) to be Gaussian for this data (Kouki et al., 2015;Wang et al., 2015;Agarwal et al., 2011). Experimental results are shown in Figure 1. ...
Preprint
Full-text available
Inference of community structure in probabilistic graphical models may not be consistent with fairness constraints when nodes have demographic attributes. Certain demographics may be over-represented in some detected communities and under-represented in others. This paper defines a novel 1\ell_1-regularized pseudo-likelihood approach for fair graphical model selection. In particular, we assume there is some community or clustering structure in the true underlying graph, and we seek to learn a sparse undirected graph and its communities from the data such that demographic groups are fairly represented within the communities. Our optimization approach uses the demographic parity definition of fairness, but the framework is easily extended to other definitions of fairness. We establish statistical consistency of the proposed method for both a Gaussian graphical model and an Ising model for, respectively, continuous and binary data, proving that our method can recover the graphs and their fair communities with high probability.
... MDF appear in numerous applications including abundance data with sites and species traits in ecology (Legendre et al., 1997), patient records in health care (quantitative gene expression values and categorical clinical features like gender, disease stage, see e.g. Murdoch and Detsky (2013)), recommender systems (Agarwal et al., 2011) and of course survey data (Heeringa et al., 2010, Chapters 5 and 6). ...
... Row and column effects may be further generalized to include additional covariates. For instance in recommender systems, in addition to interactions, users and items characteristics are known to influence the observed ratings (Agarwal et al., 2011). Another example comes from hierarchically structured data where examples (patients, students, etc.) are nested within groups (hospitals, schools, etc.). ...
Preprint
Full-text available
A mixed data frame (MDF) is a table collecting categorical, numerical and count observations. The use of MDF is widespread in statistics and the applications are numerous from abundance data in ecology to recommender systems. In many cases, an MDF exhibits simultaneously main effects, such as row, column or group effects and interactions, for which a low-rank model has often been suggested. Although the literature on low-rank approximations is very substantial, with few exceptions, existing methods do not allow to incorporate main effects and interactions while providing statistical guarantees. The present work fills this gap. We propose an estimation method which allows to recover simultaneously the main effects and the interactions. We show that our method is near optimal under conditions which are met in our targeted applications. Numerical experiments using both simulated and survey data are provided to support our claims.
... Note that r ui in (2) is often replaced by the residual r ui −µ−x T ui β, where (µ, β) is a vector of regression coefficients to be minimized in (2). Alternatively, p u = s u − x T u α and q i = t i − x T i β, where (s u , t i ) are latent factors, and (α, β) are regression coefficients to incorporate the covariate effects (Agarwal et al., 2011). ...
Article
Recommender systems predict users' preferences over a large number of items by pooling similar information from other users and/or items in the presence of sparse observations. One major challenge is how to utilize user-item specific covariates and networks describing user-item interactions in a high-dimensional situation, for accurate personalized prediction. In this article, we propose a smooth neighborhood recommender in the framework of the latent factor models. A similarity kernel is utilized to borrow neighborhood information from continuous covariates over a user-item specific network, such as a user's social network, where the grouping information defined by discrete covariates is also integrated through the network. Consequently, user-item specific information is built into the recommender to battle the 'cold-start" issue in the absence of observations in collaborative and content-based filtering. Moreover, we utilize a "divide-and-conquer" version of the alternating least squares algorithm to achieve scalable computation, and establish asymptotic results for the proposed method, demonstrating that it achieves superior prediction accuracy. Finally, we illustrate that the proposed method improves substantially over its competitors in simulated examples and real benchmark data-Last.fm music data.
... A distinctive characteristic of MDF is that column entries may be of different types and most often many entries are missing. MDF appear in numerous applications including patient records in health care (survival values at different time points, quantitative and categorical clinical features like blood pressure, gender, disease stage, see, e.g., Murdoch and Detsky (2013)), survey data (Heeringa et al., 2010, Chapters 5 and 6), abundance tables in ecology (Legendre et al., 1997), and recommendation systems (Agarwal et al., 2011). ...
Article
A mixed data frame (MDF) is a table collecting categorical, numerical and count observations. The use of MDF is widespread in statistics and the applications are numerous from abundance data in ecology to recommender systems. In many cases, an MDF exhibits simultaneously main effects, such as row, column or group effects and interactions, for which a low-rank model has often been suggested. Although the literature on low-rank approximations is very substantial, with few exceptions, existing methods do not allow to incorporate main effects and interactions while providing statistical guarantees. The present work fills this gap. We propose an estimation method which allows to recover simultaneously the main effects and the interactions. We show that our method is near optimal under conditions which are met in our targeted applications. We also propose an optimization algorithm which provably converges to an optimal solution. Numerical experiments reveal that our method, mimi, performs well when the main effects are sparse and the interaction matrix has low-rank. We also show that mimi compares favorably to existing methods, in particular when the main effects are significantly large compared to the interactions, and when the proportion of missing entries is large. The method is available as an R package on the Comprehensive R Archive Network.
... Aggarwal and Chen (2009) also propose a more general Bayesian modeling framework which we revisit later in Section 2.2. Agarwal, Zhang and Mazumder (2011) study an example where these covariance matrices are unknown and they are estimated via inverse covariance matrix estimation. Todeschini, Caron and Chavent (2013) place a prior on the singular values of the matrix and propose an EM-stylized algorithm for the task. ...
Article
We explore a general statistical framework for low-rank modeling of matrix-valued data, based on convex optimization with a generalized nuclear norm penalty. We study several related problems: the usual low-rank matrix completion problem with flexible loss functions arising from generalized linear models; reduced-rank regression and multi-task learning; and generalizations of both problems where side information about rows and columns is available, in the form of features or smoothing kernels. We show that our approach encompasses maximum a posteriori estimation arising from Bayesian hierarchical modeling with latent factors, and discuss ramifications of the missing-data mechanism in the context of matrix completion. While the above problems can be naturally posed as rank-constrained optimization problems, which are nonconvex and computationally difficult, we show how to relax them via generalized nuclear norm regularization to obtain convex optimization problems. We discuss algorithms drawing inspiration from modern convex optimization methods to address these large scale convex optimization computational tasks. Finally, we illustrate our flexible approach in problems arising in functional data reconstruction and ecological species distribution modeling.
Article
Full-text available
Recommender systems recommend items to users based on their interests and have seen tremendous growth due to the use of internet and web services. Recommendation systems have seen escalating growth rate since late 1990’s. A query on Google Scholar (famous research based search engine) gives 175,000 articles for the query “recommender system”. With such a large database of research/application articles, there arises a need to analyses the data so as to fulfill the basic requirements of effectively understanding the potential of the quantum of literature available so far. The study focuses on the topic of recommender system with various soft computing techniques such as fuzzy logic, neural network and genetic algorithm. The major contribution of this work is the demonstration of progressive knowledge for domain visualization and analysis of recommender system with soft computing techniques. The analysis is supported by various scientometric indicators such as Relative Growth Rate (RGR), Doubling Time (DT), Co-Authorship Index (CAI), Author Productivity, Degree of Collaboration, Research Priority Index (RPI), Half Life, Country wise Productivity, Citation Analysis, Page Length Distribution, Source Contributors. This research presents first of its kind scientometric analysis on “recommender system with soft computing techniques”. The present work provides useful parameters for establishing relationships between quantifiable data and intangible contributions in the field of recommender systems.
Preprint
Graph link prediction is an important task in cyber-security: relationships between entities within a computer network, such as users interacting with computers, or system libraries and the corresponding processes that use them, can provide key insights into adversary behaviour. Poisson matrix factorisation (PMF) is a popular model for link prediction in large networks, particularly useful for its scalability. In this article, PMF is extended to include scenarios that are commonly encountered in cyber-security applications. Specifically, an extension is proposed to explicitly handle binary adjacency matrices and include known covariates associated with the graph nodes. A seasonal PMF model is also presented to handle dynamic networks. To allow the methods to scale to large graphs, variational methods are discussed for performing fast inference. The results show an improved performance over the standard PMF model and other common link prediction techniques.
Article
Obtaining estimates that are nearly unbiased has proven to be difficult when random effects are incorporated into a generalized linear model. In this paper, we propose a general method of adjusting any conveniently defined initial estimates to result in estimates which are asymptotically unbiased and consistent. The method is motivated by iterative bias correction and can be applied in principle to any parametric model. A simulation‐based approach of implementing the method is described and the relationship of the method proposed with other sampling‐based methods is discussed. Results from a small scale simulation study show that the method proposed can lead to estimates which are nearly unbiased even for the variance components while the standard errors are only slightly inflated. A new analysis of the famous salamander mating data is described which reveals previously undetected between‐animal variation among the male salamanders and results in better prediction of mating outcomes.
Code
R package for Data Analysis using multilevel/hierarchical model
Article
Statistical approaches to overdispersion, correlated errors, shrinkage estimation, and smoothing of regression relationships may be encompassed within the framework of the generalized linear mixed model (GLMM). Given an unobserved vector of random effects, observations are assumed to be conditionally independent with means that depend on the linear predictor through a specified link function and conditional variances that are specified by a variance function, known prior weights and a scale factor. The random effects are assumed to be normally distributed with mean zero and dispersion matrix depending on unknown variance components. For problems involving time series, spatial aggregation and smoothing, the dispersion may be specified in terms of a rank deficient inverse covariance matrix. Approximation of the marginal quasi-likelihood using Laplace's method leads eventually to estimating equations based on penalized quasilikelihood or PQL for the mean parameters and pseudo-likelihood for the variances. Im...
Article
Article
This paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches. This paper also describes various limitations of current recommendation methods and discusses possible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader range of applications. These extensions include, among others, an improvement of understanding of users and items, incorporation of the contextual information into the recommendation process, support for multicriteria ratings, and a provision of more flexible and less intrusive types of recommendations.