Conference PaperPDF Available

When Personalization Meets Conformity

Authors:

Abstract and Figures

Existing recommender systems place emphasis on personalization to achieve promising accuracy. However, in the context of multiple domain, users are likely to seek the same behaviors as domain authorities. This conformity effect provides a wealth of prior knowledge when it comes to multi-domain recommendation, but has not been fully exploited. In particular, users whose behaviors are significant similar with the public tastes can be viewed as domain authorities. To detect these users meanwhile embed conformity into recommendation, a domain-specific similarity matrix is intuitively employed. Therefore, a collective similarity is obtained to leverage the conformity with personalization. In this paper, we establish a Collective Structure Sparse Representation(CSSR) method for multi-domain recommendation. Based on adaptive $k$-Nearest-Neighbor framework, we impose the lasso and group lasso penalties as well as least square loss to jointly optimize the collective similarity. Experimental results on real-world data confirm the effectiveness of the proposed method.
Content may be subject to copyright.
When Personalization Meets Conformity: Collective
Similarity based Multi-Domain Recommendation
Xi Zhang1, Jian Cheng1, Shuang Qiu1, Zhenfeng Zhu2, Hanqing Lu1
1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
2Institute of information Science, Beijing Jiaotong University
{xi.zhang, jcheng, shuang.qiu, luhq}@nlpr.ia.ac.cn, zhfzhu@bjtu.edu.cn
ABSTRACT
Existing recommender systems place emphasis on person-
alization to achieve promising accuracy. However, in the
context of multiple domain, users are likely to seek the same
behaviors as domain authorities. This conformity effect pro-
vides a wealth of prior knowledge when it comes to multi-
domain recommendation, but has not been fully exploited.
In particular, users whose behaviors are significant similar
with the public tastes can be viewed as domain authori-
ties. To detect these users meanwhile embed conformity
into recommendation, a domain-specific similarity matrix is
intuitively employed. Therefore, a collective similarity is ob-
tained to leverage the conformity with personalization. In
this paper, we establish a Collective Structured Sparse Rep-
resentation(CSSR) method for multi-domain recommenda-
tion. Based on adaptive k-Nearest-Neighbor framework, we
impose the lasso and group lasso penalties as well as least
square loss to jointly optimize the collective similarity. Ex-
perimental results on real-world data confirm the effective-
ness of the proposed method.
Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]: Information
Filtering
Keywords
Multiple Domains; Recommendation; Conformity
1. INTRODUCTION
The fast growth of Web 2.0 technologies facilitates and
encourages various online user behaviors, meanwhile brings
tremendous information to the public. Personalized recom-
mendation, as critical methods to push the right information
to the right users, have attracted large amounts of research
in both industry and academia. Among these methods, Col-
laborative Filtering(CF) has superior performance than oth-
er methods via an underlying assumption about entity sim-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
SIGIR’15, August 09 - 13, 2015, Santiago, Chile.
c
2015 ACM. ISBN 978-1-4503-3621-5/15/08 ...$15.00.
DOI: http://dx.doi.org/10.1145/2766462.2767810.
ilarity. Namely, users with common interests in the past
would behave much more similarly on items in the future.
Traditionally, CF approaches regard mining distinct types
of user behavior as a task within a single domain. Neverthe-
less, in many real applications, multiple behaviors ranging
from reading books, favoring music, to watching videos can
reflect the characteristics of users together, and thus train-
ing a group of predictors in a unified manner would enhance
accuracy for all the domains.
To cope with multi-domain recommendation, a straight-
forward scheme is to extend the CF algorithms into a multi-
task learning problem solved by propagating similarities a-
mong domains [4, 2]. However, somewhat surprisingly, the
context of multi-domain often introduces conformity1to on-
line user behaviors, which cannot be ignored. In particular,
the public are inclined to conform to the choices or belief-
s of domain authorities [6]. For example, comments from
well-known movie critics would be likely to affect the trend
of box offices while reading lists of celebrated writers would
probably receive admiration. That is to say, compared with
ordinary users, opinions of these domain elites are more rep-
resentative and thus have considerable influence on recom-
mendation results. Unfortunately, this effect has not been
fully exploited in existing multi-domain approaches.
The key problem becomes how to detect the domain au-
thorities and integrate the conformity effect into recommen-
dation. To simplify this problem, we concern that the elites
are users whose past behaviors can largely reflect the pub-
lic tastes. In fact, if a user has significant similar behaviors
to a large number of other users, she/he is probably the
elite user who can represent others. To this end, a domain-
specific user similarity matrix based on observed behaviors
is built to embody conformity. On the other hand, since
users’ intrinsic preferences over domains are also decisive,
a domain-shared user similarity matrix is utilized to profile
users globally. Therefore, a collective similarity combining
the effect of conformity with the original personalization is
established.
In this paper, we propose a Collective Structured Sparse
Representation(CSSR) method to optimize the collective sim-
ilarity for a top-Nmulti-domain recommendation task. The
CSSR is on the basis of the adaptive k-Nearest-Neighbor
framework. To learn ranked item lists under each domain
for users, a least square regression loss of feedback represen-
tation is employed. To encode personalization and confor-
1The general conformity is a social phenomenon of matching
attitudes, beliefs, and behaviors to group norm. Here we
narrow the concept to achieving same behaviors as elites.
mity, regularization with lasso and group lasso are adopted.
Also, we show how to use the Alternating Direction Method
of Multipliers(ADMM) [1] to efficiently optimize the pa-
rameters. Experiments show that our method consistently
achieves more accurate results than other state-of-the-arts.
2. THE PROPOSED APPROACH
2.1 Notation and Background
Suppose that there are a set of users U={u1,· · · , un}and
Dtypes of domains. Let the matrix XdRmd×nrecord the
interactions between the overall user set and item set Vdfor
the d-th domain, where d= 1,··· , D and mdis the size of
the item set. Then if a user has any behavior on item, the
element entry of Xdcould be 1 or a positive value, otherwise
the entry is set as 0. Note that implicit feedback such as
clicks, purchases and posts are easier to obtain than rating
records in real-world scenarios. Thus we assume that the
multiple feedback matrices are binary.
For a multiple domain recommendation, given a series of
user-item feedback matrices {X1,··· ,XD}, our goal is to
recommend a personalized ranking list of the potential en-
joyed items under each domain for users.
Here we introduce a general similarity based framework
for recommendation without multi-domain. Normally, pref-
erence scores are estimated by a function of f(X,Θ), where
Xis the feedback matrix and Θ is the model parameter.
The adaptive k-Nearest-Neighbor(kNN) method, which is
very popular for collaborative filtering, has been shown its
simpleness but effectiveness in top-Nrecommendation [3].
We establish our model based on the adaptive kNN tech-
nique in this paper. Consider a user based kNN problem,
the parameter Θ is set as matrix WRn×n, and the pre-
diction function f(·) could be presented as an aggregation of
the feedback values of knearest neighbors. Mathematical-
ly, xij =xT
iwj, where xij is the predicted scores, xiRn
denotes the feedback indicator of item vi, and wjRnde-
notes a similarity coefficients of user uj. The parameter W
can be regarded as the similarity between user pairs, which
can be optimized by the following problem
W= arg min
WL(W) + Ω(W)(1)
where Lis a loss function usually defined as least squares
or logistic regression, and Ω is a regularization penalty to
enforce parameter with specific structures. Ridge, lasso, or
elastic net regularization has been used in previous methods.
2.2 Collective Similarity
Now we consider the problem of multi-domain recommen-
dation. As the user set is common across domains, the user
based similarity is exploited. The interpretation behind the
idea is simple. That is, to learn a set of nearest neigh-
bors who are sufficiently similar with the user in terms of
multiple behaviors over domains. In other words, a user’s
interests can be represented by the set of her/his neighbors.
The usual approach assumes that all users are ordinary with
the comparative ability of representation and thus the per-
sonalized neighbors are mined for each users independent-
ly. Nevertheless, conformity always occurs in the context of
multi-domain. Under a specific domain, elite users’ favorites
are more acceptable to the public during recommendation.
This indicates that the similarity coefficients between these
d
X
d
W
V
Collective Similarity Feedbacks Matrices Domain-Specific
Similarity
Domain-Shared
Similarity
Least Squares Regression Loss Lasso
Regularization
Group Lasso
Regularization
´
»
+
d
X
d
V
Figure 1: Illustration of the CSSR Model
representative elite users and others tend to be significant.
Hence, we propose a collective similarity as
Wd=V+Vd(2)
where d= 1,··· , D. Parameter matrix VRn×nindi-
cates the personalized neighbors with intrinsic similar inter-
est across domains, which is a domain-shared component.
Parameter matrix VdRn×nembodies the representative
of elite neighbors in the domain, which is a domain-specific
component.
Next, we seek to explore the conformity by regularizing
the structure of the parameter set {V1,· · · ,VD}with group
lasso. On the one hand, the domain-specific elite users can
largely represent other users, which results in more dense for
the corresponding rows in Vd. On the other hand, the fact
that elite users are always the minority leads to the sparsity
in columns. Hence, the above structure serves to define a
group lasso regularization term
glas(Vd) = βdVd2,1=
n
i=1
βdvi
d2(3)
where vi
d2is the 2-norm for each row of Vd.2,1-norm
encourages the row sparsity with jointly selecting the sig-
nificant rows as elite users. βdencodes the contributions of
group lasso for each domain. Without loss of generality, we
assume d, βd=β.
Furthermore, users’ personalized neighbors are also sparse
in the overall user set. A lasso constraint is utilized to filter
out noised coefficients in V, which is
las(V) = λV1=
n
i=1
n
j=1
λ|vij |(4)
With above discussion, the collective similarity can be in-
corporated into a loss function with a least square loss as
L(Wd) = 1
2XdXdWd2
F. Following the general frame-
work in Eq. (1), we propose a CSSR model for adaptive kNN
problem in multiple domains as
min
Θ
D
d=1
αd(L(Wd)+Ωglas(Vd)) + Ωlas (V)
s.t.Wd=V+Vd,Wd0,diag(Wd) = 0
d, d = 1,··· , D
(5)
The parameter set Θ is written as {Wd,Vd,V}for suc-
cinct. And αdbalances the effect of different domain. In
addition, the non-negative constraint is used on similarity
for a more interpretable consideration, while diag(Wd)=0
ensures that the trivial solution of identity matrix is avoid-
ed. From the unified objective function, information of feed-
back matrices can be transferred among domains by Vand
distinctions can be captured by Vd. The entire model is
depicted in Figure 1.
2.3 Optimization
Although non-smooth terms are introduced, our approach
maintains convexity of the objective function. We propose
to apply the ADMM [1] to split the original problem into
several subproblems which can be handled in alternating
directions.
To solve the problem in Eq. (5), we first obtain the aug-
mented Lagrangian as
min
Θ,Yd
D
d=1
αdL(Wd) + I+(Wd)+Ωglas (Vd)+ Ωlas(V)
+
D
d=1 tr(YT
d(WdVVd)) + ρ
2WdVVd2
F
(6)
where Ydis Lagrangian multiplier of the d-th domain, I+(·)
is the indicator function for the non-negative constraint, and
ρ > 0 is a penalty parameter. For the given domain d,
optimizing {Wd,Vd,Yd}are independent with optimizing
the set of variables in other domains, thus ADMM proceeds
by solving following problems alternately until convergence
min
Wd0αdL(Wd) + ρ
2WdVVd+Ud2
F(7a)
min
Vd
αdglas(Vd) + ρ
2WdVVd+Ud2
F(7b)
min
Vlas(V) + ρ
2
D
d=1
WdVVd+Ud2
F(7c)
Ud=Ud+WdVVd(7d)
where Ud=1
ρYdleads to a scaled form. Then we pro-
vide closed-form solutions of the subproblems for Eq.(7a),
Eq.(7b) and Eq.(7c) as below.
Update for Wd.We first rewrite the problem into an
unconstrained problem as
min
Wd
αdL(Wd) + ρ
2WdVVd+Ud2
FtrT
dWd)(8)
Since Eq.(7a) is smooth, the solution can be found by taking
its derivative and set it to be zero
XT
dXd+ρIWd+ρ(UdVVd)XT
dXdΦd= 0 (9)
Using the Karush-Kuhn-Tucker complementary condition
for the nonnegativity of Wd, [Φd]ij[Wd]ij = 0, we get
[Wd]ij = [Wd]ij
XT
dXd+ρ(UdVVd)ij
(XT
dXd+ρI)Wd+ρ(UdVVd)+ij
(10)
To guarantee the nonnegativity, we decompose matrices with
any signs as A=A+A, where A+
ij = (|Aij |+Aij )/2
and A
ij = (|Aij | − Aij )/2.
Update for Vd.For each row of Vd, the optimization
problem in Eq.(7b) comes to
min
vi
d
βvi
d2+ρ
2vi
d(wi
dvi+ui
d)2
2(11)
where iis the row index of parameter matrices. Let zi
d=
wi
dvi+ui
d. This problem can be solved by a proximal
operator of group lasso to each row vector as
vi
d=Sglas, β
ρ
(zi
d) =
0,if zi
d2β
ρ
zi
d2β
ρ
zi
d2
zi
d,otherwise. (12)
Update for V. Optimizing the second term with respect
to Vin Eq.(7c) equals to optimizing ρ
2V1
DD
d=1(Wd
Vd+Ud)2
Fwhich can be substituted into Eq.(7c). Then
we have the element-wise optimal solution as
[V]ij =Slas,λ
ρ
1
D
D
d=1
(WdVd+Ud)ij
(13)
where Slas(·) is the soft-thresholding operator
Slas(x) =
xε, if x>ε
x+ε, if x < ε
0,otherwise.
(14)
3. EXPERIMENTS
3.1 Experiment Settings
To empirically study the effectiveness of our method, we
perform experiments on a multi-domain dataset crawled from
the publicly available site Douban2. It is a famous Web2.0
website containing rating behavior of users, scaled from 1
to 5 stars, on books, music and movies. As we face with a
top-Nrecommendation task, we use the implicit feedback
instead of rating scores. For a sufficient evaluation of each
user, we filtering out users with less than 10 feedbacks on
the three domains and obtained a dataset of 5,916 users.
The detailed description is presented in Table 1.
Table 1: Description of Douban Dataset
Domain #Items %Sparsity #Ratings per User
Book 14,155 99.85 22
Music 15,492 99.75 38
Movie 7,845 98.87 88
For personalized item recommendation, we analyze perfor-
mance of the model by comparing the top suggestions to the
true behaviors taken by a user. We adopt two widely used
evaluation metrics in top-Nrecommendation: MAP(Mean
Average Precision), NDCG(Normalized Discounted Cumu-
lative Gain) with the setting N= 5. Higher values on the
metrics imply better recommendation results.
We compare the proposed CSSR with several popular
baseline methods: PopRank, NCDCF U, NCDCF I [5], m-
rBPR [2], mrSLIM. Here we extend SLIM [3] to mrSLIM by
constructing a feedback matrix that takes all items in three
domains as rows. For a comprehensive comparison, perfor-
mances of CSSRsh and CSSRsp are shown. They merely
consider the domain-shared or domain-specific similarity in
Eq. (2), and utilize lasso constrains.
In our experiments, we randomly pick 80% observed feed-
backs for each user to form the training set in each domain
and the rest of 20% is test set. The random sampling is
repeated 10 times and the average performance are report-
ed. We set the weights combination of domains in CSSR as
2http://www.douban.com
Table 2: Prediction performance(mean ±std.) of three variants of CSSR and PopRank, NCDCF U, NCD-
CF I, mrSLIM, mrBPR on three domains of Douban dataset. Results in bold indicate the best ones.
Methods Params Book Music Movie
MAP NDCG MAP NDCG MAP NDCG
PopRank N/A 0.1370±0.0039 0.0619±0.0014 0.1879±0.0029 0.0922±0.0014 0.3784±0.0031 0.2249±0.0017
NCDCF U - 100 0.1826±0.0033 0.0813±0.0013 0.2647±0.0035 0.1325±0.0015 0.4982±0.0038 0.3159±0.0024
NCDCF I - 50 0.1864±0.0029 0.0835±0.0011 0.2668±0.0016 0.1341±0.0007 0.5049±0.0047 0.3230±0.0027
mrSLIM 5 0.1 0.2014±0.0035 0.0906±0.0013 0.2861±0.0028 0.1453±0.0022 0.5115±0.0040 0.3241±0.0023
mrBPR 0.01 1e-3 0.2333±0.0035 0.1071±0.0019 0.3180±0.0037 0.1554±0.0021 0.5206±0.0039 0.3377±0.0025
CSSRsh - 0.01 0.2217±0.0023 0.1013±0.0010 0.2949±0.0028 0.1453±0.0011 0.4952±0.0055 0.3106±0.0026
CSSRsp - 0.01 0.2348±0.0041 0.1070±0.0019 0.3020±0.0046 0.1539±0.0022 0.5052±0.0029 0.3241±0.0020
CSSR 0.5 0.01 0.2746±0.0032 0.1241±0.0013 0.3345±0.0037 0.1703±0.0014 0.5341±0.0030 0.3471±0.0015
1e-3 0.01 0.1 1 10
0.45
0.46
0.47
0.48
0.49
0.5
0.51
0.52
0.53
0.54
l
MAP
b=0.05
b=0.1
b=0.5
b=1
b=10
1e-3 0.01 0.1 1 10
0.16
0.18
0.2
0.22
0.24
0.26
0.28
l
MAP
b=0.05
b=0.1
b=0.5
b=1
b=10
1e-3 0.01 0.1 1 10
0.27
0.28
0.29
0.3
0.31
0.32
0.33
0.34
l
MAP
b=0.05
b=0.1
b=0.5
b=1
b=10
Book Music Movie
Figure 2: Performance variation of CSSR with re-
spect to βand λon three domains.
{α1= 0.2, α2= 0.3, α3= 0.5}based on the MAP perfor-
mance on the test set. The impact of group lasso parameter
βand lasso parameter λwill be further studied.
3.2 Results and Analysis
Experimental results of above baselines and three variants
of the proposed CSSR are shown in Table 2. The optimal pa-
rameters we obtained for CSSR are β= 0.5, λ = 0.01. From
Table 2 we observed that, our full model CSSR consistent-
ly outperforms all the other baselines. Specifically, for the
reason that PopRank is not a personalized algorithm and
only recommends items based on their popularity, all other
baselines can beat it. This proves the importance of person-
alization in recommender systems. By comparing the results
along with three domains, we discover that recommending
movies is a relatively easier task as all the approaches except
PopRank perform well on this domain. Thereby, transfer-
ring knowledge from movie domain would benefit the oth-
er two tasks. Lacking of a properly designed mechanism
for multi-task learning, NCDCF U and NCDCF I (with the
number of nearest neighbors k= 100 and k= 50) cannot
achieve satisfying results in book and music domains. We
implement mrSLIM by setting the weights of elastic net as
β= 5 and λ= 0.1 to find an optimal performance. With the
learned item similarity matrix, mrSLIM behaves better than
its simple version NCDCF I. Moveover, CSSR surpasses the
state-of-the-art ranking method mrBPR with the learning
rate as 0.01 and the weights of 2-norm as 1e3. mrBPR
focuses on studying the domain consistency to model a shar-
ing component, but ignores the heterogeneous part which
might be different in domains, in particular, conformity.
To evaluate the effectiveness of collective similarity, we
compare the three variants of CSSR. CSSRsh and CSSRsp
encode consistency and heterogeneous among domains under
the adaptive kNN settings, separately. However, the result-
s show that they obtain improvement on Book and Music
domains but fail to outperform previous methods in Movie
domain. Integrating different aspect of the two variants, our
method yields the best performance in multiple domains.
Finally, to understand the influence of regularization terms,
we analyze the performance variations with respect to λand
β. As illustrated in Fig.2, by fixing α, the performance first
increases when βgets larger. This verifies our conformity
assumption that some users are representative and could be
detected by enforcing a group lasso structure. But over-
whelmingly large βwould cause information loss in domains
and result in the accuracy declining. By fixing β, we can
see that the αis less sensitive. Another interesting obser-
vation is when βis improperly large(β= 10), decreasing
αcan upgrade results to some extent. This indicates some
missing information of domain-specific component could be
compensated by domain-shared component.
4. CONCLUSIONS
In this paper, we propose a novel CSSR method for multi-
domain recommendation, which integrates personalization
with conformity effect to construct a collective similarity
parameter. By applying the least square loss, ranking scores
can be predicted by the optimized neighbors. To model the
different kind of neighbors, lasso and group lasso constraints
are used. Experiments on multi-domain dataset show our
method outperforms baselines for top-N recommendation.
5. REFERENCES
[1] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein.
Distributed optimization and statistical learning via the
alternating direction method of multipliers. Found. Trends
Mach. Learn., 2011.
[2] A. Krohn-Grimberghe, L. Drumond, C. Freudenthaler, and
L. Schmidt-Thieme. Multi-relational matrix factorization
using bayesian personalized ranking for social network data.
In WSDM, 2012.
[3] X. Ning and G. Karypis. SLIM: Sparse linear models for
top-n recommender systems. In ICDM, 2011.
[4] A. P. Singh and G. J. Gordon. Relational learning via
collective matrix factorization. In KDD, 2008.
[5] T. Yuan, J. Cheng, X. Zhang, S. Qiu, and H. Lu.
Recommendation by mining multiple user behaviors with
group sparsity. In AAAI, 2014.
[6] X. Zhang, J. Cheng, T. Yuan, B. Niu, and H. Lu. Toprec:
domain-specific recommendation through community topic
mining in social network. 2013.
Article
Full-text available
Recently, some recommendation methods try to improve the prediction results by integrating information from user's multiple types of behaviors. How to model the dependence and independence between different behaviors is critical for them. In this paper, we propose a novel recommendation model, the Group-Sparse Matrix Factorization (GSMF), which factorizes the rating matrices for multiple behaviors into the user and item latent factor space with group sparsity regularization. It can (1) select out the different subsets of latent factors for different behaviors, addressing that users' decisions on different behaviors are determined by different sets of factors; (2) model the dependence and independence between behaviors by learning the shared and private factors for multiple behaviors automatically ; (3) allow the shared factors between different behaviors to be different, instead of all the behaviors sharing the same set of factors. Experiments on the real-world dataset demonstrate that our model can integrate users' multiple types of behaviors into recommendation better, compared with other state-of-the-arts. Copyright © 2014, Association for the Advancement of Artificial Intelligence.
Conference Paper
Full-text available
Traditionally, Collaborative Filtering assumes that similar users have similar responses to similar items. However, human activities exhibit heterogenous features across multiple domains such that users own similar tastes in one domain may behave quite differently in other domains. Moreover, highly sparse data presents crucial challenge in preference prediction. Intuitively, if users' interested domains are captured first, the recommender system is more likely to provide the enjoyed items while filter out those uninterested ones. Therefore, it is necessary to learn preference profiles from the correlated domains instead of the entire user-item matrix. In this paper, we propose a unified framework, TopRec, which detects topical communities to construct interpretable domains for domain-specific collaborative filtering. In order to mine communities as well as the corresponding topics, a semi-supervised probabilistic topic model is utilized by integrating user guidance with social network. Experimental results on real-world data from Epinions and Ciao demonstrate the effectiveness of the proposed framework.
Conference Paper
Full-text available
This paper focuses on developing effective and efficient algorithms for top-N recommender systems. A novel Sparse Linear Method (SLIM) is proposed, which generates top-N recommendations by aggregating from user purchase/rating profiles. A sparse aggregation coefficient matrix W is learned from SLIM by solving an `1-norm and `2-norm regularized optimization problem. W is demonstrated to produce high quality recommendations and its sparsity allows SLIM to generate recommendations very fast. A comprehensive set of experiments is conducted by comparing the SLIM method and other state-of-the-art top-N recommendation methods. The experiments show that SLIM achieves significant improvements both in run time performance and recommendation quality over the best existing methods.
Article
Full-text available
Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for ℓ1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.
Book
Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers argues that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas-Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for ?1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, it discusses applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. It also discusses general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.
Conference Paper
Relational learning is concerned with predicting unknown values of a relation, given a database of entities and ob- served relations among entities. An example of relational learning is movie rating prediction, where entities could in- clude users, movies, genres, and actors. Relations encode users' ratings of movies, movies' genres, and actors' roles in movies. A common prediction technique given one pairwise relation, for example a #users #movies ratings matrix, is low-rank matrix factorization. In domains with multiple relations, represented as multiple matrices, we may improve predictive accuracy by exploiting information from one re- lation while predicting another. To this end, we propose a collective matrix factorization model: we simultaneously factor several matrices, sharing parameters among factors when an entity participates in multiple relations. Each rela- tion can have a dierent value type and error distribution; so, we allow nonlinear relationships between the parameters and outputs, using Bregman divergences to measure error. We extend standard alternating projection algorithms to our model, and derive an ecient Newton update for the pro- jection. Furthermore, we propose stochastic optimization methods to deal with large, sparse matrices. Our model gen- eralizes several existing matrix factorization methods, and therefore yields new large-scale optimization algorithms for these problems. Our model can handle any pairwise re- lational schema and a wide variety of error models. We demonstrate its eciency,
Conference Paper
A key element of the social networks on the internet such as Facebook and Flickr is that they encourage users to create connections between themselves, other users and objects. One important task that has been approached in the literature that deals with such data is to use social graphs to predict user behavior (e.g. joining a group of interest). More specifically, we study the cold-start problem, where users only participate in some relations, which we will call social relations, but not in the relation on which the predictions are made, which we will refer to as target relations. We propose a formalization of the problem and a principled approach to it based on multi-relational factorization techniques. Furthermore, we derive a principled feature extraction scheme from the social data to extract predictors for a classifier on the target relation. Experiments conducted on real world datasets show that our approach outperforms current methods.