Content uploaded by Xi Sheryl Zhang

Author content

All content in this area was uploaded by Xi Sheryl Zhang on Aug 06, 2018

Content may be subject to copyright.

When Personalization Meets Conformity: Collective

Similarity based Multi-Domain Recommendation

Xi Zhang1, Jian Cheng1, Shuang Qiu1, Zhenfeng Zhu2, Hanqing Lu1

1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences

2Institute of information Science, Beijing Jiaotong University

{xi.zhang, jcheng, shuang.qiu, luhq}@nlpr.ia.ac.cn, zhfzhu@bjtu.edu.cn

ABSTRACT

Existing recommender systems place emphasis on person-

alization to achieve promising accuracy. However, in the

context of multiple domain, users are likely to seek the same

behaviors as domain authorities. This conformity eﬀect pro-

vides a wealth of prior knowledge when it comes to multi-

domain recommendation, but has not been fully exploited.

In particular, users whose behaviors are signiﬁcant similar

with the public tastes can be viewed as domain authori-

ties. To detect these users meanwhile embed conformity

into recommendation, a domain-speciﬁc similarity matrix is

intuitively employed. Therefore, a collective similarity is ob-

tained to leverage the conformity with personalization. In

this paper, we establish a Collective Structured Sparse Rep-

resentation(CSSR) method for multi-domain recommenda-

tion. Based on adaptive k-Nearest-Neighbor framework, we

impose the lasso and group lasso penalties as well as least

square loss to jointly optimize the collective similarity. Ex-

perimental results on real-world data conﬁrm the eﬀective-

ness of the proposed method.

Categories and Subject Descriptors

H.3.3 [Information Search and Retrieval]: Information

Filtering

Keywords

Multiple Domains; Recommendation; Conformity

1. INTRODUCTION

The fast growth of Web 2.0 technologies facilitates and

encourages various online user behaviors, meanwhile brings

tremendous information to the public. Personalized recom-

mendation, as critical methods to push the right information

to the right users, have attracted large amounts of research

in both industry and academia. Among these methods, Col-

laborative Filtering(CF) has superior performance than oth-

er methods via an underlying assumption about entity sim-

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full cita-

tion on the ﬁrst page. Copyrights for components of this work owned by others than

ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-

publish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from Permissions@acm.org.

SIGIR’15, August 09 - 13, 2015, Santiago, Chile.

c

⃝2015 ACM. ISBN 978-1-4503-3621-5/15/08 ...$15.00.

DOI: http://dx.doi.org/10.1145/2766462.2767810.

ilarity. Namely, users with common interests in the past

would behave much more similarly on items in the future.

Traditionally, CF approaches regard mining distinct types

of user behavior as a task within a single domain. Neverthe-

less, in many real applications, multiple behaviors ranging

from reading books, favoring music, to watching videos can

reﬂect the characteristics of users together, and thus train-

ing a group of predictors in a uniﬁed manner would enhance

accuracy for all the domains.

To cope with multi-domain recommendation, a straight-

forward scheme is to extend the CF algorithms into a multi-

task learning problem solved by propagating similarities a-

mong domains [4, 2]. However, somewhat surprisingly, the

context of multi-domain often introduces conformity1to on-

line user behaviors, which cannot be ignored. In particular,

the public are inclined to conform to the choices or belief-

s of domain authorities [6]. For example, comments from

well-known movie critics would be likely to aﬀect the trend

of box oﬃces while reading lists of celebrated writers would

probably receive admiration. That is to say, compared with

ordinary users, opinions of these domain elites are more rep-

resentative and thus have considerable inﬂuence on recom-

mendation results. Unfortunately, this eﬀect has not been

fully exploited in existing multi-domain approaches.

The key problem becomes how to detect the domain au-

thorities and integrate the conformity eﬀect into recommen-

dation. To simplify this problem, we concern that the elites

are users whose past behaviors can largely reﬂect the pub-

lic tastes. In fact, if a user has signiﬁcant similar behaviors

to a large number of other users, she/he is probably the

elite user who can represent others. To this end, a domain-

speciﬁc user similarity matrix based on observed behaviors

is built to embody conformity. On the other hand, since

users’ intrinsic preferences over domains are also decisive,

a domain-shared user similarity matrix is utilized to proﬁle

users globally. Therefore, a collective similarity combining

the eﬀect of conformity with the original personalization is

established.

In this paper, we propose a Collective Structured Sparse

Representation(CSSR) method to optimize the collective sim-

ilarity for a top-Nmulti-domain recommendation task. The

CSSR is on the basis of the adaptive k-Nearest-Neighbor

framework. To learn ranked item lists under each domain

for users, a least square regression loss of feedback represen-

tation is employed. To encode personalization and confor-

1The general conformity is a social phenomenon of matching

attitudes, beliefs, and behaviors to group norm. Here we

narrow the concept to achieving same behaviors as elites.

mity, regularization with lasso and group lasso are adopted.

Also, we show how to use the Alternating Direction Method

of Multipliers(ADMM) [1] to eﬃciently optimize the pa-

rameters. Experiments show that our method consistently

achieves more accurate results than other state-of-the-arts.

2. THE PROPOSED APPROACH

2.1 Notation and Background

Suppose that there are a set of users U={u1,· · · , un}and

Dtypes of domains. Let the matrix Xd∈Rmd×nrecord the

interactions between the overall user set and item set Vdfor

the d-th domain, where d= 1,··· , D and mdis the size of

the item set. Then if a user has any behavior on item, the

element entry of Xdcould be 1 or a positive value, otherwise

the entry is set as 0. Note that implicit feedback such as

clicks, purchases and posts are easier to obtain than rating

records in real-world scenarios. Thus we assume that the

multiple feedback matrices are binary.

For a multiple domain recommendation, given a series of

user-item feedback matrices {X1,··· ,XD}, our goal is to

recommend a personalized ranking list of the potential en-

joyed items under each domain for users.

Here we introduce a general similarity based framework

for recommendation without multi-domain. Normally, pref-

erence scores are estimated by a function of f(X,Θ), where

Xis the feedback matrix and Θ is the model parameter.

The adaptive k-Nearest-Neighbor(kNN) method, which is

very popular for collaborative ﬁltering, has been shown its

simpleness but eﬀectiveness in top-Nrecommendation [3].

We establish our model based on the adaptive kNN tech-

nique in this paper. Consider a user based kNN problem,

the parameter Θ is set as matrix W∈Rn×n, and the pre-

diction function f(·) could be presented as an aggregation of

the feedback values of knearest neighbors. Mathematical-

ly, xij =xT

iwj, where xij is the predicted scores, xi∈Rn

denotes the feedback indicator of item vi, and wj∈Rnde-

notes a similarity coeﬃcients of user uj. The parameter W

can be regarded as the similarity between user pairs, which

can be optimized by the following problem

W= arg min

WL(W) + Ω(W)(1)

where Lis a loss function usually deﬁned as least squares

or logistic regression, and Ω is a regularization penalty to

enforce parameter with speciﬁc structures. Ridge, lasso, or

elastic net regularization has been used in previous methods.

2.2 Collective Similarity

Now we consider the problem of multi-domain recommen-

dation. As the user set is common across domains, the user

based similarity is exploited. The interpretation behind the

idea is simple. That is, to learn a set of nearest neigh-

bors who are suﬃciently similar with the user in terms of

multiple behaviors over domains. In other words, a user’s

interests can be represented by the set of her/his neighbors.

The usual approach assumes that all users are ordinary with

the comparative ability of representation and thus the per-

sonalized neighbors are mined for each users independent-

ly. Nevertheless, conformity always occurs in the context of

multi-domain. Under a speciﬁc domain, elite users’ favorites

are more acceptable to the public during recommendation.

This indicates that the similarity coeﬃcients between these

d

X

d

W

V

Collective Similarity Feedbacks Matrices Domain-Specific

Similarity

Domain-Shared

Similarity

Least Squares Regression Loss Lasso

Regularization

Group Lasso

Regularization

´

»

+

´

d

X

d

V

Figure 1: Illustration of the CSSR Model

representative elite users and others tend to be signiﬁcant.

Hence, we propose a collective similarity as

Wd=V+Vd(2)

where d= 1,··· , D. Parameter matrix V∈Rn×nindi-

cates the personalized neighbors with intrinsic similar inter-

est across domains, which is a domain-shared component.

Parameter matrix Vd∈Rn×nembodies the representative

of elite neighbors in the domain, which is a domain-speciﬁc

component.

Next, we seek to explore the conformity by regularizing

the structure of the parameter set {V1,· · · ,VD}with group

lasso. On the one hand, the domain-speciﬁc elite users can

largely represent other users, which results in more dense for

the corresponding rows in Vd. On the other hand, the fact

that elite users are always the minority leads to the sparsity

in columns. Hence, the above structure serves to deﬁne a

group lasso regularization term

Ωglas(Vd) = βd∥Vd∥2,1=

n

i=1

βd∥vi

d∥2(3)

where ∥vi

d∥2is the ℓ2-norm for each row of Vd.ℓ2,1-norm

encourages the row sparsity with jointly selecting the sig-

niﬁcant rows as elite users. βdencodes the contributions of

group lasso for each domain. Without loss of generality, we

assume ∀d, βd=β.

Furthermore, users’ personalized neighbors are also sparse

in the overall user set. A lasso constraint is utilized to ﬁlter

out noised coeﬃcients in V, which is

Ωlas(V) = λ∥V∥1=

n

i=1

n

j=1

λ|vij |(4)

With above discussion, the collective similarity can be in-

corporated into a loss function with a least square loss as

L(Wd) = 1

2∥Xd−XdWd∥2

F. Following the general frame-

work in Eq. (1), we propose a CSSR model for adaptive kNN

problem in multiple domains as

min

Θ

D

d=1

αd(L(Wd)+Ωglas(Vd)) + Ωlas (V)

s.t.Wd=V+Vd,Wd≥0,diag(Wd) = 0

∀d, d = 1,··· , D

(5)

The parameter set Θ is written as {Wd,Vd,V}for suc-

cinct. And αdbalances the eﬀect of diﬀerent domain. In

addition, the non-negative constraint is used on similarity

for a more interpretable consideration, while diag(Wd)=0

ensures that the trivial solution of identity matrix is avoid-

ed. From the uniﬁed objective function, information of feed-

back matrices can be transferred among domains by Vand

distinctions can be captured by Vd. The entire model is

depicted in Figure 1.

2.3 Optimization

Although non-smooth terms are introduced, our approach

maintains convexity of the objective function. We propose

to apply the ADMM [1] to split the original problem into

several subproblems which can be handled in alternating

directions.

To solve the problem in Eq. (5), we ﬁrst obtain the aug-

mented Lagrangian as

min

Θ,Yd

D

d=1

αdL(Wd) + I+(Wd)+Ωglas (Vd)+ Ωlas(V)

+

D

d=1 tr(YT

d(Wd−V−Vd)) + ρ

2∥Wd−V−Vd∥2

F

(6)

where Ydis Lagrangian multiplier of the d-th domain, I+(·)

is the indicator function for the non-negative constraint, and

ρ > 0 is a penalty parameter. For the given domain d,

optimizing {Wd,Vd,Yd}are independent with optimizing

the set of variables in other domains, thus ADMM proceeds

by solving following problems alternately until convergence

min

Wd≥0αdL(Wd) + ρ

2∥Wd−V−Vd+Ud∥2

F(7a)

min

Vd

αdΩglas(Vd) + ρ

2∥Wd−V−Vd+Ud∥2

F(7b)

min

VΩlas(V) + ρ

2

D

d=1

∥Wd−V−Vd+Ud∥2

F(7c)

Ud=Ud+Wd−V−Vd(7d)

where Ud=1

ρYdleads to a scaled form. Then we pro-

vide closed-form solutions of the subproblems for Eq.(7a),

Eq.(7b) and Eq.(7c) as below.

Update for Wd.We ﬁrst rewrite the problem into an

unconstrained problem as

min

Wd

αdL(Wd) + ρ

2∥Wd−V−Vd+Ud∥2

F−tr(ΦT

dWd)(8)

Since Eq.(7a) is smooth, the solution can be found by taking

its derivative and set it to be zero

XT

dXd+ρIWd+ρ(Ud−V−Vd)−XT

dXd−Φd= 0 (9)

Using the Karush-Kuhn-Tucker complementary condition

for the nonnegativity of Wd, [Φd]ij[Wd]ij = 0, we get

[Wd]ij = [Wd]ij

XT

dXd+ρ(Ud−V−Vd)−ij

(XT

dXd+ρI)Wd+ρ(Ud−V−Vd)+ij

(10)

To guarantee the nonnegativity, we decompose matrices with

any signs as A=A+−A−, where A+

ij = (|Aij |+Aij )/2

and A−

ij = (|Aij | − Aij )/2.

Update for Vd.For each row of Vd, the optimization

problem in Eq.(7b) comes to

min

vi

d

β∥vi

d∥2+ρ

2∥vi

d−(wi

d−vi+ui

d)∥2

2(11)

where iis the row index of parameter matrices. Let zi

d=

wi

d−vi+ui

d. This problem can be solved by a proximal

operator of group lasso to each row vector as

vi

d=Sglas, β

ρ

(zi

d) =

0,if ∥zi

d∥2≤β

ρ

∥zi

d∥2−β

ρ

∥zi

d∥2

zi

d,otherwise. (12)

Update for V. Optimizing the second term with respect

to Vin Eq.(7c) equals to optimizing ρ

2∥V−1

DD

d=1(Wd−

Vd+Ud)∥2

Fwhich can be substituted into Eq.(7c). Then

we have the element-wise optimal solution as

[V]ij =Slas,λ

ρ

1

D

D

d=1

(Wd−Vd+Ud)ij

(13)

where Slas,ε(·) is the soft-thresholding operator

Slas,ε(x) =

x−ε, if x>ε

x+ε, if x < −ε

0,otherwise.

(14)

3. EXPERIMENTS

3.1 Experiment Settings

To empirically study the eﬀectiveness of our method, we

perform experiments on a multi-domain dataset crawled from

the publicly available site Douban2. It is a famous Web2.0

website containing rating behavior of users, scaled from 1

to 5 stars, on books, music and movies. As we face with a

top-Nrecommendation task, we use the implicit feedback

instead of rating scores. For a suﬃcient evaluation of each

user, we ﬁltering out users with less than 10 feedbacks on

the three domains and obtained a dataset of 5,916 users.

The detailed description is presented in Table 1.

Table 1: Description of Douban Dataset

Domain #Items %Sparsity #Ratings per User

Book 14,155 99.85 22

Music 15,492 99.75 38

Movie 7,845 98.87 88

For personalized item recommendation, we analyze perfor-

mance of the model by comparing the top suggestions to the

true behaviors taken by a user. We adopt two widely used

evaluation metrics in top-Nrecommendation: MAP(Mean

Average Precision), NDCG(Normalized Discounted Cumu-

lative Gain) with the setting N= 5. Higher values on the

metrics imply better recommendation results.

We compare the proposed CSSR with several popular

baseline methods: PopRank, NCDCF U, NCDCF I [5], m-

rBPR [2], mrSLIM. Here we extend SLIM [3] to mrSLIM by

constructing a feedback matrix that takes all items in three

domains as rows. For a comprehensive comparison, perfor-

mances of CSSRsh and CSSRsp are shown. They merely

consider the domain-shared or domain-speciﬁc similarity in

Eq. (2), and utilize lasso constrains.

In our experiments, we randomly pick 80% observed feed-

backs for each user to form the training set in each domain

and the rest of 20% is test set. The random sampling is

repeated 10 times and the average performance are report-

ed. We set the weights combination of domains in CSSR as

2http://www.douban.com

Table 2: Prediction performance(mean ±std.) of three variants of CSSR and PopRank, NCDCF U, NCD-

CF I, mrSLIM, mrBPR on three domains of Douban dataset. Results in bold indicate the best ones.

Methods Params Book Music Movie

MAP NDCG MAP NDCG MAP NDCG

PopRank N/A 0.1370±0.0039 0.0619±0.0014 0.1879±0.0029 0.0922±0.0014 0.3784±0.0031 0.2249±0.0017

NCDCF U - 100 0.1826±0.0033 0.0813±0.0013 0.2647±0.0035 0.1325±0.0015 0.4982±0.0038 0.3159±0.0024

NCDCF I - 50 0.1864±0.0029 0.0835±0.0011 0.2668±0.0016 0.1341±0.0007 0.5049±0.0047 0.3230±0.0027

mrSLIM 5 0.1 0.2014±0.0035 0.0906±0.0013 0.2861±0.0028 0.1453±0.0022 0.5115±0.0040 0.3241±0.0023

mrBPR 0.01 1e-3 0.2333±0.0035 0.1071±0.0019 0.3180±0.0037 0.1554±0.0021 0.5206±0.0039 0.3377±0.0025

CSSRsh - 0.01 0.2217±0.0023 0.1013±0.0010 0.2949±0.0028 0.1453±0.0011 0.4952±0.0055 0.3106±0.0026

CSSRsp - 0.01 0.2348±0.0041 0.1070±0.0019 0.3020±0.0046 0.1539±0.0022 0.5052±0.0029 0.3241±0.0020

CSSR 0.5 0.01 0.2746±0.0032 0.1241±0.0013 0.3345±0.0037 0.1703±0.0014 0.5341±0.0030 0.3471±0.0015

1e-3 0.01 0.1 1 10

0.45

0.46

0.47

0.48

0.49

0.5

0.51

0.52

0.53

0.54

l

MAP

b=0.05

b=0.1

b=0.5

b=1

b=10

1e-3 0.01 0.1 1 10

0.16

0.18

0.2

0.22

0.24

0.26

0.28

l

MAP

b=0.05

b=0.1

b=0.5

b=1

b=10

1e-3 0.01 0.1 1 10

0.27

0.28

0.29

0.3

0.31

0.32

0.33

0.34

l

MAP

b=0.05

b=0.1

b=0.5

b=1

b=10

Book Music Movie

Figure 2: Performance variation of CSSR with re-

spect to βand λon three domains.

{α1= 0.2, α2= 0.3, α3= 0.5}based on the MAP perfor-

mance on the test set. The impact of group lasso parameter

βand lasso parameter λwill be further studied.

3.2 Results and Analysis

Experimental results of above baselines and three variants

of the proposed CSSR are shown in Table 2. The optimal pa-

rameters we obtained for CSSR are β= 0.5, λ = 0.01. From

Table 2 we observed that, our full model CSSR consistent-

ly outperforms all the other baselines. Speciﬁcally, for the

reason that PopRank is not a personalized algorithm and

only recommends items based on their popularity, all other

baselines can beat it. This proves the importance of person-

alization in recommender systems. By comparing the results

along with three domains, we discover that recommending

movies is a relatively easier task as all the approaches except

PopRank perform well on this domain. Thereby, transfer-

ring knowledge from movie domain would beneﬁt the oth-

er two tasks. Lacking of a properly designed mechanism

for multi-task learning, NCDCF U and NCDCF I (with the

number of nearest neighbors k= 100 and k= 50) cannot

achieve satisfying results in book and music domains. We

implement mrSLIM by setting the weights of elastic net as

β= 5 and λ= 0.1 to ﬁnd an optimal performance. With the

learned item similarity matrix, mrSLIM behaves better than

its simple version NCDCF I. Moveover, CSSR surpasses the

state-of-the-art ranking method mrBPR with the learning

rate as 0.01 and the weights of ℓ2-norm as 1e−3. mrBPR

focuses on studying the domain consistency to model a shar-

ing component, but ignores the heterogeneous part which

might be diﬀerent in domains, in particular, conformity.

To evaluate the eﬀectiveness of collective similarity, we

compare the three variants of CSSR. CSSRsh and CSSRsp

encode consistency and heterogeneous among domains under

the adaptive kNN settings, separately. However, the result-

s show that they obtain improvement on Book and Music

domains but fail to outperform previous methods in Movie

domain. Integrating diﬀerent aspect of the two variants, our

method yields the best performance in multiple domains.

Finally, to understand the inﬂuence of regularization terms,

we analyze the performance variations with respect to λand

β. As illustrated in Fig.2, by ﬁxing α, the performance ﬁrst

increases when βgets larger. This veriﬁes our conformity

assumption that some users are representative and could be

detected by enforcing a group lasso structure. But over-

whelmingly large βwould cause information loss in domains

and result in the accuracy declining. By ﬁxing β, we can

see that the αis less sensitive. Another interesting obser-

vation is when βis improperly large(β= 10), decreasing

αcan upgrade results to some extent. This indicates some

missing information of domain-speciﬁc component could be

compensated by domain-shared component.

4. CONCLUSIONS

In this paper, we propose a novel CSSR method for multi-

domain recommendation, which integrates personalization

with conformity eﬀect to construct a collective similarity

parameter. By applying the least square loss, ranking scores

can be predicted by the optimized neighbors. To model the

diﬀerent kind of neighbors, lasso and group lasso constraints

are used. Experiments on multi-domain dataset show our

method outperforms baselines for top-N recommendation.

5. REFERENCES

[1] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein.

Distributed optimization and statistical learning via the

alternating direction method of multipliers. Found. Trends

Mach. Learn., 2011.

[2] A. Krohn-Grimberghe, L. Drumond, C. Freudenthaler, and

L. Schmidt-Thieme. Multi-relational matrix factorization

using bayesian personalized ranking for social network data.

In WSDM, 2012.

[3] X. Ning and G. Karypis. SLIM: Sparse linear models for

top-n recommender systems. In ICDM, 2011.

[4] A. P. Singh and G. J. Gordon. Relational learning via

collective matrix factorization. In KDD, 2008.

[5] T. Yuan, J. Cheng, X. Zhang, S. Qiu, and H. Lu.

Recommendation by mining multiple user behaviors with

group sparsity. In AAAI, 2014.

[6] X. Zhang, J. Cheng, T. Yuan, B. Niu, and H. Lu. Toprec:

domain-speciﬁc recommendation through community topic

mining in social network. 2013.