Content uploaded by Dung D. Le
Author content
All content in this area was uploaded by Dung D. Le on Oct 15, 2017
Content may be subject to copyright.
Indexable Bayesian Personalized Ranking for Eicient Topk
Recommendation
Dung D. Le
Singapore Management University
80 Stamford Road
Singapore 178902
ddle.2015@phdis.smu.edu.sg
Hady W. Lauw
Singapore Management University
80 Stamford Road
Singapore 178902
hadywlauw@smu.edu.sg
ABSTRACT
Top
k
recommendation seeks to deliver a personalized recommen
dation list of
k
items to a user. e dual objectives are (1) accuracy
in identifying the items a user is likely to prefer, and (2) eciency in
constructing the recommendation list in real time. One direction to
wards retrieval eciency is to formulate retrieval as approximate
k
nearest neighbor (kNN) search aided by indexing schemes, such as
localitysensitive hashing, spatial trees, and inverted index. ese
schemes, applied on the output representations of recommendation
algorithms, speed up the retrieval process by automatically discard
ing a large number of potentially irrelevant items when given a user
query vector. However, many previous recommendation algorithms
produce representations that may not necessarily align well with
the structural properties of these indexing schemes, eventually re
sulting in a signicant loss of accuracy postindexing. In this paper,
we introduce Indexable Bayesian Personalized Ranking (Indexable
BPR ) that learns from ordinal preference to produce representation
that is inherently compatible with the aforesaid indices. Experi
ments on publicly available datasets show superior performance
of the proposed model compared to stateoftheart methods on
top
k
recommendation retrieval task, achieving signicant speedup
while maintaining high accuracy.
1 INTRODUCTION
Today, we face a multitude of options in various spheres of life,
e.g., deciding which product to buy at Amazon, selecting which
movie to watch on Netix, choosing which article to read on social
media, etc. e number of possibilities is immense. Driven by
necessity, service providers rely on recommendation algorithms to
identify a manageable number
k
of the most preferred options to
be presented to each user. Due to the limited screen real estate of
devices (increasingly likely to be ever smaller mobile devices), the
value of
k
may be relatively small (e.g.,
k=
10), yet the selection of
items to be recommended are personalized to each individual.
To construct such personalized recommendation lists, we learn
from users’ historical feedback, which may be explicit (e.g., rat
ings) [
12
] or implicit (e.g., click behaviors) [
22
]. An established
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permied. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
CIKM’17 , Singapore, Singapore
©
2017 Copyright held by the owner/author(s). Publication rights licensed to ACM.
9781450349185/17/11. . . $15.00
DOI: 10.1145/3132847.3132913
methodology in the literature based on matrix factorization [
23
,
26
]
derives a latent vector
xu∈RD
for each user
u
, and a latent vector
yi∈RD
for each item
i
, where
D
is the dimensionality. e degree
of preference of user
u
for item
i
is modeled as the inner product
xuTyi
. To arrive at the recommendation for
u
, we need to identify
the topkitems with the maximum inner product to xu.
ere are two overriding goals for such top
k
recommendations.
One is accuracy, to maximize the correctness in placing items that
user
u
prefers most into
u
’s recommendation list. Another is ef
ciency; in particular we are primarily concerned with retrieval
eency, to minimize the time taken to deliver the recommenda
tion list upon request. Faster retrieval helps the system to cope
with a large number of consumers, and minimize their waiting
time to receive recommendations. In contrast, learning eciency
or minimizing model learning time, while useful, is arguably less
missioncritical, as it can be done oine and involves mainly ma
chine time, rather than human time. erefore, we seek to keep the
learning time manageable, while improving retrieval eciency.
Many previous recommendation algorithms focus mainly on
accuracy. One challenge in practice is the need for exhaustive
search over all candidate items to identify the top
k
, which is time
consuming when the number of items Nis extremely large [11].
Problem.
In this paper, we pay equal aention to both goals, i.e.,
optimizing retrieval eciency of top
k
recommendation without
losing sight of accuracy. An eective approach to improve eciency
is to use indexing structures such as localitysensitive hashing
(LSH) [
24
], spatial trees (e.g., KDtree [
3
]), and inverted index [
4
].
By indexing items’ latent vectors, we can quickly retrieve a small
candidate set for
k
“most relevant” items to the user query vector,
probably in sublinear time w.r.t. the number of items
N
. is
avoids an exhaustive search, and saves on the computation for
those large number of items that the index considers irrelevant.
Here, we focus on indexing as an alternative to exhaustive search
in real time. Indexing is preferred over precomputation of recom
mendation lists for all users, which is impractical [
4
,
11
]. User
interests change over time. New items appear. By indexing, we
avoid the storage requirement of dealing with all possible user
item pairs. Index storage scales only with the number of items
N
,
while the number of queries/users could be larger. Indexing exibly
allows the value of kto be specied at runtime.
However, most of the previous recommendation algorithms
based on matrix factorization [
12
] are not designed with index
ing in mind. e objective is to recommend to user
u
those items
with maximum inner product
xuTyi
is not geometrically compati
ble with aforementioned index structures. For one thing, it has been
established that there cannot exist any LSH family for maximum
inner product search [
24
]. For another, retrieval on a spatial tree
index nds the nearest neighbors based on the Euclidean distance,
which are not equivalent to those with maximum inner product
[
2
]. In turn, [
4
] describes an inverted index scheme based on cosine
similarity, which again is not equivalent to inner product search.
Approach.
e key reason behind the incompatibility between
inner product search that matrix factorization relies on, and the
aforesaid index structures is how a user
u
’s degree of preference
for an item i, expressed as the inner product xuTyi, is sensitive to
the respective magnitude of the latent vectors
xu ,yi
. ere
fore, one insight towards achieving geometric compatibility is to
desensitize the eect of vector magnitudes. e challenge is how
to do so while still preserving the accuracy of the topkretrieval.
ere are a couple of recent approaches in this direction. One
approach [
2
] is a postprocessing transformation that expands the
latent vectors learnt from matrix factorization with an extra dimen
sionality to equalize the magnitude of all item vectors. Because the
transformation is a separate process from learning the vectors, such
a workaround would not be as eective as working with natively
indexable vectors in the rst place. Another approach [
7
] extends
the Bayesian Probabilistic Matrix Factorization [
23
], by making the
item latent vectors natively of xed length. Fiing inner product
to absolute rating value may not be suitable when only implicit
feedback (not rating) is available. Moreover, we note that top
k
recommendation is inherently an expression of “relative” rather
than “absolute” preferences, i.e., the ranking among items is more
important than the exact scores.
We propose to work with ordinal expressions of preferences.
Ordinal preferences can be expressed as a triple
(u,i,j)
, indicat
ing that a user
u
prefers an item
i
to a dierent item
j
. Ordinal
representation is prevalent in modeling preferences [
22
], and also
accommodates both explicit (e.g., ratings) and implicit feedback.
Contributions. is paper makes the following contributions:
First, we propose Indexable Bayesian Personalized Ranking model
or Indexable BPR in short, which produces native geometrically
indexable latent vectors for accurate and ecient top
k
recommen
dation. BPR [
22
] is a generic framework modeling ordinal triples.
Each instantiation is based on a specic kernel [
8
,
13
,
16
,
20
]. [
22
]
had matrix factorization kernel, which is not welled to indexing
structures. In contrast, our Indexable BPR is formulated with a
kernel based on angular distances (see Section 3). In addition to
requiring a dierent learning algorithm, we will show how this
engenders native compatibility with various index structures.
Second, we describe how the resulting vectors are used with
LSH, spatial tree, and inverted index for top
k
recommendation
in Section 4. We conduct experiments with available datasets to
compare Indexable BPR with baselines. Empirically, we observe
that Indexable BPR achieves a balance of accuracy and runtime
eciency, achieving higher accuracy than the baselines at the same
speedup level, and higher speedup at the same accuracy level.
ird, to support the observation on the robustness of Indexable
BPR , we provide a theoretical analysis in the context of LSH, further
bolstered with empirical evidence, on why our reliance on angular
distances results in more indexfriendly vectors, smaller loss of
accuracy postindexing, and balanced allround performance.
2 RELATED WORK
We review the literature related to the problem of eciently re
trieving topkrecommendations using indexing schemes.
Matrix Factorization.
Matrix factorization is the basis of many
recommendation algorithms [
12
]. For such models, top
k
retrieval
is essentially reduced to maximum inner product search, with com
plexity proportional to the number of items in the (huge) collection.
is motivates approaches to improve the retrieval eciency of
top
k
recommendation. Of interest to us are those that yield user,
item latent vectors to be used with geometric index structures. is
engenders compatibility with both spatial tree index and inverted
index, as well as with hashing schemes, and transforms the problem
into knearest neighbor (kNN) search.
One approach is the transformation scheme applied to matrix
factorization output. [
2
,
19
] propose a postprocessing step that
extends the output latent vectors by one dimension to equalize
the magnitude of item vectors. eoretical comparisons show that
this Euclidean transformation achieves beer hashing quality as
compared to the two previous methods in [
24
] and [
25
]. However,
the Euclidean transformation results in high concentration of new
item points, aecting the retrieval accuracy of the approximate
kNN. As Indexable BPR relies on ordinal triples, one appropriate
baseline is to use the transformation scheme above on a comparable
algorithm that also relies on triples. We identify BPR [
22
] with inner
product or matrix factorization (MF) kernel, whose implementation
is available1, and refer to the composite as BPR(MF)+.
Another approach is to learn indexable vectors that ts ratings,
which would not work with implicit feedback, e.g., ordinal triples.
Indexable Probabilistic Matrix Factorization or IPMF [
7
] is a rating
based model with constraints to place item vectors on a hypersphere.
We will see that IPMF does not optimize for a high mean of the
normal distribution in Eq. 15 (see Section 5), and ordinalbased
Indexable BPR potentially performs beer.
Others may not involve the standard index structures we study.
[
11
] used representative queries identied by clustering. [
21
] in
vented another data structure (cone tree). [
27
–
29
] learnt binary
codes, which are incompatible with
l2
distance used by spatial tree.
Euclidean Embedding.
Euclidean embedding takes as input
distances (or their ordinal relationships), and outputs low dimen
sional latent coordinates for each point that would preserve the
input as much as possible [
14
]. Because they operate in the Eu
clidean space, the coordinates support nearest neighbor search
using geometric index structures such as spatial trees.
ere exist recent works on using Euclidean embedding to model
user preferences over items, which we include as experimental
baselines. e rst method Collaborative Filtering via Euclidean
Embedding or CFEE [
10
] ts a rating
ˆ
rui
by user
u
on item
i
in terms
of the squared Euclidean distance between
xu
and
yi
. Fiing ratings
directly does not preserve the pairwise comparisons. e second
method Collaborative Ordinal Embedding or COE [
15
] is based
on ordinal triples. It expresses a triple
tui j
through the Euclidean
distance dierence
xu−yj −  xu−yi 
. COE’s objective is to
maximize this dierence for each observation tui j .
1hp://www.librec.net
3 INDEXABLE BPR
Problem.
We consider a set users
U
and a set of items
I
. We
consider as input a set of triples
T ⊂ U × I × I
. A triple
tui j ∈ T
relates one user
u∈ U
and two dierent items
i,j∈ I
, indicating
u
’s preferring item
i
to item
j
. Such ordinal preference is prevalent,
encompassing explicit and implicit feedback scenarios. When rat
ings are available, we can induce an ordinal triple for each instance
when user
u
rates item
i
higher than she rates item
j
. Triples can
also model implicit feedback [
22
]. E.g., when searching on the
Web, one may click on the website
i
and ignore
j
. When browsing
products, one may choose to click or buy product iand skip j.
e goal is to derive a
D
dimensional latent vector
xu∈RD
for
each user
u∈ U
, and a latent vector
yi∈RD
for each item
i∈ I
,
such that the relative preference of a user
u
over two items
i
and
j
can be expressed as a function (to be dened) of their corresponding
latent vectors
xu
,
yi
, and
yj
. We denote the collection of all user
latent vectors and item latent vectors as Xand Yrespectively.
Framework.
Given the input triples
T
, we seek to learn the
user and item vectors X,Ywith the highest posterior probability.
arg max
X,YP(X,YT )(1)
e Bayesian formulation for modeling this posterior probability
is to decompose it into the likelihood of the triples
P(T X,Y)
and
the prior P(X,Y), as shown in Eq. 2.
P(X,YT )∝P(T X,Y)P(X,Y)(2)
We will dene the prior later when we discuss the generative
process. For now, we focus on dening the likelihood, which can
be decomposed into the probability for individual triples
tui j ∈ T
.
P(T X,Y)=Y
tui j ∈T
P(tui j xu,yi,yj)(3)
Weakness of Inner Product Kernel for TopkRetrieval.
To
determine the probability for an individual triple, we need to dene
a kernel function. e kernel proposed by the matrix factorization
based (not natively indexable) BPR [
22
] is shown in Eq. 4 (
σ
is the
sigmoid function). is assumes that if
xuTyi
is higher than
xuTyj
,
then user uis more likely to prefer item ito j.
P(tui j xu,yi,yj)=σ(xuTyi−xuTyj)(4)
Since our intended application is top
k
recommendation, once
we learn the user and item latent vectors, the top
k
recommendation
task is reduced to searching for the
k
nearest neighbors to the query
(user vector) among the potential answers (item vectors). A naive
solution is to conduct exhaustive search over all the items.
An indexingbased approach could reduce the retrieval time
signicantly, by prioritizing or narrowing the search to a smaller
search space. For the nearest neighbors identied by an index to be
as accurate as possible, the notion of similarity (or distance) used
by the index should be compatible with the notion of the similarity
of the underlying model that yields that user and item vectors.
erein lies the issue with the inner product kernel described in
Eq. 4. It is not necessarily compatible with geometric index struc
tures that rely on similarity functions other than inner products.
First, we examine its incompatibility with spatial tree index.
Suppose that all item latent vectors
yi
’s are inserted into the index.
To derive the recommendation for
u
, we use
xu
as the query. Nearest
Figure 1: An illustration for the incompatibility of inner
product kernel for spatial tree index (Euclidean distance)
and inverted index (cosine similarity).
neighbor search on spatial tree index is expected to return items
that are closest in terms of Euclidean distance. e relationship
between Euclidean distance and inner product is expressed in Eq. 5.
It implies that items with the closest Euclidean distances may not
have the highest inner products, due to the magnitudes
xu
and
yi . Spatial tree index retrieval may be inconsistent with Eq. 4.
xu−yi 2= xu 2+ yi 2−2xuTyi(5)
Second, we examine its incompatibility with inverted index that
relies on cosine similarity (Eq. 6). Similarly, the pertinence of the
magnitudes
xu
and
yi 
implies that inverted index retrieval
may be inconsistent with maximum inner product search.
cos (xu,yi)=xuTyi
xu ·  yi  (6)
Fig.1 shows an example to illustrate the above analysis. In Fig. 1,
the inner product
xuTyi
is greater than
xuTyj
, implying that
u
prefers
i
to
j
. However, the Euclidean distance computation shows
that
yj
is closer to
xu
than
yi
is to
xu
. Also, the cosine similarity
between
xu
and
yi
is smaller than that between
xu
and
yj
. is
means that the inner product kernel of the model is not compatible
with the operations of a spatial tree index relying on Euclidean
distance, or an inverted index relying on cosine similarity.
ird, in terms of its incompatibility with LSH, we note that it
has been established that there cannot exist any LSH family for
maximum inner product search [
24
], while there exist LSH families
for Euclidean distances and cosine similarity respectively.
Proposed Angular Distance Kernel.
To circumvent the lim
itation of the inner product kernel, we propose a new kernel to
express the probability for a triple
tui j
in a way that is insensi
tive to vector magnitudes. A dierent kernel is a nontrivial, even
signicant, change as it requires a dierent learning algorithm.
Our proposed kernel is based on angular distance. Let
θxy
denote
the angular distance between vectors
x
and
y
, evaluated as the
arccos of the inner product between the normalized vectors.
θxy=cos−1(xTy
x . y )(7)
Proposing the angular distance, i.e., the arccos of the cosine
similarity, to formulate the useritem association is a novel and
appropriate design choice for the following reasons.
•
Firstly, since arccos is a monotone function, the closest
point according to the angular distance is the same as the
point with the highest cosine similarity, resulting in its
compatibility with the inverted index structure.
•
Secondly, since angular distances are not aected by magni
tudes, it preserves all the information learnt by the model.
Before indexing, the learnt vectors could be normalized to
unit length for compatibility with indexing that relies on
either Euclidean distance or cosine similarity.
•
Lastly, the angular distance is also compatible to LSH in
dexing. A theoretical analysis and empirical evidence on
this compatibility is provided in Section 5.
While the user
xu
and item
yi
vectors we learn could be of vary
ing lengths, the magnitudes are uninformative as far as the user
preferences encoded by the triples are concerned. is advanta
geously allows greater exibility in parameter learning, while still
controlling the vectors via the regularization terms, as opposed to
constraining vectors to xed length during learning (as in [7]).
We formulate the probability of a triple
tui j
for Indexable BPR as
in Eq. 8. e probability is higher when the dierence
θxuyj−θxuyi
is larger. If
u
prefers
i
to
j
, the angular distance between
xu
and
yi
is expected to be smaller than between xuand yj.
P(tui j xu,yi,yj)=σ(θxuyj−θxuyi)(8)
Generative Process.
e proposed model Indexable BPR as a
whole could be expressed by the following generative process:
(1) For each user u∈ U : Draw xu∼Normal(0,η2I),
(2) For each item i∈ I: Draw yi∼Normal (0,η2I),
(3) For each triple of one user u∈ U and two items i,j∈ I:
•Draw a trial from Bernoulli(P(tu i j xu,yi,yj) ),
•If “success”, generate a triple instance tu i j ,
•Otherwise, generate a triple instance tuj i .
e rst two steps place zeromean multivariate spherical Gauss
ian priors on the user and item latent vectors.
η2
denotes the vari
ance of the Normal distributions; for simplicity we use the same
variance for users and items.
I
denotes the identity matrix. is
acts as regularizers for the vectors, and denes the prior P(X,Y).
P(X,Y)=(2πη2)−D
2Y
u∈U
e−1
2η2 xu 2Y
i∈I
e−1
2η2 yi 2
(9)
Triples in
T
are generated from users and items’ latent vectors
according to the probability P(tui j xu,yi,yj)as dened in Eq. 8.
Parameter Learning.
Maximizing the posterior as outlined in
Eq. 2 is equivalent to maximizing its logarithm, shown below.
L=ln P(T X,Y)+ln P(X,Y)
∝ln P(T X,Y)−1
η2X
u∈U
xu 2−1
η2X
i∈I
yi 2(10)
Let us denote
∆ui j =θxuyj−θxuyi
,
˜
xu=xu
 xu  ∀u∈ U
and
˜
yi=yi
 yi  ∀i∈ I. e gradient of Lw.r.t each user vector xuis:
∂L
∂xu
=X
{i,j:tui j ∈T }
1
xu 2
e−∆ui j
1+e−∆ui j ×
*
.
.
,
−˜
yj.xu +cos (xu,yj).xu
q1−cos(xu,yj)2
−−˜
yi.xu +cos (xu,yi).xu
q1−cos(xu,yi)2
+
/
/

,
in which, cos(xu,yi)=xT
uyi
 xu  . yi  ∀u∈ U and ∀i∈ I.
Algorithm 1 Gradient Ascent for Indexable BPR
Input: Ordinal triples set T={tui j ,∀u∈ U,i,j∈ I} .
1: Initialize xufor u∈ U,yifor i∈ I
2: while not converged do
3: for each u∈ U do
4: xu←xu+ϵ.∂L
∂xu
5: for each i∈ I do
6: yi←yi+ϵ.∂L
∂yi
7: Return {˜
xu=xu
 xu  }u∈ U and {˜
yi=yi
 yi  }i∈ I
e gradient of Lw.r.t each item vector ykis:
∂L
∂yk
=X
{u,j:tuk j ∈T }
1
yk 2
e−∆uk j
1+e−∆uk j
.˜
xu.yk  − cos(xu,yk).yk
q1−cos(xu,yk)2
+X
{u,i:tui k ∈T }
1
yk 2
e−∆ui k
1+e−∆ui k
.−˜
xu.yk  +cos(xu,yk).yk
q1−cos(xu,yk)2
.
Algorithm 1 describes the learning algorithm with full gradient
ascent. It rst initializes the users and items’ latent vectors. In
each iteration, the model parameters are updated based on the
gradients, with a decaying learning rate
ϵ
over time. e output
is the set of normalized user vectors
˜
xu
and item vectors
˜
yi
. On
one hand, this normalization does not aect the accuracy of the
top
k
recommendation produced by Indexable BPR , since the
magnitude of the latent vectors does not aect the ranking. On
the other hand, normalized vectors can be used for approximate
kNN search using various indexing data structures later. e time
complexity of the algorithm is linear to the number of triples in
T
,
i.e., O(U  ×  I 2).
4 EXPERIMENTS ON TOPK
RECOMMENDATION WITH INDEXING
e key idea in this paper is achieving speedup in the retrieval
time of top
k
recommendation via indexing, while still maintaining
high accuracies via beer representations that minimize any loss of
information postindexing. Hence, in the following evaluation, we
are interested in both the accuracy of the top
k
recommendation
returned by the index, and the speedup in retrieval time due to
indexing as compared to exhaustive search.
To showcase the generality of Indexable BPR in accommodat
ing various index structures, we experiment with three indexing
schemes: localitysensitive hashing, spatial tree index, and inverted
index. Note that our focus is on the relative merits of recommenda
tion algorithms, rather than on the relative merits of index struc
tures. It is our objective to investigate the eectiveness of Index
able BPR , as compared to other algorithms, for top
k
recommenda
tion when using these index structures. Yet, it is not our objective
to compare the index structures among themselves.
Comparative Methods.
We compare our proposed Indexable
BPR with the following recommendation algorithm baselines:
•
BPR(MF): the nonindex friendly BPR with inner product
(MF) kernel [
22
]. is would validate whether our angular
distance kernel is more indexfriendly.
Table 1: Datasets
#users #items #ratings #training
ordinal triples
MovieLens 20M 138,493 27,278 20,000,263 5.46 ×108
Netix 480,189 17,770 100,480,507 2.29 ×1010
•
BPR(MF)+: a composite of BPR(MF) and the Euclidean
transformation described in [
2
] to make the item vectors
indexable as postprocessing. is allows validation of our
learning inherently indexable vectors in the rst place.
•
IPMF: matrix factorization that learns xedlength item
vectors but ts rating scores [
7
]. is allows validation of
our modeling of ordinal triples.
•
CFEE: Euclidean embedding that ts rating scores [
10
].
is allows validation of our modeling of ordinal triples.
•
COE: Euclidean embedding that ts ordinal triples [
15
].
Comparison to CFEE and COE allows validation of our
compatibility with nonspatial indices such as some LSH
families as well as inverted index.
We tune the hyperparameters of all models for the best perfor
mance. For IPMF, we adopt the parameters provided by its authors
for Netix dataset. For the ordinalbased algorithms (BPR, COE,
and Indexable BPR ), the learning rate and the regularization are
0.05 and 0.001. For CFEE, they are 0.1 and 0.001. All models use
D=
20 dimensionalities in their latent representations. Similar
trends are observed across other dimensionalities (see Sec. 5).
Datasets.
We experiment on two publicly available ratingbased
datasets and derive ordinal triples accordingly. One is MovieLens
20M
2
, the largest among the MovieLens collection. e other is
Netix
3
. Table 1 shows a summary of these datasets. By default,
MovieLens 20M includes only users with at least 20 ratings. For
consistency, we apply the same to Netix. For each dataset, we ran
domly keep 60% of the ratings for training and hide 40% for testing.
We conduct stratied sampling to maintain the same ratio for each
user. We report the average results over ve training/testing splits.
For training, we generate a triple
tui j
if user
u
has higher rating for
item ithan for j, and triples are formed within the training set.
As earlier mentioned, our focus in this work is on online retrieval
speedup. We nd that the model learning time, which is oine,
is manageable. Our learning times for MovieLens 20M and Netix
are 5.2 and 9.3 hours respectively on a computer with Intel Xeon
E2650v4 2.20GHz CPU and 256GB RAM. Algorithm 1 scales with
the number of triples, which in practice grows slower than its
theoretical complexity of
O(U  × I2)
. Figure 2 shows how the
average number of triples per user grows with the number of items,
showing that the actual growth is closer to linear and lower than
the quadratic curve provided as reference.
Recall.
We assume that the goal of top
k
recommendation is to
recommend new items to a user, among the items not seen in the
training set. When retrieval is based on an index, the evaluation
of top
k
necessarily takes into account the operation of the index.
Because we maintain one index for all items to be used with all
2hp://grouplens.org/datasets/movielens/20m/
3hp://academictorrents.com/details
/9b13183dc4d60676b773c9e2cd6de5e5542cee9a
0 0.5 1 1.5 2 2.5 3
x 1
0
4
0
2
4
6
8x 108
Number of Items
Number of Triples (per user)
quadratic
actual
(a) MovieLens 20M
0 0.5 1 1.5 2
x 1
0
4
0
1
2
3
4x 108
Number of Items
Number of Triples (per user)
quadratic
actual
(b) Netix
Figure 2: Number of triples (per user) vs. number of items.
users, conceivably items returned by a top
k
query may belong to
one of three categories: those in the training set (to be excluded
for new item recommendation), those in the test set (of interest as
these are the known groundtruth of which items users prefer), and
those not seen/rated in either set (for which no groundtruth of
user preference is available). It is important to note the laer may
not necessarily be bad recommendations, they are simply unknown.
Precision of the topkmay penalize such items.
We reason that among the rated items in the test set, those that
have been assigned the maximum rating possible by a user would
be expected to appear in the top
k
recommendation list for that user.
A suitable metric is the recall of items in the test set with maximum
rating. For each user
u
with at least one highest rating item in the
test set (for the two datasets, the highest possible rating value is
5), we compute the percentage of these items that are returned in
the top
k
by the index. e higher the percentage, the beer is the
performance of the model at identifying the items a user prefers
the most. Eq. 11 presents the formula for Recall@k:
Recall@k=1
Umax X
u∈Umax
{i∈ψu
k:rui =max rating} 
{i∈ I :rui =max rating}  ,(11)
in which
Umax
is the set of users who have given at least one item
with rating of 5 and
ψu
k
is the topk returned by the index. We
exclude training items for
u
from both numerator and denomina
tor. We normalize Recall@k with the ideal Recall@k that a perfect
algorithm can achieve, and denote the metric as nRecall@k.
Speedup.
To investigate the ecacy of using the indexing
schemes for topk recommendation, we introduce the second metric
speedup, which is the ratio between the time taken by exhaustive
search to return the topk, to the time taken by an index.
Speedup =Retrieval time taken by exhaustive search
Retrieval time taken by the index .(12)
We will discuss the results in terms of tradeo between recall
and speedup. ere are index parameters that control the degree of
approximation, i.e., higher speedup at the expense of lower recall.
Among the comparative recommendation algorithms, a beer trade
o means higher speedup at the same recall, or higher recall at
the same speedup. For each comparison below, we control for the
indexing scheme, as dierent schemes vary in ways of achieving
approximation, implementations, and deployment scenarios.
4.1 Topk Recommendation with LSH Index
We rst briey review LSH and how it is used for top
k
recom
mendation. Let
h=(h1,h2, . . . ,hb)
be a set of LSH hash functions.
0.00
0.08
0.15
0.23
5101520
nRecall@k
k
b = 8
0.00
0.08
0.15
0.23
5101520
nRecall@k
k
b = 12
0.00
0.08
0.15
0.23
5101520
nRecall@k
k
b = 16
0.000
0.015
0.030
0
.
04
5
5101520
nRecall@k
k
b = 8
0.000
0.015
0.030
0.045
5101520
nRecall@k
k
b = 12
0.000
0.015
0.030
0.045
5101520
nRecall@k
k
b = 16
(a) MovieLens 20M (b) Netflix
Figure 3: nRecall@k with Hash table lookup strategy
(T=10 hash tables).
Each function assigns a bit for each vector.
h
will assign each user
u
a binary code
h(xu)
, and each item
i
a binary hashcode
h(yi)
,
all of length
b
. Assuming that
xu
prefers
yi
to
yj
,
h
is expected
to produce binary hashcodes with a smaller Hamming distance
 h(xu)−h(yi)H
than the Hamming distance
 h(xu)−h(yj)H
.
e most frequent indexing strategy for LSH is hash table lookup.
We store item codes in hash tables, with items having the same code
in the same bucket. Given a query (user) code, we can determine
the corresponding bucket in constant time. We search for the topk
only among items in that bucket, reducing the number of items on
which we need to perform exact similarity computations.
We use the LSH package developed by [
1
]. e LSH family for
Indexable BPR for generating hashcodes is SRPLSH, which is also
used for IPMF following [
7
]. We apply it to BPR(MF) and BPR(MF)+,
as [
25
], [
19
] claim it to be the more suitable family for transformed
vectors. In turn, the LSH scheme for COE and CFEE is L2LSH,
since both use
l2
distance. In Section 5, we will elaborate with
theoretical analysis and empirical evidence how more compatible
representations tend to produce beer results.
When using hash tables, one species the number of tables
T
and the code length
b
. We experiment with various
T
, and
T=
10
returns the best performance (consistent with [
7
]). We also vary
b
and larger bis expected to lead to fewer items in each bucket.
Figure 3(a) shows the nRecall@k using hash table lookup with
T=
10 tables and dierent values of code length
b=
8
,
12
,
16 for
MovieLens20M. Across the
b
’s, the trends are similar. Indexable
BPR has the highest nRecall@k values across all
k
. It outperforms
BPR(MF)+ that conducts vector transformation as postprocessing,
nRecall@10
Speedup (log scale)
(a) MovieLens 20M
nRecall@10
Speedup (log scale)
(b) Netflix
Figure 4: nRecall@10 vs. speedup with Hash table lookup
strategy (T=10 hash tables).
which indicates that learning inherently indexable vectors is helpful.
In turn, BPR(MF)+ outperforms BPR(MF), which indicates that the
inner product kernel is not conducive for indexing. Interestingly,
Indexable BPR also performs beer than models that t ratings
(IPMF, CFEE), suggesting that learning from relative comparisons
may be more suitable for topkrecommendation.
Figure 3(b) shows the results for Netix. Again, Indexable
BPR has the highest nRecall@k values across all
k
. e relative
comparisons among the baselines are as before, except that IPMF
now is more competitive, though still lower than Indexable BPR .
We also investigate the tradeo between the speedup achieved
and the accuracy of the top
k
returned by the index. Fig. 4 shows
the nRecall@10s and the speedup when varying the value of
b
.
Given the same speedup, Indexable BPR can achieve signicantly
higher performance compared to the baselines. As
b
increases, the
speedup increases and nRecall@10 decreases. is is expected, as
the longer the hashcodes, the smaller the set of items on which the
system needs to perform similarity computation. is reects the
tradeo of speedup and approximation quality.
4.2 Topk Recommendation with KDTree
Index
Spatial trees refer to a family of methods that recursively partition
the data space towards a balanced binary search tree, in which each
node encompasses a subset of the data points [
17
]. For algorithms
that model the useritem association by
l2
distance, spatial trees
can be used to index the item vectors. Top
k
recommendation is
thus equivalent to nding kNN to the query. e tree will locate the
0.00
0.08
0.15
0.23
5101520
nRecall@k
k
c = 500
0.00
0.08
0.15
0.23
5101520
nRecall@k
k
c = 1000
0.00
0.08
0.15
0.23
5101520
nRecall@k
k
c = 1500
0.00
0.02
0.04
0.05
5101520
nRecall@k
k
c = 500
0.00
0.02
0.04
0.05
5101520
nRecall@k
k
c = 1000
0.00
0.02
0.04
0.05
5101520
nRecall@k
k
c = 1500
(a) MovieLens 20M (b) Netflix
Figure 5: nRecall@k with KDTree indexing.
nodes that the query belongs to, and exact similarity computation
is performed only on the points indexed by those nodes.
For Indexable BPR , Algorithm 1 returns two sets of normalized
vectors ˜
xu∀u∈ U and ˜
yi∀i∈ I. We observe that:
 ˜
xu−˜
yi <  ˜
xu−˜
yj ⇔ ˜
xT
u˜
yi>˜
xT
u˜
yj⇔θ˜
xu˜
yi<θ˜
xu˜
yj,(13)
i.e., the ranking of items according to
l2
distance on normalized
vectors is compatible to that according to angular distance, implying
Indexable BPR ’s output can support kNN using spatial tree.
In this paper, we consider a wellknown tree structure, KD
tree. Approximate kNN retrieval can be achieved by restricting the
searching time on the tree ([
7
]). e implementation of KDtree in
[
18
] controls this by
c
, the number of nodes to explore on the tree.
Figure 5 shows the nRecall@k with various
c∈ {
500
,
1000
,
1500
}
.
We also experimented with
c∈ {
50
,
150
,
300
,
750
,
2000
}
and get sim
ilar trends. Indexable BPR consistently outperforms the baselines
at all values of
c
. Notably, Indexable BPR outperforms BPR(MF)+,
which in turn outperforms BPR(MF), validating the point made ear
lier about native indexability. Figure 6 plots the accuracy in terms
of nRecall@10 vs. the retrieval eciency in terms of speedup. As we
increase
c
, a longer searching time on KDtree is allowed, resulting
in higher quality of the returned top
k
. Here too, Indexable BPR
achieves higher accuracy at the same speedup, higher speedup at
the same accuracy, as compared to the baselines.
4.3 Topk Recommendation with Inverted
Index
For recommendation retrieval, [
4
] presents an inverted index scheme,
where every user or item is represented with a sparse vector de
rived from their respective dense realvalued latent vectors via a
0.00
0.02
0.04
0.06
0.08
1248163264128
nRecall@10
Speedup (log scale)
(a) MovieLens 20M
0.000
0.005
0.010
0.015
0.020
1248163264128
nRecall@10
Speedup (log scale)
(b) Netflix
Figure 6: nRecall@10 vs. speedup with KDtree indexing.
transformation. Given the user sparse vector as query, the inverted
index will return items with at least one common nonzero element
with the query as candidates. Exact similarity computation will be
performed only on those candidates to nd out the topk.
Here, we describe very briey the indexing scheme. For an ex
tended treatment, please refer to [
4
]. e sparse representations for
users and items are obtained from their dense latent vectors (learnt
by the recommendation algorithm, e.g., Indexable BPR ) through a
set of geometryaware permutation maps
Φ
dened on a tessellated
unit sphere. e tessellating vectors are generated from a base
set
Bd={−
1
,−d−1
d, . . . , −1
d,
0
,1
d, . . . , d−1
d,
1
},
characterized by a
parameter
d
. e obtained sparse vectors have the sparsity paerns
that are related to the angular closeness between the original latent
vectors. e angular closeness between user vector
xu
and item
vector yiis dened as dac (xu,yj)=1−xT
uyj
 xu  . yi  .
In the case of
xu = yi  =
1
∀u∈ U,i∈ I
, we have
(∀i,j∈ I):
dac (xu,yi)<dac (xu,yj)⇔xuTyi
xu . yi
 {z }
θxuyi
>xuTyj
xu . yj
 {z }
θxuyj
(14)
e item ranking according to
dac
is equivalent to that according
to
θ
angular distance. We hypothesize that Indexable BPR based
on angular distance would be compatible with this structure.
e parameter
d
can be managed to control the tradeo between
the eciency and the quality of approximation of kNN retrieval.
Increasing the value of
d
leads to a higher number of discarded
items using the inverted index, which leads to higher speedup of
the topkrecommendation retrieval.
0.00
0.08
0.15
0.23
5101520
nRecall@k
k
d = 150
0.00
0.08
0.15
0.23
5101520
nRecall@k
k
d = 300
0.00
0.08
0.15
0.23
5101520
n
R
eca
ll@k
k
d = 500
0.000
0.013
0.025
0.038
5101520
nRecall@k
k
d = 150
0.000
0.013
0.025
0.038
5101520
nRecall@k
k
d = 300
0.000
0.013
0.025
0.038
5101520
nRecall@k
k
d = 500
(a) MovieLens 20M (b) Netflix
Figure 7: nRecall@k with inverted indexing.
We run the experiments with dierent values of parameter
d
to
explore the tradeo between speed and accuracy. Figure 7 presents
the nRecall@k of the two datasets at
d∈ {
150
,
300
,
500
}
. In all cases,
Indexable BPR outperforms the baselines in terms of nRecall@k.
is suggests that Indexable BPR produces a representation that
has greater degree of compatibility in terms of angular closeness
dac
between users and their preferred items. As a result, the corre
sponding sparse vectors will have highly similar sparsity paerns,
which enhances the quality of kNN using inverted indexing. Figure
8 shows the speedup using the inverted index as we vary the value
of parameter
d
. We observe that the speedup increases as
d
in
creases. Indexable BPR shows superior performance as compared
to other models, given the same speedup.
Overall, Indexable BPR works well on the indexing schemes.
Eectively, we develop a model that work with multiple indices,
and leave the choice of index structure to the respective application
based on need. Our focus is on indexable recommendation algo
rithms. Here, several consistent observations emerge. Indexable
BPR produces representations that are more amenable to indexing,
as compared to baselines BPR(MF)+ and BPR(MF). is validates
the aim of Indexable BPR in learning natively indexable vectors
for users and items. It also outperforms models that t ratings, as
opposed to ordinal triples, for topkrecommendations.
5 ANALYSIS ON LSHFRIENDLINESS OF
INDEXABLE BPR
In an eort to further explain the outperformance by Indexable
BPR when used with LSH, we analyze the compatibility betweeen
recommendation algorithms and hashing functions. Since LSH is
inherently an approximate method, the loss of information caused
d = 50
d = 750
d = 50 d = 500 d = 750
d = 300
d = 750
d = 50 d = 100 d = 200
d = 750
d = 50
d = 100
d = 150
d = 200
d = 300 d = 500 d = 750
Speedup (log scale)
(b) Netflix
d = 50
d = 100
d = 500 d = 750
d = 50 d = 100 d = 150 d = 300
d = 750
d = 50 d = 100 d = 200 d = 500 d = 750
d = 50
d = 100
d = 150
d = 200
d = 300
d = 500
d = 750
Speedup (log scale)
(a) MovieLens 20M
Figure 8: nRecall@10 vs. speedup with inverted indexing.
by random hash functions is inevitable. Informally, a representation
is LSHfriendly if the loss aer hashing is as minimal as possible.
To achieve such small loss, a user’s ranking of items based on the
latent vectors should be preserved by the hashcodes.
Analysis.
For
xu,yi,yj
in
RD
, one can estimate the probability
of the corresponding hashcodes to preserve the correct ordering
between them. Let us consider the probability of Hamming distance
Pr( h(xu)−h(yi) H)
. Since the hash functions
h1,h2, . . . ,hb
are
independent of one another,
 h(xu)−h(yi)H
follows the bino
mial distribution with mean
bpxuyi
and variance
bpxuyi(
1
−pxuyi)
,
where
pxuyi
is the probability of
xu
and
yi
having dierent hash
values (this probability depends on the specic family of hash func
tions). Since binomial distribution can be approximated by a normal
distribution with same mean and variance, and the dierence be
tween two normal distributions is another normal distribution, we
have:
Pr( h(xu)−h(yj) H−  h(xu)−h(yi) H>0)(15)
∼Normal(bpxuyj−bp xuyi,bpxuyj(1−pxuyj)+bpxuyi(1−pxuyi))
Due to the shape of the normal distribution, Eq. 15 implies that a
higher mean and smaller variance would lead to a higher probability
of the hashcode of
xu
is more similar to the hashcode of
yi
than to
the that of
yj
. erefore, for a xed length
b
, if indeed
u
prefers
i
to
j
, we say that
xu,yi,yj
is a more LSHfriendly representation
for
u
,
i
, and
j
if the mean value
(pxuyj−pxuyi)
is higher and the
variance (pxuyj(1−pxuyj)+pxuyi(1−pxuyi)) is smaller.
Hence, the mean and the variance in Eq. 15 could potentially
reveal which representation is more LSHfriendly, i.e., preserves
information beer aer hashing. For each user
u∈ U
, let
τu
k
be
the set of items in the top
k
by a method before hashing, and
¯
τu
k
be
CFEE COE IPMF BPR(MF) BPR(MF)+ Indexable
BPR
MeanNorm@10 0.137 0.188 0.065 0.017 0.023 0.219
VarNorm@10 0.726 0.576 0.484 0.171 0.138 0.428
0.00
0.05
0.10
0.15
0.20
0.25
MeanNorm@10
MovieLens 20M
CFEE COE IPMF BPR(MF) BPR(MF)+ Indexable
BPR
MeanNorm@10 0.163 0.080 0.072 0.018 0.025 0.247
VarNorm@10 0.699 0.755 0.480 0.192 0.146 0.424
0.00
0.05
0.10
0.15
0.20
0.25
MeanNorm@10
Netflix
Figure 9: LSH friendly measurement at D=20.
all the other items not returned by the models. We are interested in
whether aer hashing, the items in
τu
k
would be closer to the user
than the items in
¯
τu
k
. To account for this potential, we introduce
two measures: MeanNorm@k and VarNorm@k.
MeanNorm@k =1
U  X
i∈τu
k
X
j∈¯
τu
k
(pxuyj−pxuyi)
τu
k.¯
τu
k
VarNorm@k =1
U  X
i∈τu
k
X
j∈¯
τu
k
pxuyj(1−pxuyj)+pxuyi(1−pxuyi)
τu
k.¯
τu
k
To achieve LSHfriendly representation, MeanNorm@k should
be high and VarNorm@k should be low. Fig. 9 shows the bar charts
displaying values of those metrics. From Fig. 9, Indexable BPR
shows higher mean values MeanNorm@10 (i.e.,
k=
10) at
D=
20
(we observe the same results with other values of
D
and
k
). ough
BPR(MF) and BPR(MF)+ have smaller variance, their mean values
are among the lowest. is result gives us a hint that Indexable
BPR can preserve information aer hashing more eectively.
Compatible Hash Function.
ere is an explanation for the
superior numbers of Indexable BPR in Fig. 9. Specically, the
probability
pxuyi
depends on the LSH family. In particular, signed
random projections [
5
,
9
] or SRPLSH is meant for angular similarity.
e angular similarity between
x,y
is dened as
sim∠(x,y)=
1
−
cos−1(xTy
 x  . y  )/π
. e parameter
a
is a random vector chosen
with each component from i.i.d normal. e hash function is dened
as
hsrp
a(x)=sign(aTx)
and the probability of
x,y
having dierent
hash values is:
pxy=Pr(hsrp
a(x),hsrp
a(y)) =cos−1(xTy
x . y )/π=
θxy
π,(16)
For Indexable BPR , as shown in Eq. 8, for each observation “
u
prefers
i
to
j
”, we would like to maximize the dierence
θxuyj−θxuyi
.
From Eq. 16, we observe that the probability
pxuyi
is a linear func
tion of the angular distance
θxuyi
. us, we can infer that Index
able BPR ’s objective corresponds to maximizing
pxuyj−pxuyi
.
0.50
0.60
0.70
0.80
525456585
n
DCG@10
D
MovieLens 20M
0.50
0.60
0.70
0.80
525456585
nDCG@10
D
Net
f
l
i
x
Figure 10: nDCG@10 at D∈ {5,10,20,30,50,75,100}.
According to Eq. 15, this increases the probability that the Ham
ming distance between
u
and
i
is smaller than that between
u
and
j
. In other words, the hashcodes are likely to preserve the ranking
order. is alignment between the objective of Indexable BPR and
the structural property of SRPLSH implies that Indexable BPR is
more LSHfriendly, which helps the model minimize information
loss, and show beer postindexing performance.
Also, the appropriate LSH family for methods based on
l2
dis
tance, which includes COE, is L2LSH [
6
]. However, there is a
question as to how compatible the objective of COE is with the
hash functions. e hash function of L2LSH is dened as follows:
hL2
a,b(x)=baTx+b
rc; (17)
where
r
 the window size,
a
 random vector with each component
from i.i.d normal and a scalar
b∼Uni(
0
,r)
. e probability of two
points x,yhaving dierent hash values under L2LSH function is:
FL2
r(dxy)=Pr(hL2
a,b(x),hL2
a,b(y))
=2ϕ(−r
dxy
)+1
p(2π)(r/dxy)(1−exp(−(r
dxy
)2/2)); (18)
where
ϕ(x)
is cumulative probability function of normal distribu
tion and
dxy=x−y
is the
l2
distance between
x,y
. From Eq. 18,
we see that
FL2
r(dxy)
is a nonlinear monotonically increasing func
tion of
dxy
. COE’s objective to maximize
dxuyj−dxuyi
does not
directly maximize the corresponding mean value of the normal dis
tribution (see Eq.15), i.e.,
FL2
r(dxuyj)−FL2
r(dxuyi)
, since
FL2
r(dxuyj)
is not a linear function of
l2
distance
dxuyj
. Our hypothesis is that
though both rely on ordinal triples, COE may not be as compatible
with LSH as Indexable BPR .
Empirical Evidence.
For each user
u
, we rank the items that
u
has rated in the test set, and measure how closely the ranked list
is to the ordering by groundtruth ratings. As metric, we turn to
the wellestablished metric for ranking nDCG@k, where
k
is the
cuto point for the ranked list. Its denition can be found in [
26
].
Fig. 10 shows the nDCG@10 values for MovieLens 20M and Net
ix respectively at various dimensionality of the latent vectors
D
. We observe that, Indexable BPR is among the best, with the
most competitive baseline being IPMF (which ts ratings). More
important is whether the models will still perform well when used
with index structures. As similar trends are observed with other
values of D, subsequently we show results based on D=20.
Here, the objective is to investigate the eectiveness of the LSH
hashcodes in preserving the ranking among the rated items in the
test set. We use Hamming ranking, repeating the same experiment
Table 2: Absolute nDCG@10 and Relative nDCG@10 of all models as the length of LSH codes (b) varies.
MovieLens 20M Netix
Absolute nDCG@10 Relative nDCG@10 Absolute nDCG@10 Relative nDCG@10
b 8 12 16 8 12 16 8 12 16 8 12 16
CFEE 0.582 0.582 0.585 0.805 0.806 0.809 0.559 0.561 0.562 0.834 0.836 0.838
COE 0.605 0.609 0.608 0.886 0.891 0.890 0.570 0.565 0.575 0.906 0.898 0.914
IPMF 0.702 0.728 0.704 0.920 0.955 0.923 0.705 0.737 0.747 0.896 0.936 0.949
BPR(MF) 0.599 0.603 0.605 0.831 0.837 0.840 0.560 0.551 0.553 0.863 0.849 0.853
BPR(MF)+ 0.603 0.604 0.606 0.837 0.840 0.841 0.569 0.569 0.566 0.877 0.877 0.873
Indexable BPR 0.743 0.745 0.754 0.977 0.980 0.991 0.732 0.761 0.756 0.924 0.960 0.954
in Fig.10, but using Hamming distances over hashcodes. is is to
investigate how well Indexable BPR preserves the ranking com
pared to the baselines. As hashing relies on random hash functions,
we average results over 10 dierent sets of functions.
Table 2 shows the performances of all models. e two met
rics are: Absolute nDCG@10 is the nDCG@10 of LSH hashcodes,
and Relative nDCG@10 is the relative ratio between the Absolute
nDCG@10 and that of original realvalued latent vectors. Index
able BPR consistently shows beer Absolute nDCG@10 values
than the baselines when using LSH indexing. is implies that
Indexable BPR coupled with SRPLSH produces more compact
and informative hashcodes. Also, the Relative nDCG@10 of In
dexable BPR are close to 1 and higher than those of the baselines.
ese observations validate our hypotheses that not only is In
dexable BPR competitively eective preindexing, but it is also
more LSHfriendly, resulting in less loss in the ranking accuracy
postindexing.
6 CONCLUSION
We propose a probabilistic method for modeling user preferences
based on ordinal triples, which is geared towards top
k
recommen
dation via approximate kNN search using indexing. e proposed
model Indexable BPR produces an indexingfriendly representa
tion, which results in signicant speedups in top
k
retrieval, while
still maintaining high accuracy due to its compatibility with index
ing structures such as LSH, spatial tree, and inverted index. As
future work, a potential direction is to go beyond achieving rep
resentations more compatible with existing indexing schemes, to
designing novel data structures or indexing schemes that would
beer support ecient and accurate recommendation retrieval.
ACKNOWLEDGMENTS
is research is supported by the National Research Foundation,
Prime Minister’s Oce, Singapore under its NRF Fellowship Pro
gramme (Award No. NRFNRFF201607).
REFERENCES
[1]
Mohamed Aly, Mario Munich, and Pietro Perona. 2011. Indexing in large scale
image collections: Scaling properties and benchmark. In IEEE Workshop on
Applications of Computer Vision (WACV). 418–425.
[2]
Yoram Bachrach, Yehuda Finkelstein, Ran GiladBachrach, Liran Katzir, Noam
Koenigstein, Nir Nice, and Ulrich Paquet. 2014. Speeding up the xbox recom
mender system using a euclidean transformation for innerproduct spaces. In
RecSys. ACM, 257–264.
[3]
Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative
searching. Commun. ACM 18, 9 (1975), 509–517.
[4]
Avradeep Bhowmik, Nathan Liu, Erheng Zhong, Badri Narayan Bhaskar, and
Suju Rajan. 2016. Geometry Aware Mappings for High Dimensional Sparse
Factors. In AISTATS.
[5]
Moses S Charikar. 2002. Similarity estimation techniques from rounding algo
rithms. In STOC. ACM, 380–388.
[6]
Mayur Datar, Nicole Immorlica, Piotr Indyk, and VahabS Mirrokni. 2004. Locality
sensitive hashing scheme based on pstable distributions. In SCOG. ACM, 253–
262.
[7]
Marco Fraccaro, Ulrich Paquet, and Ole Winther. 2016. Indexable Probabilistic
Matrix Factorization for Maximum Inner Product Search. In AAAI.
[8]
Ruining He and Julian McAuley. 2016. VBPR: Visual Bayesian Personalized
Ranking from Implicit Feedback. In AAAI.
[9]
Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, and Qi Tian. 2012. Superbit
localitysensitive hashing. In NIPS. 108–116.
[10]
Mohammad Khoshneshin and W Nick Street. 2010. Collaborative ltering via
euclidean embedding. In RecSys. ACM, 87–94.
[11]
Noam Koenigstein, Parikshit Ram, and Yuval Shavi. 2012. Ecient retrieval of
recommendations in a matrix factorization framework. In CIKM. ACM, 535–544.
[12]
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech
niques for recommender systems. Computer 42, 8 (2009).
[13]
Artus KrohnGrimberghe, Lucas Drumond, Christoph Freudenthaler, and Lars
Schmidtieme. 2012. Multirelational matrix factorization using bayesian
personalized ranking for social network data. In WSDM. 173–182.
[14]
J. B. Kruskal. 1964. Multidimensional scaling by optimizing goodness of t to a
nonmetric hypothesis. Psychometrika 29, 1 (1964).
[15]
Dung D Le and Hady W Lauw. 2016. Euclidean CoEmbedding of Ordinal Data
for MultiType Visualization. In SDM. SIAM, 396–404.
[16]
Lukas Lerche and Dietmar Jannach. 2014. Using graded implicit feedback for
bayesian personalized ranking. In RecSys. 353–356.
[17]
Brian McFee and Gert R. G. Lanckriet. 2011. Largescale music similarity search
with spatial trees. In ISMIR.
[18]
Marius Muja and David G. Lowe. 2009. Fast Approximate Nearest Neighbors with
Automatic Algorithm Conguration. In International Conference on Computer
Vision eory and Application VISSAPP’09). INSTICC Press, 331–340.
[19]
Behnam Neyshabur and Nathan Srebro. 2015. On Symmetric and Asymmetric
LSHs for Inner Product Search. In ICML.
[20]
Weike Pan and Li Chen. 2013. GBPR: Group Preference Based Bayesian Personal
ized Ranking for OneClass Collaborative Filtering.. In IJCAI, Vol. 13. 2691–2697.
[21]
Parikshit Ram and Alexander G Gray. 2012. Maximum innerproduct search
using cone trees. In Proceedings of the 18th ACM SIGKDD international conference
on Knowledge discovery and data mining. ACM, 931–939.
[22]
Steen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt
ieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In
UAI. AUAI Press, 452–461.
[23]
Ruslan Salakhutdinov and Andriy Mnih. 2008. Bayesian probabilistic matrix
factorization using Markov chain Monte Carlo. In ICML. ACM, 880–887.
[24]
Anshumali Shrivastava and Ping Li. 2014. Asymmetric LSH (ALSH) for sublinear
time maximum inner product search (MIPS). In Advances in Neural Information
Processing Systems. 2321–2329.
[25]
Anshumali Shrivastava and Ping Li. 2015. Improved Asymmetric Locality Sensi
tive Hashing (ALSH) for Maximum Inner Product Search (MIPS). In UAI.
[26]
Markus Weimer, Alexandros Karatzoglou, oc V. Le, and Alexander J. Smola.
2007. COFI RANK  Maximum Margin Matrix Factorization for Collaborative
Ranking. In NIPS.
[27]
Hanwang Zhang, Fumin Shen, Wei Liu, Xiangnan He, Huanbo Luan, and Tat
Seng Chua. 2016. Discrete collaborative ltering. In Proc. of SIGIR, Vol. 16.
[28]
Zhiwei Zhang, Qifan Wang, Lingyun Ruan, and Luo Si. 2014. Preference preserv
ing hashing for ecient recommendation. In SIGIR. ACM, 183–192.
[29]
Ke Zhou and Hongyuan Zha. 2012. Learning binary codes for collaborative
ltering. In KDD. ACM, 498–506.