Conference PaperPDF Available

How to Combine Visual Features with Tags to Improve Movie Recommendation Accuracy?

Authors:

Abstract and Figures

Previous works have shown the effectiveness of using stylistic visual features, indicative of the movie style, in content-based movie recommendation. However, they have mainly focused on a particular recommendation scenario, i.e. , when a new movie is added to the catalogue and no information is available for that movie (New Item scenario). However , the stylistic visual features can be also used when other sources of information is available (Existing Item scenario). In this work, we address the second scenario and propose a hybrid technique that exploits not only the typical content available for the movies (e.g., tags), but also the stylistic visual content extracted form the movie files and fuse them by applying a fusion method called Canonical Correlation Analysis (CCA). Our experiments on a large catalogue of 13K movies have shown very promising results which indicates a considerable improvement of the recommendation quality by using a proper fusion of the stylistic visual features with other type of features.
Content may be subject to copyright.
How to Combine Visual Features with Tags to
Improve Movie Recommendation Accuracy?
Yashar Deldjoo, Mehdi Elahi, Paolo Cremonesi, Farshad Bakhshandegan
Moghaddam and Andrea Luigi Edoardo Caielli
Politecnico di Milano, Milan, Italy
{yashar.deldjoo,mehdi.elahi,paolo.cremonesi}@polimi.it
moghaddam@okit.de,andrea.caielli@mail.polimi.it
http://www.polimi.it
Abstract. Previous works have shown the effectiveness of using stylis-
tic visual features, indicative of the movie style, in content-based movie
recommendation. However, they have mainly focused on a particular rec-
ommendation scenario, i.e. , when a new movie is added to the catalogue
and no information is available for that movie (New Item scenario). How-
ever, the stylistic visual features can be also used when other sources of
information is available (Existing Item scenario).
In this work, we address the second scenario and propose a hybrid tech-
nique that exploits not only the typical content available for the movies
(e.g., tags), but also the stylistic visual content extracted form the movie
files and fuse them by applying a fusion method called Canonical Cor-
relation Analysis (CCA). Our experiments on a large catalogue of 13K
movies have shown very promising results which indicates a considerable
improvement of the recommendation quality by using a proper fusion of
the stylistic visual features with other type of features.
1 Introduction
Classical approaches to multimedia recommendation are of unimodal nature [35,
19, 7, 14]. Recommendations are typically generated based on two different types
of item features (or attributes): metadata containing High-Level (or semantic)
information and media entailing Low-Level (or stylistic) aspects.
The high-level features can be collected both from structured sources, such
as databases, lexicons and ontologies, and from unstructured sources, such as
reviews, news articles, item descriptions and social tags [6, 27, 29, 10, 28, 6,
11, 1, 2]. The low-level features, on the other hand, can be extracted directly
from the media itself. For example, in music recommendation many acoustic
features, e.g. rhythm and timbre, can be extracted and used to find perceptual
similar tracks [3, 4].
In this paper, we extend our previous works on movie recommendation [15,
14, 13, 16, 12], where a set of low-level visual features were used to mainly address
the new item cold start scenario [17, 34, 18]. In such a scenario, no information
is available about the new coming movies (e.g. user-generated movies), and the
2 Authors Suppressed Due to Excessive Length
low-level visual features are used to recommend those new movies. While this
is an effective way of solving the new item problem, visual features can be also
used when the other sources of information is provided (e.g., tags added by users)
to the movies. Accordingly, a fusion method can be used in order to combine
two types of features, i.e. low-level stylistic features defined in our previous
works [15, 14] with user-generated tags into a joint representation in order to
improve the quality of recommendation. Hence, we can formulate the research
hypothesis as follows: Combining the low-level visual features (extracted from
movies) with tag features by a proper fusion method, can lead to more accurate
recommendations, in comparison to recommendations based on these features
when used in isolation.
More particularly, we propose a multimodal fusion paradigm which is aimed
to build a content model that exploits low-level correlation between visual-
metadata modalities 1. The method is based on Canonical Correlation Analysis
(CCA) which belongs to a wider family of multimodal subspace learning meth-
ods known as correlation matching [23, 30]. Unlike very few available multimodal
video recommender systems todate [36, 26] which treat the fusion problem as
a basic linear modeling problem without studying the underlying spanned fea-
ture spaces, the proposed method learns the correlation between modalities and
maximize the pairwise correlation.
The main contributions of this work are listed below:
we propose a novel technique that combines a set of automatically extracted
stylistic visual features with other source of information in order to improve
the quality of recommendation.
we employ a data fusion method called Canonical Correlation Analysis (CCA)
[20, 21] that unlike traditional fusion methods, which do not exploit the re-
lationship between two set of features coming from two different sources,
achieves this by maximizing the pairwise correlation between two sets.
we evaluate our proposed technique with a large dataset with more than 13K
movies that has been thoroughly analyzed in order to extract the stylistic
visual features2.
The rest of the paper is organized as follows. Section 2 briefly reviews the re-
lated work. Section 3 discusses the proposed method by presenting a description
of the visual features and introducing a mathematical model for the recommen-
dation problem and the proposed fusion method. Section 4 presents the evalu-
ation methodology. Results and discussions are presented in Section 5. Finally,
in Section 6 we present the conclusion and future work.
1Note that though textual in nature, we treat metadata as a separate modality which
is added to a video by a community-user (tag) or an expert (genre). Refer to Table
1 for further illustration.
2the dataset is called Mise-en-scene Dataset and it is publicly available through the
following link: http://recsys.deib.polimi.it
Title Suppressed Due to Excessive Length 3
2 Related work
Up to the present, the exploitation of low-level features have been marginally
explored in the community of recommender systems. This is while such fea-
tures have been extensively studied in other fields such as computer vision and
content-based video retrieval [32, 24]. Although for different objectives, these
communities share with the community of recommender systems, the research
problems of defining the “best” representation of video content and of classify-
ing videos according to features of different nature. Hence they offer results and
insights that are of interest also in the movie recommender systems context.
The works presented in [24, 5] provide comprehensive surveys on the relevant
state of the art related to video content analysis and classification, and discuss a
large body of low-level features (visual, auditory or textual) that can be consid-
ered for these purposes. In [32] Rasheed et al. proposes a practical movie genre
classification scheme based on computable visual cues. [31] discusses a similar
approach by considering also the audio features. Finally, in [37] Zhou et al. pro-
poses a framework for automatic classification, using a temporally-structured
features, based on the intermediate level of scene representation.
While the scenario of using the low-level features has been interesting for the
goal of video retrieval, this paper addresses a different scenario, i.e., when the
the low-level features features are used in a recommender system to effectively
generate relevant recommendations for users.
3 Method Descriptions
In this section, we present the proposed method.
3.1 Visual Features
Multimedia content in a video can be classified into three hierarchical levels:
Level 1 “High-level (semantic) features”: At this level, we have semantic
features that deal with the concepts and events happening in a video. For
example, the plot of the movie “The Good, the Bad and the Ugly”, which
revolves around three gunslingers competing to find a buried cache of gold
during the American Civil War.
Level 2 “Mid-level (syntactic) features”: At the intermediate level we
have syntactic features that deal with what objects exist in a video and their
interactions. As an example, in the same movie there are Clint Eastwood, Lee
Van Cleef, Eli Wallach, plus several horses and guns.
Level 3 “Low-level (stylistic) features”: At the lowest level we have stylis-
tic features which define the Mise-en-Scene characteristics of the movie, i.e.,
the design aspects that characterize aesthetic and style of a movie. As an
example, in the same movie predominant colors are yellow and brown, and
camera shots use extreme close-up on actors’ eyes.
4 Authors Suppressed Due to Excessive Length
The examples above are presented for the visual modality as it forms the focus of
our recent works [14, 15, 13]. In order to allow a fair comparison between different
modalities, we present the hierarchical comparison of multimedia content across
different modalities [7] in Table 1. Note that while the visual, aural and textual
modalities are elements of the multimedia data itself, metadata is added to the
movie after production.
Table 1: Hierarchical and modality-wise classification of multimedia features
Level Visual Aural Text Metadata
High events,
concepts
events,
concepts
semantic
similarity summary, tag
Mid objects/people,
objects’ interaction
objects/people,
source
sentences,
keywords genre, tag, cast
Low motion, color,
shape, lighting
timbre, pitch
spectral frequency
nouns, verbs,
adjectives genre, tag
Recommender systems in the movie domain typically use high-level or mid-
level features such as genre or tag which appears in the form of metadata [25, 35,
19]. These feature usually cover a wide range in the hierarchical classification of
content, for example tags most often contain words about events and incidents
(high-level), people and places (mid-level) while visual features extracted and
studied in our previous works (presented in Table 2) cover the low-level aspects.
By properly combining the high-level metadata and low-level visual features we
aim to maximize the informativeness of the joint feature representation.
3.2 Multimedia recommendation Problem
A multimedia document D(e.g. a video) can be represented with the quadruple
D= (dV, dA, dT, dM)
in which dV,dA,dTare the visual,aural and textual documents constituting a
multimedia document and dMis the metadata added to the multimedia docu-
ment by a human (e.g. tag, genre or year of production). In a similar manner,
a user’s profile Ucan be projected over each of the above modalities and be
symbolically represented as
U= (uV, uA, uT, uM)
The multimedia components are represented as vectors in features spaces R|V|,
R|A|,R|T|and R|M|. For instance, fV={f1, ..., f|V|}TR|V|is the feature
Title Suppressed Due to Excessive Length 5
Table 2: The list of low-level stylistic features representative of movie style pre-
sented in our previous works [14, 16]
Features Equation Description
Camera Motion Lsh =nf
nsh Camera shot is used as the representative measure of
camera movement. A shot is a single camera action.
The Average shot length Lsh and number of shots nsh
are used as two distinctive features.
Color Variance ρ=
σ2
Lσ2
Lu σ2
Lv
σ2
Lu σ2
uσ2
uv
σ2
Lv σ2
uv σ2
v
For each keyframe in the Luv colorspace the covariance
matrix ρis computed where σL, σu, σv, σLu, σLv , σuv
are the standard deviation over three channels L, u, v
and their mutual covariance. Σ=det(ρ) is the mea-
sure for color variance. The mean and std of Σover
keyframes are used as the representative features of
color variance.
Object Motion OI(x, t).v+It(x, t) = 0 Ob ject motion is calculated based on optical flow esti-
mation. which provides a robust estimate of the object
motion in video frames based on pixel velocities. The
mean and std of of pixels motion is calculated on each
frames and averaged across all video as the two repre-
sentative features for object motion.
Lighting Key ξ=µ.σ After transforming pixel to HSV colorspace the mean
µand std σof the value component which corresponds
to the brightness is computed. ξwhich is the mul-
tiplication of two is computed and averaged across
keyframes as the measure of average lighting key in
a video.
vector representing the visual component. The relevancy between the target user
profile Uand the item profile Dis of interest for recommenders and is denoted
with R(U, D). In this work, we will focus our attention to the visual features
defined in Table 2 and the rich metadata (tag).
Given a user profile Ueither directly provided by her (direct profile) or
evaluated by the system (indirect profile) and a database of videos D=
{D1, D2, ..., D|D|}, the task of video recommendation is to seek the video Di
that satisfies
D
i= arg max
Di∈ D
R(U, Di) (1)
where R(U, D) can be computed as
R(U, Di) = R(F(uV, uM), F (dV, dM)) (2)
where Fis a function whose role is to combine different modalities into a joint
representation. This function is known by inter-modal fusion function in multi-
media information retrieval (MMIR). It belongs to the family of fusion methods
6 Authors Suppressed Due to Excessive Length
known as early fusion methods which integrates unimodal features before pass-
ing them to a recommender. The effectiveness of early fusion methods has been
proven in couple of multimedia retrieval papers [30, 33].
3.3 Fusion Method
The fusion method aims to combine information obtained from two sources
of features: (1)LL Features: stylistic visual features extracted by our system
and (2)HL features: the tag features. We employ a novel fusion method known
as Canonical Correlation Analysis (CCA) [30, 20, 21] for fusing two sources of
features. CCA is popular method in multi-data processing and is mainly used to
analyse the relationships between two sets of features originated from different
sources of information.
Given two set of features XRp×nand YRq×n, where pand qare the
dimension of features extracted from the nitems, let Sxx Rp×pand Syy Rq×q
be the between-set and Sxy Rp×qbe the within-set covariance matrix. Also
let us define SR(p+q)×(p+q)to be the overall covariance matrix - a complete
matrix which contains information about association between pairs of features-
represented as following
S=cov(x) cov(x, y)
cov(y, x) cov(y)=Sxx Sxy
Syx Syy (3)
then CCA aims to find a linear transformation X=WT
x.X and Y=WT
y.Y
that maximizes the pair-wise correlation across two feature set as given by eq. 4.
The latter will ensure the relationship between two set of features will follow a
consistent pattern. This would lead to creation of discriminative and informative
fused feature vector
arg max
Wx,Wy
corr(X, Y ) = cov(X, Y )
var(X).var(Y)(4)
where cov(X, Y ) = WT
xSxyWyand for variances we have that var(X) =
WT
xSxxWxand var(Y) = WT
ySyy Wy. We adopt the maximization procedure
described in [20, 21] and solving the eigenvalue equation
S1
xx SxyS1
yy Syx ˆ
Wx=Λ2ˆ
Wx
S1
yy Syx S1
xx Sxy ˆ
Wy=Λ2ˆ
Wy
(5)
where Wx, WyRp×dare the eigenvectors and Λ2is the diagonal matrix of
eigenvalues or squares of the canonical correlations. Finally, d=rank(Sxy )
min(n, p, q) is the number of non-zero eigenvalues in each equation. After cal-
culating X, Y Rd×n, feature-level fusion can be performed in two manners:
(1)concatenation (2)summation of the transformed features:
Zccat =X
Y=WT
x.X
WT
y.Y (6)
Title Suppressed Due to Excessive Length 7
user
Feedback
User Profile
Learner
Static
Features
Dynamic
Features
Shot
Detection
KeyFrame
Extraction
Videos
Recommended
User
Profiling
Video
tags
Fusion
based on CCA
Recommendatio n
Content
Modeling
Fig. 1: Illustration of the proposed video recommender system based on stylistic
low-level visual feature and user-generated tag using a fusion method based on
CCA
and
Zsum =X+Y=WT
x.X +WT
y.Y (7)
Figure 1 illustrates the building block of the developed video recommender
system. Color variance and lighting key are the extracted static features and
camera and object motion are the dynamic features.
3.4 Recommendation algorithm
To generate recommendations, we adopted a classical “k-nearest neighbor”
content-based algorithm. Given a set of users Uand a catalogue of items I,
a set of preference scores rui has been collected. Moreover, each item iIis
associated to its feature vector fi. For each couple of items iand j, a similarity
score sij is computed using cosine similarity as follows
sij =fiTfj
kfik kfjk(8)
For each item ithe set of its nearest neighbors NNiis built, |NNi|< K.
Then, for each user uU, the predicted preference score ˆrui for an unseen item
iis computed as follows
8 Authors Suppressed Due to Excessive Length
ˆrui =PjN Ni,ruj >0ruj sij
PjNNi,ruj >0sij
(9)
4 Evaluation Methodology
4.1 Dataset
We have used the latest version of Movielens dataset [22] which contains
22’884’377 ratings and 586’994 tags provided by 247’753 users to 34’208 movies
(sparsity 99.72%). For each movie in Movielens dataset, the title has been auto-
matically queried in YouTube to search for the trailer. If the trailer is available,
it has been downloaded. We have found the trailers for 13’373 movies.
Low-level features have been automatically extracted from trailers. We have
used trailers and not full videos in order to have a scalable recommender system.
Previous works have shown that low-level features extracted from trailers of
movies are equivalent to the low-level features extracted from full-length videos,
both in terms of feature vectors and quality of recommendations [14].
We have used Latent Semantic Analysis (LSA) to better exploit the implicit
structure in the association between tags and items. The technique consists in
decomposing the tag-item matrix into a set of orthogonal factors whose linear
combination approximates the original matrix [8].
4.2 Methodology
We have evaluated the Top-Nrecommendation quality by adopting a procedure
similar to the one described in [9].
– We split the dataset into two random subsets. One of the subsets contains
80% of the ratings and it is used for training the system (train set) and the
other one contains 20% of the rating and it is used for evaluation (test set).
For each relevant item irated by user uin test set, we form a list containing
the item iand all the items not rated by the user u, which we assume to be
irrelevant to her. Then, we formed a top-Nrecommendation list by picking
the top Nranked items from the list. Being rthe rank of i, we have a hit if
r < N , otherwise we have a miss. Hence, if a user uhas Nurelevant items,
the precision and recall in its recommendation list of size Nis computed.
We measure the quality of the recommendation in terms of recall, precision
and mean average precision (MAP) for different cutoff values N= 5,10,20.
5 Results
Table 3 represents the results that we have obtained from the conducted exper-
iments. As it can be seen, both methods for fusion of LL visual features with
Title Suppressed Due to Excessive Length 9
TagLSA features, outperform either of using the TagLSA and LL visual features,
with respect to the all considered evaluation metrics.
In terms of recall, fusion of LL visual with TagLSA, based on concatenation
of these features (ccat), obtains the score of 0.0115, 0.0166, and 0.0209 for recom-
mendation at 5, 10, and 20, respectively. The alternative fusion method based on
summation (sum), also scores better than the other baselines, i.e. , LL visual
features and TagLSA, with the recall values of 0.0055, 0.0085, and 0.0112 for
different recommendation sizes (cutoff values). These values are 0.0038, 0.0046,
and 0.0053 for recommendation by using LL visual features and 0.0028, 0.0049,
and 0.0068 for recommendation by using TagLSA. These scores indicate that
recommendation based on fusion of LL visual features and TagLSA features is
considerably better than recommendation based on these content features indi-
vidually.
In terms of precision, again, the best results are obtained by fusion of LL
visual features with TagLSA features based on concatenation with scores of
0.0140, 0.0115, and 0.0079 for recommendation at 5, 10, and 20, respectively.
The alternative fusion method (sum) obtains precision scores of 0.0081, 0.0069,
and 0.0048 which is better than the other two individual baselines. Indeed, LL
visual features archives precision scores of 0.0051, 0.0037, and 0.0023, while the
TagLSA achieves precision scores of 0.0045, 0.0041, and 0.0031. These results
also indicates the superior quality of the recommendation based on fusion of LL
visual features and TagLSA in comparison to recommendation each of these set
of features.
Similar results have been obtained in terms of MAP metric. Hence, fusion
method based on concatenation (ccat) performs the best in comparison to the
other baselines, by obtaining the MAP scores of 0.0091, 0.0080, and 0.0076 for
recommendation at 5, 10, and 20. The MAP scores are 0.0045, 0.0038, 0.0035
for fusion of features based on summation (sum), 0.0035, 0.0028, 0.0026 for LL
visual features, and 0.0025, 0.0021, 0.0019 for TagLSA. Accordingly, the fusion
of the LL visual features and TagLSA presents excellent performance in terms
of MAP metric.
Overall, the results validates our hypothesis and shows that combining the
Low-Level visual features (LL visual) extracted from movies with tag content,
by adopting a proper fusion method, can lead to significant improvement on the
quality of recommendations. This is an promising outcome and shows the great
potential of exploiting LL visual features together with other sources of content
information such as tags in generation of relevant personalised recommendation
in multimedia domain.
6 Conclusion and Future Work
In this paper, we propose the fusion of visual features extracted from the movie
files with other types of content (i.e., tags), in order to improve the quality of
the recommendation. In the previous works, the visual features are used mainly
to solve cold start problem, i.e. , when a new movie is added to the catalogue
10 Authors Suppressed Due to Excessive Length
Table 3: Quality of recommendation w.r.t Recall, Precision and MAP when using
low-level visual features and high-level metadata features in isolation compared
with fused features using our proposed method based on Canonical Correlation
Analysis.
Features Fusion Recall Precision MAP
Method @5 @10 @20 @5 @10 @20 @5 @10 @20
TagLSA - 0.0028 0.0049 0.0068 0.0045 0.0041 0.0031 0.0025 0.0021 0.0019
LL - 0.0038 0.0046 0.0053 0.0051 0.0037 0.0023 0.0035 0.0028 0.0026
LL + TagLSA CCA-Sum 0.0055 0.0085 0.0112 0.0081 0.0069 0.0048 0.0045 0.0038 0.0035
LL + TagLSA CCA-Ccat 0.0115 0.0166 0.0209 0.0140 0.0115 0.0079 0.0091 0.0080 0.0076
and no information is available for that movie. In this work, however, we use the
stylistic visual features in combination with other sources of information. Hence,
our research hypothesis is that a proper fusion of the visual features of movies
may have led to a higher accuracy of movie recommendation, w.r.t. using these
set of features individually.
Based on the experiments, we conducted on a large dataset of 13K movies,
we successfully verified the hypothesis and shown that the recommendation ac-
curacy is considerably improved when the the (low-level) visual features are
combined with user-generated tags.
In future, we would consider fusion of additional sources of information, such
as, audio features, in order to farther improve the quality of the content based
recommendation system. Moreover, we will investigate the effect of different
feature aggregation methods on the quality of the extracted information and on
the quality of the generated recommendations.
References
1. C. C. Aggarwal. Content-based recommender systems. In Recommender Systems,
pages 139–166. Springer, 2016.
2. C. C. Aggarwal. Recommender Systems: The Textbook. Springer, 2016.
3. D. Bogdanov and P. Herrera. How much metadata do we need in music recom-
mendation? a subjective evaluation using preference sets. In ISMIR, pages 97–102,
2011.
4. D. Bogdanov, J. Serr`a, N. Wack, P. Herrera, and X. Serra. Unifying low-level
and high-level music similarity measures. Multimedia, IEEE Transactions on,
13(4):687–701, 2011.
5. D. Brezeale and D. J. Cook. Automatic video classification: A survey of the liter-
ature. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE
Transactions on, 38(3):416–430, 2008.
6. I. Cantador, M. Szomszor, H. Alani, M. Fern´andez, and P. Castells. Enriching
ontological user profiles with tagging history for multi-domain recommendations.
2008.
7. O. Celma. Music recommendation. In Music Recommendation and Discovery,
pages 43–85. Springer, 2010.
Title Suppressed Due to Excessive Length 11
8. P. Cremonesi, F. Garzotto, S. Negro, A. V. Papadopoulos, and R. Turrin. Looking
for good recommendations: A comparative evaluation of recommender systems. In
Human-Computer Interaction–INTERACT 2011, pages 152–168. Springer, 2011.
9. P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms
on top-n recommendation tasks. In Proceedings of the 2010 ACM Conference on
Recommender Systems, RecSys 2010, Barcelona, Spain, September 26-30, 2010,
pages 39–46, 2010.
10. M. de Gemmis, P. Lops, C. Musto, F. Narducci, and G. Semeraro. Semantics-aware
content-based recommender systems. In Recommender Systems Handbook, pages
119–159. Springer, 2015.
11. M. Degemmis, P. Lops, and G. Semeraro. A content-collaborative recommender
that exploits wordnet-based user profiles for neighborhood formation. User Mod-
eling and User-Adapted Interaction, 17(3):217–255, 2007.
12. Y. Deldjoo, M. Elahi, and P. Cremonesi. Using visual features and latent factors
for movie recommendation. In Workshop on New Trends in Content-Based Rec-
ommender Systems (CBRecSys), in conjugation with ACM Recommender Systems
conference (RecSys), 2016.
13. Y. Deldjoo, M. Elahi, P. Cremonesi, F. Garzotto, and P. Piazzolla. Recommending
movies based on mise-en-scene design. In Proceedings of the 2016 CHI Conference
Extended Abstracts on Human Factors in Computing Systems, pages 1540–1547.
ACM, 2016.
14. Y. Deldjoo, M. Elahi, P. Cremonesi, F. Garzotto, P. Piazzolla, and M. Quadrana.
Content-based video recommendation system based on stylistic visual features.
Journal on Data Semantics, pages 1–15, 2016.
15. Y. Deldjoo, M. Elahi, M. Quadrana, and P. Cremonesi. Toward building a content-
based video recommendation system based on low-level features. In E-Commerce
and Web Technologies. Springer, 2015.
16. Y. Deldjoo, M. Elahi, M. Quadrana, P. Cremonesi, and F. Garzotto. Toward
effective movie recommendations based on mise-en-sc`ene film styles. In Proceedings
of the 11th Biannual Conference on Italian SIGCHI Chapter, pages 162–165. ACM,
2015.
17. M. Elahi, F. Ricci, and N. Rubens. A survey of active learning in collaborative
filtering recommender systems. Computer Science Review, 20:29 – 50, 2016.
18. I. Fern´andez-Tob´ıas, M. Braunhofer, M. Elahi, F. Ricci, and I. Cantador. Alle-
viating the new user problem in collaborative filtering by exploiting personality
information. User Modeling and User-Adapted Interaction, 26(2):221–255, 2016.
19. I. Guy, N. Zwerdling, I. Ronen, D. Carmel, and E. Uziel. Social media recommen-
dation based on people and tags. In Proceedings of the 33rd international ACM
SIGIR conference on Research and development in information retrieval, pages
194–201. ACM, 2010.
20. M. Haghighat, M. Abdel-Mottaleb, and W. Alhalabi. Fully automatic face normal-
ization and single sample face recognition in unconstrained environments. Expert
Systems with Applications, 47:23–34, 2016.
21. D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation anal-
ysis: An overview with application to learning methods. Neural computation,
16(12):2639–2664, 2004.
22. F. M. Harper and J. A. Konstan. The movielens datasets: History and context.
ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4):19, 2015.
23. H. Hotelling. Relations between two sets of variates. Biometrika, 28(3/4):321–377,
1936.
12 Authors Suppressed Due to Excessive Length
24. W. Hu, N. Xie, L. Li, X. Zeng, and S. Maybank. A survey on visual content-based
video indexing and retrieval. Systems, Man, and Cybernetics, Part C: Applications
and Reviews, IEEE Transactions on, 41(6):797–819, 2011.
25. X. Li, L. Guo, and Y. E. Zhao. Tag-based social interest discovery. In Proceedings
of the 17th international conference on World Wide Web, pages 675–684. ACM,
2008.
26. T. Mei, B. Yang, X.-S. Hua, and S. Li. Contextual video recommendation by mul-
timodal relevance and user feedback. ACM Transactions on Information Systems
(TOIS), 29(2):10, 2011.
27. R. J. Mooney and L. Roy. Content-based book recommending using learning for
text categorization. In Proceedings of the fifth ACM conference on Digital libraries,
pages 195–204. ACM, 2000.
28. M. Nasery, M. Braunhofer, and F. Ricci. Recommendations with optimal combina-
tion of feature-based and item-based preferences. In Proceedings of the 2016 Con-
ference on User Modeling Adaptation and Personalization, pages 269–273. ACM,
2016.
29. M. Nasery, M. Elahi, and P. Cremonesi. Polimovie: a feature-based dataset for
recommender systems. In ACM RecSys Workshop on Crowdsourcing and Human
Computation for Recommender Systems (CrawdRec), volume 3, pages 25–30. ACM,
2015.
30. J. C. Pereira, E. Coviello, G. Doyle, N. Rasiwasia, G. R. Lanckriet, R. Levy, and
N. Vasconcelos. On the role of correlation and abstraction in cross-modal multi-
media retrieval. IEEE transactions on pattern analysis and machine intelligence,
36(3):521–535, 2014.
31. Z. Rasheed and M. Shah. Video categorization using semantics and semiotics. In
Video mining, pages 185–217. Springer, 2003.
32. Z. Rasheed, Y. Sheikh, and M. Shah. On the use of computable features for film
classification. Circuits and Systems for Video Technology, IEEE Transactions on,
15(1):52–64, 2005.
33. N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. Lanckriet, R. Levy,
and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In
Proceedings of the 18th ACM international conference on Multimedia, pages 251–
260. ACM, 2010.
34. N. Rubens, M. Elahi, M. Sugiyama, and D. Kaplan. Active learning in recom-
mender systems. In Recommender Systems Handbook, pages 809–846. Springer,
2015.
35. M. Szomszor, C. Cattuto, H. Alani, K. OHara, A. Baldassarri, V. Loreto, and V. D.
Servedio. Folksonomies, the semantic web, and movie recommendation. 2007.
36. B. Yang, T. Mei, X.-S. Hua, L. Yang, S.-Q. Yang, and M. Li. Online video recom-
mendation based on multimodal fusion and relevance feedback. In Proceedings of
the 6th ACM international conference on Image and video retrieval, pages 73–80.
ACM, 2007.
37. H. Zhou, T. Hermans, A. V. Karandikar, and J. M. Rehg. Movie genre classifi-
cation via scene categorization. In Proceedings of the international conference on
Multimedia, pages 747–750. ACM, 2010.
... The fusion process therefore presents a significant challenge in preserving the original information from both features. Existing methods adopted concatenation [16], summation [13,73], attention mechanism [10] or meta paths [50,56] to fuse different types of features. Concatenation and summation based methods, which merely concatenate and add different features in the fusion process, are too simple to fully capture the nonlinearity between different types of features, and thus cannot optimally fuse them. ...
... It can often present difficulties in retaining all of the original information from those features during the fusion process. Existing methods achieve the fusion of feature through concatenation [16], summation [13,73], attention mechanism [10,58] or meta paths [51,56]. GraphRec [16] proposed to concatenate two different features: user latent representations from item space and social space, respectively, to precisely model user characteristics. ...
... We propose a new approach to feature fusion that differs from traditional methods which either attempt to minimize the gap between features from different sources [17,21] or map them into a shared feature space [13]. Instead, our method minimizes the gap between interaction scores. ...
Preprint
Graph Neural Networks (GNNs) have demonstrated effectiveness in collaborative filtering tasks due to their ability to extract powerful structural features. However, combining the graph features extracted from user-item interactions and auxiliary features extracted from user genres and item properties remains a challenge. Currently available fusion methods face two major issues: 1) simple methods such as concatenation and summation are generic, but not accurate in capturing feature relationships; 2) task-specific methods like attention mechanisms and meta paths may not be suitable for general feature fusion. To address these challenges, we present GraphTransfer, a simple but universal feature fusion framework for GNN-based collaborative filtering. Our method accurately fuses different types of features by first extracting graph features from the user-item interaction graph and auxiliary features from users and items using GCN. The proposed cross fusion module then effectively bridges the semantic gaps between the interaction scores of different features. Theoretical analysis and experiments on public datasets show that GraphTransfer outperforms other feature fusion methods in CF tasks. Additionally, we demonstrate the universality of our framework via empirical studies in three other scenarios, showing that GraphTransfer leads to significant improvements in the performance of CF algorithms.
... In Deldjoo et al. (2016bDeldjoo et al. ( , c, 2018d we specifically addressed the under-researched problem of combining visual features extracted from movies with available semantic information embedded in metadata or collaborative data available in users' interaction patterns in order to improve offline recommendation quality. To this end, for multimodal fusion (i.e., fusing features from different modalities) in Deldjoo et al. (2016b), for the first time, we investigated adoption of an effective data fusion technique named canonical correlation analysis (CCA) to fuse visual and textual features extracted from movie trailers. ...
... In Deldjoo et al. (2016bDeldjoo et al. ( , c, 2018d we specifically addressed the under-researched problem of combining visual features extracted from movies with available semantic information embedded in metadata or collaborative data available in users' interaction patterns in order to improve offline recommendation quality. To this end, for multimodal fusion (i.e., fusing features from different modalities) in Deldjoo et al. (2016b), for the first time, we investigated adoption of an effective data fusion technique named canonical correlation analysis (CCA) to fuse visual and textual features extracted from movie trailers. A detailed discussion about CCA can be found in Sect. ...
... Although a small number of visual features were used to represent the trailer content (similar to Deldjoo et al. 2016d), the results of offline recommendation using 14K trailers suggested the merits of the proposed fusion approach for the recommendation task. In Deldjoo et al. (2018d) we extended (Deldjoo et al. 2016b) and used both low-level visual features (color-and texture-based) using the MPEG-7 standard together with deep learning features in a hybrid CF and CBF approach. The aggregated and fused features were ultimately used as input for a collective sparse linear method (SLIM) (Ning and Karypis 2011) method, generating an enhancement for the CF method. ...
Article
Full-text available
As of today, most movie recommendation services base their recommendations on collaborative filtering (CF) and/or content-based filtering (CBF) models that use metadata (e.g., genre or cast). In most video-on-demand and streaming services, however, new movies and TV series are continuously added. CF models are unable to make predictions in such a scenario, since the newly added videos lack interactions—a problem technically known as new item cold start (CS). Currently, the most common approach to this problem is to switch to a purely CBF method, usually by exploiting textual metadata. This approach is known to have lower accuracy than CF because it ignores useful collaborative information and relies on human-generated textual metadata, which are expensive to collect and often prone to errors. User-generated content, such as tags, can also be rare or absent in CS situations. In this paper, we introduce a new movie recommender system that addresses the new item problem in the movie domain by (i) integrating state-of-the-art audio and visual descriptors, which can be automatically extracted from video content and constitute what we call the movie genome; (ii) exploiting an effective data fusion method named canonical correlation analysis, which was successfully tested in our previous works Deldjoo et al. (in: International Conference on Electronic Commerce and Web Technologies. Springer, Berlin, pp 34–45, 2016b; Proceedings of the Twelfth ACM Conference on Recommender Systems. ACM, 2018b), to better exploit complementary information between different modalities; (iii) proposing a two-step hybrid approach which trains a CF model on warm items (items with interactions) and leverages the learned model on the movie genome to recommend cold items (items without interactions). Experimental validation is carried out using a system-centric study on a large-scale, real-world movie recommendation dataset both in an absolute cold start and in a cold to warm transition; and a user-centric online experiment measuring different subjective aspects, such as satisfaction and diversity. Results show the benefits of this approach compared to existing approaches.
... Usergenerated content, such as tags, can also be rare or absent in CS situations. In this paper, we introduce a new movie recommender system that addresses the new item problem in the movie domain by (i) integrating state-of-the-art audio and visual descriptors, which can be automatically extracted from video content and constitute what we call the movie genome; (ii) exploiting an effective data fusion method named canonical correlation analysis (CCA), which was successfully tested in our previous works [28,22], to better exploit complementary information between different modalities; (iii) proposing a two-step hybrid approach which trains a CF model on warm items (items with interactions) and leverages the learned model on the movie genome to recommend Abbreviation Term MM Multimedia RS Recommender systems VRS Video recommender systems MRS Movie recommender systems MMRS Multimedia recommender systems CS Cold start WS Warm start AVF Aesthetic visual features BLF Block-level features CBF Content-based filtering CF Collaborative filtering CF-MMRS Collaborative filtering multimedia recommender system CB-MMRS Content-based multimedia recommender system CA-MMRS Context-aware multimedia recommender system BPR Bayesian personalized ranking KNN K-Nearest Neighbor CFeCBF Collaborative-filtering enriched content-based filtering ...
... In [28,30,25] we specifically addressed the under-researched problem of combining visual features extracted from movies with available semantic information embedded in metadata or collaborative data available in users' interaction patterns in order to improve offline recommendation quality. To this end, for multimodal fusion (i.e., fusing features from different modalities) in [28], for the first time, we investigated adoption of an effective data fusion technique named canonical correlation analysis (CCA) to fuse visual and textual features extracted from movie trailers. ...
... In [28,30,25] we specifically addressed the under-researched problem of combining visual features extracted from movies with available semantic information embedded in metadata or collaborative data available in users' interaction patterns in order to improve offline recommendation quality. To this end, for multimodal fusion (i.e., fusing features from different modalities) in [28], for the first time, we investigated adoption of an effective data fusion technique named canonical correlation analysis (CCA) to fuse visual and textual features extracted from movie trailers. A detailed discussion about CCA can be found in Section 3.2. ...
Preprint
As of today, most movie recommendation services base their recommendations on collaborative filtering (CF) and/or content-based filtering (CBF) models that use metadata (e.g., genre or cast). In most video-on-demand and streaming services, however, new movies and TV series are continuously added. CF models are unable to make predictions in such a scenario, since the newly added videos lack interactions-a problem technically known as new item cold start (CS). Currently, the most common approach to this problem is to switch to a purely CBF method, usually by exploiting textual metadata. This approach is known to have lower accuracy than CF because it ignores useful collaborative information and relies on human-generated tex-tual metadata, which are expensive to collect and often prone to errors. User-generated content, such as tags, can also be rare or absent in CS situations. In this paper, we introduce a new movie recommender system that addresses the new item problem in the movie domain by (i) integrating state-of-the-art audio and visual descriptors, which can be automatically extracted from video content and constitute what we call the movie genome; (ii) exploiting an effective data fusion method named canonical correlation analysis (CCA), which was successfully tested in our previous works [28, 22], to better exploit complementary information between different modalities; (iii) proposing a two-step hybrid approach which trains a CF model on warm items (items with interactions) and leverages the learned model on the movie genome to recommend 2 Deldjoo et al. cold items (items without interactions). Experimental validation is carried out using a system-centric study on a large-scale, real-world movie recommendation dataset both in an absolute cold start and in a cold to warm transition; and a user-centric online experiment measuring different subjective aspects, such as satisfaction and diversity. Results show the benefits of this approach compared to existing approaches.
... In previous works, we have shown that mise-en-scène visual features extracted from trailers can be used to accurately make prediction of the genre of movies [15,[17][18][19]23]. In this paper, we extend these papers and show how to use low-level visual features extracted automatically from movie files as input to a pure CBF and a hybrid CF+CBF (based on cSLIM [41]) algorithm. ...
... Motivated by the approach proposed in [19], we employed the fusion method based on canonical correlation analysis (CCA) which exploits the low-level correlation between two sets of visual features and learns a linear transformation to maximize the pairwise correlation between the two sets of MPEG-7 and deep-learning networks features. ...
Article
Full-text available
Item features play an important role in movie recommender systems, where recommendations can be generated by using explicit or implicit preferences of users on traditional features (attributes) such as tag, genre, and cast. Typically, movie features are human-generated, either editorially (e.g., genre and cast) or by leveraging the wisdom of the crowd (e.g., tag), and as such, they are prone to noise and are expensive to collect. Moreover, these features are often rare or absent for new items, making it difficult or even impossible to provide good quality recommendations. In this paper, we show that users’ preferences on movies can be well or even better described in terms of the mise-en-scène features, i.e., the visual aspects of a movie that characterize design, aesthetics and style (e.g., colors, textures). We use both MPEG-7 visual descriptors and Deep Learning hidden layers as examples of mise-en-scène features that can visually describe movies. These features can be computed automatically from any video file, offering the flexibility in handling new items, avoiding the need for costly and error-prone human-based tagging, and providing good scalability. We have conducted a set of experiments on a large catalog of 4K movies. Results show that recommendations based on mise-en-scène features consistently outperform traditional metadata attributes (e.g., genre and tag).
... In this section, we will briefly review two related research areas, i.e., (a) tag-based recommender systems and (b) visually-aware recommender systems. Several prior works have incorporated (humanannotated) tags into the recommendation process [1,2,6,12,20,21,36]. An example can be [29] where integrating tag-based similarities within a Collaborative Filtering system has yielded an improvement in the recommendation. ...
Conference Paper
Full-text available
This paper addresses the so-called New Item problem in video Recommender Systems, as part of Cold Start. New item problem occurs when a new item is added to the system catalog, and the recommender system has no or little data describing that item. This could cause the system to fail to meaningfully recommend the new item to the users. We propose a novel technique that can generate cold start recommendation by utilizing automatic visual tags, i.e., tags that are automatically annotated by deeply analyzing the content of the videos and detecting faces, objects, and even celebrities within the videos. The automatic visual tags do not need any human involvement and have been shown to be very effective in representing the video content. In order to evaluate our proposed technique, we have performed a set of experiments using a large dataset of videos. The results have shown that the automatically extracted visual tags can be incorporated into the cold start recommendation process and achieve superior results compared to the recommendation based on human-annotated tags.
... Early fusion attempts to combine feature extracted from various unimodal streams into a single representation. For instance, in [13,16] we studied adoption of an effective early fusion technique named canonical correlation analysis (CCA) to combine visual, textual and/or audio descriptors extracted from movie trailers and better exploit complementary information between different modalities. Late fusion approaches combine outputs of several system run on different descriptors. ...
Chapter
Full-text available
Video recordings are complex media types. When we watch a movie, we can effortlessly register a lot of details conveyed to us (by the author) through different multimedia channels, in particular, the audio and visual modalities. To date, majority of movie recommender systems use collaborative filtering (CF) models or content-based filtering (CBF) relying on metadata (e.g., editorial such as genre or wisdom of the crowd such as user-generated tags) at their core since they are human-generated and are assumed to cover the 'content semantics' of movies by a great degree. The information obtained from multimedia content and learning from muli-modal sources (e.g., audio, visual and metadata) on the other hand, offers the possibility of uncovering relationships between modalities and obtaining an in-depth understanding of natural phenomena occurring in a video. These discerning characteristics of heterogeneous feature sets meet users' differing information needs. In the context of this Ph.D. thesis [9], which is briefly summarized in the current extended abstract, approaches to automated extraction of multimedia information from videos and their integration with video recommender systems have been elaborated , implemented, and analyzed. Variety of tasks related to movie recommendation using multimedia content have been studied. The results of this thesis can motivate the fact that recommender system research can benefit from knowledge in multi-media signal processing and machine learning established over the last decades for solving various recommendation tasks.
... • Mise-en-scene features [61,62,117] • MPEG7 features [63] • Deep Learning features [66,67,68] Cross-domain • Knowledge Aggregation [103] • Knowledge Transfer [104] We discussed various solutions that have been proposed in the literature. These solutions are summarized in table 1.1. ...
... • Mise-en-scene features [61,62,117] • MPEG7 features [63] • Deep Learning features [66,67,68] Cross-domain • Knowledge Aggregation [103] • Knowledge Transfer [104] We discussed various solutions that have been proposed in the literature. These solutions are summarized in table 1.1. ...
Preprint
Full-text available
Recommendation systems are essential tools to overcome the choice overload problem by suggesting items of interest to users. However, they suffer from a major challenge which is the so-called cold-start problem. The cold-start problem typically happens when the system does not have any form of data on new users and on new items. In this chapter, we describe the cold start problem in recommendation systems. We mainly focus on Collaborative Filtering (CF) systems which are the most popular approaches to build recommender systems and have been successfully employed in many real-world applications. Moreover, we discuss multiple scenarios that cold-start may happen in these systems and explain different solutions for them.
Book
Full-text available
This open access book presents nine outstanding doctoral dissertations in Information Technology from the Department of Electronics, Information and Bioengineering, Politecnico di Milano, Italy. Information Technology has always been highly interdisciplinary, as many aspects have to be considered in IT systems. The doctoral studies program in IT at Politecnico di Milano emphasizes this interdisciplinary nature, which is becoming more and more important in recent technological advances, in collaborative projects, and in the education of young researchers. Accordingly, the focus of advanced research is on pursuing a rigorous approach to specific research topics starting from a broad background in various areas of Information Technology, especially Computer Science and Engineering, Electronics, Systems and Controls, and Telecommunications. Each year, more than 50 PhDs graduate from the program. This book gathers the outcomes of the nine best theses defended in 2018-19 and selected for the IT PhD Award. Each of the nine authors provides a chapter summarizing his/her findings, including an introduction, description of methods, main achievements and future work on the topic. Hence, the book provides a cutting-edge overview of the latest research trends in Information Technology at Politecnico di Milano, presented in an easy-to-read format that will also appeal to non-specialists.
Conference Paper
Full-text available
Many recommender systems rely on item ratings to predict users' preferences and generate recommendations. However, users often express preferences by referring to features of the items, e.g., "I like Tarantino's movies". But, it has been shown that user models based on feature preferences may lead to wrong recommendations. In this paper we cope with this issue and we introduce a novel prediction model that generate better item recommendations, especially in cold-start situations, by exploiting both item-based and feature-based preferences. We also show that it is possible to optimize the combination of the two types of preferences when actively requesting them to users.
Conference Paper
Full-text available
Item features play an important role in movie recommender systems, where recommendations can be generated by using explicit or implicit preferences of users on attributes such as genres. Traditionally, movie features are human-generated, either editorially or by leveraging the wisdom of the crowd. In this short paper, we present a recommender system for movies based of Factorization Machines that makes use of the low-level visual features extracted automatically from movies as side information. Low-level visual features – such as lighting, colors and motion – represent the design aspects of a movie and characterize its aesthetic and style. Our experiments on a dataset of more than 13K movies show that recommendations based on low-level visual features provides almost 10 times better accuracy in comparison to genre based recommendations, in terms of various evaluation metrics.
Conference Paper
Full-text available
In this paper, we present an ongoing work that will ultimately result in a movie recommender system based on the Mise-en-Scène characteristics of the movies. We believe that the preferences of users on movies can be well described in terms of the mise-en-scène, i.e., the design aspects of movie making influencing aesthetic and style. Examples of mise-en-scène characteristics are Lighting, colors, background, and movements. Our recommender system opens new opportunities in the design of new user interfaces able to offer a personalized way to search for interesting movies through the analysis of film styles rather than using the traditional classifications of movies based on explicit attributes such as genre and cast.
Chapter
Full-text available
In Recommender Systems (RS), a users preferences are expressed in terms of rated items, where incorporating each rating may improve the RS’s predictive accuracy. In addition to a user rating items at-will (a passive process), RSs may also actively elicit the user to rate items, a process known as Active Learning (AL). However, the number of interactions between the RS and the user is still limited. One aim of AL is therefore the selection of items whose ratings are likely to provide the most information about the user’s preferences. In this chapter, we provide an overview of AL within RSs, discuss general objectives and considerations, and then summarize a variety of methods commonly employed. AL methods are categorized based on our interpretation of their primary motivation/goal, and then sub-classified into two commonly classified types, instance-based and model-based, for easier comprehension. We conclude the chapter by outlining ways in which AL methods could be evaluated, and provide a brief summary of methods performance.
Chapter
Full-text available
Content-based recommender systems (CBRSs) rely on item and user descriptions (content) to build item representations and user profiles that can be effectively exploited to suggest items similar to those a target user already liked in the past. Most content-based recommender systems use textual features to represent items and user profiles, hence they suffer from the classical problems of natural language ambiguity. This chapter presents a comprehensive survey of semantic representations of items and user profiles that attempt to overcome the main problems of the simpler approaches based on keywords. We propose a classification of semantic approaches into top-down and bottom-up. The former rely on the integration of external knowledge sources, such as ontologies, encyclopedic knowledge and data from the Linked Data cloud, while the latter rely on a lightweight semantic representation based on the hypothesis that the meaning of words depends on their use in large corpora of textual documents. The chapter shows how to make recommender systems aware of semantics to realize a new generation of content-based recommenders.
Chapter
The collaborative systems discussed in the previous chapters use the correlations in the ratings patterns across users to make recommendations. On the other hand, these methods do not use item attributes for computing predictions. This would seem rather wasteful; after all, if John likes the futuristic science fiction movie Terminator, then there is a very good chance that he might like a movie from a similar genre, such as Aliens. In such cases, the ratings of other users may not be required to make meaningful recommendations.
Conference Paper
Many advanced recommendation frameworks employ ontologies of various complexities to model individuals and items, providing a mechanism for the expression of user interests and the representation of item attributes. As a result, complex matching techniques can be applied to support individuals in the discovery of items according to explicit and implicit user preferences. Recently, the rapid adoption of Web2.0, and the proliferation of social networking sites, has resulted in more and more users providing an increasing amount of information about themselves that could be exploited for recommendation purposes. However, the unification of personal information with ontologies using the contemporary knowledge representation methods often associated with Web2.0 applications, such as community tagging, is a non-trivial task. In this paper, we propose a method for the unification of tags with ontologies by grounding tags to a shared representation in the form of Wordnet and Wikipedia. We incorporate individuals’ tagging history into their ontological profiles by matching tags with ontology concepts. This approach is preliminary evaluated by extending an existing news recommendation system with user tagging histories harvested from popular social networking sites.
Conference Paper
While the Semantic Web has evolved to support the meaningful exchange of heterogeneous data through shared and controlled conceptualisations, Web 2.0 has demonstrated that large-scale community tagging sites can enrich the semantic web with readily accessible and valuable knowledge. In this paper, we investigate the integration of a movies folksonomy with a semantic knowledge base about user-movie rentals. The folksonomy is used to enrich the knowledge base with descriptions and categorisations of movie titles, and user interests and opinions. Using tags harvested from the Internet Movie Database, and movie rating data gathered by Netflix, we perform experiments to investigate the question that folksonomy-generated movie tag-clouds can be used to construct better user profiles that reflect a user's level of interest in different kinds of movies, and therefore, provide a basis for prediction of their rating for a previously unseen movie.
Book
This book comprehensively covers the topic of recommender systems, which provide personalized recommendations of products or services to users based on their previous searches or purchases. Recommender system methods have been adapted to diverse applications including query log mining, social networking, news recommendations, and computational advertising. This book synthesizes both fundamental and advanced topics of a research area that has now reached maturity. The chapters of this book are organized into three categories: - Algorithms and evaluation: These chapters discuss the fundamental algorithms in recommender systems, including collaborative filtering methods, content-based methods, knowledge-based methods, ensemble-based methods, and evaluation. - Recommendations in specific domains and contexts: the context of a recommendation can be viewed as important side information that affects the recommendation goals. Different types of context such as temporal data, spatial data, social data, tagging data, and trustworthiness are explored. - Advanced topics and applications: Various robustness aspects of recommender systems, such as shilling systems, attack models, and their defenses are discussed. In addition, recent topics, such as learning to rank, multi-armed bandits, group systems, multi-criteria systems, and active learning systems, are introduced together with applications. Although this book primarily serves as a textbook, it will also appeal to industrial practitioners and researchers due to its focus on applications and references. Numerous examples and exercises have been provided, and a solution manual is available for instructors. About the Author: Charu C. Aggarwal is a Distinguished Research Staff Member (DRSM) at the IBM T.J. Watson Research Center in Yorktown Heights, New York. He completed his B.S. from IIT Kanpur in 1993 and his Ph.D. from the Massachusetts Institute of Technology in 1996. He has published more than 300 papers in refereed conferences and journals, and has applied for or been granted more than 80 patents. He is author or editor of 15 books, including a textbook on data mining and a comprehensive book on outlier analysis. Because of the commercial value of his patents, he has thrice been designated a Master Inventor at IBM. He has received several internal and external awards, including the EDBT Test-of-Time Award (2014) and the IEEE ICDM Research Contributions Award (2015). He has also served as program or general chair of many major conferences in data mining. He is a fellow of the SIAM, ACM, and the IEEE, for “contributions to knowledge discovery and data mining algorithms.”