Content uploaded by Yashar Deldjoo
Author content
All content in this area was uploaded by Yashar Deldjoo on Oct 29, 2017
Content may be subject to copyright.
Recommending Movies Based on
Mise-en-Sc`
ene Design
Yashar Deldjoo
Politecnico di Milano
yashar.deldjoo@polimi.it
Franca Garzotto
Politecnico di Milano
franca.garzotto@polimi.it
Mehdi Elahi
Politecnico di Milano
mehdi.elahi@polimi.it
Pietro Piazzolla
Politecnico di Milano
pietro.piazzolla@polimi.it
Paolo Cremonesi
Politecnico di Milano
paolo.cremonesi@polimi.it
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s). Copyright is held by the
author/owner(s).
CHI’16 Extended Abstracts, May 7–12, 2016, San Jose, CA, USA.
ACM 978-1-4503-4082-3/16/05.
http://dx.doi.org/10.1145/2851581.2892551
Abstract
In this paper, we present an ongoing work that will ulti-
mately result in a movie recommender system based on
the Mise-en-Scène characteristics of the movies. We be-
lieve that the preferences of users on movies can be well
described in terms of the mise-en-scène, i.e., the design
aspects of movie making influencing aesthetic and style.
Examples of mise-en-scène characteristics are Lighting,
colors, background, and movements. Our recommender
system opens new opportunities in the design of new user
interfaces able to offer a personalized way to search for in-
teresting movies through the analysis of film styles rather
than using the traditional classifications of movies based on
explicit attributes such as genre and cast.
Author Keywords
movie recommendation, film making, video processing
Introduction
Recommender Systems (RSs) are applications, that are ca-
pable of filtering large information and selecting the items
that are likely to be attractive for users [18]. Particularly,
they play an important role in video-on-demand web ap-
plications (e.g., YouTube and Netflix) characterized by a
huge catalogs of movies: the ultimate aim of RSs is to find
and recommend to users the movies that are most likely
to be attractive for them. However, RSs cannot make rele-
Late-Breaking Work: Engineering of Interactive Systems
#chi4good, CHI 2016, San Jose, CA, USA
1540
vant recommendation of movies, before some information is
available on these movies.
Recommendations are typically made using implicit and ex-
plicit preferences of users on movies’ attributes (e.g., genre,
director, and actors)[6]. However, user’s preferences can
be also described by the mise-en-scène characteristics of
movies [8, 7], i.e., the design aspects of a movie produc-
tion used to classify aesthetic and style. Lighting, colors,
background, and movements in a movie are all examples
of mise-en-scène features. Although viewers may not con-
sciously notice movie style, it still influences the viewer’s
experience of the movie. The mise-en-scène highlights sim-
ilarities in the narratives, as movie makers typically relate
the overall movie style to reflect the story, and can be used
to categorize movies at a finer level compared to the tradi-
tional movie features [7].
In this research work, we propose the exploitation of auto-
matically extracted design visual features of movies based
on mise-en-scène design characteristics, in the context
of recommender systems. We propose a novel recom-
mender system that automatically analyze video contents
and extracts a set of representative stylistic visual features
grounded on Applied Media Aesthetic [22], i.e., the theory
that is concerned with the relation of aesthetic media at-
tributes (e.g., light, camera movements, and colors) with the
perceptual reactions they are able to evoke in consumers of
media communication, particularly movies.
Our results poses new challenges in the design of user in-
terfaces able to integrate the stylistic features of movies into
a comprehensive and practical recommender system, as
the perceived quality of a recommender system is deter-
mined by its algorithm as well as by its usability [5, 4, 3].
This is a novel and multidisciplinary approach, from both
design and engineering perspectives, toward video rec-
ommendation systems. It can build huge influence on the
research area, and revolutionize the industry, e.g., social
video sharing.
More specifically, in this work we have conducted a prelimi-
nary data analysis in order to investigate two conjectures:
(i) if stylistic visual features extracted from trailers are a
good representation of the corresponding features extracted
from the original full-length movies, and (ii) if the stylistic
visual features are informative indicators of the movies.
We briefly present the results of the analysis and provide
a discussion on the ultimate goals we pursue. An extended
version of this work has been published in [7].
Technical Background
A prerequisite for RSs is the availability of information about
“explicit” content features of the items. In movie RSs, such
features are associated to the items as structured meta-
information (e.g., movie genre, director, and cast) or un-
structured meta-information (e.g., plot, tags and textual
reviews). In contrast, we propose a stylistic-based movie
recommendation technique that exploits “implicit” content
characteristics of items, i.e., features that are “encapsu-
lated" in the items and must be computationally “extracted”
from them.
For example, two movies may be from the same genre, but
they can be different based on the movie style. “The Fifth
Element" and the “War of the Worlds" are both sci-fi movies
about an alien invasion. However, they are shot completely
different, with Luc Besson (The Fifth Element) using bright
colors while Steven Spielberg (War of the Worlds) preferring
dark scenes. Although a viewer may not consciously notice
the two different movie styles, they still affect the viewer’s
Late-Breaking Work: Engineering of Interactive Systems
#chi4good, CHI 2016, San Jose, CA, USA
1541
experience of the movie. There are countless ways to cre-
ate a movie based on the same script simply by changing
the mise-en-scène [11].
Furthermore, mise-en-scène characteristics of the movies
can bring additional benefits to RSs. For example, mise-
en-scène can be used to tackle with the Cold Start problem
which occurs when the system is unable to accurately rec-
ommend a new item to the existing users [10]. This is a
situation that typically occurs in social movie-sharing web
applications (e.g., YouTube) where every day, hundred mil-
lions of hours of videos are uploaded by users and may
contain no meta-data and no user preference. Traditional
techniques would neglect to consider these new items even
if they may be relevant for recommendation purposes, as
the recommender has no content to analyze but video files.
To the best of my knowledge, this problem has not been yet
effectively solved [19].
Figure 1: above. Out of the past
(1947) an example of highly
contrasted lighting. below. The
wizard of OZ (1939) flat lighting
example.
Figure 2: above. An image from
Django Unchained (2012). The red
hue is used to increase the scene
sense of violence. below. An
image from Lincoln (2012). Blue
tone is used to produce the sense
of coldness and fatigue
experienced by the characters.
Artistic Motivation
In this section, we describe the artistic background to the
idea of stylistic visual features for movie recommendation.
We do this by describing the stylistic visual features from
an artistic point of view and explaining the relation between
these visual features and the corresponding aesthetic vari-
ables in movie-making domain.
The study on aesthetic elements and how their combination
contributes to establish the meaning conveyed by an artistic
work is the subject of different disciplines such as semi-
otics, and traditional aesthetic studies. The shared notion
is that humans respond to certain stimuli in ways that are
predictable, up to a given extent. One of the consequences
of the above notion is that similar stimuli are expected to
provoke similar reactions, and this as the result may allow
to group similar works of art together by the reaction they
are expected to provoke.
Among these disciplines, Applied Media Aesthetic [22], par-
ticularly, is concerned with the relation between a number
of media elements, such as light, camera movements, col-
ors, with the perceptual reactions they are able to evoke
in consumers of media communication, mainly videos and
films. Such media elements, that together build the visual
images composing the media, are investigated following a
rather formalistic approach that suits the purposes of this
paper. By an analysis of cameras, lenses, lighting, etc., as
production tools as well as their aesthetic characteristics
and uses, Applied Media Aesthetic tries to identify patterns
in how such elements operate to produce the desired effect
in communicating emotions and meanings.
The image elements that are usually addressed as funda-
mental in the literature, e.g. in [9], even if with slight differ-
ences due to the specific context, are lights and shadows,
colors, space representation, motion. It has been proved,
e.g. in [17][1], that some aspects concerning these ele-
ments can be computed from the video data stream as sta-
tistical values. We call these computable aspects as fea-
tures.
We will now look into closer details of the features, investi-
gated for content-based video recommendation in this pa-
per to provide a solid overview on how they are used to
producing perceptual reaction in the audience.
Lighting
There are at least two different purposes for lighting in
movies chiaroscuro and f lat lighting. While, the first
is a lighting technique characterized by high contrast be-
tween light and shadow areas that puts the emphasis on
an unnatural effect, the latter instead is a neutral, realistic,
Late-Breaking Work: Engineering of Interactive Systems
#chi4good, CHI 2016, San Jose, CA, USA
1542
way of illuminating, whose purpose is to enable recognition
of stage objects. Figure 1 illustrates the difference between
these two alternatives.
Colors
The expressive quality of colors is closely related to that
of lighting, sharing the same ability to set or magnify the
feeling derived by a given situation. Even if an exact cor-
relation between colors and the feeling they may evoke is
not currently supported by enough scientific data, colors
nonetheless, have an expressive impact that has been in-
vestigated thoroughly, e.g. in [20]. An interesting metric to
quantify this impact has been proposed in [21] as perceived
color energy, a quantity that depends on a color’s satura-
tion, brightness and the size of the area the color covers
in an image. Also the hue plays a role as if it tends toward
reds, the quantity of energy is more, while if it tends more
on blues, it is less. These tendencies are shown in exam-
ples of Figure 2.
Motion
The illusion of movement given by screening a sequence of
still frames in rapid succession is the very reason of cinema
existence. In a video or movie, there are different types of
motions to consider:
•Profilmic movements: Every movement that con-
cerns elements, shot by the camera, falls in this cate-
gory, e.g. performers motion, or vehicles. The move-
ment can be real or perceived. By deciding the type
and quantity of motion an ‘actor’ has, considering as
actor any possible protagonist of a scene, the director
defines, among others, the level of attention to, or ex-
pectations from, the scene. As an example, the hero
walking slowly in a dark alley, or a fast car chasing.
•Camera movements: are the movements that al-
ter the point of view on the narrated events. Camera
movements, such as the pan, truck, pedestal, or dolly,
can be used for different purposes. Some usages are
descriptive, to introduce landscapes or actors, to fol-
low performers actions, and others concern the nar-
ration, to relate two or more different elements, e.g.,
anticipating a car’s route to show an unseen obstacle,
to move toward or away from events.
•Sequences movements: As shots changes, using
cuts or other transitions, the rhythm of the movie
changes accordingly. Generally, a faster rhythm is
associated with excitement, and a slower rhythm sug-
gests a more relaxed pace [2].
In this paper, we followed the approach in [17], considering
the motion content of a scene as a feature that aggregate
and generalize both profilmic and camera movements.
Research Objectives
There are a number of objectives that are expected to achieve
at the end of this on-going research work:
• development and evaluation of a novel movie recom-
mendation system, based on automatic extraction of
visual stylistic features from the multimedia content;
the extracted features represent Mise-en-Scène char-
acteristics of the movies;
• as a broader goal, design and development of a novel
video retrieval platform, including the HCI, that im-
proves searching and recommendation capabilities,
based on aesthetic attributes (i.e., visual features)
derived from movie styles as determined by movie
maker professionals, and accurately match viewers’
perceptions;
Late-Breaking Work: Engineering of Interactive Systems
#chi4good, CHI 2016, San Jose, CA, USA
1543
• extraction of audio features that can effectively de-
scribe the movies in the audio feature space and will
be used together with visual features to improve the
representation model.
Visual features
While stylistic visual features have been marginally ex-
plored in the community of recommender systems, they
have been extensively studied in other fields such as Com-
puter Vision and Video Retrieval [17, 14]. By reviewing the
state-of-the-art works in these disciplines, we have iden-
tified and selected five visual features that have shown
promising results in representing the movie contents and
being the most informative and distinctive visual features:
(1) Average Shot Length, (2) Color Variance, (3) Average
Motion, (4) Motion Variation, and (5) Lighting Key.
Average Shot Length: a single camera action is named
ashot and the total number of shots in a video can be in-
dicative of the pace at which the movie is being created.
For example, action movies typically contain quick move-
ments of the camera in comparison to drama movies. Hence,
in action movies, average shot length is expected to be high
and in drama to be low. Color Variance: it is known that
variance of colors in movies is highly correlated with their
corresponding genre. Indeed, directors tend to use a large
variety of bright colors for comedy movies and darker com-
bination of colors for horror movies. For each key frame
represented in LUV color space, we compute the gener-
alized color variance [17], which is indicative of the color
variation in that key frame. Average and Variation of Mo-
tion in a video can be caused either as the result of the
camera movements (camera motion) or movements of the
objects being filmed (object motion). While measuring the
average shot length may focuses on the former, it is also
desired that the latter type to be captured accurately. For
this purpose, motion features are extracted. We used op-
tical flow [13], indicative of motion, as a robust estimate of
the pixel velocities over a sequence of images being filmed.
Lighting is considered as a discriminating factor among
movie genres and shall be effectively measured as a key
playing factor to control the type of emotion induced to a
movie consumer. For example, comedy movies often con-
tain abundance of light with a low key-to-fill ratio, i.e., a low
ratio between the brightest and dimmest light. This concept
in cinematography is known as high-key lightening. On the
other hand, horror or noir movies exploit low-key lighten-
ing, i.e. low amount of light and a high key-to-fill ratio.
We have extracted these visual features, automatically, from
each video and used them for recommendation generation.
We conducted a preliminary analysis which is described in
the next section.
Preliminary Analysis
We have conducted a preliminary experiment using a dataset
of 167 movies sampled randomly from 4 main genres, i.e.,
Action, Comedy, Drama, and Horror. Some of the movies
were from mixed genres. The dataset consisted of both full-
length movies and their trailers. Almost 95% of the movies
are recent (year of production between 1990 and 2015).
Only 5% of the movies were produce before the 90s.
In this preliminary experiment, we are interested in inves-
tigating (i) if visual features extracted from trailers are, in
general, a good approximation of the corresponding fea-
tures extracted from the original full-length movies, and (ii) if
the visual features are informative indicators of the movies.
We have computed the similarity between the visual fea-
tures extracted from the full-length movies and the trailers.
The similarity values have been computed using the well
known Cosine similarity metric [15, 16]. The average sim-
Late-Breaking Work: Engineering of Interactive Systems
#chi4good, CHI 2016, San Jose, CA, USA
1544
ilarity is 0.78 out of 1 (median is 0.80). More than 75% of
the movies have a similarity greater than 0.7 between the
full-length movie and trailer. Moreover, less than 3% of the
movies have a similarity below 0.5.
Overall, the cosine similarity shows a substantial correla-
tion between the full-length movies and trailers. This is an
interesting outcome that basically indicates that the trailers
of the movies can be considered as good representatives of
the corresponding full-length movies.
We have obtained high correlation between all visual fea-
tures, except with feature 2 (color variance) and 4 (object
motion): the average similarity values are 0.71, 0.57, 0.76,
0.56, and 0.92 for the first to fifth visual feature, respec-
tively. Features 2 and 4 show less similarity, comparing the
full-length movies and trailers, suggesting that their adop-
tion, if extracted from trailers, should provide less accurate
recommendations.
We have also performed a Wilcoxon significance test com-
paring features extracted from the full-length movies and
trailers. The results show that no significant difference ex-
ists between the features average motion and lighting key,
which clearly shows that the full-length movies and trailers
are highly correlation with respect to these two features.
For the other features, significant differences have been
obtained. This basically states that some of the extracted
features may be either less correlated or not very informa-
tive.
In order to identify the visual features that are more use-
ful in terms of recommendation quality, we have computed
Entropy as a measure [12] of the informativeness of the
data. Our results show that the entropy scores of almost
all visual stylistic features are large, meaning that the infor-
mative content is rich: the entropy values are 0.83, 0.61,
0.70, 0.76, and 0.93 for the first to fifth visual feature, re-
spectively. The most informative feature, in terms of entropy
score is feature 5, i.e., lighting key, and the least informative
feature is the feature 2, i.e., color variance. This observa-
tion is in the full consistency with the other findings, that
we have obtained from, e.g. Wilcoxon test and correlation
analysis (similarity of between features)
Conclusion
In this paper, we present an ongoing work of building a
video recommender system that uses Mise-en-Scène char-
acteristics of movies in order to generate recommendations.
Our recommender system will encompass a technique to
automatically analyze video contents and to extract a set of
representative stylistic features, i.e., Average Shot Length,
Color Variance, Average Motion, Motion Variation, and
Lighting Key. We present a preliminary results of analysis
that show (i) the trailers of the movies are well representa-
tive of the full-length movies (ii) the stylistic visual features
are well informative of the movie content.
For future work, we plan to design and develop an online
web application with a novel HCI that will provide a per-
sonalized way to search for interesting movies through the
analysis of film styles rather than using the traditional clas-
sifications of movies based on explicit attributes such as
genre and cast. We plan to design and conduct a real user
study in order to evaluate the quality of the recommenda-
tion as well as the usability of the system.
Acknowledgements
This work is supported by Telecom Italia S.p.A., Open Inno-
vation Department, Joint Open Lab S-Cube, Milan.
Late-Breaking Work: Engineering of Interactive Systems
#chi4good, CHI 2016, San Jose, CA, USA
1545
REFERENCES
1. Warren Buckland. 2008. What Does the Statistical
Style Analysis of Film Involve? A Review of Moving into
Pictures. More on Film History, Style, and Analysis.
Literary and Linguistic Computing 23, 2 (2008),
219–230. DOI:
http://dx.doi.org/10.1093/llc/fqm046
2. Kazimierz Choro´
s. 2009. Video Shot Selection and
Content-Based Scene Detection for Automatic
Classification of TV Sports News. In Internet 
Technical Development and Applications, Ewaryst
Tkacz and Adrian Kapczynski (Eds.). Advances in
Intelligent and Soft Computing, Vol. 64. Springer Berlin
Heidelberg, 73–80.
3. Dan Cosley, Shyong K Lam, Istvan Albert, Joseph A
Konstan, and John Riedl. 2003. Is seeing believing?:
how recommender system interfaces affect users’
opinions. In Proceedings of the SIGCHI conference on
Human factors in computing systems. ACM, 585–592.
4. Paolo Cremonesi, Franca Garzotto, and Roberto Turrin.
2012a. Investigating the persuasion potential of
recommender systems from a quality perspective: An
empirical study. ACM Transactions on Interactive
Intelligent Systems (TiiS) 2, 2 (2012), 11.
5. Paolo Cremonesi, Franca Garzottto, and Roberto
Turrin. 2012b. User effort vs. accuracy in rating-based
elicitation. In Proceedings of the sixth ACM conference
on Recommender systems. ACM, 27–34.
6. Marco de Gemmis, Pasquale Lops, Cataldo Musto,
Fedelucio Narducci, and Giovanni Semeraro. 2015.
Semantics-Aware Content-Based Recommender
Systems. In Recommender Systems Handbook.
Springer, 119–159.
7. Yashar Deldjoo, Mehdi Elahi, Paolo Cremonesi, Franca
Garzotto, Pietro Piazzolla, and Massimo Quadrana.
2016. Content-based Video Recommendation System
based on Stylistic Visual Features. Journal on Data
Semantics Special Issue on Recommender Systems
(2016).
8. Yashar Deldjoo, Mehdi Elahi, Massimo Quadrana,
Paolo Cremonesi, and Franca Garzotto. 2015. Toward
Effective Movie Recommendations Based on
Mise-en-Scène Film Styles. In Proceedings of the 11th
Biannual Conference on Italian SIGCHI Chapter. ACM,
162–165.
9. Chitra Dorai and Svetha Venkatesh. 2001.
Computational Media Aesthetics: Finding Meaning
Beautiful. IEEE MultiMedia 8, 4 (Oct. 2001), 10–12.
DOI:http://dx.doi.org/10.1109/93.959093
10. Mehdi Elahi, Francesco Ricci, and Neil Rubens. 2013.
Active learning strategies for rating elicitation in
collaborative filtering: a system-wide perspective. ACM
Transactions on Intelligent Systems and Technology
(TIST) 5, 1 (2013), 13.
11. J. Gibbs. 2002. Mise-en-scène: Film Style and
Interpretation. Wallflower.
https://books.google.it/books?id=j4dqY_phZlEC
12. Isabelle Guyon, Nada Matic, Vladimir Vapnik, and
others. 1996. Discovering Informative Patterns and
Data Cleaning. (1996).
13. Berthold K Horn and Brian G Schunck. 1981.
Determining optical flow. In 1981 Technical Symposium
East. International Society for Optics and Photonics,
319–331.
Late-Breaking Work: Engineering of Interactive Systems
#chi4good, CHI 2016, San Jose, CA, USA
1546
14. Weiming Hu, Nianhua Xie, Li Li, Xianglin Zeng, and
Stephen Maybank. 2011. A survey on visual
content-based video indexing and retrieval. Systems,
Man, and Cybernetics, Part C: Applications and
Reviews, IEEE Transactions on 41, 6 (2011), 797–819.
15. Pasquale Lops, Marco De Gemmis, and Giovanni
Semeraro. 2011. Content-based recommender
systems: State of the art and trends. In Recommender
systems handbook. Springer, 73–105.
16. Michael J. Pazzani and Daniel Billsus. 2007. The
Adaptive Web. Springer-Verlag, Berlin, Heidelberg,
Chapter Content-based Recommendation Systems,
325–341. http:
//dl.acm.org/citation.cfm?id=1768197.1768209
17. Zeeshan Rasheed, Yaser Sheikh, and Mubarak Shah.
2005. On the use of computable features for film
classification. Circuits and Systems for Video
Technology, IEEE Transactions on 15, 1 (2005), 52–64.
18. Francesco Ricci, Lior Rokach, and Bracha Shapira.
2011. Introduction to recommender systems handbook.
In Recommender Systems Handbook, Francesco Ricci,
Lior Rokach, Bracha Shapira, and Paul Kantor (Eds.).
Springer Verlag, 1–35.
19. Neil Rubens, Mehdi Elahi, Masashi Sugiyama, and
Dain Kaplan. 2015. Active Learning in Recommender
Systems. In Recommender Systems Handbook -
chapter 24: Recommending Active Learning. Springer
US, 809–846.
20. Patricia Valdez and Albert Mehrabian. 1994. Effects of
color on emotions. Journal of Experimental
Psychology: General 123, 4 (1994), 394.
21. Hee Lin Wang and Loong-Fah Cheong. 2006. Affective
understanding in film. Circuits and Systems for Video
Technology, IEEE Transactions on 16, 6 (June 2006),
689–704. DOI:
http://dx.doi.org/10.1109/TCSVT.2006.873781
22. Herbert Zettl. 2002. Essentials of Applied Media
Aesthetics. In Media Computing, Chitra Dorai and
Svetha Venkatesh (Eds.). The Springer International
Series in Video Computing, Vol. 4. Springer US, 11–38.
Late-Breaking Work: Engineering of Interactive Systems
#chi4good, CHI 2016, San Jose, CA, USA
1547