Conference PaperPDF Available

Exploring Computer Vision for Film Analysis: A Case Study for Five Canonical Movies

Authors:

Abstract and Figures

We present an exploratory study in the context of digital film analysis inspecting and comparing five canonical movies by applying methods of computer vision. We extract one frame per second of each movie which we regard as our sample. As computer vision methods we explore image-based object detection, emotion recognition, gender and age detection with state-of-the-art models. We were able to identify significant differences between the movies for all methods. We present our results and discuss the limitations and benefits of each method. We close by formulating future research questions we plan to answer by applying and optimizing the methods.
Content may be subject to copyright.
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Exploring Computer Vision for Film Analysis:
A Case Study for Five Canonical Movies
Thomas Schmidt, Alina El-Keilany, Johannes Eger &
Sarah Kurek
Media Informatics Group, University of Regensburg, Germany
{firstname.lastname@ur.de}
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Keywords: film studies, film analysis, computer vision, object detection, emotion recognition, gender,
age
Abstract.
We present an exploratory study in the context of digital film analysis inspecting and comparing five
canonical movies by applying methods of computer vision. We extract one frame per second of each
movie which we regard as our sample. As computer vision methods we explore image-based object
detection, emotion recognition, gender and age detection with state-of-the-art models. We were able to
identify significant differences between the movies for all methods. We present our results and discuss
the limitations and benefits of each method. We close by formulating future research questions we plan
to answer by applying and optimizing the methods.
1. Introduction
Quantitative methods have a long tradition in film analysis going back to the predigital era (Salt
1974; Vonderau 2020). Nowadays, multiple projects explore movies via computational
methods to investigate colors (Burghardt et al. 2016; 2018; Flueckiger 2017; Kurzhals et al.
2016; Masson et al. 2020; Pause / Walkowski 2018), shot lengths (Baxter et al. 2017; DeLong
2015) or annotation possibilities (Halter et al. 2019; Kuhn et al. 2015; Schmidt / Halbhuber
2020; Schmidt et al. 2020a). Recent research has also led to the definition of the term Distant
Viewing (Arnold / Tilton 2019) to describe large-scale digital movie analysis. A lot of the
current research is focused on the analysis of text via scripts or subtitles (Byszuk 2020; Holobut
et al. 2016; Holubut / Rybicki 2020; Hoyt et al. 2014). However, developments in computer
vision have led to novel methods for the image channel of movies and are already applied in
computer science to develop recommender systems (Deldioo et al. 2016; Wei et al. 2004) but
Cite as:
Schmidt, T., El-Keilany, A., Eger, J. & Kurek, S. (2021). Exploring Computer Vision for
Film Analysis: A Case Study for Five Canonical Movies. In 2nd International Conference
of the European Association for Digital Humanities (EADH 2021). Krasnoyarsk, Russia.
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
2
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
also in Digital Humanities (DH) to analyze movies (Howanitz et al. 2019; Pustu-Iren et al.
2020; Zaharieva et al. 2012) and other visual media (Schmidt et al. 2020e). We argue that these
methods are beneficial for digital film studies and give new perspectives.
We present an exploratory study for the methods: Object detection, emotion recognition,
gender- and age-prediction. We apply state-of-the-art models on a subset of frames of five
different movies of varied decades and genres. We apply the exploratory research approach
defined by Wulff (1998) for traditional film analysis in this study for computational approaches.
Our goals are (1) to inspect the benefits and problems of the methods, (2) explore if the methods
uncover specific characteristics of the movies and (3) what research questions seem promising
to follow in further large-scale studies.
2. Material
We limited the analysis on five movies. Table 1 presents the movies and metadata. For all
movies except Avengers, we use a digitally restored version. All movies have a 720x576
resolution, 25 frames per second and 32 bits per sample. We focus on canonical work and
Hollywood productions.
Title
Release
Date
Running time
in seconds
Important Attributes
Metropolis
1927
8,636
Silent film, black and white
Wizard of Oz
1939
5,856
Color
Some Like It Hot
1959
6,990
Black and white
Breakfast at
Tiffany’s
1961
6,406
Color
Marvels The
Avengers
2012
8,224
Color, including CGI effects
Table 1. Movies and metadata.
3. Methods
All analysis was performed in Python 3. We extracted the frames of every movie since all of
the applied methods are image-based. However, we take one frame per second of a movie and
regard this as the sample of a movie. We decided to employ this approach because using all
frames makes the data processing very performance/resource-intensive and we argue that one
frame per second offers sufficient information for our first explorations.
To perform the object detection, we use Detectron2 (Wu et al. 2019) which offers state-of-the-
art object detection models by Facebook AI Research
1
. We use a pretrained masked RCCN-
model trained on the well-known COCO-Dataset (Lin et al. 2015), which can predict 80 object
classes including vehicles, animals, and sports objects. Applying this prediction model on an
image, we receive the number of predicted objects, the locations, and the prediction confidence
(0-100%). As threshold for the detection, we select 50% which is usually very low but fits our
exploratory approach.
1
More Information: https://github.com/facebookresearch/detectron2
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
3
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Emotion recognition is a sub-field of affective computing (cf. Halbhuber et al. 2019; Hartl et
al. 2019; Ortloff et al. 2019; Schmidt et al. 2020c) and is often applied in DH to predict
sentiment and emotions from written text (Moßburger et al. 2020; Schmidt / Burghardt 2018;
Schmidt, 2019; Schmidt et al. 2019a; Schmidt et al. 2020b). We focus on the image channel of
movies and for the emotion prediction we use the Python module FER
2
(Goodfellow et al.
2013). The module first performs face detection via a MTCNN Face Detector
3
(Zhang et al.
2016) and then predicts the emotion via a convolutional neural network (CNN) trained on over
35,000 images. The model predicts the seven classes anger, disgust, fear, happiness, sadness,
surprise and neutral on a scale from 0 to 1. All values sum up to 1 for one face.
We perform gender- and age-prediction via the module py-agender
4
which is also a CNN
trained on the IMDB-Wiki dataset (Rothe et al. 2018) consisting of over 500,000 faces. The
model achieves a mean average error of 4.08 on standardized datasets (Agustsson et al. 2017).
For the gender prediction the model produces a value between 0 and 1, with values below 0.5
being male and above being female faces.
4. Results
4.1 Object detection
We summarize the results of the object detection by looking at the 10 most frequent objects
overall and per movie. Table 2 and 3 show the objects starting with the most frequent per unit.
Freq is the absolute number of detected instances while % is the percentage of frames at least
one of the specific objects was detected.
Metropolis
Wizard of Oz
Some Like It Hot
Object
freq
%
Object
freq
%
Object
freq
%
person
26,844
73.9
person
17,587
87.0
person
23,027
95.0
tie
1,704
14.8
dog
1,049
17.0
tie
3,280
27.6
book
1,616
2.6
handbag
842
13.1
chair
1,138
12.1
chair
382
3.5
chair
722
10.9
wine glass
708
6.0
clock
366
3.0
potted
plant
566
7.2
handbag
417
5.6
umbrella
265
1.2
tie
541
7.5
cup
405
4.9
horse
124
1.0
vase
354
5.1
bottle
351
3.7
dog
113
1.3
horse
344
4.8
cell phone
348
4.4
dining table
108
1.2
cat
306
5.0
suitcase
330
3.6
cup
103
1.0
bottle
226
2.9
vase
286
3.2
Table 2. Detected objects per movie and overall (part 1).
2
More Information: https://pypi.org/project/fer/
3
More Information: https://github.com/ipazc/mtcnn
4
More Information: https://github.com/yu4u/age-gender-estimation
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
4
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Breakfast At Tiffany’s
Avengers
Overall
Object
freq
%
Object
freq
%
Object
freq
%
person
17,357
96.2
person
15,556
77.7
person
100,371
84.9
tie
4,057
44.4
chair
1,476
11.9
tie
101,84
19.3
book
2,248
5.4
car
917
6.1
chair
4,985
10.1
chair
1,217
13.6
tie
602
5.8
book
4,237
2.3
wine glass
731
7.6
tv
476
4.6
handbag
1,876
4.7
car
720
4.6
bottle
286
1.9
car
1,809
2.5
bottle
708
6.9
airplane
208
1.6
wine glass
1,662
3.0
cup
536
6.9
cell phone
205
2.3
bottle
1,653
3.0
dining table
413
5.5
book
189
1.3
dog
1,460
3.8
handbag
367
4.7
backpack
184
2.0
cup
1,320
3.1
Table 3. Detected objects per movie and overall (part 2).
Persons are the most frequently detected “objects” (figure 1). Other frequent objects are mostly
furniture (book, chair), clothes (tie, handbag) and drinking objects (cup, wine glass).
Figure 1. Frame with the most detected persons (Metropolis).
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
5
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Comparing the movies, we identified that movies below 90% of frames with persons are indeed
the more action-oriented movies (Avengers, Metropolis) or include fantasy/animal-like
characters (Wizard of Oz). Many modern objects (e.g cell phones and airplanes) are more
frequent in the contemporary movie Avengers (figure 2). One outlier we identified is the clock-
object in Metropolis, which is not a frequent object in the other movies but represents a well-
studied reoccurring motif of this specific movie (figure 3; cf. Cowan 2007).
Figure 2. Detected airplanes in Avengers.
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
6
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Figure 3. Clocks as a reoccurring motif in Metropolis.
While we did not perform a systematic evaluation, but we identified a lot of mistakes in the
prediction e.g. guns were predicted as handbags or the character “Cowardly Lion” in Wizard of
Oz was oftentimes predicted as dog (figure 4).
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
7
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Figure 4. The “Cowardly Lion” in Wizard of Oz detected as „dog“.
Nevertheless, we see potential in the method of object detection to explore specifics of the mise-
en-scène as well as motif-like reoccurring objects in movies (Zaharieva / Breiteneder 2012).
Furthermore, as object classes of the COCO dataset are not necessarily fitting for movies, we
recommend exploring the possibilities of post-training via Detectron to analyze objects that are
not part of the pretrained models.
4.2 Emotion recognition
For the emotion recognition we decided to create an average for a frame if multiple faces are
detected. If no face is detected, we mark the frame with missing values. Table 4 summarizes
the results. Maximums and minimums are marked in bold.
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
8
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Metropolis
Wizard
of Oz
Some
Like it
Hot
Breakfast
At
Tiffany’s
Avengers
Overall
Angry
M
0.22
0.23
0.17
0.13
0.21
0.19
Max
0.98
0.97
0.95
0.85
0.93
0.98
Sd
0.17
0.17
0.13
0.12
0.15
0.16
Disgust
M
0.00
0.00
0.00
0.00
0.00
0.00
Max
0.12
0.18
0.18
0.19
0.22
0.22
Sd
0.01
0.01
0.01
0.01
0.01
0.01
Fear
M
0.16
0.11
0.11
0.08
0.12
0.11
Max
0.88
0.94
0.73
0.8
0.65
0.94
Sd
0.13
0.09
0.09
0.08
0.09
0.1
Happy
M
0.1
0.13
0.13
0.09
0.07
0.11
Max
1.0
1.0
1.0
1.0
1.0
1.0
Sd
0.16
0.17
0.18
0.18
0.14
0.17
Neutral
M
0.22
0.16
0.23
0.37
0.23
0.24
Max
0.96
0.96
0.9
0.99
0.93
0.99
Sd
0.19
0.15
0.18
0.25
0.2
0.21
Sad
M
0.29
0.32
0.27
0.28
0.28
0.29
Max
0.93
0.95
0.88
0.91
0.9
0.95
Sd
0.18
0.19
0.17
0.18
0.17
0.18
Surprise
M
0.04
0.04
0.09
0.04
0.08
0.06
Max
0.87
0.93
0.95
0.83
0.95
0.95
Sd
0.1
0.09
0.14
0.08
0.13
0.11
Table 4. Emotion values per movie and overall (M=mean, Max=maximum, Sd=standard
deviation).
Overall, highest averages for emotions are the neutral (M=0.24) and the sad class (M=0.29).
Surprise (M=0.11) and disgust (M=0.00) are rather rare among the movies. The two comedies
in the movie corpus (Wizard of Oz, Some Like it Hot) do indeed have the highest happy-
averages (M=0.13) (figure 5).
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
9
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Figure 5. Frame with maximum happy value (Some Like it Hot).
However, the results are rather inconsistent since Wizard of Oz has also the highest sad- and
angry-averages and therefore is the movie with generally the strongest emotional expressions.
Breakfast at Tiffany’s on the contrast is the most neutral movie (M=0.37; figure 6).
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
10
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Figure 6. Frame with highest neutrality value in the corpus (Breakfast at Tiffany’s).
Additionally, we performed a Welch-ANOVA to investigate if the movies differ to each other
significantly (all requirements for the test are met according to Field (2009)). Indeed, we do
find significant differences (p<0.05) for all emotion categories but rather small effects
according to Cohen (1988) defining η²<0.01 as weak, <0.06 as moderate and <.14 as strong
effect. We report the p-, F- and η²-value (table 5).
p-value
F-value
η²
angry
<0.001
302.11
0.06
disgust
<0.001
46.38
0.02
fear
<0.001
124.75
0.03
happy
<0.001
75.00
0.02
neutral
<0.001
431.98
0.12
sad
<0.001
32.21
0.01
surprise
<0.001
112.78
0.03
Table 5. Results of Welch-ANOVA-Tests for all emotion categories
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
11
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
The strongest effect can be seen for neutral. Performing post-hoc tests and inspecting a box-
plots graph (figure 7) we identified Breakfast at Tiffany’s as interesting outlier. This might be
due to the fact that the main characters of the movie try to stay rather “unaffected” up until the
ending of the movie while Wizard of Oz, as a musical, consist of strong emotional outbursts.
Figure 7. Box-plots graph for the emotion class neutral.
4.3 Gender- and age-recognition
Table 6 illustrates the descriptive statistics for the gender- and age-detection.
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
12
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Metropolis
Wizard
of Oz
Some
Like it
Hot
Breakfast
At
Tiffany’s
Avengers
Overall
Age
M
37.28
35.73
40.55
40.26
39.31
38.67
Min
23.04
13.65
27.35
26.16
22.6
13.65
Max
58.64
71.28
65.9
66.27
64.82
71.28
Sd
3.81
7.67
5.77
5.88
4.96
6.01
Gender
M
0.35
0.49
0.41
0.38
0.33
0.39
Min
0.01
0.02
0.02
0.01
0.01
0.01
Max
0.91
0.96
0.94
0.97
0.98
0.98
Sd
0.17
0.2
0.22
0.26
0.25
0.23
Table 6. Descriptive statistics for age and average gender.
The average age is for most movies is around 40 which is a rather consistent over-estimation
since most leading actors in the selected movies are around 30. Performing a Welch-ANOVA
shows that the difference between the movies is significant (p<0.001, F=336.07, η²=0.09) with
a moderate effect. The strongest outlier movie, as shown with post hoc tests, is Wizard of Oz
with a child/teenager as leading actor that gets correctly detected as around 14-16 years old
(figure 8).
Figure 8. Lowest age in the corpus (Wizard of Oz).
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
13
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
An average score for gender below 0.5 points to more male detections and it is striking that all
movies point below 0.5, thus a more frequent representation of males which is in line with the
reality of the movies. There is a significant difference considering gender but with a smaller
effect compared to age (p<0.001, F=251.36, η²=0.06) and with the strongest differences
concerning Wizard of Oz. The differences become apparent regarding the distribution of
gender-classes (table 7). We assigned every frame with male if average gender > 0.6 and female
if <0.4. We decided to include a class androgynous for in-between-values pointing to either
multiple genders on one screen or uncertainty by the model.
Metropolis
Wizard
of Oz
Some
Like it
Hot
Breakfast
at
Tiffany’s
Avengers
androgynous
# frames
689
1,024
838
638
395
% frames
7.98
17.49
11.99
9.96
4.8
% frames with faces
22.37
40.54
28.43
21.35
18.3
female
# frames
290
694
623
669
314
% frames
3.36
11.85
8.91
10.44
3.82
% frames with faces
9.42
27.47
21.14
22.39
14.6
male
# frames
2,101
808
1,486
1,681
1,441
% frames
24.33
13.8
21.26
26.24
17.52
% frames with faces
68.21
31.99
21.26
56.26
67.02
Table 7. Frequency distributions of gender classes.
Wizard of Oz has the most frames classified as androgynous. In general, this means that female
and male characters are equally on the frame but in this case the classification is due to the high
number of human-like fantasy creatures for which the model is unsure to pick a gender (figure
9).
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
14
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Figure 9. An “androgynous” face (Wizard of Oz).
5 Discussion
While this study was rather small and exploratory in the approach, we did gain important first insights
for our future research. Overall, we find it promising that we were able to find significant results, even
for this small set of movies. For object detection we see the most potential in adjusting pretrained models
to objects that are of interest for a specific research question. We see a lot of potential for interesting
diachronic but also genre-based emotion and gender analysis with larger corpora. For this case study,
we did not find striking differences of method performance considering technical differences between
the movies. We are planning systematic evaluations on a cross section of movies of different decades to
get a better understanding on the performance of the methods before we move on to explore more
concrete research questions. Modern cultural artefacts have shown to be of interest for gender studies in
the DH context (Schmidt et al. 2020d). We see potential concerning research on the intercourse of gender
and film studies. We plan to explore the relationship of gender representations with expressed emotions
throughout the time to explore how the representation of gender roles developed. Furthermore, we want
to also explore multimodal approaches combining the various modality channels of movies (similar to
Schmidt et al. 2019b).
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
15
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
References
Agustsson, E., Timofte, R., Escalera, S., Baro, X., Guyon, I., & Rothe, R. (2017, May). Apparent and real age
estimation in still images with deep residual regressors on appa-real database. In 2017 12th IEEE International
Conference on Automatic Face & Gesture Recognition (FG 2017) (pp. 87-94). IEEE.
https://doi.org/10.1109/FG.2017.20
Arnold, T., & Tilton, L. (2019). Distant viewing: Analyzing large visual corpora. Digital Scholarship in the
Humanities. https://doi.org/10.1093/digitalsh/fqz013
Baxter, M., Khitrova, D., & Tsivian, Y. (2017). Exploring cutting structure in film, with applications to the films of D.
W. Griffith, Mack Sennett, and Charlie Chaplin. Digital Scholarship in the Humanities, 32(1), 116.
https://doi.org/10.1093/llc/fqv035
Burghardt, M., Kao, M., & Walkowski, N. O. (2018). Scalable MovieBarcodesAn Exploratory Interface for the
Analysis of Movies. In IEEE VIS Workshop on Visualization for the Digital Humanities (Vol. 2).
Burghardt, M., Kao, M., Wolff, C. (2016). Beyond Shot Lengths Using Language Data and Color Information as
Additional Parameters for Quantitative Movie Analysis. In Digital Humanities 2016: Conference Abstracts.
Jagiellonian University & Pedagogical University, Kraków, pp. 753-755.
Byszuk, J. (2020). The Voices of Doctor Who How Stylometry Can be Useful in Revealing New Information About
TV Series. Digital Humanities Quarterly, 014(4).
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Academic press.
Cowan, Michael. (2007). The Heart Machine: „Rhythm“ and Body in Weimar Film and Fritz Lang’s Metropolis.
Modernism/Modernity, 14(2), 225248. https://doi.org/10.1353/mod.2007.0030
Deldjoo, Y., Elahi, M., Cremonesi, P., Garzotto, F., & Piazzolla, P. (2016, May). Recommending movies based on
mise-en-scene design. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in
Computing Systems (pp. 1540-1547). https://doi.org/10.1145/2851581.2892551
DeLong, J. (2015). Horseshoes, handgrenades, and model fitting: The lognormal distribution is a pretty good model for
shot-length distribution of Hollywood films. Literary and Linguistic Computing, 30(1), 129136.
https://doi.org/10.1093/llc/fqt030
Field, A. P. (2009). Discovering statistics using SPSS: And sex, drugs and rock „n“ roll (3rd ed). SAGE Publications.
Flueckiger, B. (2017). A Digital Humanities Approach to Film Colors. The Moving Image: The Journal of the
Association of Moving Image Archivists, 17(2), 7194. JSTOR. https://doi.org/10.5749/movingimage.17.2.0071
Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D.,
Lee, D.-H., Zhou, Y., Ramaiah, C., Feng, F., Li, R., Wang, X., Athanasakis, D., Shawe-Taylor, J., Milakov, M., Park,
J., … Bengio, Y. (2013). Challenges in Representation Learning: A report on three machine learning contests.
arXiv:1307.0414 [cs, stat]. http://arxiv.org/abs/1307.0414
Halbhuber, D., Fehle, J., Kalus, A., Seitz, K., Kocur, M., Schmidt, T., & Wolff, C. (2019). The Mood Game-How to
Use the Player's Affective State in a Shoot'em up Avoiding Frustration and Boredom. In Proceedings of Mensch und
Computer 2019 (pp. 867-870). https://doi.org/10.1145/3340764.3345369
Halter, G., Ballester‐Ripoll, R., Flueckiger, B., & Pajarola, R. (2019). VIAN: A Visual Annotation Tool for Film
Analysis. Computer Graphics Forum, 38(3), 119129. https://doi.org/10.1111/cgf.13676
Hartl, P., Fischer, T., Hilzenthaler, A., Kocur, M., & Schmidt, T. (2019). AudienceAR-Utilising Augmented Reality
and Emotion Tracking to Address Fear of Speech. In Proceedings of Mensch und Computer 2019 (pp. 913-916).
https://doi.org/10.1145/3340764.3345380
Holobut, A., Rybicki, J., & Wozniak, M. (2016). Stylometry on the Silver Screen: Authorial and Translatorial Signals
in Film Dialogue. In DH (pp. 561-565).
Hołobut, A., & Rybicki, J. (2020). The Stylometry of Film Dialogue: Pros and Pitfalls. Digital Humanities Quarterly,
014(4).
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
16
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Howanitz, G., Bermeitinger, B., Radisch, E., Sebastian G., Rehbein, M. & Handschuh, S. (2019). Deep Watching -
Towards New Methods of Analyzing Visual Media in Cultural Studies. In Book of Abstracts of the International
Digital Humanities Conference (DH 2019).
Hoyt, E., Ponto, K., & Roy, C. (2014). Visualizing and Analyzing the Hollywood Screenplay with ScripThreads.
Digital Humanities Quarterly, 008(4).
Kuhn, V., Craig, A., Simeone, M., Satheesan, S. P., & Marini, L. (2015, July). The VAT: enhanced video analysis. In
Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure (pp.
1-4). https://doi.org/10.1145/2792745.2792756
Kurzhals, K., John, M., Heimerl, F., Kuznecov, P., & Weiskopf, D. (2016). Visual Movie Analytics. IEEE
Transactions on Multimedia, 18(11), 21492160. https://doi.org/10.1109/TMM.2016.2614184
Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., &
Dollár, P. (2015). Microsoft COCO: Common Objects in Context. arXiv:1405.0312 [cs].
http://arxiv.org/abs/1405.0312
Masson, E., Olesen, C. G., Noord, N. van, & Fossati, G. (2020). Exploring Digitised Moving Image Collections: The
SEMIA Project, Visual Analysis and the Turn to Abstraction. Digital Humanities Quarterly, 014(4).
Moßburger, L., Wende, F., Brinkmann, K., & Schmidt, T. (2020, December). Exploring Online Depression Forums via
Text Mining: A Comparison of Reddit and a Curated Online Forum. In Proceedings of the Fifth Social Media Mining
for Health Applications Workshop & Shared Task (pp. 70-81).
Ortloff, A. M., Güntner, L., Windl, M., Schmidt, T., Kocur, M., & Wolff, C. (2019). Sentibooks: Enhancing
audiobooks via affective computing and smart light bulbs. In Proceedings of Mensch und Computer 2019 (pp. 863-
866). https://doi.org/10.1145/3340764.3345368
Pause, J., & Walkowski, N. O. (2018). Everything is illuminated. Zur numerischen Analyse von Farbigkeit in Filmen.
Zeitschrift für digitale Geisteswissenschaften.
Pustu-Iren, K., Sittel, J., Mauer, R., Bulgakowa, O., & Ewerth, R. (2020). Automated Visual Content Analysis for Film
Studies: Current Status and Challenges. Digital Humanities Quarterly, 014(4).
Rothe, R., Timofte, R., & Van Gool, L. (2018). Deep Expectation of Real and Apparent Age from a Single Image
Without Facial Landmarks. International Journal of Computer Vision, 126(2), 144157.
https://doi.org/10.1007/s11263-016-0940-3
Salt, B. (1974). Statistical style analysis of motion pictures. Film Quarterly, 28(1), 13-22.
Schmidt, T. (2019). Distant Reading Sentiments and Emotions in Historic German Plays. In Abstract Booklet,
DH_Budapest_2019 (pp. 57-60). Budapest, Hungary. https://doi.org/10.5283/epub.43592
Schmidt, T. & Burghardt, M. (2018a). An Evaluation of Lexicon-based Sentiment Analysis Techniques for the Plays
of Gotthold Ephraim Lessing. In Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics
for Cultural Heritage, Social Sciences, Humanities and Literature (pp. 139-149). Santa Fe, New Mexico: Association
for Computational Linguistics.
Schmidt, T. & Halbhuber, D. (2020). Live Sentiment Annotation of Movies via Arduino and a Slider. In Digital
Humanities in the Nordic Countries 5th Conference (DHN 2020). Late Breaking Poster.
Schmidt, T., Burghardt, M., Dennerlein, K. & Wolff, C. (2019a). Sentiment Annotation in Lessing’s Plays: Towards a
Language Resource for Sentiment Analysis on German Literary Texts. In 2nd Conference on Language, Data and
Knowledge (LDK 2019). LDK Posters. Leipzig, Germany.
Schmidt, T., Burghardt, M. & Wolff, C. (2019b). Toward Multimodal Sentiment Analysis of Historic Plays: A Case
Study with Text and Audio for Lessing’s Emilia Galotti. In Proceedings of the Digital Humanities in the Nordic
Countries 4th Conference (DHN 2019) (pp. 405-414). Copenhagen, Denmark.
Schmidt, T., Engl, I., Halbhuber, D., & Wolff, C. (2020a). Comparing Live Sentiment Annotation of Movies via
Arduino and a Slider with Textual Annotation of Subtitles. In Post-Proceedings of the 5th Conference Digital
Humanities in the Nordic Countries (DHN 2020) (pp. 212-223). CEUR Workshop Proceedings.
Schmidt, T., Kaindl, F. & Wolff, C. (2020b). Distant Reading of Religious Online Communities: A Case Study for
Three Religious Forums on Reddit. In Proceedings of the Digital Humanities in the Nordic Countries 5th Conference
(DHN 2020) (pp. 157-172). Riga, Latvia.
Schmidt et al. (2021). Exploring Computer Vision for Film Analysis
17
2nd International Conference of the European Association for Digital Humanities (EADH 2021)
Krasnoyarsk, Russia
September 21-25, 2021
Schmidt, T., Schlindwein, M., Lichtner, K., & Wolff, C. (2020c). Investigating the Relationship Between Emotion
Recognition Software and Usability Metrics. i-com, 19(2), 139-151. https://doi.org/10.1515/icom-2020-0009
Schmidt, T., Engl, I., Herzog. J. & Judisch, L. (2020d). Towards an Analysis of Gender in Video Game Culture:
Exploring Gender-specific Vocabulary in Video Game Magazines. In Proceedings of the Digital Humanities in the
Nordic Countries 5th Conference (DHN 2020) (pp. 333-341). Riga, Latvia.
Schmidt, T., Mosiienko, A., Faber, R., Herzog, J., & Wolff, C. (2020e). Utilizing HTML‐analysis and computer vision
on a corpus of website screenshots to investigate design developments on the web. In Proceedings of the Association
for Information Science and Technology, 57(1), e392. https://doi.org/10.1002/pra2.392
Vonderau, P. (2020). Quantitative Werkzeuge. In M. Hagener & V. Pantenburg (Hrsg.), Handbuch Filmanalyse (pp.
399413). Springer Fachmedien. https://doi.org/10.1007/978-3-658-13339-9_28
Wei, C. Y., Dimitrova, N., & Chang, S. F. (2004). Color-mood analysis of films based on syntactic and psychological
models. In 2004 IEEE international conference on multimedia and expo (ICME)(IEEE Cat. No. 04TH8763) (Vol. 2,
pp. 831-834). IEEE.
Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y., & Girshick, R. (2019). Detectron2.
https://github.com/facebookresearch/detectron2
Wulff, H. J. (1998). Semiotik der Filmanalyse: Ein Beitrag zur Methodologie und Kritik filmischer Werkanalyse.
Kodikas/Code, 21(1-2), 19-36.
Zaharieva, M., & Breiteneder, C. (2012). Recurring Element Detection in Movies. In K. Schoeffmann, B. Merialdo, A.
G. Hauptmann, C.-W. Ngo, Y. Andreopoulos, & C. Breiteneder (Hrsg.), Advances in Multimedia Modeling (S. 222
232). Springer. https://doi.org/10.1007/978-3-642-27355-1_22
Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint Face Detection and Alignment Using Multitask Cascaded
Convolutional Networks. IEEE Signal Processing Letters, 23(10), 14991503.
https://doi.org/10.1109/LSP.2016.2603342
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
In this contribution, we present the first version of a novel approach and prototype to perform live sentiment annotation of movies while watching them. Our prototype consists of an Arduino microcontroller and a potentiometer, which is paired with a slider. We motivate the need for this approach by arguing that the presentation of multimedia content of movies as well as performing the annotation live during the viewing of the movie is beneficial for the annotation process and more intuitive for the viewer/annotator. After outlining the motivation and the technical setup of our system, we report on which studies we plan to validate the benefits of our system.
Conference Paper
Full-text available
We present preliminary results of a project examining the role and usage of gender specific vocabulary in a corpus of video game magazines. The corpus consists of three popular video game magazines with 634 issues from the 1980s until 2011 and was gathered via OCR-scans of the platform archive .org. We report on the distribution and progression of gender-specific words by using word lists of the LIWC for the categories "male" and "female". We can indeed show that words of the type male are considerably more frequent than words of the type female, with a slight increase of female words during 2006-2010. This is in line with the overall development of gaming culture throughout these years and previous research in the humanities. Furthermore, we analyzed how the usage of negatively connoted words for female depictions (e.g. chick, slut) has evolved and identified a constant increase throughout the years reaching the climax around 2001-2005, a timespan that coincides with the release and popularity of games encompassing rather sexist concepts. We discuss the limitations of our explorations and report on plans to further investigate the role of gender in gaming culture.
Conference Paper
Full-text available
We present results of a project examining the application of computational text analysis and distant reading in the context of comparative religious studies, sociology, and online communication. As a source for our corpus, we use the popular platform Reddit and three of the largest religious subreddits: the subreddit Christianity, Islam and Occult. We have acquired all posts along with metadata for an entire year resulting in over 700,000 comments and around 50 million tokens. We explore the corpus and compare the different online communities via measures like word frequencies, bigrams, collocations and sentiment and emotion analysis to analyze if there are differences in the language used, the topics that are talked about and the sentiments and emotions expressed. Furthermore, we explore approaches to diachronic analysis and visualization. We conclude with a discussion about the limitations but also the benefits of distant reading methods in religious studies.
Conference Paper
Full-text available
We present first results of an ongoing research project on sentiment annotation of historical plays by German playwright G. E. Lessing (1729-1781). For a subset of speeches from six of his most famous plays, we gathered sentiment annotations by two independent annotators for each play. The annotators were nine students from a Master's program of German Literature. Overall, we gathered annotations for 1,183 speeches. We report sentiment distributions and agreement metrics and put the results in the context of current research. A preliminary version of the annotated corpus of speeches is publicly available online and can be used for further investigations, evaluations and computational sentiment analysis approaches.
Conference Paper
Full-text available
We present a study employing various techniques of text mining to explore and compare two different online forums focusing on depression: (1) the subreddit r/depression (over 60 million tokens), a large, open social media platform and (2) Beyond Blue (almost 5 million tokens), a professionally curated and moderated depression forum from Australia. We are interested in how the language and the content on these platforms differ from each other. We scrape both forums for a specific period. Next to general methods of computational text analysis, we focus on sentiment analysis, topic modeling and the distribution of word categories to analyze these forums. Our results indicate that Beyond Blue is generally more positive and that the users are more supportive to each other. Topic modeling shows that Beyond Blue's users talk more about adult topics like finance and work while topics shaped by school or college terms are more prevalent on r/depression. Based on our findings we hypothesize that the professional curation and moderation of a depression forum is beneficial for the discussion in it.
Conference Paper
Full-text available
Sentiment and emotions are important parts of the analysis and interpretation of literary texts, especially of plays. Therefore, the computational method to analyze sentiments and emotions in written text, sentiment analysis, has found its way into computational literary studies. However, recent research in computational literary studies is focused on annotation and the evaluation of different approaches. We present a tool to investigate the possibilities of Distant Reading the sentiments and emotions expressed in the plays of Lessing. Researchers can explore polarity and emotion distributions and progression on concerning structural and character based levels but also character relations. We present various use cases to highlight the visualizations and functionalities of our tool and discuss how Distant Reading of sentiments can add value to research in literary studies.
Article
Full-text available
We present preliminary results of a project investigating the design development of popular websites between 1996 and 2020 via HTML analysis and basic computer vision methods. We acquired a corpus of website screenshots of the current top 47 popular websites. We crawled a snapshot of every month of these websites via the wayback machine of the Internet Archive platform since the time snapshots are stored to gather 7,953 screenshots and HTML pages. We report upon quantitative analysis results concerning HTML elements, color distributions and visual complexity throughout the years.
Article
Full-text available
Due to progress in affective computing, various forms of general purpose sentiment/emotion recognition software have become available. However, the application of such tools in usability engineering (UE) for measuring the emotional state of participants is rarely employed. We investigate if the application of sentiment/emotion recognition software is beneficial for gathering objective and intuitive data that can predict usability similar to traditional usability metrics. We present the results of a UE project examining this question for the three modalities text, speech and face. We perform a large scale usability test (N = 125) with a counterbalanced within-subject design with two websites of varying usability. We have identified a weak but significant correlation between text-based sentiment analysis on the text acquired via thinking aloud and SUS scores as well as a weak positive correlation between the proportion of neutrality in users’ voice and SUS scores. However, for the majority of the output of emotion recognition software, we could not find any significant results. Emotion metrics could not be used to successfully differentiate between two websites of varying usability. Regression models, either unimodal or multimodal could not predict usability metrics. We discuss reasons for these results and how to continue research with more sophisticated methods.
Conference Paper
Full-text available
With Augmented Reality (AR) we can enhance the reality by computer-generated information about real entities projected in the user's field of view. Hence, the user's perception of a real environment is altered by adding (or subtracting) information by means of digital augmentations. In this demo paper we present an application where we utilise AR technology to show visual information about the audience's mood in a scenario where the user is giving a presentation. In everyday life we have to talk to and in front of people as a fundamental aspect of human communication. However, this situation poses a major challenge for many people and may even go so far as to lead to fear and and avoidance behaviour. Based on findings in previous work about fear of speech, a major cause of anxiety is that we do not know how the audience judges us. To eliminate this feeling of uncertainty, we created an AR solution to support the speaker while giving a speech by tracking the audience's current mood and displaying this information in real time to the speaker's view: AudienceAR. By doing so we hypothesise to reduce the speaker's tension before and during presentation. Furthermore, we implemented a small web interface to analyse the presentation based on the audience mood after the speech is given. Effects will be tested in future work.
Chapter
Der Text widmet sich der Funktion digitaler Methoden in der filmwissenschaftlichen Analyse von Filmen. Was wird unter „digital tools“ verstanden, seit wann gibt es und wie verbreitet sind sie? Mithilfe eines historischen Überblicks zeigt das Kapitel zunächst, aus welchen Gründen Ansätze der computergestützten Filmanalyse im Zeitraum 1985–2005 in der Filmwissenschaft entwickelt worden sind. In einem zweiten Schritt wird vor dem Hintergrund dieser Historisierung die Gemengelage der gegenwärtig unter dem Schlagwort Big Data versammelten Phänomene kritisch in den Blick genommen. Dabei wird deutlich, dass Verfahren zur Quantifizierung der Filmanalyse heute paradoxerweise eine geringere Rolle spielen, als sie dies noch vor zwanzig Jahren taten.