Content uploaded by Víctor Yeste
Author content
All content in this area was uploaded by Víctor Yeste on Sep 21, 2023
Content may be subject to copyright.
ANOVA of the rating
average of movie premieres
in the USA using Twitter
hashtag data
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Index
Introduction
Objectives
Methodology
Results
Discussion & Conclusions
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Introduction – ANOVA
ANalysis Of VAriance (ANOVA)
Analytical model that studies the statistical significance of differences, assessing the reliability
of the association between an independent variable and the dependent variables (Tabachnick,
2007).
Main advantage
All variations and their mutual influence are accounted for and they would be otherwise
impossible to estimate (Melović et al., 2020).
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Introduction – Movies
Examples of the use of ANOVA with movies
To study the influence of the MPAA’s film rating system on the viewers' attendance (Austin,
1980).
To find out associations between movie genre and movie ratings (Mehnaz Khan et al., 2020).
Other studies that use movie hashtags
To study trend analysis (Hossain & Shams, 2019).
To predict movie ratings and box office hits (Sanguinet, 2016).
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Objectives
An exploratory, quantitative and not experimental study that uses an inductive inference type
and a longitudinal follow-up.
Scope: movies that premiered in February 2022 in the USA.
Object of the study: the movies and the tweets published about them the week before, the
same week and the week immediately after each premiere date.
Main objective
To explore the use of ANOVA to look for relationships between average rating and usage,
conversation, engagement and amplification rates of official movie hashtags from movie
premieres.
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Methodology – Design
2 Tables
Movies
id
name
hashtag
mpaa
release_date
opening_grosses
opening_theaters
rating_avg
Tweets
id
status_id
movie_id
timepoint
author_id
created_at
quote_count
reply_count
retweet_count
like_count
sentiment
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Methodology – Data Collection
Movies data
From Box Office Mojo, a website part of iMDbPro, one of the most well-known cinema
websites.
Twitter data
Collected on the 21st of March 2022 using Twitter’s API with Academic Research access.
Search queries consisted of a “#’ character and each movie’s official hashtag.
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Methodology – Statistical Analysis
Main tools
Jupyter Notebook, including Python libraries such as Pandas and Seaborn, and a database
based on SQLite.
Initial step
Retrieve all data from the database into a generic data summary for each movie and their
Twitter-related data.
Create data frames to analyze it from each week's perspective.
ANOVA
Every movie and Twitter variable has been tested with ANOVA, considering all the data and
limiting the information to just one week at a time.
Dependent variable: the rating average.
Because all the independent variables are continuous, they must have been converted to
categorical using the quartiles.
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Results – Data Summary
36 movies were released in the USA in February 2022.
Opening grosses’ average of $4,882,695.
A mean of 1,240.52 theaters in the week of their premiere.
Rating average mean of all the movies: 6.14 / 10.
389,639 tweets related to the movies retrieved.
An average of 10,823.3 tweets, 1,579,464 retweets, 31,489.67 likes, 2,382.53 replies, 819.89
quotes and a sentiment analysis average of 0.39 (-1 is the absolute negative and 1 is the
absolute positive).
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Results – ANOVA (Movie data)
df sum_sq mean_sq FPR(>F)
C(opening_grosses
_cat) 33.241695 1.080565 1.366
0.280454
Residual 21 16.611905 0.791043
Opening grosses
Opening theaters
df
sum_sq
mean_sq
F
PR(>F)
C(opening_theaters
_cat)
3
9.388362
3.129454
6.279698
0.003275*
Residual
21
10.465238
0.498345
*The factor is significative at an alpha level of 0.05.
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Results – ANOVA (Twitter data)
*The factor is significative at an alpha level of 0.05.
Variable PR(>F) (all) PR(>F) (t1) PR(>F) (t2) PR(>F) (t3)
Tweet count 0.025126* 0.001204* 0.016394* 0.001921*
Quote count 0.080732 0.00018* 0.042324* 0.271232
Quote mean 0.408379 0.268275 0.616853 0.745262
Reply count 0.00811* 0.000096* 0.039745* 0.133894
Reply mean 0.184873 0.11641 0.773312 0.459757
Retweet count 0.122793 0.036248* 0.142973 0.008187*
Retweet mean 0.110123 0.04399* 0.460968 0.124569
Like count 0.06142 0.000257* 0.112427 0.128893
Like mean 0.846622 0.007058* 0.072036 0.016488*
Sentiment mean 0.000504* 0.376202 0.00406* 0.064979
all = all data | t1 = week before | t2 = same week | t3 = week after
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Discussion & Conclusions (1)
All data considered
The variable that is more surely linked to average rating is the sentiment mean, only followed
by the reply count and the tweet count.
The three of them are measures of the online conversation about the movies.
The week immediately before the movie premiere (t1)
It has a more reliable relationship to the rating average.
All count variables, and the means of retweet, like and sentiment analysis are related to the
rating average.
All the characteristics of the Twitter conversation about the movie before its release will affect
the rating that users will post after they have watched it.
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Discussion & Conclusions (2)
The week before (t1) vs the week of the premiere (t2)
The latter don’t include retweets and likes, but it includes the sentiment mean.
Only conversation on itself about movies is linked to the rating average, not the amplification
and appreciation of the tweets involved.
The week after the premiere (t3)
Only shows relationships with tweet count, retweet count and like mean.
It is more important the amplification and the overall appreciation of the movie with time.
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Discussion & Conclusions (3)
Opening theaters is related to the rating average, but not opening grosses
If the number of places where the movie can be seen affects the rating users will post, but not
how many users have paid to watch it, maybe it is more related to visibility and fame than the
actual number of movie viewers.
ANOVA
An interesting methodology to find significative differences in movies and Twitter data, and to
study the relationships’ reliability.
This study means to be a cornerstone to broader and longer studies that could confirm these
relationships, prove new ones and consider other new variables.
Other possibilities: trend analysis, other social media platforms...
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Any questions?
Thank you very much!