PresentationPDF Available

ANOVA of the rating average of movie premieres in the USA using Twitter hashtag data

Authors:

Abstract

This paper explores the possibility that the statistical methodology ANOVA (ANalysis Of VAriance) can detect interesting relationships between the characteristics of tweets and the average rating of a movie. The main objective of ANOVA test is to look for statistically significant differences in a variable from the information provided by one or more independent variables. In this case, the selected response variable is the average rating of each movie, whose variability is studied with the different values of the movie or Twitter data variables. Thus, if a significative difference is found in an ANOVA, the movie or Twitter variable analyzed could affect the mean rating of the movie. There are multiple websites where users can find a database of movies and rate the ones they watch. One of the most famous is IMDb, an acronym for Internet Movie Data Base, with millions of titles and user records. But the movie experience doesn’t start with the rating, or even the viewing, but with the conversation about the movie even before the release. This study focuses on studying the reliability of the relationship between the conversation around movies on Twitter and the rating the movie gets a few weeks after its premiere. This paper explores the possibility of using ANOVA as a methodology to search for relationships between average rating and usage, conversation, engagement and amplification rates of official movie hashtags. Considering the 36 movies that were premiered in February 2022 on four different dates, all tweets that were posted the week before, the week after and two weeks after the release of each film were extracted from the Twitter API. The total sum was 389,649 tweets, all posted between 28th January 2022 and 10th March 2022. This extraction allowed us to obtain some characteristics of each tweet, such as tweet count, quotes, replies, retweets, likes and sentiment analysis of the content.s An interesting conclusion of this study is that the ANOVA has detected relationships such as the one between the tweet sentiment analysis and the average rating of movies on iMDB. Since this work has analyzed each week separately, in different ANOVAs, this methodology has shown that the tweets posted in the week before the movie release are especially related to the average rating. This work paves the way for the use of ANOVA in trend analysis of movie hashtags over a longer time and for a deeper and broader study of sentiment analysis of the digital conversation produced on Twitter about movie premieres.
ANOVA of the rating
average of movie premieres
in the USA using Twitter
hashtag data
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Index
Introduction
Objectives
Methodology
Results
Discussion & Conclusions
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Introduction ANOVA
ANalysis Of VAriance (ANOVA)
Analytical model that studies the statistical significance of differences, assessing the reliability
of the association between an independent variable and the dependent variables (Tabachnick,
2007).
Main advantage
All variations and their mutual influence are accounted for and they would be otherwise
impossible to estimate (Melović et al., 2020).
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Introduction Movies
Examples of the use of ANOVA with movies
To study the influence of the MPAA’s film rating system on the viewers' attendance (Austin,
1980).
To find out associations between movie genre and movie ratings (Mehnaz Khan et al., 2020).
Other studies that use movie hashtags
To study trend analysis (Hossain & Shams, 2019).
To predict movie ratings and box office hits (Sanguinet, 2016).
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Objectives
An exploratory, quantitative and not experimental study that uses an inductive inference type
and a longitudinal follow-up.
Scope: movies that premiered in February 2022 in the USA.
Object of the study: the movies and the tweets published about them the week before, the
same week and the week immediately after each premiere date.
Main objective
To explore the use of ANOVA to look for relationships between average rating and usage,
conversation, engagement and amplification rates of official movie hashtags from movie
premieres.
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Methodology Design
2 Tables
Movies
id
name
hashtag
mpaa
release_date
opening_grosses
opening_theaters
rating_avg
Tweets
id
status_id
movie_id
timepoint
author_id
created_at
quote_count
reply_count
retweet_count
like_count
sentiment
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Methodology Data Collection
Movies data
From Box Office Mojo, a website part of iMDbPro, one of the most well-known cinema
websites.
Twitter data
Collected on the 21st of March 2022 using Twitter’s API with Academic Research access.
Search queries consisted of a “#’ character and each movie’s official hashtag.
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Methodology Statistical Analysis
Main tools
Jupyter Notebook, including Python libraries such as Pandas and Seaborn, and a database
based on SQLite.
Initial step
Retrieve all data from the database into a generic data summary for each movie and their
Twitter-related data.
Create data frames to analyze it from each week's perspective.
ANOVA
Every movie and Twitter variable has been tested with ANOVA, considering all the data and
limiting the information to just one week at a time.
Dependent variable: the rating average.
Because all the independent variables are continuous, they must have been converted to
categorical using the quartiles.
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Results Data Summary
36 movies were released in the USA in February 2022.
Opening grosses’ average of $4,882,695.
A mean of 1,240.52 theaters in the week of their premiere.
Rating average mean of all the movies: 6.14 / 10.
389,639 tweets related to the movies retrieved.
An average of 10,823.3 tweets, 1,579,464 retweets, 31,489.67 likes, 2,382.53 replies, 819.89
quotes and a sentiment analysis average of 0.39 (-1 is the absolute negative and 1 is the
absolute positive).
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Results ANOVA (Movie data)
df sum_sq mean_sq FPR(>F)
C(opening_grosses
_cat) 33.241695 1.080565 1.366
0.280454
Residual 21 16.611905 0.791043
Opening grosses
Opening theaters
df
sum_sq
mean_sq
F
PR(>F)
C(opening_theaters
_cat)
3
9.388362
3.129454
6.279698
0.003275*
Residual
21
10.465238
0.498345
*The factor is significative at an alpha level of 0.05.
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Results ANOVA (Twitter data)
*The factor is significative at an alpha level of 0.05.
Variable PR(>F) (all) PR(>F) (t1) PR(>F) (t2) PR(>F) (t3)
Tweet count 0.025126* 0.001204* 0.016394* 0.001921*
Quote count 0.080732 0.00018* 0.042324* 0.271232
Quote mean 0.408379 0.268275 0.616853 0.745262
Reply count 0.00811* 0.000096* 0.039745* 0.133894
Reply mean 0.184873 0.11641 0.773312 0.459757
Retweet count 0.122793 0.036248* 0.142973 0.008187*
Retweet mean 0.110123 0.04399* 0.460968 0.124569
Like count 0.06142 0.000257* 0.112427 0.128893
Like mean 0.846622 0.007058* 0.072036 0.016488*
Sentiment mean 0.000504* 0.376202 0.00406* 0.064979
all = all data | t1 = week before | t2 = same week | t3 = week after
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Discussion & Conclusions (1)
All data considered
The variable that is more surely linked to average rating is the sentiment mean, only followed
by the reply count and the tweet count.
The three of them are measures of the online conversation about the movies.
The week immediately before the movie premiere (t1)
It has a more reliable relationship to the rating average.
All count variables, and the means of retweet, like and sentiment analysis are related to the
rating average.
All the characteristics of the Twitter conversation about the movie before its release will affect
the rating that users will post after they have watched it.
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Discussion & Conclusions (2)
The week before (t1) vs the week of the premiere (t2)
The latter don’t include retweets and likes, but it includes the sentiment mean.
Only conversation on itself about movies is linked to the rating average, not the amplification
and appreciation of the tweets involved.
The week after the premiere (t3)
Only shows relationships with tweet count, retweet count and like mean.
It is more important the amplification and the overall appreciation of the movie with time.
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Discussion & Conclusions (3)
Opening theaters is related to the rating average, but not opening grosses
If the number of places where the movie can be seen affects the rating users will post, but not
how many users have paid to watch it, maybe it is more related to visibility and fame than the
actual number of movie viewers.
ANOVA
An interesting methodology to find significative differences in movies and Twitter data, and to
study the relationships’ reliability.
This study means to be a cornerstone to broader and longer studies that could confirm these
relationships, prove new ones and consider other new variables.
Other possibilities: trend analysis, other social media platforms...
Víctor Yeste
Universitat Politècnica de València
Ángeles Calduch Losa
Universitat Politècnica de València
Any questions?
Thank you very much!
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.