ChapterPDF Available

Exploratory Twitter hashtag analysis of movie premieres in the USA

Authors:

Abstract and Figures

Exploratory Twitter hashtag analysis of movie premieres in the USA 2 This paper presents an exploratory study about the use, performance, and main characteristics of the official Twitter hashtags of movie premieres in the USA. Almost weekly, a different group of movies is released for the first time in the USA, trying to get the attention and interest of the audience to watch them in theaters or via Video-on-Demand (VoD). From four consecutive movie release dates, data about the Twitter official hashtags of the movies were gathered in three time points: one week before, the same week, and one week after every movie release. The objective is to study how participation in Twitter conversation fluctuates around movie premieres and the users' characteristics. Suitable metrics to learn this are the volume of tweets, the ratio of retweets and favorites, and sentiment analysis. Using an exploratory point of view, some conclusions about the use, performance, and trend analysis have been gathered to justify a further study involving a much more extended period.
Content may be subject to copyright.
Exploratory Twitter hashtag analysis of movie premieres in the USA
Análisis exploratorio de hashtags de Twitter de estrenos de cine en EEUU
Víctor Yeste
Universitat Politècnica de València, Valencia, España
Ángeles Calduch-Losa
Universitat Politècnica de València, Valencia, España
Resumen
En este trabajo se presenta un estudio exploratorio sobre el uso, el rendimiento y las princi-
pales características de los hashtags oficiales de Twitter de los estrenos de películas en los
Estados Unidos. Casi semanalmente, un grupo diferente de películas se estrena por primera
vez en los Estados Unidos, tratando de atraer la atención e interés de la audiencia para verlas
en cines o a través de Video-on-Demand (VoD). A partir de cuatro fechas consecutivas de
estreno de películas, se reunieron datos sobre los hashtags oficiales de Twitter de las pelí-
culas en tres puntos temporales: una semana antes, la misma semana y una semana des-
pués de cada lanzamiento de película. El objetivo es estudiar cómo fluctúa la participación
en la conversación de Twitter en torno a los estrenos de películas y las características de los
usuarios involucrados. Las métricas elegidas para el estudio son: el volumen de tweets, la
proporción de retweets y favoritos y el análisis de sentimientos. Utilizando un punto de vista
exploratorio, se recaban algunas conclusiones sobre el uso, el rendimiento y el análisis de
tendencias para justificar un estudio más profundo que implique un período mucho más pro-
longado.
Abstract
Exploratory Twitter hashtag analysis of movie premieres in the USA
2
This paper presents an exploratory study about the use, performance, and main characteris-
tics of the official Twitter hashtags of movie premieres in the USA. Almost weekly, a different
group of movies is released for the first time in the USA, trying to get the attention and interest
of the audience to watch them in theaters or via Video-on-Demand (VoD). From four consec-
utive movie release dates, data about the Twitter official hashtags of the movies were gath-
ered in three time points: one week before, the same week, and one week after every movie
release. The objective is to study how participation in Twitter conversation fluctuates around
movie premieres and the users’ characteristics. Suitable metrics to learn this are the volume
of tweets, the ratio of retweets and favorites, and sentiment analysis. Using an exploratory
point of view, some conclusions about the use, performance, and trend analysis have been
gathered to justify a further study involving a much more extended period.
Palabras clave
analítica de redes sociales; analítica de Twitter; análisis de hashtags; análisis de tendencias;
análisis de sentimientos
Keywords
social media analytics; Twitter analytics; hashtag analysis; trend analysis; sentiment analysis
1. Introduction
Twitter, a well-known microblogging social network, has led a big role in the research of com-
munication on social networks. Users can publish their opinion about multiple subjects, com-
ment, and interact with the opinion of others, letting researchers do “opinion mining” to ana-
lyze how certain products are behaving in the market, regarding Jain (2013).
That is the case of movie success, where past research has focused on using conversational
tagging in form of hashtags. A hashtag is the name for a tag on Twitter, and they are easily
recognized because of being preceded by the symbol “#”(Huang et al., 2010), e.g., #thegod-
father. This tagging let Twitter users search and classify better all the content published in
real-time on the social network. It helps Twitter research too because it makes communicative
exchanges much easier to track and analyze (Bruns & Stieglitz, 2013). These characteristics
motivate trend analysis based on hashtags, and Wang & Zheng (2014) classified them based
on their temporal patterns into single spike, multi-spikes, and fluctuation patterns.
Hashtags have other important roles in trend analysis. Trends are usually described as events
or topics that appear within a certain period, and as Hossain & Shams (2019) point out, there
is an entire research field focused on analyzing how to identify emerging topics, analyze their
characteristics and even try to predict them using sentiment analysis and diverse machine
learning methods. This has led some papers like the one from Kesharwani et al. (2017) to
predict movie rating, the one from Sanguinet (2016) to predict box office hits by tracking Twit-
ter hashtags, and the one from Ahmad et al. (2020) to predict sequel movie revenues. This
prediction ability has been applied to other similar cultural sectors, for example, Crisci et al.
(2018) used Twitter-based metrics to predict the expected audience on television programs.
Scientists have applied sentiment analysis using natural language processing to identify and
classify sentiments and opinions of tweets published on the social network, and one of the
first to use it to predict box office revenues of movies was Asur & Huberman (2010). As Verma
et al. (2015) explained, movies usually divide powerfully the opinion of the masses, and it
Exploratory Twitter hashtag analysis of movie premieres in the USA
3
provides two advantages: the potentially big volume of tweets published by users talking
about them and the high variability in the public opinion about the movies. That is why studying
not only hashtags but the content itself through sentiment analysis helps to study the valence
and volume of the popularity of movies.
This study has been focused on a preliminary exploratory analysis of the Twitter hashtags of
movie premieres in February 2022 in the USA, using certain specific characteristics of the
movies released that month to see if there are interesting relationships between the variables.
That could motivate a further and much deeper study and verify if there is a point in separating
the weeks close to each premiere in the study.
2. Objectives and methodology
2.1. Design and methodology
This work is an exploratory, quantitative, and not experimental study with an inductive infer-
ence type and a longitudinal follow-up. It analyzes movie data and tweets published by users
using the official Twitter hashtags of movie premieres in the week before, the same week,
and the week after each release date.
The scope of the study is the collection of movies that were released in February 2022 in the
USA, and the object of the study includes them and the tweets that are referring to the movies
in the 3 closest weeks to their premiere dates. The tweets recollected were classified by the
week in which they were published, so they are classified by a time dimension that has been
called timepoint. The week before the release date has been designated as timepoint 1, the
week of the release date is the timepoint 2 and the week immediately afterward is the
timepoint 3. Another dimension that has been considered is if the movie has domestic pro-
duction or not, which means that if one of the countries of origin is United States, the movie
is designated as domestic.
The chosen variables are organized in two data tables, one for the movies and one for the
collected tweets.
Variables related to the movies:
id: Internal id of the movie
name: Title of the movie
hashtag: Official hashtag of the movie
countries: List of countries of the movie, separated by a semicolon
mpaa: Film ratings system by the Motion Picture Association of America. It is a com-
pletely voluntary rating system and ratings have no legal standing. The currently rating
systems include G (general audiences), PG (parental guidance suggested), PG-13
(parents strongly cautioned), R (restricted, under 17 requires accompanying parent or
adult guardian) and NC-17 (no one 17 and under admitted) (Film Ratings - Motion
Picture Association, n.d.)
genres: List of genres of the movie, e.g., Action or Thriller, separated by a semicolon
release_date: Release date of the movie in a format YYYY-MM-DD
opening_grosses: Amount of USA dollars that the movie obtained on the opening date
Exploratory Twitter hashtag analysis of movie premieres in the USA
4
(the first week after the release date)
opening_theaters: Amount of USA theaters that released the movie on the opening date
(the first week after the release date)
rating_avg: Average rating of the movie
Variables related to the tweets:
id: Internal id of the tweet
status_id: Twitter id of the tweet
movie_id: Internal id of the movie
timepoint: Week number related to the movie premiere that the tweet was published on.
“1” is the week before the movie release, “2” is the week after the movie release” and
“3” is the second week after the movie release.
author_id: Twitter id of the author of the tweet
created_at: Date and time of the tweet, with format “YYYY-MM-DD HH:MM:SS”
quote_count: Number of the tweet’s quotes
reply_count: Number of the tweet’s replies
retweet_count: Number of the tweet’s retweets
like_count: Number of the tweet’s likes
sentiment: Sentiment analysis of the tweet’s content with a range from -1 (negative) to
1 (positive)
This correlational research will explore relationships between the variables to justify the road
to a broader and deeper study.
2.2. Data collection
Data from the movies premiered in February 2022 has been taken from the website Box Office
Mojo
1
, part of iMDbPro. iMDb
2
is one of the most well-known and trusted websites where
users publish legitimate reviews about movies and see more information about them (Rahul
Shah, 2021). The average rating of the movies has been collected from the iMDB website on
2nd of May 2022, more than a month after the movies were released.
Twitter’s data involves tweets that are related to the movie premiered in February 2022 and
that were published in the week before, the week after and the second week after. It was
obtained the 21st of March using Twitter’s API with Academic Research access. The endpoint
to get tweets data has been “GET /2/tweets/search/all” (Twitter Developer Platform, n.d.),
consisting in a full-archive search that returns the complete history of public Tweets matching
a search query. Search queries have been strings constructed as “#’ + each movie’s hashtag.
2.3. Statistical analysis
This study’s analysis has been made with Python’s libraries Pandas and Seaborn and a Ju-
pyter Notebook. First, a data summary has been extracted, describing the main parameters
that give a big picture of the movies and their related data. To deepen this view, movies were
1
iMDb Office Mojo, February 2022: https://www.boxofficemojo.com/calendar/2022-02-01/
2
iMDb website: https://www.imdb.com/
Exploratory Twitter hashtag analysis of movie premieres in the USA
5
grouped as domestic (USA production) or foreign and their main parameters have been com-
pared through one-way analysis of variance (ANOVA), with the goal to verify if the classifica-
tion of domestic and foreign movies as a qualitative factor has a significative difference for the
selected variables. It evaluates the impact of the factor in a variable and determines if there
is a significant difference between the means of the groups.
Another of the objectives of this study was to establish if it was necessary to group tweets
data by the week when they were published. Twitter’s data was grouped by the timepoint’s
tweets and summary data has been collected for each movie.
As this paper's aim is to look for connections between the variables, a correlation matrix has
been generated between all of them, filtering only strong correlations (with an absolute Pear-
son’s correlation coefficient bigger than 0.7) between different variables. This has led to an
even more specific correlation matrix to get a clearer view of the variables’ relationships.
3. Results
3.1. Movies data summary
This study involves the 36 movies that were released in February 2022 in the USA, regarding
Box Office Mojo’s website. The complete list of the movies is as follows: Jackass Forever,
Moonfall, The Wolf and the Lion, Only Fools Rush In, The Worst Person in the World, Breaking
Bread, Lingui, The Long Night, Last Survivors, Marry Me, Death on the Nile, Blacklight, Catch
the Fair One, Water Gate Bridge, Fabian: Going to the Dogs, Supercool, Ronnie's, Cosmic
Dawn, Give or Take, Uncharted, Dog, The Cursed, A Banquet, Ted K, Strawberry Mansion,
Too Cool to Kill, The Automat, Finding Carlos, A Fairy Tale After All, Studio 666, Cyrano,
Butter, The Burning Sea, Let Me Be Me, The Desperate Hour, and Moon Manor.
There were 4 different date releases: 9 movies were premiered on the 4th of February, 10
movies on the 11th of February, 10 movies on the 18th of February, and 7 movies on the 25th
of February. Between the movies with opening grosses and theaters data, they have had a
mean of $4,882,695 grosses and have been released in a mean of 1240.52 theaters in the
opening week. Uncharted has been the movie that obtained the maximum opening grosses
($44,010,155) and theaters (4,275). The rating average mean has been 6.14 over 10, being
The Automat the best scored with 8.4 and The Long Night the worst with only 3.5.
3.2. Domestic and foreign movies comparison of movies data
23 of them have domestic production, while the other 13 haven’t. If movie variables are com-
pared, strong difference appear between domestic and not domestic movies.
Only 25 movies have published their opening grosses. Opening grosses for each type can be
seen in the data distribution below through boxplots:
Figure 1 Comparison of opening grosses between domestic and foreign movies
Exploratory Twitter hashtag analysis of movie premieres in the USA
6
A big difference in opening grosses (xe7) can be seen, as the 15 domestic movies earned a
mean of $7,982,231 and the 10 foreign ones a mean of $233,392. If the groups are compared
with an ANOVA analysis, they have a p-value of 0.058, an almost significant difference be-
tween domestic and foreign movies with a 95% level of confidence.
Table 1 ANOVA of opening_grosses variable by domestic/foreign grouping
df
sum_sq
mean_sq
F
PR(>F)
C(domestic)
1.0
3.602669e+14
3.602669e+14
3.973767
0.058206
Residual
23.0
2.085210e+15
9.066131e+13
NaN
NaN
Only 25 movies have published their opening theaters. Opening theaters for each type can
be seen in the data distribution below through boxplots:
Figure 2 – Comparison of opening theaters between domestic and foreign movies
Exploratory Twitter hashtag analysis of movie premieres in the USA
7
A big difference in opening theaters can be seen, as the 15 domestic movies were published
in a mean of 1,939 theaters and the 10 foreign ones in a mean of 192.8 theaters. If the groups
are compared with an ANOVA analysis, they have a p-value of 0.004, a significant difference
between domestic and foreign movies with a 95% level of confidence.
Table 2 – ANOVA of opening_theaters variable by domestic/foreign grouping
df
sum_sq
mean_sq
F
PR(>F)
C(domestic)
1.0
18295286.64
1.829529e+07
10.137494
0.004134
Residual
23.0
41508441.60
1.804715e+06
NaN
NaN
The average rating for each type can be seen in the data distribution below through boxplots:
Figure 3 – Comparison of average rating between domestic and foreign movies
Exploratory Twitter hashtag analysis of movie premieres in the USA
8
The boxplots of each average ratings are overlapped, indicating that possibly there is no sig-
nificant difference between domestic and foreign movies. The mean of average rating for do-
mestic movies is 6.16 and for foreign ones is 6.1. If the groups are compared with an ANOVA
analysis, they have a p-value of 0.895, showing that effectively there is no significant differ-
ence between domestic and foreign movies in average rating with a 95% level of confidence.
Table 3 – ANOVA of average rating variable by domestic/foreign grouping
df
mean_sq
F
PR(>F)
C(domestic)
1.0
0.026534
0.017635
0.895137
Residual
34.0
1.504604
NaN
NaN
3.3. Twitter data summary
These movie premieres have led to the extraction of 389,639 tweets in total, published be-
tween 28th January 2022 and 10th of March 2022. They have got a mean of 10,823.3 tweets,
819.89 quotes, 2,382.53 replies, 1,579,464 retweets, 31,489.67 likes and a sentiment analy-
sis mean of 0.39.
The movie with more tweets published has been Marry me (hashtag: #marrymemovie, differ-
ent to avoid other meanings) with 122,103 tweets and the movie with less tweets published
has been Finding Carlos with only 1.The movie with more quotes has been Uncharted with
11,235 quotes, the movie with more replies has been Jackass Forever with 29,220 replies,
the movie with more retweets has been Moonfall with 22,628,590 retweets, and the one with
more likes has been Uncharted with 416208 likes. Finally, the movie with the most positive
Exploratory Twitter hashtag analysis of movie premieres in the USA
9
sentiment has been Moon Manor with 0.77 and the one with the most negative sentiment has
been The Desperate Hour with -0.11.
3.4. Domestic and foreign movies comparison of Twitter data
As there has been a significant difference in two of the three quantitative variables from the
movies data, a similar approach has been taken for Twitter summary data, comparing do-
mestic and foreign movies.
The number of tweets for each type can be seen in the data distribution below through box-
plots:
Figure 4 – Comparison of number of tweets between domestic and foreign movies
It seems to be a difference between both types. The mean of number of tweets for domestic
movies is 16,599 and for foreign ones is 604.77. If the groups are compared with an ANOVA
analysis, they have a p-value of 0.1, showing that there is no significant difference between
domestic and foreign movies in the number of tweets with a 95% level of confidence.
Table 4 – ANOVA of number of tweets variable by domestic/foreign grouping
df
sum_sq
mean_sq
F
PR(>F)
C(domestic)
1.0
2.124689e+09
2.124689e+09
2.814052
0.102613
Residual
34.0
2.567096e+10
7.550283e+08
NaN
NaN
The number of quotes for each type can be seen in the data distribution below through box-
plots:
Figure 5 – Comparison of number of quotes between domestic and foreign movies
Exploratory Twitter hashtag analysis of movie premieres in the USA
10
It seems to be a difference between both types. The mean of number of quotes for domestic
movies is 1,257.65 and for foreign ones is 45.38. If the groups are compared with an ANOVA
analysis, they have a p-value of 0.12, showing that there is no significant difference between
domestic and foreign movies in the number of quotes with a 95% level of confidence.
Table 5 – ANOVA of number of quotes variable by domestic/foreign grouping
df
sum_sq
mean_sq
F
PR(>F)
C(domestic)
1.0
1.220578e+07
1.220578e+07
2.572649
0.117976
Residual
34.0
1.613110e+08
4.744442e+06
NaN
NaN
The number of replies for each type can be seen in the data distribution below through box-
plots:
Figure 6 – Comparison of number of replies between domestic and foreign movies
Exploratory Twitter hashtag analysis of movie premieres in the USA
11
It seems to be a difference between both types. The mean of number of replies for domestic
movies is 3,544.35 and for foreign ones is 130.62. If the groups are compared with an ANOVA
analysis, they have a p-value of 0.12, showing that there is no significant difference between
domestic and foreign movies in the number of replies with a 95% level of confidence.
Table 6 – ANOVA of number of replies variable by domestic/foreign grouping
df
sum_sq
mean_sq
F
PR(>F)
C(domestic)
1.0
1.031861e+08
1.031861e+08
2.599472
0.116145
Residual
34.0
1.349630e+09
3.969500e+07
NaN
NaN
The number of retweets for each type can be seen in the data distribution below through
boxplots:
Figure 7 – Comparison of number of retweets between domestic and foreign movies
Exploratory Twitter hashtag analysis of movie premieres in the USA
12
It doesn’t seem to be a difference between both types with 1xe7. The mean of number of
retweets for domestic movies is 2,432,215 and for foreign ones is 70,749. If the groups are
compared with an ANOVA analysis, they have a p-value of 0.2, showing that there is no sig-
nificant difference between domestic and foreign movies in the number of retweets with a 95%
level of confidence.
Table 7 – ANOVA of number of retweets variable by domestic/foreign grouping
df
sum_sq
mean_sq
F
PR(>F)
C(domestic)
1.0
4.631611e+13
4.631611e+13
1.699033
0.201172
Residual
34.0
9.268492e+14
2.726027e+13
NaN
NaN
The number of likes for each type can be seen in the data distribution below through boxplots:
Figure 8 – Comparison of number of likes between domestic and foreign movies
Exploratory Twitter hashtag analysis of movie premieres in the USA
13
It seems to be a difference between both types. The mean of number of likes for domestic
movies is 48,377.3 and for foreign ones is 1,611.54. If the groups are compared with an
ANOVA analysis, they have a p-value of 0.12, showing that there is no significant difference
between domestic and foreign movies in the number of likes with a 95% level of confidence.
Table 8 – ANOVA of number of likes variable by domestic/foreign grouping
df
sum_sq
mean_sq
F
PR(>F)
C(domestic)
1.0
1.816456e+10
1.816456e+10
2.626984
0.1143
Residual
34.0
2.350966e+11
6.914604e+09
NaN
NaN
The average sentiment analysis for each type can be seen in the data distribution below
through boxplots:
Figure 9 – Comparison of average sentiment analysis between domestic and foreign movies
Exploratory Twitter hashtag analysis of movie premieres in the USA
14
It doesn’t seem to be a difference between both types, as the boxplots are overlapped. The
mean of average sentiment analysis for domestic movies is 0.396 and for foreign ones is
0.391. If the groups are compared with an ANOVA analysis, they have a p-value of 0.95,
showing that there is no significant difference between domestic and foreign movies in the
average sentiment analysis with a 95% level of confidence.
Table 9 – ANOVA of average sentiment analysis variable by domestic/foreign grouping
df
sum_sq
mean_sq
F
PR(>F)
C(domestic)
1.0
0.000190
0.000190
0.003896
0.950595
Residual
34.0
1.655438
0.048689
NaN
NaN
3.5. Correlation analysis between time points
Another of this study’s aims is to see if there is a point in differentiating Twitter’s data between
the week before the release date, the week after and the second week after the release date.
This week number was designated as time points, being 1 for the first analyzed week for each
movie, 2 for the second and 3 for the third.
To see the relationships between them, this paper has been focused on the analysis of the
Pearson correlation coefficients. The complete correlation matrix can be seen below, filtering
only the strong correlations (0.7 or higher) between different variables:
Figure 10Correlation matrix of Twitter data variables per timepoint
Exploratory Twitter hashtag analysis of movie premieres in the USA
15
Each variable name starts with tw_tand the timepoint number, to differentiate and compare
between time points visually. As it can be seen, if it is a count, there are many strong correla-
tions between the same variable in different time points: tweet_count, quote_count, re-
ply_count, retweet_count and like_count. If it is mean, only retweet_mean is strongly corre-
lated between t1 and t2 (and not t3). And sentiment mean is not strongly correlated to its
counterparts.
Figure 11Correlation matrix of Twitter data variables that are not strongly correlated between the three time
points
Exploratory Twitter hashtag analysis of movie premieres in the USA
16
If the Twitter data variables that have strong correlations between the three time points are
filtered out, it leaves only two strong correlations: the retweets mean between t1 and t2, and
the replies and likes mean of t3. It is not possible to remove them from the time points per-
spective.
3.6. Correlation analysis between movies data and Twitter data in general
Another interesting point of view is to compare movies data with Twitter data, so this relation-
ship has been examined by creating a correlation matrix. It has only considered Twitter data
in general (the total and mean of all the three time points together), and it can be seen below:
Figure 12Correlation matrix of movies data and Twitter data variables in general
Exploratory Twitter hashtag analysis of movie premieres in the USA
17
In the correlation matrix some interesting strong correlations can be seen:
Opening grosses is strongly correlated to opening theaters
Opening grosses is strongly correlated to the quote, reply, retweet and like counts
Opening theaters is strongly correlated to the tweet, quote, reply and like counts
Tweet count is strongly correlated to quote, reply and like count
Quote mean and retweet mean are not strongly correlated to any other variable
Reply mean is strongly correlated to like mean
Rating average and sentiment mean are not strongly correlated to any other variable
4. Discussion and conclusions
This paper has taken the approach of grouping the movies released in February 2022 into
domestic and foreign, depending on having USA production or not. After studying the differ-
ence between groups for the chosen variables, the only variable that has shown a clear sig-
nificative difference has been the number of opening theatres, and because the domestic
ones have a higher mean, it shows that movie theaters are clearly choosing domestic movies
over foreign ones. That has motivated a difference in the opening grosses, that is almost
significant statistically. Both variables are strongly correlated, something that is not entirely
surprising because more theaters mean more opportunities to see the movies. Because Twit-
ter data variables have not shown a significative different with this grouping perspective, it
can be affirmed that it is not necessary to differentiate movies by domestic/foreign precedence
to study their Twitter repercussion.
Exploratory Twitter hashtag analysis of movie premieres in the USA
18
On the other hand, another grouping of data has been considered: Twitter data of each
movie’s hashtag into three different weeks, including the week before, the week after and the
second week after each release. After comparing them, all the count variables have shown
strong correlations, so it could be concluded that it would be better to not differentiate them
between weeks and just analyze the total sum. That is not happening with the means and the
sentiment analysis: only retweets mean between the first two weeks and replies and likes
mean of the third week have strong correlations, so if it is necessary to study the change of
means with time, it would be advisable to separate between weeks.
A final approach has been taken in this paper: comparing movies data with their Twitter’s
hashtag data in general. Some strong correlations have made their appearance, rising inter-
esting conclusions. First, opening grosses are strongly and positively correlated to quote, re-
ply, retweet and like counts, but not tweets, so it seems that more audience has motivated
higher engagement and amplification rates. Opening theaters are strongly and positively cor-
related to tweet, quote, reply and like counts, but not retweets, so it appears as more locations
lead to higher conversation and engagement rates. Tweet count is strongly and positively
correlated to quote, reply and like count, but not retweets, so a high conversation rate could
be leading to a high engagement rate, and/or vice versa, as both can feed the other back.
Reply and like means are strongly correlated, which could be pointing that when writing about
movies, users usually use both Twitter tools together. Finally, average rating of the movie and
sentiment analysis of the tweets are not strongly related to any other variable and even be-
tween them, which would have been quite interesting.
It hasn’t been mentioned in this study, but there are outliers in most of the variables, and
because they are mostly desirable from marketing’s perspective (e.g., a high number of re-
tweets), they could be separated in a different sample and motivate a separate analysis. That
could lead to study their characteristics and see if it could be possible to predict them.
This paper has been able to analyze movies data considering their Twitter hashtag’s behavior,
and it proves that a much broader and deeper study could lead to new relationships between
these and possibly other new variables. It would be interesting to see the trend analysis of
movie hashtags in a broader time range and see if there is a peak in most of them and how it
evolves with time. Another interesting approach could be to include in the analysis other social
media platforms such as Facebook or Instagram and see if there is a difference between
Twitter and them.
5. References
Ahmad, I. S., Abu Bakar, A., Yaakub, M. R., & Darwich, M. (2020). Sequel movie revenue pre-
diction model based on sentiment analysis. Data Technologies and Applications, 54(5), 665–
683. https://doi.org/10.1108/DTA-10-2019-0180/FULL/HTML
Asur, S., & Huberman, B. A. (2010). Predicting the Future With Social Media. International Con-
ference on Web Intelligence and Intelligent Agent Technology, 492–499. https://ieeex-
plore.ieee.org/stamp/stamp.jsp?arnumber=5616710&casa_token=-
Td4D70eKFkAAAAA:UkLko6gomqgW7XHfL44kLeSD7lX7xqTDMj_Oe23UAlIs603R
MRMwNYkRypcQ_WZeMcV4vqNsMQ&tag=1
Bruns, A., & Stieglitz, S. (2013). Towards more systematic Twitter analysis: metrics for tweeting
activities. International Journal of Social Research Methodology: Theory and Practice,
16(2), 91–108.
Exploratory Twitter hashtag analysis of movie premieres in the USA
19
Crisci, A., Grasso, V., Nesi, P., Pantaleo, G., Paoli, I., & Zaza, I. (2018). Predicting TV programme
audience by using twitter based metrics. Multimedia Tools and Applications, 77(10), 12203–
12232. https://doi.org/10.1007/S11042-017-4880-X/TABLES/9
Film Ratings - Motion Picture Association. (n.d.). Retrieved May 6, 2022, from https://www.mo-
tionpictures.org/film-ratings/
Twitter Developer Platform. (n.d.). GET /2/tweets/search/all. Docs. Retrieved May 9, 2022, from
https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-
search-all
Hossain, J., & Shams, M. bin. (2019). Trend Analysis with Twitter Hashtags.
Huang, J., Thornton, K. M., & Efthimiadis, E. N. (2010). Conversational Tagging in Twitter. Pro-
ceedings of the 21st ACM Conference on Hypertext and Hypermedia, 173–178.
http://www.tweetdeck.com/
Jain, V. (2013). Prediction of Movie Success using Sentiment Analysis of Tweets. The Interna-
tional Journal of Soft Computing and Software Engineering, 3(3).
https://doi.org/10.7321/jscse.v3.n3.46
Kesharwani, M. A., Rakesh, M., Tech, B. M., Abdul, A. P. J., & Prof, A. (2017). Movie Rating
Prediction based on Twitter Sentiment Analysis. Journal of Advanced Computing and Com-
munication Technologies, 5(1), 2347–2804. https://apps.twitter.com/app/new.
Rahul Shah, D. (2021). Movie Stats: Sentiment Analysis of IMDB Reviews and Tweets of a Movie
Using Naïve Bayes Classifier. International Journal of Scientific Research & Engineering
Trends, 7(1), 363–365.
Sanguinet, M. E. (2016). Hashtags, tweets and movie receipts: Social media analytics in predict-
ing box office hits.
Verma, A., Abhay, K., Singh, P., & Kanjilal, K. (2015). Knowledge Discovery and Twitter Sen-
timent Analysis: Mining Public Opinion and Studying its Correlation with Popularity of In-
dian Movies. International Journal of Management, 6(1), 686–696.
http://www.iaeme.com/IJM.asp
Wang, Y., & Zheng, B. (2014). On macro and micro exploration of hashtag diffusion in Twitter.
Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Net-
works Analysis and Mining: ASONAM 2014, 285–288.
https://doi.org/10.1109/ASONAM.2014.6921598
ResearchGate has not been able to resolve any citations for this publication.
Predicting TV programme audience by using twitter based metrics
  • A Crisci
  • V Grasso
  • P Nesi
  • G Pantaleo
  • I Paoli
  • I Zaza
Crisci, A., Grasso, V., Nesi, P., Pantaleo, G., Paoli, I., & Zaza, I. (2018). Predicting TV programme audience by using twitter based metrics. Multimedia Tools and Applications, 77(10), 12203-12232. https://doi.org/10.1007/S11042-017-4880-X/TABLES/9
GET /2/tweets/search/all
Twitter Developer Platform. (n.d.). GET /2/tweets/search/all. Docs. Retrieved May 9, 2022, from https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweetssearch-all
Prediction of Movie Success using Sentiment Analysis of Tweets. The International
  • V Jain
Jain, V. (2013). Prediction of Movie Success using Sentiment Analysis of Tweets. The International Journal of Soft Computing and Software Engineering, 3(3).
Movie Rating Prediction based on Twitter Sentiment Analysis
  • M A Kesharwani
  • M Rakesh
  • B M Tech
  • A P J Abdul
  • A Prof
Kesharwani, M. A., Rakesh, M., Tech, B. M., Abdul, A. P. J., & Prof, A. (2017). Movie Rating Prediction based on Twitter Sentiment Analysis. Journal of Advanced Computing and Communication Technologies, 5(1), 2347-2804. https://apps.twitter.com/app/new.
Hashtags, tweets and movie receipts: Social media analytics in predicting box office hits
  • M E Sanguinet
Sanguinet, M. E. (2016). Hashtags, tweets and movie receipts: Social media analytics in predicting box office hits.