Conference PaperPDF Available

Big Social Data Analytics in Football: Predicting Spectators and TV Ratings from Facebook Data

Big Social Data Analytics in Football: Predicting
Spectators and TV Ratings from Facebook Data
Nicolai H. Egebjerg1, Niklas Hedegaard1, Gerda Kuum1, Raghava Rao Mukkamala1and Ravi Vatrapu1,2
1Centre for Business Data Analytics, Copenhagen Business School, Denmark
2Westerdals Oslo School of Arts, Comm & Tech, Norway
{rrm.itm, rv.itm}
Abstract—This paper explores the predictive power of big
social data in regards to football fans’ off-line and on-line
behaviours. We address the research question of to what ex-
tent can big social data from Facebook predict the number
of spectators and TV ratings in the case of Danish National
Football Association (DBU). The predictive model was built from
Facebook, match attendance, and TV ratings data sets from 2014-
2016. The best fit was a linear regression model with GLM coding.
Ultimately, the model did best when predicting the number of
spectators based on the Facebook activity during a match as well
as the activity from the last two weeks leading up to the match.
Furthermore, the data reveals that photos generates the most
activity on the national team’s page and with videos running
at higher production costs there might be some unexploited
potential for DBU to improve its social media marketing strategy.
Although data limitations are present, this research concludes
that predictive models based on big social data can indeed offer
important insights for companies to understand their customer
base and how to improve marketing strategies.
Index Terms—Big data, Big social media data, Danish National
Team, DBU, Facebook data, Football fans, Spectators, TV ratings.
A football match is quite an emotional event and being
a football team fan evokes a shared sense of emotional
attachment to the club, city, and/or country [1]. Fans exhibit
social and cultural attachment to clubs1. Dansk Boldspil Union
(The Danish Football Association) was formed in 1889 with
a purpose to promote ball sports, primarily cricket. The As-
sociation has, however, since then shifted its focus from other
ball sports to primarily focus on football. The organization
initially consisted of 86 clubs, including around 4,000 playing
members, but has since grown to represent 1,653 clubs and
335,459 members. DBU has thus been one of the main forces
in making football the most popular sports in Denmark.The
way the association promotes football on a national level is by
having the Danish Men’s National Football Team play matches
against other national teams. These matches can be differen-
tiated into three categories: World Cup qualifiers, European
Championship qualifiers, and friendly matches. The Danish
public has the possibility to purchase tickets to watch those
matches at the stadium, or see the matches live at television.
Matches since 2005 have been broadcast on different national
media channels, including Kanal 5, 6’eren, Kanal 9, TV 2,
TV3, TV3 Puls, and TV3+.
1The Social And Community Value Of Football
A. Problem Formulation and Research Question
Since 2010 the Danish National Men’s Football Team has
faced serious branding issues. Its popularity among Danish
citizens has declined and ticket sales have decreased through-
out the last 6 years2. One of the main reasons for the low
popularity has been the football team’s unprofessional usage
of online and traditional media as a means to create a socio-
cultural connection between the team and its fans. As a result,
in 2014-16 the association went through radical management
changes, the previous men’s team head coach Morten Olsen
was replaced by ge Hareide, and Claus Bretton-Meyer was an-
nounced to become a CEO of DBU. The new CEO decided to
solve the low-popularity issue by changing the organizational
structure of the association and redefining the national team’s
brand values3. The new CEO of the DBU, Claus Bretton-
Meyer (2016) argues that the Danish fans have fallen asleep
to a point where only 16% of the Danish population consider
themselves either a ’big fan’ or a ’very big fan’ of the National
Team. In 2014, following the CEO succession DBU changed
its marketing strategy by creating a new slogan: A Part of
Something Bigger.In addition, new initiatives were created to
increase the football teams (Men, Women, Under-21) presence
on online and traditional media. Our paper seeks to explore the
efficacy of these new initiatives with regard to social media
on spectators and TV ratings. Towards this end, this paper
addresses the general research question using the specific case
of DBU:
To what extent can big social data from Facebook
predict fan engagement in terms of spectators and
TV Ratings?
The remainder of the paper is organized as follows. Section
2 explains the conceptual framework. Section 3 presents re-
lated work and discusses relevant theories. Section 4 provides
a detailed description of the dataset and provides an overview
of the process and methods adopted for empirical analysis.
Section 5 presents the core empirical findings from the DBU
case. Section 6 provides an answer to the research question and
a discussion on limitations and implications for future research
and practice. Finally, Section 7 provides a short conclusion.
2National Team has lost more than a third of its television viewership
3Dansk fodbold er en del af noget strre. Berlingske.
In order to generate demand and produce fan interest,
sports leagues justify a range of restrictions that resemble
cartels. Szymanski [2, p. 1153] argues that the justification
for restrictions can be reduced to three core claims: 1) In-
equality of resources leads to unequal competition, 2) fan
interest declines when outcomes become less uncertain, and
3) specific redistribution mechanisms produce more outcome
uncertainty.The second proposition is of particular interest to
this paper. There has been substantial research work in the
direction of predicting game attendance. Rottenberg [3] looked
at American baseball and argued that uncertainty of outcome is
necessary if the fan/consumer is to be willing to pay admission
to the game (p. 246). Schreyer, Schmidt and Torgler [4]
explored the role of Game Outcome Uncertainty (GOU) in
season ticket holders’ stadium attendance demand and found
a positive relationship. Szymanski [2] summarizes research
in this area and argues that there seems to be an emerging
consensus that demand for match tickets is highest when the
home team’s probability of winning is about twice that of the
visiting team, i.e., a probability of around 0.66 [5] [6].
However, Buraimo and Simmons [7] used TV viewing
figures to show that uncertainty of outcome does not have a
positive effect on television audience demand. Instead they
argue that ”there has been a transition of preference for
uncertainty of outcome towards a preference for increased
talent” [7, p.466]. What attracts spectators and TV viewers
is then sporting entertainment performed by superstars. This
paper will not dispute that GOU has an effect on the number
of spectators. However, it can be mediated by the importance
of the particular match. If uncertainty is said to produce
interesting matches then it can be argued that matches where
the stakes are low (i.e. friendlies) will have less interest and
fewer followers. The issue of whether the type of match has
an impact on the number of spectators and TV viewers leads
us to our first hypothesis:
H1: Matches with high importance (qualifiers) will result in
higher TV ratings and number of spectators than matches
with low importance (friendlies).
Other related research focuses on the relationship between
broadcasting and attendance. Forrest, Simmons and Szyman-
ski [8] studied the English Premier League, which is a cartel
of soccer teams that collectively sells the rights to broadcast
its matches. Despite considerable demand, the clubs agreed
to sell only a fraction of the broadcast rights (60 out of 380
matches played each season between 1992 and 2001). The
clubs argued that increased broadcasting would reduce the
number of spectators at matches and therefore reduce cartel
income. However, the authors found that broadcasting had ”a
negligible effect on attendance and that additional broadcast
fees would be likely to exceed any plausible opportunity
cost” [8, p. 243]. If there is a positive correlation between
the number of spectators and TV ratings for this data sample
it becomes possible to use the two variables interchangeably
when answering the research question. Other relevant work
studied the relationship between TV ratings and Facebook data
with regard to events such as sports broadcasting [9] and talk
shows [10].
H2: There is a positive correlation between the number of
spectators and TV ratings.
A systematic review of predictive analytics with social
media data was conducted by [11] . Researchers have already
utilized big social data to predict stock market movements
(e.g. [12], [13]), announcements of flu outbreaks [14], forecast
revenues for movies ( [15], [16]) and to predict election
outcomes [17]. Lee, Kim and Cha [16] used a generalized
Bass Model (GBM) that reflected both daily seasonality and
herd behavior to predict the sales patterns of motion pictures.
This is also an interesting model for this paper since football
matches might also experience daily seasonality (with higher
attendance at matches played on weekends and holidays) and
herd behavior.
H3: The match played after a match with a positive result
will experience herd behavior and thus have higher
attendance than a match played after a negative result.
H4: Matches played on weekdays will have fewer spectators
and lower TV ratings than matches played on weekends.
The underlying assumption for this research stream is
that social media actions such as tweeting, posting, liking,
commenting etc. are proxies for consumer’s attention to a
particular topic/brand/product and that ”the shared digital
artefact that is persistent can create social influence” [18, p.
1]. Most related research relies on Twitter data instead of
Facebook data. The goal of this paper is to predict ticket
sales and TV ratings, which can also be understood as event
prediction. Owens and Shah [19] partnered with an events
company to demonstrate how levels of social media activity
can predict ticket sales. They found a 53% correlation between
social media activity and ticket sales and furthermore that
Facebook had the highest correlation to ticket sales (52%)
slightly higher than Twitter (38%). There might also be a
difference between the different types of posts on social media
in general, and Facebook in particular, and the level of activity
they generate. Pletikosa and Michahelles [20] found that
different post characteristics had effect on the interaction on
Facebook. They did not include videos, but found that photos
had the greatest level of engagement followed by statuses and
links. Since production value and narrative scope is higher for
videos it would be fair to assume that videos will produce
more social media activity than other content. This leads to
the fifth hypothesis:
H5: Videos will generate more activity than pictures, which
in turn will generate more activity than status updates,
links and events.
Lassen et al. [18] demonstrated how Twitter data could
be used to predict iPhones sales. They developed a linear
regression model that transformed iPhone tweets into a pre-
diction of the quarterly iPhone sales. They built their analysis
on the AIDA (Attention, Interest, Desire and Action) and
Hierarchy of Effects models ( [21], [22] in order to understand
the relationship between users’ propensity to tweet and the
probability to purchase the product. This paper will follow
the same line of argument. Social media activity surrounding
Landsholdet are associated with all four stages of the AIDA
model and all six stages of the Hierarchy of Effects model.
Drawing on Asur and Huberman [15] as well as Lassen et
al. [18], this paper treats social data from Facebook as a proxy
for a user’s attention towards the object of analysis, which in
this case are matches played by the Danish National Team.
Facebook activity is not seen as belonging to a particular stage
of the AIDA or HoE models. Instead it is treated as ”social
media manifestations of real-world activities” [18, p. 83] of
fans/consumers with respect to football matches. This leads to
our sixth and main hypothesis for this paper:
H6: Matches played during periods with high Facebook ac-
tivity will have more spectators and higher TV ratings
than matches played during periods with less Facebook
A. Dataset Description
Two data sets were used in this paper: Facebook data
and match data. The first data set contained data from the
Danish National Team’s official Facebook page. The raw data
consisted of a little more than 2.1M data points where each
row is equivalent to an action on the Facebook page. The
data contains information on action type (whether it is a
post, comment or like), actor name and ID, timestamp, type
of post, and if relevant, links and text value for posts and
comments. The social data available ranges from 10/30/2014
to 11/10/2016 which covers 11 matches played during that
time period. The aggregated Facebook data was ordered in
dimensions of total posts, comments and likes for each match
over a two-week, one-week and two-days window and during
the event as shown in table II.
The second data set about matches contained information
such as date of the match, number of spectators, TV viewer rat-
ings and other control variables needed to test the hypotheses
such as the result, type of match and the broadcasting channel
as shown in table I. The data was collected for all home
games played between 2005 and 2016. In order to answer the
research question only data from 2014-2016 was necessary,
but the additional data provides some interesting insights and
is required to test the secondary hypotheses.
B. Data Analysis Process
The data analysis process is illustrated in figure 1. Al-
together 2,132,003 data entries from the Facebook page of
Landsholdet were collected using the tool SODATO [23]
and TV ratings were collected from TNS Gallup. The two
data sources were then combined using Tableau and SAS
Studio - tools that were later used for descriptive-, visual- and
4WC Q: World Cup qualifier, Euro Q: Euro Cup Qualifier
5Country codes KZ:Kasakhstan, ME:Montenegro, AM:Armenia,
LI:Liechtenstein, IS:Iceland, SE: Sweden, FR:France, AL: Albania,
Date Cou-
Res TV
Week Channel
2016/11/11 KZ WC Q 18901 -1681 4-1 45 CH5
2016/10/11 ME WC Q 20582 -1213 0-1 650 41 CH5
2016/09/04 AM WC Q 21795 13791 1-0 620 35 CH5
2016/08/31 LI Friendly 8004 -1190 5-0 302 35 CH5
2016/03/24 IS Friendly 9194 -26857 2-1 452 12 CH5
2015/11/17 SE Euro Q 36051 17906 2-2 900 47 CH5
2015/10/11 FR Friendly 18145 -17503 1-2 305 41 CH5
2015/09/04 AL Euro Q 35648 4761 0-0 810 36 CH5
2015/06/13 RS Euro Q 30887 21707 2-0 651 24 CH5
2015/06/08 ME Friendly 9180 -1325 2-1 328 24 CH5
2015/03/25 USA Friendly 10505 10505 3-2 458 13 CH5
Table I
predictive analytics in order to answer the research question
by hypotheses testing.
Figure 1. Data Analysis Process Diagram
For the prediction model in H6 different statistical models
were evaluated. The final choice was to use a multiple regres-
sion model with GLM coding in SAS studio. Control variables
were included based on findings from the previous hypotheses
(H1 and H4). By including match type and day of the match
the correlation was improved resulting in a RMSE of 1.762 for
the number of spectators (compared to 8.230 without control
variables). Additionally, we tested whether past results could
work as a predictor of herd behaviour or a trend. However,
when this variable of past results was added to the model
it was no longer statistically significant. This is probably an
effect of too few observations in the sample. Finally, inputs
Nov 16 Oct 16 Sep 16 Aug 16 Mar 16 Nov 15 Oct 15 Sep 15 Jun 15 Jun 15 Mar 15
2 Weeks Before
Posts N/A 256 304 148 124 398 280 220 216 90 172 2,208
Comments N/A 6,913 1,481 308 765 24,230 5,529 14,133 9,617 2,438 4,933 70,347
Likes N/A 42,154 80,822 31,414 46,876 147,073 103,893 55,779 100,610 44,116 34,536 687,273
1 Week Before
Posts N/A 200 246 104 100 296 224 174 168 66 112 1,690
Comments N/A 6,739 1,327 141 679 22,625 5,003 13,070 8,113 1,660 4,460 63,817
Likes N/A 35,158 71,011 23,399 45,218 117,615 92,843 42,192 67,726 36,406 26,246 557,814
2 Days Before
Posts N/A 116 76 58 42 84 44 102 54 24 60 660
Comments N/A 6,363 139 138 404 14,568 789 12,743 6,538 218 3,822 45,722
Likes N/A 6,921 18,712 16,827 18,726 35,783 12,986 28,957 20,222 7,708 15,080 181,922
3 Hours During The Match
Posts N/A 74 38 34 30 102 32 50 40 54 52 506
Comments N/A 595 72 18 169 1,961 496 1,280 750 236 1,064 6,641
Likes N/A 2,980 25,487 13,384 20,281 11,098 5,822 8,050 68,064 23,584 35,674 214,424
Table II
Spectators Tv Ratings
Model Root
2 Week Act 2746.46 0.0008 0.965 113423 0.0017 0.712
1 Week Act 2494.15 0.0005 0.971 113927 0.0018 0.710
2 day Act 2974.37 0.0011 0.959 99278 0.0003 0.780
2 Week + Dur-
1776.01 0.0006 0.989 118041 0.0054 0.712
1 Week + Dur-
1762.82 0.0006 0.988 118575 0.0057 0.710
2 Day During 2715.78 0.0031 0.973 101662 0.001 0.787
During 2567.79 0.0005 0.969 118588 0.0029 0.686
Table III
for the prediction model were:
where Fw: total Facebook activity leading up to the matches,
continuous variable (different windows were used: 2 weeks, 1
week and 2 days)
Fm: total Facebook activity during the match, continuous
variable (3.5 hours before, 2 hours during and 0.5 hours after
the match)
Mt: match type (categorical variable: WC Qualifier, Euro
Qualifier and Friendly)
Sd: daily seasonality based on the day of the match (categor-
ical variable: weekday or weekend)
Y: number of spectators/TV viewers.
The primary coefficients of interest are 0 and 1 which can
be interpreted as the contribution of social media activity
to the number of spectators or TV viewers that will watch
the match. However, due to the introduction of the control
variables these coefficients may be negative although they
correlate positively with the dependent variable when standing
alone. All the different test combinations of the model effects
are presented in table III. Based on the RMSE the best model
for predicting the number of spectators used Facebook data
from 1 weeks leading up to, as well as the activity during
the match. Alternatively, in order to predict TV ratings, using
Facebook data from 2 days prior to the match proved to be
the best fit as shown in table III.
In this section, we present the results of our data analy-
sis and discuss whether the hypotheses were confirmed or
rejected. One of the important findings is that the most
Figure 2. Distribution of Facebook posts vs. spectators and TV ratings
of the activity on their Facebook page is generated around
matches. Especially the temporal distribution of various Face-
book actions (such as posts, postlikes, comments and so on)
indicated large amount of peaks before and during the match
events. Moreover, there is also significant visual coherence
between the Facebook actions verses number of spectators and
TV ratings. Figure 2shows one such distribution where the
distribution of Facebook posts by DBU verses spectators and
TV ratings is plotted.
(a) Spectators (b) TV Ratings
Figure 3. Spectators and TV-ratings for each match type
1) Hypothesis H1:Hypothesis 1 was tested for both the
match data gathered from 2005-2016 as well as on the sample
data. The visual analytics clearly showed that qualifiers had
both higher TV ratings and spectators as shown in figure 3.
When tested in SAS, there is significant correlation on both TV
ratings and number of spectators for the entirety of the match
data with P-values of 0.0126 and 0.0001 respectively. Here
we only distinguished between qualifiers and friendlies. In the
sample data, the types of qualifier was distinguished from each
other. However, the correlation still holds with Euro qualifiers
compared to friendlies having P-values of 0.0001 for specta-
tors and 0.0001 for TV ratings. World Cup (WC) qualifiers
show the same pattern with P-values of 0.0113 for spectators.
The correlation between WC qualifiers and friendlies has a
P-value is 0.0525. However, seen together with the entire
match data this would be statistically significant. Therefore,
H1 is confirmed, and matches with higher importance will
have higher TV ratings and number of spectators.
2) Hypothesis H2:A correlation analysis of the number
of TV viewers and number of spectators was done in SAS
Studio. A visual representation of this can be seen in figure 4.
It is difficult to see with the naked eye whether there is a
correlation here or not, so here a calculation was needed. The
analysis returned Total spectators = 0.0170159 * TV rating
+ 6486.7 at a significance level of p < 0.0001 and therefore
there exists a clear correlation between the two. Thus H2 is
confirmed. This means that when the numbers of spectators
are growing, so is the number of people watching on TV and
vice versa.
3) Hypothesis H3:In order to calculate whether matches
played after a positive result experienced herd behaviour the
authors had to calculate a delta spectators i.e. the change in
spectators from match to match and a fixed result. The fixed
result is calculated by taking the difference in goals, e.g. a 3-1
defeat is calculated as a -2. As shown in figure 5there seems
to be an outlier in the upper left corner. This match where the
fixed result is -4 resulted in an increase in spectators for the
Figure 4. TV ratings verse spectators
next home game of more than 20,000. The correlation between
the two is however still significant with p= 0.049 <0.05
and with an plot equation DeltaSpectators(noF riendl y) =
1896.79Resultf ixed1963.86. When the outlier is removed
the correlation becomes stronger with p= 0.0006 <0.01.
Thus H3 is confirmed. This means that the game played before
a home match will have an impact on the number of spectators
for the next home game
Figure 5. Delta spectators vs previous match’s result
4) Hypothesis H4:The overall day-wise distribution of TV-
ratings and total spectators is shown in figure 6. In order
to investigate H4 the days of the week was binary coded.
Monday through Thursday was coded as 0 for weekday, and
Friday to Sunday as 1 for weekend. Tableau was used to
analyze visually whether H4 was true. The weekend matches
have a much higher number of spectators 25,102 vs. 19,777.
However, weekday matches seems to have a higher number
of viewers on TV with 901,725 vs. 865,048 on average. This
suggests that either people watch something else during the
weekend or use their time on other things than watching TV.
Thus H4 is only partially confirmed.
Figure 6. Day wise distribution of TV ratings and total spectators
5) Hypothesis H5:H5 was tested by taking all activity on
the different kind of posts and then comparing the average
activity as shown in figure 7. It shows that the post types
experiencing the most activity are photos followed by statuses
and videos. Our result is in accordance with the findings of
[24] which showed that photo has high engagement potential
among all post types of Facebook. We further analyzed to
see the correlation between the different post types and it
shows that photos will always receive more activity on average
than all other post types except for statuses. There were no
other correlations between the posts types. Thus H5 is only
partially confirmed. The results of H5 suggest that consumers
don’t take the time to watch videos on Facebook. The extra
time and cost it takes for Landsholdet to produce videos is
thus not worthwhile and it is suggested that they decrease
the number of videos on their Facebook wall. In any case
this result suggests that it could be useful to change the way
that Landsholdet does videos on Facebook. That is, they might
have to change the content of the videos or the length of them.
At the same time since photos are vastly superior compared to
other posts types it is suggested that they increase the number
of posts on their Facebook wall in order to create extra activity.
The suggestions here raise a couple of questions. When is
enough? When will photos stop creating extra activity, and
are they only superior at the moment because they enter into
a mix of different post types? The overall marketing strategy
was not studied in this paper - and the mix of post types that
Landsholdet uses might be a deliberate move in their branding
6) Hypothesis H6:The hypothesis H6 is split into two sub
hypothesis (H6a and H6b) to predict the number of spectators
Figure 7. Day wise distribution of TV ratings and total spectators
and the TV ratings respectively. As shown in table III, different
models were tested in order to find the ones most accurately
predicting the number of spectators and the TV ratings. The
most accurate model for spectators was the one with all the
Facebook activity from 1 week leading up to and the activity
during a match. The one best predicting TV ratings included
Facebook activity from the two days before a match. In both
cases increased Facebook activity has a positive effect on the
dependent variables. Thus both H6a and H6b are confirmed.
Figure 8. Predictive model for spectators with 1 week + during forecasting
The results of the predictive model of total spectators and
TV ratings can be seen in figure 8and 9respectively, where
the red lines indicates predicted values and dark blue lines
indicate actual values. Moreover, the multiple linear regression
model results for spectators and TV ratings are presented in
table IV. One could notice that the multiple linear regression
model results with a high value of adjusted R-square (0.97)
indicates good amount of fit as also indicated in figure 8.
For TV ratings, the model results are reasonably satisfactory
(adjusted R-square 0.71) with a fair amount fitness as can
also be seen in figure 9.
This paper investigated the consequences of the specific case
of DBU’s new digital media strategy in terms of total number
Spectators TV Ratings
Root MSE 1762.81627 Root MSE 99278
Dependent Mean 19999 Dependent Mean 577444
R-Square 0.9887 R-Square 0.78
Adj R-Sq 0.9745 Adj R-Sq 0.7123
AIC 164.33045 AIC 438.34699
AICC 220.33045 AICC 445.98335
SBC 154.14596 SBC 422.79885
Table IV
Figure 9. Predictive model for TV ratings with 2-day forecasting model
of spectators and TV ratings based on user engagement on
DBU’s official Facebook wall.
First, we found that DBU can improve their digital media
strategy by making fewer video posts on Facebook and instead
post more photos as they carry more engagement potential than
videos. Our finding is contrary to conventional wisdom that
posits that as football is an active field, game videos must
be more appealing to the people than photos. That said, our
finding also confirms the [24] that the Facebook post type of
photo caries high engagement potential .
Second, we also found that social media data is indeed able
to predict the number of spectators and the TV ratings of
football matches fairly accurately. Unlike previous work, this
was done using neither the Game Outcome Uncertainty (GOU)
nor the Quality of the players. This suggests that GOU, as
Buraimo & Simmons [7] observed, is not the only variable
affecting spectator attendance and TV ratings. However, both
the variables mentioned here could very likely strengthen the
predictive models of this paper. There are many variables
influencing demands for football tickets and a handful of them
were included in this study. A few that were not included
here are weather, GOU and Star Quality of the players.
In addition, it would have been useful to include business
data showing continuous sales and information about season
ticket holders. This would have allowed for more accurate
analytics of how Facebook data influence sales of companies.
However, this would also raise ethical issues as to how closely
people purchasing tickets and season ticket holders should
be monitored. The models applied here only used the total
activity without ever including information about actors on
the social medium. Thus one could argue that the privacy of
the individual is more secure in this version. Third, until now
only limited research has focused on whether Facebook social
data can predict sales patterns with previous research mostly
focusing on Twitter data. Future research would have to look
into other areas. However, as it stands, it seems likely that
companies can influence their own sales by posting content
about their products on social media.
Fourth, this study could have included textual analysis in
order to investigate the sentiment towards Landsholdet. This
would have provided a more precise indication of the mood
of the posts and given additional information for predicting
the number of spectators. This would also in some cases
have indicated who actually attends or watches the games
from their sofa. However, once more the ethical issue of how
closely these individuals should be monitored resurfaces. In
any case, future research into the predictive capabilities of
Facebook data would benefit from including some sort of
textual analysis.
Overall, the findings in this paper indicate that football clubs
should increase their social media presence and make sure that
they post content on a continuous basis since it creates demand
for tickets irrespective of how likely an outcome is.
A. Recommendations for Case Company
The outcome of this study indicates that DBU at the moment
does not utilize big data in their marketing strategy. If this was
the case they would have known the diminished return on the
time invested in creating videos. Using big data could also help
them recognize which actors are the biggest fans and thus aid
them in their communication towards those. However, it might
be that at present they do not have the resources to do so.
In the short-term DBU should investigate the connection
between fan activity on their Facebook page and ticket sales.
Analyzing continuous sales and social media activity together
might provide them with even better tools to understand what
type of posts and content that drives sales and fan interest. In
the mid- to long-term DBU should continue to work on their
brand image. It is now clear that social media is, and should
be, part of their marketing strategy. Generating content at the
right time, targeted towards the right fan base will eventually
help them to increase the amount of loyal fans. As the sample
data illustrates, most of the activity on their Facebook page
is generated around matches as shown in figure 2. Since
the national team only plays 5-6 matches a year it becomes
necessary to focus on the season breaks and silent periods
between matches.
However, there might be diminishing returns to sharing
content online, which is something to be cautious about. An
organisation like DBU must be careful not to create posts
that could be understood as clickbait since the goal of these
sites is often high traffic and low engagement6, while selling
tickets requires high engagement. Instead, DBU should follow
other companies that use machine learning and sophisticated
recommendation algorithms that identify potential customers
and send them messages such as other fans of Danish football
bought tickets for this game at key points along the decision
6The dirty secrets of clickbait. This post will blow your mind!
journey. A study by McKinsey found that these algorithms
are highly effective at converting customers, though with an
important limitation: the influence ... can be as much as
75 percent lower if messages aren’t highly personalized and
targeted [25].
B. Limitations of the Study
This paper has three limitations. First, social data was only
available for little more than two years. Having data for more
years could have made it possible to see the effects of the new
marketing strategy launched after the appointment of Bretton-
Meyer as CEO. Second, the data on ongoing sales or season
ticket holders is not available. The latter has been the primary
focus of DBU for the past two years. Third and last, the TV
ratings predictive model could have been improved had there
been a control variable for the pull towards other channels
during match time.
This paper illustrates the increasing value big social data.
By using data fetched from the Danish national football team’s
Facebook page it was possible to set up a predictive model
for the number of spectators and TV viewers. It is a fairly
simple model relying on only two other control variables:
match type and day of the match. Particularly the spectators-
model did a great job illustrated by the nearly identical
graphed line (figure 8). However, since there were various
data and resource limitations, the models could be improved
even further. These limitations include; the fact that very few
matches were played during the sample period, no distinction
was made between positive/neutral/negative posts, no data was
available for ongoing sales or season ticket holders and not
taking other channels into consideration for the TV ratings-
Assuming increased activity leads to more spectators and
higher TV ratings (the sample shows mixed results), DBU can
improve upon their social media marketing strategy by making
better returns on their video posts. Although production costs
for videos are higher, it is currently their photo posts that
generate the most activity among fans. By investigating the
relationship between the activity on their Facebook page and
ticket sales, they should be able to verify if increased activity
in fact leads to increased sales. Furthermore, by posting
improved content more often, also between matches, while
avoiding clickbait, they should see an increase in season
ticket holders, which is their primary concern for the future.
Future related work should gather more data and do sentiment
analysis to see how this would affect the predictive model.
[1] I. Abosag, S. Roper, and D. Hind, “Examining the relationship between
brand emotion and brand extension among supporters of professional
football clubs,” European Journal of marketing, vol. 46, no. 9, pp. 1233–
1251, 2012. 1
[2] S. Szymanski, “The economic design of sporting contests,” Journal of
economic literature, vol. 41, no. 4, pp. 1137–1187, 2003. 2
[3] S. Rottenberg, “The baseball players’ labor market,Journal of political
economy, vol. 64, no. 3, pp. 242–258, 1956. 2
[4] D. Schreyer, S. L. Schmidt, and B. Torgler, “Against all odds? exploring
the role of game outcome uncertainty in season ticket holders? stadium
attendance demand,” Journal of Economic Psychology, vol. 56, pp. 192–
217, 2016. 2
[5] G. Knowles, K. Sherony, and M. Haupert, “The demand for major league
baseball: A test of the uncertainty of outcome hypothesis,” The American
Economist, vol. 36, no. 2, pp. 72–80, 1992. 2
[6] D. Forrest and R. Simmons, “Outcome uncertainty and attendance
demand in sport: the case of english soccer,Journal of the Royal
Statistical Society: Series D (The Statistician), vol. 51, no. 2, pp. 229–
241, 2002. 2
[7] B. Buraimo and R. Simmons, “Uncertainty of outcome or star qual-
ity? television audience demand for english premier league football,
International Journal of the Economics of Business, vol. 22, no. 3, pp.
449–469, 2015. 2,7
[8] D. Forrest, R. Simmons, and S. Szymanski, “Broadcasting, attendance
and the inefficiency of cartels,Review of Industrial Organization,
vol. 24, no. 3, pp. 243–265, 2004. 2
[9] A. Hennig, A.-S. ˚
Amodt, H. Hernes, H. M. Nyg˚
ardsmoen, P. A. Larsen,
R. R. Mukkamala, B. Flesch, A. Hussain, and R. Vatrapu, “Big social
data analytics of changes in consumer behaviour and opinion of a
tv broadcaster,” in Big Data (Big Data), 2016 IEEE International
Conference on. IEEE, 2016, pp. 3839–3848. 2
[10] H. H. Larsen, J. M. Forsberg, S. V. Hemstad, R. R. Mukkamala,
A. Hussain, and R. Vatrapu, “Tv ratings vs. social media engagement:
Big social data analytics of the scandinavian tv talk show skavlan,” in
Big Data (Big Data), 2016 IEEE International Conference on. IEEE,
2016, pp. 3849–3858. 2
[11] N. B. Lassen, L. la Cour, and R. Vatrapu, “Predictive analytics with
social media data,” The SAGE Handbook of Social Media Research
Methods, p. 328, 2017. 2
[12] Y. Karabulut, “Can facebook predict stock market activity?” AFA 2013
San Diego Meetings, 2013. 2
[13] J. Bollen, H. Mao, and X. Zeng, “Twitter mood predicts the stock
market,” Journal of computational science, vol. 2, no. 1, pp. 1–8, 2011.
[14] V. Lampos and N. Cristianini, “Tracking the flu pandemic by monitoring
the social web,” in Cognitive Information Processing (CIP), 2010 2nd
International Workshop on. IEEE, 2010, pp. 411–416. 2
[15] S. Asur and B. Huberman, “Predicting the future with social media,”
in Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010
IEEE/WIC/ACM International Conference on, vol. 1, 2010, pp. 492–
499. 2,3
[16] Y. Lee, S.-H. Kim, and K. C. Cha, “A generalized bass model for
predicting the sales patterns of motion pictures having seasonality
and herd behavior,Journal of Global Scholars of Marketing Science,
vol. 22, no. 4, pp. 310–326, 2012. 2
[17] D. Gayo-Avello, “A meta-analysis of state-of-the-art electoral prediction
from twitter data,” Soc. Sci. Comput. Rev., vol. 31, no. 6, pp. 649–679,
Dec. 2013. 2
[18] N. B. Lassen, R. Madsen, and R. Vatrapu, “Predicting iphone sales from
iphone tweets,” in Enterprise Distributed Object Computing Conference
(EDOC), 2014 IEEE 18th International. IEEE, 2014, pp. 81–90. 2,3
[19] J. Owens and S. Shah, “Webinar: How social media activity
predicts concert ticket sales,”
face-webinar-social-to-sales-sept-2014, 2014. 2
[20] I. P. Cvijikj and F. Michahelles, “A case study of the effects of moderator
posts within a facebook brand page,” in International Conference on
Social Informatics. Springer, 2011, pp. 161–170. 2
[21] H. Li and J. D. Leckenby, “Examining the effectiveness of internet
advertising formats,” Internet advertising : theory and research, pp. 203–
224, 2007. 2
[22] R. J. Lavidge and G. A. Steiner, “A model for predictive measurements
of advertising effectiveness,Journal of Marketing, vol. 25, no. 6, pp.
59–62, 1961. [Online]. Available:
[23] A. Hussain and R. Vatrapu, “Social data analytics tool (sodato),” in
DESRIST-2014 Conference (in press), ser. Lecture Notes in Computer
Science (LNCS). Springer, 2014. 3
[24] N. Straton, K. Hansen, R. R. Mukkamala, A. Hussain, T.-M. Gronli,
H. Langberg, and R. Vatrapu, “Big social data analytics for public
health: Facebook engagement and performance,” in 2016 IEEE 18th
International Conference on e-Health Networking, Applications and
Services (Healthcom). IEEE, 2016, pp. 1–6. 6,7
[25] J. Bughin, “Gettig a sharper picture of social media?s influence,”
McKinsey Quartely, July, 2015. 8
... Según estos autores, existe una relación recíproca entre el rendimiento de su equipo y consumo por parte del seguidor: a mayores éxitos mayor es el consumo de los aficionados, y cuanto mayor sea el consumo de sus fans, el equipo ten-drá mayores beneficios. Un buen ejemplo de ello son las investigaciones que utilizan el big data para predecir los espectadores de un partido de fútbol: a mayor importancia del partido, mayor número de espectadores, índice de audiencia e interacciones en internet (Egebjerg et al., 2017). Por tanto, cada vez es más frecuente ver cómo las organizaciones deportivas, dentro de su estrategia de gestión, prestan gran cuidado a los servicios de atención al cliente o aficionado (MacIntosh y Doherty 2007). ...
... Estas son una fuente de valor para las organizaciones, ya que producen capital social y las empujan al desarrollo organizacional (Hajli y Hajli, 2013). Un uso poco profesional de los medios de comunicación en línea y tradicionales puede implicar una baja conexión entre un equipo y sus fans (Egebjerg et al., 2017), desperdiciando los beneficios que potencialmente se podrían obtener. ...
Full-text available
Las organizaciones deportivas pueden tener una relación más cercana con sus seguidores y aprender de ellos a través de sus interacciones sociales en comunidades virtuales. El objetivo fue constatar la validez, fiabilidad y estructura factorial de la escala de motivos de uso de páginas web de equipos de fútbol. Tras los análisis factoriales confirmatorios se obtuvo un cuestionario reducido compuesto por siete dimensiones, con 21 ítems, con una gran proporción de varianza explicada y fiabilidad alta. Estos resultados constataron que se trataba de un instrumento fiable y válido, acreditándolo como útil para la gestión y la investigación, con una fácil puesta en práctica, pudiendo ser utilizado de forma periódica por los responsables de las organizaciones deportivas, permitiendo la comparación entre éstas. Los resultados han permitido presentar una herramienta capaz de conocer los motivos por los que las personas aficionadas al fútbol utilizaban las páginas web de los equipos. Abstract. Sports organizations may have a closer relationship with their fans and learn from them through their social interactions in virtual communities. The aim was to test the validity, reliability and factor structure of the scale of reasons for using football team websites. The confirmatory factor analyses resulted in a reduced questionnaire composed of seven dimensions, with 21 items, a high proportion of explained variance and high reliability. These results showed that it was a reliable and valid instrument, accrediting it as useful for management and research, easy to implement, and can be used periodically by those in charge of sports organizations, allowing for comparison between them. The results have made it possible to present a tool capable of finding out the reasons why football fans use the teams' websites.
... Egebjerg, Niklas Hedegaard, Gerda Kuum and Raghava Rao Mukkamala [5] in 2017 used Big Data Analytics to anticipate fan commitment as far as onlookers and television appraisals. It was workable for them to set up a prescient model for the quantity of onlookers and audience members decently depending on just two control factors to be specific match type and match day. ...
... In [8], averaged value of audience in past events and Twitter data (contributions per minute) have been used for predicting audience on successive political shows having long series of events; therefore, demonstrating a correlation among the volume of tweets and the audience. In [5], a solution to predict football game results has been proposed by considering the volume of tweets. Hence the achieved prediction rate was in the range of 68%. ...
... Many researchers use basic visualization methods such as line charts to show audience ratings. For 115 example, Egebjerg et al. [32] used line charts to display predicted results from multiple linear regression. They also compared different activities to find outliers using box-plots. ...
Full-text available
The television ratings provide an effective way to analyze the popularity of TV programs and audiences’ watching habits. Most previous studies have analyzed the ratings from a single perspective. Few efforts have integrated analysis from different perspectives and explored the reasons for changes in ratings. In this paper, we design a visual analysis system called TVseer to analyze audience ratings from three perspectives: TV channels, TV programs, and audiences. The system can help users explore the factors that affect ratings, and assist them in decisions about program productions and schedules. There are six linked views in TVseer: the channel ratings view and program ratings view show ratings change information from the perspective of TV channels and programs respectively; the overlapping program competition view and the same-type program competition view indicate the competitive relationships among programs; the audience transfer view shows how audiences are moving among different channels; the audience group view displays audience groups based on their watching behavior. Besides, we also construct case studies and expert interviews to prove our system is useful and effective.
... Ye et al. (Ye et al. 2011), for example, show that user ratings and the number of reviews have a positive impact on online hotel bookings. Facebook activity can be used to predict attendance of football matches (Egebjerg et al. 2017), user-generated content related to music albums has a positive correlation with sales (Dhar and Chang 2009) and movie ticket sales can also be predicted using online ratings (Duan et al. 2008). Social media content was also used in other areas including the prediction of election results or macroeconomic developments (Yu and Kak 2012). ...
Full-text available
In an attempt to channel sales activities, companies often focus on ‘high value targets’ that offer attractive prospective returns. In liberalized electricity markets, commercial customers with high electricity demand constitute such high value targets. The problem when acquiring new customers, however, is that the electricity consumption is not known to the sales organization in advance. This hinders the possibility to prioritize sales targets and thus increases the acquisition cost, reduces the competitiveness within the market and ultimately leads to higher cost for electricity customers. In this study, we investigate the annual electricity consumption of enterprises by means of a dataset with 1810 company addresses in a typical town in Switzerland. We use the industry branch of the enterprises together with open big data (geographic information, online-content, social media data and governmental statistical data) to explain and predict the electricity consumption of such. Our linear regression analysis shows that information on the economic branches of the enterprises, basal area of buildings, number of opening hours and social media data can explain up to 19% of variance in electricity consumption. Economic trends (e.g., in labor market and turnover statistics) reflect changes in the electricity consumption in the investigated years 2010–2014 for several economic branches. We show, that the electricity consumption can be predicted better than a random predictor, however with a high uncertainty. Nevertheless, the open data sources can be used to identify a relevant group of companies with high consumption (more than 100,000kWh per year) with good accuracy.
With the boom in Internet techniques and computer science, a variety of big data have been introduced into forecasting research, bringing new knowledge and improving prediction models. This paper is the first attempt to conduct a literature review on full-scale big data analytics in forecasting research. By source, big data in forecasting research fell into user-generated content data (from the users on social media in texts, photos, etc.), device-monitored data (by meteorological monitors, smart meters, GPS, etc.) and activity log data (for web searching/visiting, online/offline marketing, clinical treatments, laboratory experiments, etc.). Different data types, bearing distinctive information and characteristics, dominated different forecasting tasks, required different analysis technologies and improved different forecasting models. This survey provides an overall review of big data-based forecasting research, details what (regarding data types and sources), where (forecasting hotspots) and how (analysis and forecasting methods used) big data improved prediction, and provides insights into future prospects.
In recent years, social media has become the best platform to advertise products or gain popularity in one way or another. The only way to figure out the public demand is by analyzing the advertisements or product launch, which is gaining increased popularity. On various social media sites, there are billions of users who regularly share their opinions. As a result, a vast amount of data is generated. To identify useful patterns from this data, it must be analyzed. Henceforth, this paper discusses the importance of social media data analysis and its benefits to the business. Also, this paper has utilized the customer support dataset from Twitter for further analysis. This research work considers a large, modern corpus of tweets from customers and their replies. The pandas-profiling tool is used to perform the analysis. The overall outcome of this analysis is about the improvements required in the services of different companies which in return if addressed timely, can increase the overall profit of the companies.
Basic information is that the expression based on ``sports analytics'' is widely used these days. Sports Analytics based pitches have different applications in football that affect the assortment of fields. For example, these may indicate a competitor or group's performance, competitor's ability and market reputation, and a potential physical problem. The ever-increasing number of conferences and pioneers are happy to incorporate such "tools" into their teaching courses to improve their strategies. The show is divided into two important parts. The first is a written audit of existing progress in this regard. The latter part is primarily focused on research directed with football information. In these tests, using appropriate calculations, the player's condition on the field could be traced. By gathering information from past years, can predict a player's goal-scoring performance over the next season. Besides, to be more precise, the player can adjust the shots' size in each match, which corresponds to the objective scoring probability.
Conference Paper
Full-text available
Ever since its first manifesto in Greece around 3000 years ago, sports as a field has accumulated a long history with strong traditions while at the same time, gone through tremendous changes toward professionalization and commercialization. The current waves of digitalization have intensified its evolution, as digital technologies are increasingly entrenched in a wide range of sporting activities and for applications beyond mere performance enhancement. Despite such trends, research on sports digitalization in the IS discipline is surprisingly still nascent. This paper aims at establishing a discourse on sports digitalization within the discipline. Toward this, we first provide an understanding of the institutional Sports Digitalization Thirty Eighth International Conference on Information Systems, Seoul 2017 2 characteristics of the sports industry, establishing its theoretical importance and relevance in our discipline; second, we reveal the latest trends of digitalization in the sports industry and unpack its implications for sports organizations; last, we propose an agenda for sports digitalization research in IS.
Conference Paper
Full-text available
This paper explores the relationship between TV viewership ratings for Scandinavian's most popular talk show, Skavlan and public opinions expressed on its Facebook page. The research aim is to examine whether the activity on social media affects the number of viewers per episode of Skavlan, how the viewers are affected by discussions on the Talk Show, and whether this creates debate on social media afterwards. By analyzing TV viewer ratings of Skavlan talk show, Facebook activity and text classification of Facebook posts and comments with respect to type of emotions and brand sentiment, this paper identifes patterns in the users' real-world and digital world behaviour.
Conference Paper
Full-text available
This paper examines the changes in consumer behaviour and opinions due to the transition from a public to a commercial broadcaster in the context of broadcasting international media events. By analyzing TV viewer ratings, Facebook activity and its sentiment, we aim to provide answers to how the transition from airing Winter Olympic Games on NRK to TV2 in Norway affected consumer behaviour and opinion. We used text classification and visual analytics methods on the business and social datasets. Our main finding is a clear link between negative sentiment and commercials. Despite positive change in customer behaviour, there was a negative change in customer opinion. Based on media events and broadcaster theories, we identify generalisable findings for all such transitions.
Conference Paper
Full-text available
In recent years, social media has offered new opportunities for interaction and distribution of public health information within and across organisations. In this paper, we analysed data from Facebook walls of 153 public organisations using unsupervised machine learning techniques to understand the characteristics of user engagement and post performance. Our analysis indicates an increasing trend of user engagement on public health posts during recent years. Based on the clustering results, our analysis shows that Photo and Link type posts are most favourable for high and medium user engagement respectively.
Full-text available
This chapter presents a methodology and software tool for the analysis of Facebook data. In particular, it describes and demonstrates the analytical framework and computational aspects of the Social Data Analytics Tool (SODATO). SODATO fetches, stores, analyses and visualises data from Facebook walls. The method has been previously applied to the US elections of 2008 (Robertson 2011; Robertson et al. 2010a, 2010b). Here, we replicate and extend the analysis to Danish elections in 2011. Our substantive research question is to measure the extent to which Facebook walls function as online public spheres. To do so, we extract the Facebook walls of three prominent candidates in the 2011 Danish general election. Our findings show overlapping online public spheres and how different types of individuals inhabit these overlapping public spheres and how they provide structure and interpretive information for others.
Full-text available
Current analytical approaches in computational social science can be characterized by four dominant paradigms: text analysis (information extraction and classification), social network analysis (graph theory), social complexity analysis (complex systems science), and social simulations (cellular automata and agent-based modeling). However, when it comes to organizational and societal units of analysis, there exists no approach to conceptualize, model, analyze, explain, and predict social media interactions as individuals' associations with ideas, values, identities, and so on. To address this limitation, based on the sociology of associations and the mathematics of set theory, this paper presents a new approach to big data analytics called social set analysis. Social set analysis consists of a generative framework for the philosophies of computational social science, theory of social data, conceptual and formal models of social data, and an analytical framework for combining big social data sets with organizational and societal data sets. Three empirical studies of big social data are presented to illustrate and demonstrate social set analysis in terms of fuzzy set-theoretical sentiment analysis, crisp set-theoretical interaction analysis, and event studies-oriented set-theoretical visualizations. Implications for big data analytics, current limitations of the set-theoretical approach, and future directions are outlined.
Conference Paper
Full-text available
This paper presents the Social Data Analytics Tool (SODATO) that is designed, developed and evaluated to collect, store, analyze, and report big social data emanating from the social media engagement of and social media conversations about organizations.
In this study we investigate the important but rather ambiguous role of game outcome uncertainty (GOU) in consumers’ demand for professional sports. Specifically, using a unique and strongly balanced panel data set containing information on individual physical attendance from 13,892 season ticket holders (STHs) of a German professional football club, we find evidence for a positive effect of GOU on two differing spectator decisions – both the decision to physically attend a game in the stadium and the decision on what time to enter the stadium, an aspect which has so far been neglected in the literature. Moreover, GOU seems to play an especially important role in the decision-making of one particular group: STHs with comparatively high coordination costs.
The relationship between attendance at major league baseball games and the uncertainty of the outcome of each game is examined. We use an a priori measure of uncertainty in estimating the attendance equation. The variable is developed from the betting lines for individual games and measures the probability of a home team victory during the 1988 major league baseball season. The results indicate that uncertainty of outcome is a significant determinant of attendance for major league baseball. In addition, the results are used to determine the probability of a home team victory at which attendance will be maximized.