ArticlePDF Available

Abstract

We here focus on the problem of predicting the popularity trend of user generated content (UGC) as early as possible. Taking YouTube videos as case study, we propose a novel two-step learning approach that: (1) extracts popularity trends from previously uploaded objects, and (2) predicts trends for new content. Unlike previous work, our solution explicitly addresses the inherent tradeoff between prediction accuracy and remaining interest in the content after prediction, solving it on a per-object basis. Our experimental results show great improvements of our solution over alternatives, and its applicability to improve the accuracy of state-of-the-art popularity prediction methods.
TrendLearner: Early Prediction of Popularity Trends of User Generated Content
Flavio Figueiredo1,, Jussara M. Almeidaa, Marcos A. Gonc¸alvesa, Fabricio Benevenutoa
aDepartment of Computer Science, Universidade Federal de Minas Gerais
Av. Antˆonio Carlos 6627, CEP 31270-010, Belo Horizonte - MG, Brazil. Phone: +55 (31) 3409-7541, Fax: +55 (31) 3409-5858
Abstract
Predicting the popularity of user generated content (UGC) is a valuable task to content providers, advertisers, as well as social media
researchers. However, it is also a challenging task due to the plethora of factors that aect content popularity in social systems. Here,
we focus on the problem of predicting the popularity trend of a piece of UGC (object) as early as possible. Unlike previous work,
we explicitly address the inherent tradeobetween prediction accuracy and remaining interest in the object after prediction, since,
to be useful, accurate predictions should be made before interest has exhausted. Given the heterogeneity in popularity dynamics
across objects, this tradeohas to be solved on a per-object basis, making the prediction task harder. We tackle this problem with a
novel two-step learning approach in which we: (1) extract popularity trends from previously uploaded objects, and then (2) predict
trends for newly uploaded content. Our results for YouTube datasets show that our classification eectiveness, captured by F1
scores, is 38% better than the baseline approaches. Moreover, we achieve these results with up to 68% of the views still remaining
for 50% or 21% of the videos, depending on the dataset.
Keywords: popularity, trends, classification, social media, ugc, prediction
1. Introduction
The success of Internet applications based on user generated
content (UGC)1has motivated questions such as: How does
content popularity evolve over time? What is the potential pop-
ularity a piece of content will achieve after a given time period?
How can we predict popularity evolution of a particular piece of
UGC? For example, from a system perspective, accurate popu-
larity predictions can be exploited to build more cost-eective
content organization and delivery platforms (e.g., caching sys-
tems, CDNs). They can also drive the design of better analytic
tools, a major segment nowadays [20, 34], while online adver-
tisers may benefit from them to more eectively place contex-
tual advertisements. From a social perspective, understanding
issues related to popularity prediction can be used to better un-
derstand the human dynamics of consumption. Moreover, be-
ing able to predict popularity on an automated way is crucial for
marketing campaigns (e.g. created by activists or politicians),
which increasingly often use the Web to influence public opin-
ion.
Challenges: However, predicting the popularity of a piece
of content, here referred to as an object, in a social system is
a very challenging task. This is mostly due to the various phe-
nomena aecting the popularity prediction of social media –
which were observed on the datasets we use (as well as oth-
ers) [11, 22, 33] – as well as the diminishing interesting in ob-
jects over time, which implies that popularity predictions must
Corresponding author
Email addresses: flaviov@dcc.ufmg.br (Flavio Figueiredo),
jussara@dcc.ufmg.br (Jussara M. Almeida), mgoncalv@dcc.ufmg.br
(Marcos A. Gonc¸alves), fabricio@dcc.ufmg.br (Fabricio Benevenuto)
1YouTube, Flickr, Twitter, and so forth
be timely to capture user interest and be useful in real work
settings. Both challenges can be summarized as follows:
1. Due to the easiness with which UGC can be created, many
factors can aect an object’s popularity. Such factors in-
clude, for instance, the object’s content, the social context
in which it is inserted (e.g., social neighborhood or influ-
ence zone of the object’s creator), the mechanisms used to
access the content (e.g., searching, recommendation, top-
lists), or even an external factor, such as a hyperlink to the
content in a popular blog or website. These factors can
cause spikes in the surge of interest in objects, as well as
information propagation cascades which aect the popu-
larity trends of objects.
2. To be useful in a real scenario, a popularity prediction ap-
proach must identify popularity trends before the user in-
terest in the object has severely diminished. To illustrate
this point, Figure 1 shows the popularity evolution of two
YouTube videos: the video on the left receives more than
80% (shaded region) of all views received during its lifes-
pan in the first 300 days since upload, whereas the other
video receives only about half of its total views in the
same time frame. If we were to monitor each video for
300 days, most potential views of the first video would be
lost. In other words, not all objects require the same mon-
itoring period, as assumed by previous work, to produce
accurate predictions: for some objects, the prediction can
be made earlier. Thus, the tradeoshould be solved on a
per-object basis, which implies that determining the dura-
tion of the monitoring period that leads to a good solution
of the tradeofor each object is part of the problem.
Preprint submitted to Elsevier January 9, 2020
arXiv:1402.2351v4 [cs.SI] 14 Feb 2016
050100
150
200
250
300
350
400
Time window (days)
0
10
20
30
40
50
60
Views (103)
87.27%
050
100
150
200
250
300
350
400
450
Time window (days)
0
50
100
150
200
250
51.14%
Figure 1: Popularity Evolution of Two YouTube Videos.
These challenges set UGC objects apart from more tradi-
tional web content. For instance, news media [3] tends to have
clear definitions of monitoring periods, say predicting the pop-
ularity of news after one day using information from the first
hour after upload. This is mostly due to the timely nature of
the content, which is reflected in the popularity trends usually
followed by news media [11] – interest is usually concentrated
in a peak window (e.g., day) and dies out rather quickly. Thus,
mindful of the challenges above, we here tackle the problem
of UGC popularity trend prediction. That is, we focus on the
(hard) task of predicting popularity trends. Trend prediction can
help determining, for example, if an object will follow a viral
pattern (e.g., Internet memes) or will continue to gain attention
over time (e.g., music videos for popular artists). Moreover, we
shall also show that, by knowing popularity trends beforehand,
we can improve the accuracy of models for predicting popular-
ity measures (e.g., hits). Thus, by focusing on predicting trends,
we fill a gap in current research since no previous eorts has
eectively predicted the popularity trend of UGC taking into
account challenges (1) and (2).
We should stress that one key aspect distinguishes our work
from previous eorts to predict popularity [1, 3, 19, 25, 27, 32]
– we explicitly address the inherent tradeobetween predic-
tion accuracy and how early the prediction is made, assessed
in terms of the remaining interest in the content after predic-
tion. All previous popularity prediction eorts considered fixed
monitoring periods for all objects, which is given as input. We
refer to this problem as early prediction2.
In terms of applications, knowing that an object will be pop-
ular early on can help advertisers to plan out specific revenue
models [13]. Such knowledge can also help out on geographic
content sharding [8] for better content delivery. On the other
hand, being aware that an object will not be popular at all, as
early as possible, allow low access content to be tiered down to
lower latency servers/geographic regions, whereas advertisers
can use this knowledge to avoid bidding for ads in such content
(since they will not generate revenue). Another example are
search engines rankings based on predictions [26]. Knowing
2We also point out that an earlier, much simpler, variant of our approach,
which did not focus on early predictions, was first place on two out of three
prediction tasks of the 2014 ECML/PKDD Predictive Analytics Challenge for
News Content [10]3, reflecting the quality/eectiveness of our proposal.
that a content is becoming popular can help out in generating
better rankings to user queries. However, if we have evidence
(based on the trend and remaining interest) that such content is
losing popularity (e.g., timely content that users can lose inter-
est over time), such contents may be of less interest to the user.
Finally, early prediction is of utmost importance to content pro-
ducers – knowing whether a piece of content will be follow a
certain trend can help in their promotion strategies and in the
creation of new content.
TrendLearner: We tackle this problem with a novel two-
step combined learning approach. First, we identified popu-
larity trends, expressed by popularity timeseries, from previ-
ously uploaded objects. Then, we combine novel time series
classification algorithms with object features for predicting the
trends of new objects. This approach is motivated by the intu-
ition that it might be easier to identify the popularity trend of an
object if one has a set of possible trends as basis for compari-
son. More important, we propose a new trend classification ap-
proach, namely TrendLearner, that tackles the aforementioned
tradeobetween prediction accuracy and remaining interest af-
ter prediction on a per-object basis. The idea here is to monitor
newly uploaded content on an online basis to determine, for
each monitored object, the earliest point in time when predic-
tion confidence is deemed to be good enough (defined by input
parameters), producing, as output, the probabilities of each ob-
ject belonging to each class (trend). Moreover, unlike previous
work, TrendLearner also combines the results from this classi-
fier (i.e., the probabilities) with a set of object related features
[11], such as category and incoming links, building an ensem-
ble learner.
To evaluate our method, we use, in addition to traditional
classification metrics (e.g., Micro/Macro F1), two newly de-
fined metrics, specific for the problem: (1) remaining interest
(RI), defined as the fraction of all views (up to a certain date)
that remain after the prediction, and (2) the correlation between
the total views and the remaining interest. While the first metric
measures the potential future viewership of the objects, the sec-
ond one estimates whether there is any bias towards more/less
popular objects.
In sum, our main contributions include a novel popular-
ity trend classification method that considers multiple trends,
called TrendLearner. The use of TrendLearner can improve the
2
prediction of popularity metrics (e.g., number of views). Im-
provements over state-art-method are significant, being around
33%, at least.
The rest of this article is organized as follows. Next section
discusses related work. We state our target problem in Sec-
tion 3, and present our approach to solve it in Section 4. We in-
troduce the metrics and datasets used to evaluate our approach
in Section 5. Our main experimental results are discussed in
Section 6. Section 7 oers conclusions and directions for fu-
ture work.
2. Related Work
Popularity evolution of online content has been the target of
several studies. Several previous eorts aimed at developing
models to predict the popularity of a piece of content at a given
future date. In [19], the authors developed stochastic user be-
havior models to predict the popularity of Digg’s stories based
on early user reactions to new content and aspects of the web-
site design. Such models are very specific to Digg features, and
are not general enough for dierent kinds of UGC. Szabo and
Huberman proposed a linear regression method to predict the
popularity of YouTube and Digg content from early measures
of user accesses [27]. This method has been recently extended
and improved with the use of multiple features [25]. Castillo et.
al. [3] used a similar approach as [27] to predict the popularity
of news content.
Out of these previous eorts, most authors focused on varia-
tions of Linear Regression based methods to predict UGC pop-
ularity [3, 18, 25]. In the context of search engines, Radinsky et
al. proposed Holt-Winters linear models to predict future popu-
larity, seasonality and the bursty behavior of queries [26] . The
models capture the behavior of a population of users searching
on the Web for a specific query, and are trained for each in-
dividual time series. We note that none of these prior eorts
focused on the problem of predicting popularity trends. In par-
ticular, those focused on UGC popularity prediction assumed
a fixed monitoring period for all objects, given as input, and
did not explore the trade-obetween prediction accuracy and
remaining views after prediction.
Other methods exploit epidemic modeling of UGC popular-
ity evolution. Focusing on content propagation within an OSN,
Li et al. addressed video popularity prediction within a sin-
gle (external) OSN (e.g., Facebook) [21]. Similarly, Matsubara
et. al. [22] created a unifying epidemic model for the trends
usually found in UGC. Such a model can be use for tail fore-
casting, that is, predictions after the peak time window. Again,
none of these methods focus neither on trend predictions or on
early predictions as we do. Also, tail-part forecasting is very
limited when the popularity of an object may exhibit multiple
peaks [15, 33]. By focusing on a two step trend identification
and prediction approach, combined with a non-parametric dis-
tance function, TrendLearn can overcome these challenges.
Chen et al. [5] propose to predict whether a tweet will be-
come a trending topic by applying a binary classification model
(trending versus non-trending), learned from a set of objects
from each class. We here propose a more general approach to
detect multiple trends (classes), where trends are first automati-
cally learned from a training set. It is also important to note that
our solution complements the one by Jiang et al. [17], which
focused on predicting when a video will peak in popularity. Fi-
nally, our solution also exploits the concept of shapelets [31] to
reduce the classification time complexity, as we show in Section
4.
We also mention some other eorts to detect trending top-
ics in various domains. Vakali et al. proposed a cloud-based
framework for detecting trending topics on Twitter and blog-
ging systems [28], focusing particularly on implementing the
framework on the cloud, which is complementary to our goal.
Golbandi et al. [14] tackled trend topic detection for search en-
gines. Despite the similar goal, their solution applies to a very
dierent domain, and thus focuses on dierent elements (query
terms) and uses dierent techniques (language models) for pre-
diction.
Table 1 summarizes the key functionalities of the aforemen-
tioned approaches as well as of our new TrendLearner method.
In sum, to our knowledge, we are the first to tackle the inherent
challenges of predicting UGC popularity (trends and metrics) as
early and accurately as possible, on a per-object basis, recogniz-
ing that dierent objects may require dierent monitoring peri-
ods for accurate predictions. More important, the challenges we
approach with TrendLearner (i.e. predicting trends also tack-
ling the tradeobetween prediction accuracy and remaining in-
terest after prediction on a per-object basis) are key to leverage
popularity prediction towards practical scenarios and deploy-
ment in real systems.
3. Problem Statement
The early popularity trend prediction problem can be defined
as follows. Given a training set of previously monitored user
generated objects (e.g., YouTube videos or tweets), Dtr ain, and
a test set of newly uploaded objects Dtest , do: (1) extract popu-
larity trends from Dtrain ; and (2) predict a trend for each object
in Dtest as early and accurately as possible, particularly before
user interest in such content has significantly decayed. User in-
terest can be expressed as the fraction of all potential views a
new content will receive until a given point in time (e.g., the day
when the object was collected). Thus, by predicting as early as
possible the popularity trend of an object, we aim at maximiz-
ing the fraction of views that still remain to be received after
prediction. Determining the earliest point in time when pre-
diction can be made with reasonable accuracy is an inherent
challenge of the early popularity prediction problem, given that
it must be addressed on a per-object basis. That is, while later
predictions can be more accurate, they would imply a reduction
of the remaining interest in the content.
In particular, we here treat the above problem as a trend-
extraction one combined with as a multi-class classification
task. The popularity trends automatically extracted from Dtrain
(step 1) represent the classes into which objects in Dtest should
be grouped (step 2). Trend extraction is performed using a time
series clustering algorithm [30], whereas prediction is a classi-
3
Table 1: Comparison of TrendLearner with other approaches
Trend Identification Trend Prediction Views Prediction Early Prediction
Trending Topics Prediction
[5, 28] X(Binary only)
Linear Regression [25]
[27] X
[3]
Holt-Winters
[26] X
Epidemic Models
[21, 22] X
TrendLearner X X X X
Table 2: Notation. Vectors (x) and matrices (X), in bold, are dierentiated by lower and upper cases. Streams ( ˆ
x) are dierentiated
by the hat accent (ˆ). Sets (D) and variables (d) are shown in regular upper and lower case letters, respectively.
Symbol Meaning Example
Ddataset of UGC content YouTube videos
Dtrain training set -
Dtest testing set -
da piece of content or object video
Diclass/trend i -
cDicentroid of class i -
sdtime series vector for object dsd=<pd,1,· · · ,pd,n>
ˆ
sdtime series stream for object dˆ
sd=<pd,1, , · · ·
pd,ipopularity of dat i-th window number of views
s[i]
dindex operator <7,8,9>[2]=8
s[i:j]
dslicing operator <7,8,9>[2:3]=<8,9>
Smatrix with set of time series all time series
fication task. For the sake of clarity, we shall make use of the
term “class” to refer to both clusters and classes.
Table 2 summarizes the notation used throughout the paper.
Each object dDtrain is represented by an n-dimensional time
series vector sd=<pd,1,pd,2,· · · ,pd,n>, where pd,iis the pop-
ularity (i.e., number of views) acquired by d during the ith time
window after its upload. Intuitively, the duration of a time win-
dow wcould be a few hours, days, weeks, or even months.
Thus, vector sdrepresents a time series of the popularity of
a piece of content measured at time intervals of duration w
(fixed for each vector). New objects in Dte st are represented by
streams, ˆ
sd, of potentially infinite length (ˆ
sd=<pd,1,pd,2,· · · ).
This captures the fact that our trend prediction/classification
method is based on monitoring each test object on an online ba-
sis, determining when a prediction with acceptable confidence
can be made (see Section 4.2). Note that a vector can be seen
as a contiguous subsequence of a stream. Note also that the
complete dataset is referred to as D=Dtrain SDte st.
4. Our Approach
We here present our solution to the early popularity trend pre-
diction problem. We introduce our trend extraction approach
(Section 4.1), present our novel trend classification method,
TrendLearner (Section 4.2), and discuss practical issues related
to the joint use of both techniques (Section 4.3).
4.1. Trend Extraction
To extract temporal patterns of popularity evolution (or
trends) from objects in Dtrain , we employ a time series cluster-
ing algorithm called K-Spectral Clustering (KSC) [30]4, which
4We have implemented a parallel version of the KSC algorithm which is
available at http://github.com/flaviovdf/pyksc. The repository also
contains the TrendLearner code
4
groups time series based on the shape of the curve. To group
the time series, KSC defines the following distance metric to
capture the similarity between two time series sdand sd0with
scale and shifting invariants:
dist(sd,sd0)=min α, q||sdαsd0(q)||
||sd|| ,(1)
where sd0(q)is the operation of shifting the time series sd0by q
units and || · || is the l2norm5. For a fixed q, there exists an exact
solution for αby computing the minimum of dist, which is: α=
sT
dsd0(q)
||sd0|| .In contrast, there is no simple way to compute shifting
parameter q. Thus, in our implementation of KSC, whenever
we measure the distance between two series, we search for the
optimal value of qconsidering all integers in the range (n,n)6.
Having defined a distance metric, KSC is mostly a direct
translation of the K-Means algorithm [6]. Given a number of
trends kto extract and the set of time series, it works as:
1. The time series are uniformly distributed to krandom classes;
2. Cluster centroids are computed based on its members. In
K-Means based algorithms, the goal is to find centroid cDisuch
that cDi=arg,mincPsdDidist(sd,c)2. We refer the reader to
the original KSC paper for more details on how to find cDi[30];
3. For each time series vector sd, object dis assigned to the
nearest centroid based on metric dist; 4. Return to step 2 until
convergence, i.e., until all objects remain within the same class
in step 3. Each centroid defines the trend that objects in the
class (mostly) follow.
Before introducing our trend classification method, we make
the following observation that is key to support the design of the
proposed approach: each trend, as defined by a centroid, is con-
ceptually equivalent to the notion of time series shapelets [31].
A shapelet is informally defined as a time series subsequence
that is in a sense maximally representative of a class. As argued
in [31], the distance to the shapelet can be used to classify ob-
jects with more accuracy and much faster than state-of-the-art
classifiers. Thus, by showing that a centroid is a shapelet, we
choose to classify a new object based only on the distances be-
tween the object’s popularity time series up to a monitored time
and each trend.
This is one of the points where our approach diers from
the method proposed in [5], which uses the complete Dt rain
as reference series, classifying an object based on the dis-
tances between its time series and all elements of each class.
Given |Dtrain|objects in the training set and ktrends (with
k<< |Dtrain |), our approach is faster by a factor of |Dtr ain|
k.
Definition: For a given class Di, a shapelet cDiis a time se-
ries subsequence such that: (1) dist(cDi,sd)β, sdDi; and
(2) dist(cDi,sd0)> β, sd0<Di, where βis defined as an optimal
5The l2norm of a vector xis defined as ||x|| =qPn
i=1x2
i.
6Shifts are performed in a rolling manner, where elements at the end of
the vector return to the beginning. This maintains the symmetric nature of
dist(sd,sd0).
distance for a given class. With this definition, a shapelet can be
shown to maximize the information gain of a given class [31],
being thus the most representative time series of that class.
We argue that, by construction, a centroid produced by KSC
is a shapelet with βbeing the distance from the centroid to the
time series within the class that is furthest away from its cen-
troid. Otherwise, the time series that is furthest away would
belong to a dierent class, which contradicts the KSC algo-
rithm. This is an intuitive observation. Note that a centroid is
a shapelet only when using K-Means based approaches, such
as KSC, to define class labels. In the case of learning from al-
ready labeled data a shapelet finding algorithms [31] should be
employed.
4.2. Trend Prediction
Let Direpresent class i, previously learned from Dtrain . Our
task now is to create a classifier that correctly determines the
class of a new object as early as possible. We do so by moni-
toring the popularity acquired by each object d(dDtest ) since
its upload on successive time windows. As soon as we can state
that dbelongs to a class with acceptable confidence, we stop
monitoring it and report the prediction. The heart of this ap-
proach is in detecting when such statement can be made.
4.2.1. Probability of an Object Belonging to a Class
Given a monitoring period defined by trtime windows, our
trend prediction is fundamentally based on the distances be-
tween the subsequence of the stream ˆ
sdrepresenting d’s pop-
ularity curve from its upload until tr,ˆ
s[1:tr]
d, and the centroid
of each class. To respect shifting invariants, we consider all
possible starting windows tsin each centroid time series when
computing distances. That is, given a centroid cDi, we consider
all values from 1 to |cDi| − tr, where |cDi|is the number of time
windows in cDi. Specifically, the probability that a new object d
belongs to class Di, given Di’s centroid, the monitoring period
trand a starting window ts, is:
p(ˆ
sdDi|cDi;tr,ts)ex p(di st(ˆ
s[1:tr]
d,c[ts:ts+tr1]
Di)) (2)
where [x:y] (xy) is a moving window slicing operator (see
Table 2). As in [5, 6, 25], we assume that probabilities are in-
versely proportional to the exponential function of the distance
between both series, given by function dist (Equation 1), nor-
malizing them afterwards to fall in the 0 to 1 range (here omit-
ted for simplicity). Figure 2 shows an illustrative example of
how both time series would be aligned for probability compu-
tation7. That is, for time series of dierent lengths, we slice a
consecutive range of the largest time series so that it has the size
of the smallest one. Every slice possible is considered (starting
from 1 to |ˆ
s[1:tr]
d|) and we keep the slice with the smallest dis-
tance when computing probabilities.
With Equation 2, we could build a classifier that simply picks
the class with highest probability. But this would require tsand
7In case |cDi|<|ˆ
s[1:tr]
d|, we try all possible alignments of cDiwith ˆ
s[1:tr]
d.
5
ˆsd
cDitsts+tr1
Figure 2: Example of alignment of time series (dashed lines)
for probability computation.
trto be fixed. As shown in Figure 1, dierent time series may
need dierent monitoring periods (dierent values of tsand tr),
depending on the required confidence.
Instead, our approach is to monitor an object for succes-
sive time windows (increasing tr), computing the probability
of it belonging to each class at the end of each window. We
stop when the class with maximum probability exceeds a class-
specific threshold, representing the required minimum confi-
dence on predictions for that class. We detail our approach
next, focusing first on a single class (Algorithm 1), and then
generalizing it to multiple classes (Algorithm 2).
Algorithm 1 shows how we define when to stop computing
the probability for a given class Di. The algorithm takes as in-
put the object stream ˆ
sd, the class centroid cDi, the minimum
confidence θirequired to state that a new object belongs to Di,
as well as γiand γmax , the minimum and maximum thresholds
for the monitoring period. The former is used to avoid comput-
ing distances with too few windows, which may lead to very
high (but unrealistic) probabilities. The latter is used to guaran-
tee that the algorithm ends. We allow dierent values of γiand
θifor each class as dierent popularity trends have overall dif-
ferent dynamics, requiring dierent thresholds8. The algorithm
outputs the number of monitored windows trand the estimated
probability p. The loop in line 4 updates the stream with new
observations (increases tr), and function AlignCom puteProb
computes the probability for a given trby trying all possible
alignments (i.e., all possible values of ts). For a fixed align-
ment (i.e., fixed trand ts), AlignCom puteProb computes the
distance between both time series (line 15) and the probability
of ˆ
sdbelonging to Di(line 16). It returns the largest probabil-
ity representing the best alignment between ˆ
sdand cDi, for the
given tr(lines 17 and 20). Both loops that iterate over tr(line 4)
and ts(line 15) stop when the probability exceeds the minimum
confidence θi. The algorithm also stops when the monitoring
period trexceeds γmax (line 7), returning a probability equal to
0 to indicate that it was not possible to state the ˆ
sdbelongs to
Diwithin the maximum monitoring period allowed (γmax).
We now extend Algorithm 1 to compute probabilities and
monitoring periods for all object streams in Dtest, considering
all classes extracted from Dtrain . Algorithm 2 takes as input the
test set Dtest , a matrix CDwith the class centroids, vectors θ
and γwith per-class parameters, and γmax. It outputs a vector t
with the required monitoring period for each object, and a ma-
trix Pwith the probability estimates for each object (row) and
class (column), both initialized with 0 in all elements. Given a
valid monitoring period tr(line 6), the algorithm monitors each
8Indeed, initial experiments showed that using the same values of γi(and
θi) for all classes produces worse results.
object din Dtest (line 7) by first computing the probability of d
belonging to each class (line 9). It then takes, for each object
d, the largest of the computed probabilities (line 11) and the as-
sociated class (line 12), and tests whether it is possible to state
that dbelongs to that class with enough confidence at tr, i.e.,
whether: (1) the probability exceeds the minimum confidence
for the class, and (2) trexceeds the per-class minimum thresh-
old (line 13). If the test succeeds, the algorithm stops monitor-
ing the object (line 16), saving the current trand the per-class
probabilities computed at this window in tand P(lines 14-15).
After exhausting all possible monitoring periods (tr> γmax ) or
whenever the number of objects being monitored nobj s reaches
0, the algorithm returns. At this point, entries with 0 in Pin-
dicate objects for which no prediction was possible within the
maximum monitoring period allowed (γmax).
Having P, a simple classifier can be built by choosing for
each object (row) the class (column) with maximum probabil-
ity. The value in tdetermines how early this classification can
be done. However, we here employ a dierent strategy, using
matrix Pas input features to another classifier, as discussed be-
low. We compare our proposed approach against the aforemen-
tioned simpler strategy in Section 6.
4.2.2. Probabilities as Input Features to a Classifier
Instead of directly extracting classes from P, we choose to
use this matrix as input features to another classification algo-
rithm, motivated by previous results on the eectiveness of us-
ing distances as features to learning methods [6]. Specifically,
we employ an extremely randomized trees classifier [12], as it
has been shown to be eective on dierent datasets [12], requir-
ing little or no pre-processing, besides producing models that
can be more easily interpreted, compared to other techniques
like Support Vector Machines9. Extremely randomized trees
tackle the over fitting problem of more common decision tree
algorithms by training a large ensemble of trees. They work as
follows: 1) for each node in a tree, the algorithm selects the best
features for splitting based on a random subset of all features;
2) split values are chosen at random. The decision of these
trees are then averaged out to perform the final classification.
Although feature search and split values are based on random-
ization, tree nodes are still chosen based on the maximization
of some measure of discriminative power such as Information
Gain, with the goal of improving classification eectiveness.
We extend the set of probability features taken from Pwith
other features associated with the objects. The set of object fea-
tures used depends on the type of UGC under study and charac-
teristics of the datasets (D). We here use the features shown in
Table 3, which are further discussed in Section 5.2, combining
them with the probabilities in P. We refer to this approach as
TrendLearner.
Before continuing, we briefly discuss other strategies to com-
bine classifiers as we have done. We experimented with these
methods, finding them to be unsuitable to our dataset due to var-
ious reasons. For instance, we implemented Co-Training [24], a
9We also used SVM learners, achieving similar results.
6
Algorithm 1 Define when to stop computing probability of object ˆ
sdbelonging to class Di, based on minimum confidence θi, and
minimum and maximum monitoring periods γiand γmax .
1: function PerClassProb(ˆ
sd,cDi,θi,γi,γmax )
2: p0
3: trγi1Start at previous window
4: while p< θido Extend monitoring period
5: trtr+1Move to next current window
6: if tr> γmax then Monitoring period ended
7: return γmax,0
8: end if
9: pAlignComputeProb(ˆ
sd,cDi, θi,tr)
10: end while
11: return tr,pReturn monitoring period and probability
12: end function
13: function AlignComputeProb(ˆ
sd,cDi,θi,tr)
14: ts1; p0
15: while (ts≤ |cDi| − tr) and (p< θi)do
Iterate over possible values of ts, aligning both series
16: p0ex p(dist(ˆ
s[1:tr]
d,c[ts:ts+tr1]
Di))
17: pmax(p,p0)
18: tsts+1
19: end while
20: return p
21: end function
Algorithm 2 Define when to stop computing probabilities for each object in Dtest, considering the centroids of all classes (CD),
per-class minimum confidence (θ) and monitoring period (γ), and maximum monitoring period (γmax).
1: function MultiClassProbs(Dt est ,CD,θ,γ,γmax)
2: t=[0] Per-object monitoring period vector
3: P=[[0]] Per-object, per-class probability matrix
4: nob js ← |Dt est |Number of objects to be monitored
5: trmin(γ)Init trwith minimum γi
6: while (trγmax ) and (nob js >0) do
7: for all ˆ
sdDtest do Predict class for each object
8: for all cDiCDdo Get centroid of each class
9: p[i]AlignComputeProb(ˆ
sd,cDi, θi,tr)
10: end for
11: max p max(p)Get max. probability and corresponding class for tr
12: maxc argma x(p)
13: if (max p >θ[maxc]) and (trγ[ma xc])then Stop if maxp and trexceeds per-class thresholds
14: t[d]trSave current tr
15: P[d]pSave current pin row d
16: nob js nob j s 1
17: Dtest Dte st − {ˆ
sd}
18: end if
19: end for
20: trtr+1
21: end while
22: return t,PReturn monitoring periods and probabilities
23: end function
traditional semi-supervised label propagation approach. How-
ever, it failed to achieve better results than just combining the
features, most likely because it depends on feature indepen-
dence, which may not hold in our case. We also experimented
with Stacking [9], which yielded similar results as the proposed
approach. Nevertheless, either strategy might be more eective
on dierent datasets or types of UGC, an analysis that we leave
for future work.
4.3. Putting It All Together
A key point that remains to be discussed is how to define the
input parameters of the trend extraction approach, that is, the
number of trends k, as well as the parameters of TrendLearner,
namely vectors θand γ,γmax, and the parameters of the adopted
classifier.
We choose the number of trends kbased primarily on the
βCV quality metric [23]. Let the intraclass distance be the dis-
7
tance between a time series and its centroid (the trend), and
the interclass distance be the distance between dierent trends.
The general purpose of the trend extraction is to minimize the
variance of the intraclass distances while maximizing the vari-
ance of the interclass distances. The βCV is defined as the ratio
of the coecient of variation10 (CV) of intraclass distances to
the CV of the interclass distances. The value of βCV should
be computed for increasing values of k. The smallest kafter
which the βCV remains roughly stable should be chosen [23], as
a stable βCV indicates that new splits aect only marginally the
variations of intra and interclass distances, implying that a well
formed trend has been split.
Regarding the TrendLearner parameters, we here choose to
constrain γmax with the maximum number of points in our time
series (100 in our case, as discussed in Section 5.2). As for
vector parameters θand γ, a traditional cross-validation in the
training set to determine their optimal values would imply in
a search over an exponential space of values. Moreover, note
that it is fairly simple to achieve best classification results by
setting θto all zeros and γto large values, but this would lead
to very late predictions (and possibly low remaining interest in
the content after prediction). Instead, we suggest an alternative
approach. Considering each class iseparately, we run a one-
against-all classification for objects of iin Dtrain for values of γi
varying from 1 till γmax. We select the smallest value of γifor
which the performance exceeds a minimum target (e.g., classi-
fication above random choice, meaning Micro-F1 greater than
0.5), and set θito the average probability computed for all class
iobjects for the selected γi. We repeat the same process for
all classes. Depending on the required tradeobetween predic-
tion accuracy and remaining fraction of views, dierent perfor-
mance targets could be used. Finally, we use cross-validation in
the training set to choose the parameter values for the extremely
randomized trees classifier, as further discussed in Section 6.
We summarize our solution to the early trend prediction
problem in Algorithm 3. In particular, TrendLearner works
by first learning the best parameter values and the clas-
sification model from the training set (LearnParams and
T rainE RT rees), and then applying the learned model to clas-
sify test objects (Predict ERT ree s), taking the class member-
ship probabilities (MultiClassProb) and other object features
as inputs. A pictorial representation is shown in Figure 3. Com-
pared to previous eorts [5], our method incorporates multiple
classes, uses only centroids to compute class membership prob-
abilities (which reduces time complexity), and combines these
probabilities with other object features as inputs to a classifier,
which, as shown in Section 6, leads to better results.
5. Evaluation Methodology
This section presents the metrics (Section 5.1) and datasets
(Section 5.2) used in our evaluation.
10The ratio of the standard deviation to the mean.
5.1. Metrics
As discussed in Section 3, an inherent challenge of the early
popularity trend prediction problem is to properly address the
tradeobetween prediction accuracy and how early the predic-
tion is made. Thus, we evaluate our method with respect to
these two aspects.
We estimate prediction accuracy using the standard Micro
and Macro F1 metrics, which are computed from precision and
recall. The precision of class c,P(c), is the fraction of correctly
classified videos out of those assigned to cby the classifier,
whereas the recall of class c,R(c), is the fraction of correctly
classified objects out of those that actually belong to that class.
The F1 of class cis given by: F1(c)=2·P(c)·R(c)
P(c)+R(c).Macro F1 is the
average F1 across all classes, whereas Micro F1 is computed
from global precision and recall, calculated for all classes.
To complement the standard metrics above, we propose the
use of novel metrics that we define to measure the eectiveness
of the early predictions extracted by TrendLearner. These met-
rics are by no means replacements for standard classification
evaluation metrics (such as the F1 defined above). That is, given
that TrendLearner aims to capture the trade-obetween accu-
racy and early predictions, our proposed novel metrics need to
be evaluated together with the traditional ones. Recall that, our
objectives are to evaluate both the: (1) accuracy of the classifi-
cation; and (2) the possible loss of user interest in objects over
time.
We evaluate how early our correct predictions are made com-
puting the remaining interest (RI) in the content after predic-
tion. The RI for an object sdis defined as the fraction of all
views up to a certain point in time (e.g., the day when the ob-
ject was collected) that are received after the prediction. That
is, RI(sd,t)=sum(s[t[d]+1:n]
d)
sum(s[1:n]
d)where nis the number of points in d’s
time series, t[d]is the prediction time (i.e., monitoring period)
produced by our method for d, and function sum adds up the
elements of the input vector. In essence, this metric captures
the future potential audience of sdafter prediction.
We also assess whether there is any bias in our correct pre-
dictions towards more (less) popular objects by computing the
correlation between the total popularity and the remaining in-
terest after prediction for each object. A low correlation implies
no bias, while a strong positive (negative) correlation implies a
bias towards earlier predictions for more (less) popular objects.
We argue that, if any bias exists, a bias towards more popular
objects is preferred, as it implies larger remaining interests for
those objects. We use both the Pearson linear correlation coef-
ficient (ρp) and the Spearman’s rank correlation coecient (ρs)
[16], as the latter does not assume linear relationships, taking
the logarithm of the total popularity first due to the great skew
in their distribution [4, 7, 11].
5.2. Datasets
As case study, we focus on YouTube videos and use two-
datasets, analyzed in [11] and publicly available11. The Top
11http://vod.dcc.ufmg.br/traces/youtime/
8
Algorithm 3 Our Solution: Trend Extraction and Prediction
1: function TrendExtraction(Dtrain)
2: k1
3: while βCV is not stable do
4: kk+1
5: CDKS C (Dtrain,k)
6: end while
7: Store centroids in CD
8: end function
9: function TrendLearner(CD,Dtrain,Dt est )
10: θ,γ,Ptrain LearnParams(Dtrain ,CD)
11: T rainE RT ree(Dtrain ,Ptrain Sobj. feats)
12: t,PMultiCla ssProbs(Dtest ,CD,θ,γ)
13: return t,PredictE RT ree(Dtest ,PSobj. feats)
14: end function
TrendExtraction
(KSC)
Pop. Time
Series (train)
LearnParams
TrainClassifier
(ERTree)
Obj. Features
(train)
MultiClass
Probs
UseClassifier
(ERTree)
Obj. Features
(test)
Pop. Time
Streams (test)
Prediction
Results
TrendLearner
Figure 3: Pictorial Representation of Our Solution
dataset consists of 27,212 videos from the various top lists
maintained by YouTube (e.g., most viewed and most com-
mented videos), and the Random topics dataset includes
24,482 videos collected as results of random queries submitted
to YouTube’s API12.
For each video, the datasets contain the following features
(shown in Table 3): the time series of the numbers of views,
comments and favorites, as well as the ten most important re-
ferrers (incoming links), along with the date that referrer was
first encountered, the video’s upload date and its category. The
original datasets contain videos of various ages, ranging from
days to years. We choose to study only videos with more than
100 days for two reasons. First, these videos tend to have their
long term time series popularity more stable. Second, the KSC
algorithm requires that all time series vectors sdhave the same
dimension n. Moreover, the popularity time series provided
by YouTube contains at most 100 points, independently of the
video’s age. Thus, by focusing only on videos with at least 100
days of age, we can use nequal to 100 for all videos. After fil-
tering younger videos out, we were left with 4,527 and 19,562
videos in the Top and Random datasets, respectively.
Table 4 summarizes our two datasets, providing mean µand
standard deviation σfor the number of views, age (in days),
12We do not claim this dataset is a random sample of YouTube videos. Nev-
ertheless, for the sake of simplicity, we use the term Random videos to refer to
videos from this dataset.
Table 3: Summary of Features
Class Feature Name Type
Video Video category Categorical
Upload date Numerical
Video age Numerical
Time window size (w) Numerical
Referrer Referrer first date Numerical
Referrer # of views Numerical
Popularity
# of views Numerical
# of comments Numerical
# of favorites Numerical
change rate of views Numerical
change rate of comments Numerical
change rate of favorites Numerical
Peak fraction Numerical
and time window duration w13. Note that both average and me-
dian window durations are around or below one week. This is
important as previous work [2] pointed out that eective popu-
larity growth models can be built based on weekly views.
13wis equal to the video age divided by 99 as the first point in the time series
corresponds to the day before the upload day.
9
Table 4: Summary of analyzed datasets
Top Random
µ σ µ σ
# of Views 4,022,634 9,305,996 141,413 1,828,887
Video Age (days) 632 402 583 339
Window w(days) 6.38 4.06 5.89 3.42
6. Experimental Results
In this section, we present our results of our trend extraction
(Section 6.1) and trend prediction (Section 6.2) approaches. We
also show how TrendLearner can be used to improve the accu-
racy of state-of-the-art popularity prediction models (Section
6.3). These results were computed using 5-fold cross valida-
tion, i.e., splitting the dataset Dinto 5 folds, where 4 are used
as training set Dtrain and one as test set Dte st, and rotating the
folds such that each fold is used for testing once. As discussed
in Section 4, trends are extracted from Dtrain and predicted for
videos in Dtest .
Since we are dealing with time series, one might argue that
a temporal split of the dataset into folds would be preferred to
a random split, as we do here. However, we choose a random
split because of the following. Regarding the object features
used as input to the prediction models, no temporal precedence
is violated, as the features are computed only during the mon-
itoring period tr,before prediction. All remaining features are
based on the distances between the popularity curve of the ob-
ject until trand the class centroids (or trends). As we argue be-
low, the same trends/centroids found in our experiments were
consistently found in various subsets of each dataset, covering
various periods of time. Thus, we expect the results to remain
similar if a temporal split is done. However, a temporal split of
our dataset would require interpolations in the time series, as all
of them have exactly 100 points regardless of video age. Such
interpolations, which are not required in a random split, could
introduce serious inaccuracies and compromise our analyses.
6.1. Trend Extraction
Recall that we used the βCV metric to determine the number
of trends kused by the KSC algorithm. In both datasets, we
found kto be stable after 4 trends. We also checked centroids
and class members for larger values of k, both visually and us-
ing other metrics (as in [30]), finding no reason for choosing a
dierent value14. Thus, we set k=4. We also analyzed the
centroids in all training sets, finding that the same 4 shapes ap-
peared in every set. Thus, we manually aligned classes based
on their centroid shapes in dierent training sets so that class i
is the same in every set. We also found that, in 95% of the cases,
a video was always assigned to the same class in dierent sets.
Figure 4 shows the popularity trends discovered in the Ran-
dom dataset. Similar trends were also extracted from the Top
14A possible reason would be the appearance of a new distinct class, which
did not happen.
dataset. Each graph shows the number of views as function
of time, omitting scales as centroids are shape and volume in-
variants. The y-axes are in log scale to highlight the impor-
tance of the peak. We note that the KSC algorithm consistently
produced the same popularity trends for various randomly se-
lected samples of the data, which are also consistent with simi-
lar shapes identified in other datasets [7, 30]. We also note that
the 4 identified trends might not perfectly match the popularity
curves of all videos, as there might be variations within each
class. However, our goal is not to perfectly model the popu-
larity evolution of all videos. Instead, we aim at capturing the
most prevalent trends, respecting time shift and volume invari-
ants, and using them to improve popularity prediction. As we
show in Section 6.3, the identified trends can greatly improve
state-of-the-art prediction models.
Table 5 presents, for each class, the percentage of videos be-
longing to it, as well as the average number of views, average
change rate15, and average fraction of views at the peak time
window of these videos. Note that class D0consists of videos
that remain popular over time, as indicated by the large posi-
tive change rates, shown in Table 5. This behavior is specially
strong in the Top dataset, with an average change rate of 1,112
views per window, which corresponds to roughly a week (Ta-
ble 4). Those videos also have no significant popularity peak, as
the average fraction of views in the peak window is very small
(Table 5). The other three classes are predominantly defined by
a single popularity peak, and are distinguished by the rate of
decline after the peak: it is slower in D1, faster in D2, and very
sharp in D3. These classes also exhibit very small change rates,
indicating stability after the peak.
We also measured the distribution of dierent types of refer-
rers and video categories across classes in each dataset. Under
a Chi-square test with significance of .01, we found that the dis-
tribution diers from that computed for the aggregation of all
classes, implying that these features are somewhat correlated
with the class, and motivating their use to improve trend classi-
fication.
6.2. Trend Prediction
We now discuss our trend prediction results, which are av-
erages of 5 test sets along with corresponding 95% confidence
intervals. We start by showing results that support our approach
of computing class membership probabilities using only cen-
troids as opposed to all class members, as in [5] (Section 6.2.1).
15Defined by the average of pd,i+1pd,ifor each video drepresented by
vector sd=<pd,1,pd,2,· · · ,pd,n>.
10
Figure 4: Popularity Trends in YouTube Datasets.
Table 5: Summary of popularity trends (classes)
D0D1D2D3
Top Dataset
% of Videos 22% 29% 24% 25%
Avg. # of Views 711,868 6,133,348 1,440,469 1,279,506
Avg. Change Rate in # Views 1112 395 51 67
Avg. Peak Fraction 0.03 0.04 0.19 0.40
Random Dataset
% of Videos 21% 34% 26% 19%
Avg. # of Views 305,130 108,844 64,274 127,768
Avg. Change Rate in # Views 47 7 4 4
Avg. Peak Fraction 0.03 0.03 0.08 0.28
Table 6: Classification using only centroids vs. using all class
members: averages and 95% confidence intervals.
Monitoring Centroid Whole Training Set
period trMicro F1 Macro F1 Micro F1 Macro F1
1 window .24 ±.01 .09 ±.00 .29 ±.04 .11 ±.01
25 windows .56 ±.02 .52 ±.01 .53 ±.04 .44 ±.08
50 windows .67 ±.03 .65 ±.03 .64 ±.05 .57 ±.09
75 windows .70 ±.02 .68 ±.02 .69 ±.08 .61 ±.12
We then evaluate our TrendLearner method, comparing it with
three alternative approaches (Section 6.2.2).
6.2.1. Are shapelets better than a reference dataset?
We here discuss how the use of centroids to compute class
membership probabilities (Equation 2) compare to using all
class members [5]. For the latter, the probability of an object
belonging to a class is proportional to a summation over the
exponential of the (negative) distance between the object and
every member of the given class.
An important benefit of our approach is a reduction in run-
ning time: for a given object, it requires computing the dis-
tances to only ktime series, as opposed to the complete training
set |Dtrain |, leading to a reduction in running time by a factor of
|Dtrain|
k, as discussed in Section 4.1. We here focus on the clas-
sification eectiveness of the probability matrix Pproduced by
both approaches. To that end, we consider a classifier that as-
signs the class with largest probability to each object, for both
matrices.
Table 6 shows Micro and Macro F1 results for both ap-
proaches, computed for fixed monitoring periods tr(in num-
ber of windows) to facilitate comparison. We show results only
for the Top dataset, as they are similar for the Random dataset.
Note that, unless the monitoring period is very short (tr=1),
both strategies produce statistically tied results, with 95% con-
fidence. Thus, given the reduced time complexity, using cen-
troids only is more cost-eective. When using a single window
11
Table 7: Best values for vector parameters γand θ(averages
and 95% confidence intervals) for the Top dataset
Top Dataset
D0D1D2D3
θ.250 ±.015 .257 ±.001 .272 ±.003 .303 ±.006
γ28 ±16 89 ±8 5 ±0.9 3 ±0.5
Table 8: Best values for vector parameters γand θ(averages
and 95% confidence intervals) for the Random dataset
Random Dataset
D0D1D2D3
θ.250 ±.001 .251 ±.001 .269 ±0.001 .317 ±0.001
γ33 ±0.6 74 ±2 45 ±9 17 ±3
both approaches are worse than random guessing (Macro F1 =
0.25), and thus are not interesting.
6.2.2. TrendLearner Results
We now compare our TrendLearner method with three other
trend prediction methods, namely: (1) P only: assigns the class
with largest probability in Pto an object; (2) P+ERTree:
trains an extremely randomized trees learner using Ponly as
features; (3) ERTree: trains an extremely randomized trees
learner using only the object features in Table 3. Note that
TrendLearner combines ERTree and P+ERTree. Thus, a com-
parison of these four methods allows us to assess the benefits of
combining both sets of features.
For all methods, when classifying a video d, we only con-
sider features of that video available up until t[d], the time win-
dow when TrendLearner stopped monitoring d. We also use the
same best values for parameters shared by the methods, chosen
as discussed in Section 4.3. Both Tables 7 (for the Top dataset)
and 8 (for the Random dataset), show the best values of vec-
tor parameters γand θ, selected considering a Macro-F1 of at
least 0.5 as performance target (see Section 4.3). These results
are averages across all training sets, along with 95% confidence
intervals. The variability is low in most cases, particular for
θ. Recall that γmax is set to 100. Regarding the extremely ran-
domized trees classifier, we set the size of the ensemble to 20
trees, and the feature selection strength equal to the square root
of the total number of features, common choices for this clas-
sifier [12]. We then apply cross-validation within the training
set to choose the smoothing length parameter (nmin), consider-
ing values equal to {1,2,4,8,16,32}. We refer to [12] for more
details on the parametrization of extremely randomized trees.
Still analyzing Tables 7 and 8, we note that classes with
smaller peaks (D0and D1) need longer minimum monitoring
periods γi, likely because even small fluctuations may be con-
fused as peaks due to the scale invariance of the distance metric
used (Equation 1)16. However, after this period, it is somewhat
16Indeed, most of these videos are wrongly classified into either D2or D3
Table 9: Comparison of trend prediction methods for both
datasets (averages and 95% confidence intervals) for the Top
dataset
Top Dataset
Ponly P+ERTree ERTree TrendLearner
Micro F1 .48 ±.06 .48 ±.06 .58 ±.01 .62 ±.01
Macro F1 .44 ±.06 .44 ±.06 .57 ±.01 .61 ±.01
Table 10: Comparison of trend prediction methods for both
datasets (averages and 95% confidence intervals) for the Ran-
dom dataset
Random Dataset
Ponly P+ERTree ERTree TrendLearner
Micro F1 .67 ±.02 .62 ±.01 .65 ±.01 .71 ±.01
Macro F1 .69 ±.02 .63 ±.01 .63 ±.01 .70 ±.01
easier to determine whether the object belongs to one of those
classes (smaller values of θi). In contrast, classes with higher
peaks (D2and D3) usually require shorter monitoring periods,
particularly in the Top dataset, where videos have popularity
peaks with larger fractions of views (Table 5). Indeed, by cross-
checking results in Tables 5, 7 and 8, we find that classes with
smaller fractions of videos in the peak window (D0and D1in
Top, and D0,D1and D2in Random) tend to require longer
minimum monitoring periods so as to avoid confusing small
fluctuations with peaks from the other classes.
We now discuss our classification results, focusing first on
the Micro and Macro F1 results, shown in Table 9 and Ta-
ble 10, for the Top and Random datasets respectivelly. From
both tables we can see that TrendLearner consistently outper-
forms all other methods in both datasets and on both metrics,
except for Macro F1 in the Random dataset, where it is sta-
tistically tied with the second best approach (Ponly). In con-
trast, there is no clear winner among the other three methods
across both datasets. Thus, combining probabilities and object
features brings clear benefits over using either set of features
separately. For example, in the Top dataset, the gains over the
alternatives in average Macro F1 vary from 7% to 38%, whereas
the average improvements in Micro F1 vary from 7% to 29%.
Similarly, in the Random dataset, gains in average Micro and
Macro F1 reach up to 14% and 11%, respectively. Note that
TrendLearner performs somewhat better in the Random dataset,
mostly because videos in that dataset are monitored for longer,
on average (larger values of γi). However, this superior results
comes with a reduction in remaining interest after prediction,
as we discuss below.
We note that the joint use of both probabilities and object
features renders TrendLearner more robustness to some (hard-
to-predict) videos. Recall that, as discussed in Section 4.2.1,
for shorter monitoring periods.
12
Algorithm 2 may, in some cases, return a probability equal to 0
to indicate that a prediction was not possible within the maxi-
mum monitoring period allowed. Indeed, this happened for 1%
and 10% of the videos in the Top and Random datasets, respec-
tively, which have popularity curves that do not closely follow
any of the extracted trends. The results for the Ponly and P
+ERTree methods shown in Tables 9 and 10 do not include
such videos, as these methods are not able to do predictions for
them (since they rely only on the probabilities). However, both
ERTree and TrendLearner are able to perform predictions for
such videos by exploiting the object features, since at least the
video category and upload date are readily available as soon as
the video is posted. Thus, the results of these two methods in
Tables 9 and 10 contemplate the predictions for all videos17.
We now turn to the other side of the tradeoand discuss how
early the predictions are made. These results are the same for
all four aforementioned methods as all of them use the predic-
tion time returned by TrendLearner. For all correctly classified
videos, we report the remaining interest RI after prediction, as
well as the Pearson (ρp) and Spearman (ρs) correlation coe-
cients between remaining interest and (logarithm of) total pop-
ularity (i.e., total number of views), as informed in our datasets.
Figure 5(a) shows the complementary cumulative distribu-
tion of the fraction of RI after prediction for both datasets, while
Figures 5(b) and 5(c) (log scale on the y-axis) show the total
number of views and the RI for each video in the Top and Ran-
dom datasets, respectively. All three graphs were produced for
the union of the videos in all test sets. Note that, for 50% of the
videos, our predictions are made before at least 68% and 32%
of the views are received, for Top and Random videos, respec-
tively. The same RI of at least 68% of views is achieved for
21% of videos in the Random dataset. In general, for a signifi-
cant number of videos in both datasets, our correct predictions
are made before a large fraction of their views are received, par-
ticularly in the Top dataset.
We also point out a great variability in the duration of the
monitoring periods produced by our solution: while only a few
windows are required for some videos, others have to be mon-
itored for a longer period. Indeed, the coecients of variation
of these monitoring periods are 0.54 and 1.57 for the Random
and Top datasets, respectively. This result emphasizes the need
for choosing a monitoring period on a per-object basis, a novel
aspect of our approach, and not use the same fixed value.
Moreover, the scatter plots in Figures 5(b-c) show that some
moderately positive correlations exist between the total number
of views and RI. Indeed, ρpand ρsare equal to 0.42 and 0.48,
respectively, in the Top dataset, while both metrics are equal to
0.39 in the Random dataset. Such results imply that our solu-
tion is somewhat biased towards more popular objects, although
the bias is not very strong. In other words, for more popular
videos, TrendLearner is able to produce accurate predictions by
17For the cases with probability equal to 0, the predictions of TrendLearner
and ERTree were made with tr=γmax , when Algorithm 2 stops. Since we set
γmax =100, those predictions were made at the last time window, using all avail-
able information to compute object features. Nevertheless, note that, in those
cases, the remaining interest (RI) after prediction is equal to 0.
potentially observing a smaller fraction of their total views, in
comparison with less popular videos. This is a nice property,
given that such predictions can drive advertisement placement
and content replication/organization decisions which are con-
cerned mainly with the most popular objects.
6.3. Applicability to Regression Models
Motivated by results in [25, 29], which showed that know-
ing popularity trends beforehand can improve the accuracy of
regression-based popularity prediction models, we here assess
whether our trend predictions are good enough for that purpose.
To that end, we use the state-of-the-art ML and MRBF regres-
sion models proposed in [25]. The former is a multivariate lin-
ear regression model that uses the popularity acquired by the
object don each time window up to a reference date tr(i.e.,
pd,i,i=1...tr) to predict its popularity at a target date tt=tr+δ.
The latter extends the former by including features based on Ra-
dial Basis Functions (RBFs) to measure the similarity between
dand specific examples, previously selected from the training
set.
Our goal is to evaluate whether our trend prediction results
can improve these models. Thus, as in [25], we use the mean
Relative Squared Error (mRSE) to assess the prediction accu-
racy of the ML and MRBF models in two settings: (1) a general
model, trained using the whole dataset (as in [25]); (2) a spe-
cialized model, trained for each predicted class. For the latter,
we first use our solution to predict the trend of a video. We
then train ML and MRBF models considering as reference date
each value of t[d]produced by TrendLearner for each video d.
Considering a prediction lag δequal to 1, 7, and 15, we mea-
sure the mRSE of the predictions for target date tt=t[d]+δ.
We also compare our specialized models against the state-space
models (SSMs) proposed in [26]. These models are variations
of a basic state-space Holt-Winters model that represent query
and click frequency in Web search, capturing various aspects
of popularity dynamics (e.g., periodicity, bursty behavior, in-
creasing trend). All of them take as input the popularity time
series during the monitoring period tr. Thus, though originally
proposed for the Web search domain, they can be directly ap-
plied to our context. Both regression and state-space models
are parametrized as originally proposed18.
Table 11 shows average mRSE for each model along with
95% confidence intervals, for all datasets and prediction lags.
Comparing our specialized models and the original ones they
build upon, we find that using our solution to build trend-
specific models greatly improves prediction accuracy, partic-
ularly for larger values of δ. The reductions in mRSE vary from
10% to 77% (39%, on average) in the Random dataset, and from
11% to 64% (33%, on average) in the Top dataset19 . The spe-
cialized models also greatly outperform the state-space mod-
els: the reductions in mRSE over the best state-space model are
18The only exception is the number of examples used to compute similarities
in the MRBF model: we used 50 examples, as opposed to the suggested 100
[25], as it led to better results in our datasets.
19The only exception is the MRBF model for δ=1 in the Top dataset, where
general and specialized models produce tied results.
13
0.0 0.2 0.4 0.6 0.8 1.0
Fraction of Remaining Views - x
0.0
0.2
0.4
0.6
0.8
1.0
P(X > x)
Top
Rand
(a) Remaining Interest (RI)
0.0 0.2 0.4 0.6 0.8 1.0
Remaining Interest
101
102
103
104
105
106
107
108
109
Total Views
Top (ρp= 0.42,ρs= 0.48)
(b) Total Views vs. RI (Top)
0.0 0.2 0.4 0.6 0.8 1.0
Remaining Interest
100
101
102
103
104
105
106
107
108
Total Views
Random (ρp= 0.39,ρs= 0.39)
(c) Total View vs. RI (Random)
Figure 5: Remaining Interest (RI) and Correlations Between Popularity and RI for Correctly Classified Videos.
Table 11: Mean Relative Squared Error Various Prediction Models and Lags δ(averages and 95% confidence intervals)
Prediction Model Top Dataset Random Dataset
δ=1δ=7δ=15 δ=1δ=7δ=15
generalML .09 ±.005 .42 ±.02 .75 ±.04 .01 ±.001 .06 ±.005 .11 ±.01
generalMRBF .08 ±.005 .52 ±.05 1.29 ±.17 .01 ±.001 .1±.01 .26 ±.03
best SSM .76 ±.01 .63 ±.02 .64 ±.03 .90 ±.002 .69 ±.005 .54 ±.006
specializedML .08 ±.005 .27 ±.01 .38 ±.02 .009 ±.001 .04 ±.0003 .06 ±.003
specializedMRBF .08 ±.005 .32 ±.04 .47 ±.08 .009 ±.001 .04 ±.0004 .06 ±.008
at least 89% and 27% in the Random and Top datasets (94%
and 59%, on average). These results oer strong indications of
the usefulness of our trend predictions for predicting popularity
measures.
Finally, it is important to discuss why the state-space models
did not work well in our context. The main reason we found
was that Holt-Winters based models can only capture the lin-
ear trends in time series, that is, linear growth and decay. By
using the KSC distance function, we can identify and group
UGC time series with non-linear trends [22, 30], and create
specific prediction models for these cases. Also, these mod-
els are trained independently for each target object, using early
points of the time series. Another possible reason for the low
performance in our context might be that, unlike in [26] where
the models were trained with hundreds of points of each time
series, we here use much less data (only points up to t[d]).
7. Conclusions
In this article, we have identified and formalized a new re-
search problem. To the extent of our knowledge, we are the
first work to tackle the problem of early prediction of popular-
ity trends in UGC. We were motivated in studying this problem
based on our previous knowledge on the complex patterns and
causes of popularity in UGC [11]. Dierent from other kinds of
content, e.g., news, which have clear definitions of monitoring
periods, target and prediction dates for popularity, the complex
nature of UGC calls for a popularity prediction solution which
is able to determine these dates automatically. We here pro-
vided such a solution – TrendLearner.
We have also proposed a novel two-step learning approach
for early prediction of popularity trends of UGC. Moreover, we
defined new metrics for measuring the eectiveness of popu-
larity of UGC content, the remaining interest, which is opti-
mized by TrendLearner as to provide not only accurate, but also
timely, predictions. Thus, unlike previous work, we addresses
the tradeobetween prediction accuracy and remaining interest
in the content after prediction on a per-object basis.
We performed an extensive experimental evaluation of our
method, comparing it with state-of-the-art, representative solu-
tions of the literature. Our experimental results on two YouTube
datasets showed that our method not only outperforms other ap-
proaches for trend prediction (a gain of up to 38%) but also
achieves such results before 50% or 21% of videos (depend-
ing on the dataset) accumulate more than 32% of their views,
with a slight bias towards earlier predictions for more popular
videos. Moreover, when applied jointly with recently proposed
regression based models to predict the popularity of a video at a
future date, our method outperforms state-of-the-art regression
and state-space based models, with gains in accuracy of at least
33% and 59%, on average, respectively.
As future work, we plan to further investigate how dier-
14
ent types of UGC (e.g., blogs and Flickr photos) dier in their
popularity evolution as well as which factors (e.g., referrers,
content quality) impact this evolution.
Acknowledgments
This research is partially funded by the Brazilian Na-
tional Institute of Science and Technology for Web Research
(MCT/CNPq/INCT Web Grant Number 573871/2008-6), and
by the authors’ individual grants from Google, CNPq, CAPES
and Fapemig.
References
[1] M. Ahmed, S. Spagna, F. Huici, and S. Niccolini. A Peek Into the Future:
Predicting the Evolution of Popularity in User Generated Content. In
Proc. WSDM, 2013.
[2] Y. Borghol, S. Mitra, S. Ardon, N. Carlsson, D. Eager, and A. Mahanti.
Characterizing and Modeling Popularity of User-Generated Videos. Per-
formance Evaluation, 68(11):1037–1055, 2011.
[3] C. Castillo, M. El-Haddad, J. Pfeer, and M. Stempeck. Characterizing
the life cycle of online news stories using social media reactions. In Proc.
CSCW, 2014.
[4] M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. Analyzing
the Video Popularity Characteristics of Large-Scale User Generated Con-
tent Systems. IEEE/ACM Transactions on Networking, 17(5):1357–1370,
2009.
[5] G. H. Chen, S. Nikolov, and D. Shah. A Latent Source Model for Non-
parametric Time Series Classification. In Proc. NIPS, 2013.
[6] A. Coates and A. Ng. Learning Feature Representations with K-Means.
Neural Networks: Tricks of the Trade, pages 561–580, 2012.
[7] R. Crane and D. Sornette. Robust Dynamic Classes Revealed by Mea-
suring the Response Function of a Social System. Proceedings of the
National Academy of Sciences, 105(41):15649–53, 2008.
[8] Q. Duong, S. Goel, J. Hofman, and S. Vassilvitskii. Sharding social net-
works. In Proc. WSDM, Feb. 2013.
[9] S. D ˇ
zeroski and B. ˇ
Zenko. Is Combining Classifiers with Stacking Better
than Selecting the Best One? Machine Learning, 54(3):255–273, 2004.
[10] F. Figueiredo, J. Almeida, and M. Gonc¸alves. Improving the Eectiveness
of Content Popularity Prediction Methods using Time Series Trends. In
Proc. ECML/PKDD Predictive Analytics Challenge Workshop, 2014.
[11] F. Figueiredo, F. Benevenuto, M. Gonc¸alves, and J. Almeida. On the
Dynamics of Social Media Popularity: A YouTube Case Study. ACM
Trans. Internet Technol., 14(4):24:1–24:23, 2014.
[12] P. Geurts, D. Ernst, and L. Wehenkel. Extremely Randomized Trees.
Machine Learning, 63(1):3–42, 2006.
[13] P. Gill, V. Erramilli, A. Chaintreau, B. Krishnamurthy, D. Papagiannaki,
and P. Rodriguez. Follow the Money: Understanding Economics of On-
line Aggregation and Advertising. In Proc. IMC, 2013.
[14] N. G. Golbandi, L. K. Katzir, Y. K. Koren, and R. L. Lempel. Expediting
Search Trend Detection via Prediction of Query Counts. In Proc. WSDM,
2013.
[15] Q. Hu, G. Wang, and P. S. Yu. Deriving Latent Social Impulses to Deter-
mine Longevous Videos. In Proc. WWW, 2014.
[16] R. Jain. The Art of Computer Systems Performance Analysis: Techniques
for Experimental Design, Measurement, Simulation, and Modeling. Wi-
ley, 1991.
[17] L. Jiang, Y. Miao, Y. Yang, Z. Lan, and A. G. Hauptmann. Viral video
style: A closer look at viral videos on youtube. In Proc. ICMR, ICMR
’14, 2014.
[18] J. G. Lee, S. Moon, and K. Salamatian. An Approach to Model and
Predict the Popularity of Online Contents with Explanatory Factors. In
Proc. WIC, volume 1, 2010.
[19] K. Lerman and T. Hogg. Using a Model of Social Dynamics to Predict
Popularity of News. In Proc. WWW, 2010.
[20] J. Leskovec. Social Media Analytics. In Proc. WWW, 2011.
[21] H. Li, X. Ma, F. Wang, J. Liu, and K. Xu. On Popularity Prediction of
Videos Shared in Online Social Networks. In Proc. CIKM, 2013.
[22] Y. Matsubara, Y. Sakurai, B. A. Prakash, L. Li, and C. Faloutsos. Rise
and Fall Patterns of Information Diusion. In Proc. KDD., 2012.
[23] D. Menasc ´
e and V. Almeida. Capacity Planning for Web Services: Met-
rics, Models, and Methods. Prentice Hall, 2002.
[24] K. Nigam and R. Ghani. Analyzing the Eectiveness and Applicability
of Co-training. In Proc. CIKM, 2000.
[25] H. Pinto, J. Almeida, and M. Gonc¸alves. Using Early View Patterns to
Predict the Popularity of YouTube Videos. In Proc. WSDM, 2013.
[26] K. Radinsky, K. Svore, S. Dumais, J. Teevan, A. Bocharov, and
E. Horvitz. Behavioral Dynamics on the Web: Learning, Modeling, and
Prediction. ACM Transactions on Information Systems, 32(3):1–37, 2013.
[27] G. Szabo and B. A. Huberman. Predicting the Popularity of Online Con-
tent. Communications of the ACM, 53(8):80–88, 2010.
[28] A. Vakali, M. Giatsoglou, and S. Antaris. Social Networking Trends
and Dynamics Detection via a Cloud-Based Framework Design. In Proc.
WWW, 2012.
[29] J. Yang and J. Leskovec. Modeling Information Diusion in Implicit
Networks. In Proc. ICDM, 2010.
[30] J. Yang and J. Leskovec. Patterns of Temporal Variation in Online Media.
In Proc. WSDM, 2011.
[31] L. Ye and E. Keogh. Time Series Shapelets: A Novel Technique that
Allows Accurate, Interpretable and Fast Classification. Data Mining and
Knowledge Discovery, 22(1-2):149–182, 2011.
[32] P. Yin, P. Luo, M. Wang, and W.-C. Lee. A Straw Shows Which Way
the Wind Blows: Ranking Potentially Popular Items from Early Votes. In
Proc. WSDM, 2012.
[33] H. Yu, L. Xie, and S. Sanner. Exploring the Popularity Phases of YouTube
Videos: Observations, Insights, and Prediction. In Proc. ICWSM, 2015.
[34] D. Zeng, H. Chen, R. Lusch, and S.-H. Li. Social Media Analytics and
Intelligence. IEEE Intelligent Systems, 25(6):13–16, 2010.
15
... 3) Effects of Promotion. Figueiredo et al. (2016) demonstrated that popular videos are more likely to be shared and promoted, leading to a cascade effect resulting in higher view counts. However, the effects of PR/promotion are heavily influenced by external factors such as other social media platforms and word-of-mouth, and the quantitative impact remains unclear [11] 4)Video Information. ...
... Figueiredo et al. (2016) demonstrated that popular videos are more likely to be shared and promoted, leading to a cascade effect resulting in higher view counts. However, the effects of PR/promotion are heavily influenced by external factors such as other social media platforms and word-of-mouth, and the quantitative impact remains unclear [11] 4)Video Information. We classified video titles, descriptions, and thumbnails under video information. ...
Article
Full-text available
p class="ICST-abstracttext"> This study examines the influence of title length and the number of words in titles on the view counts of YouTube videos, with a focus on optimizing title characteristics to enhance viewer engagement. Using a dataset of 12,466 Microsoft Excel tutorial videos published between January 2006 and November 2023, the analysis identifies significant patterns in title characteristics that affect viewership. The results reveal a bell-shaped relationship, with both mean and median view counts peaking at specific title lengths and word counts, emphasizing the importance of striking a balance between informativeness and brevity. Furthermore, the study highlights the critical role of visual information (character count) compared to semantic information (word count) in driving video engagement. By providing actionable insights, this research offers practical, cost-efficient guidelines for creators to optimize their video titles, ultimately contributing to more effective content strategies on digital platforms. </p
... The format of Eq. (7) covers the content churn model with constant popularity used in [39] as well as the Markov popularity model in [107]. Studies on web request pattern [47][96] [119] confirm object popularity profiles characterized by a fast ramp up to a maximum, followed by a slow, long-lasting decrease. For modeling of such behavior, we can set p k (r+1) = p k (r) (1 -) for k = 1, …, N (r) , if a new object enters at the r th request with initial request rate . ...
... Instead of a start for new objects at the maximum request rate, the bound of Eq. (7) is also adaptive to more general rate profiles [47][96] [119] via Markovian rate models, similar to shot noise models. Time based request rate modeling [45] [99] is addressed in more detail in Section 6 on TTL caching. ...
Preprint
We survey analytical methods and evaluation results for the performance assessment of caching strategies. Knapsack solutions are derived, which provide static caching bounds for independent requests and general bounds for dynamic caching under arbitrary request pattern. We summarize Markov- and time-to-live-based solutions, which assume specific stochastic processes for capturing web request streams and timing. We compare the performance of caching strategies with different knowledge about the properties of data objects regarding a broad set of caching demands. The efficiency of web caching must regard benefits for network wide traffic load, energy consumption and quality-of-service aspects in a tradeoff with costs for updating and storage overheads.
... [19] utilized sentiment analysis of the You-Tube comments related to popular topics using machine learning algorithms to identify trends, seasonality, and forecasts of the topics. [13] defined new metrics and proposed a novel two-step learning approach for early prediction of popularity trends of user-generated content by proposing TrendLearner framework on YouTube platform. [16] investigated the sensitivity of YouTube meta-level features and social dynamics to estimate the popularity of videos using machine learning methods. ...
... Do not focus on analysis and utilization of trending YouTube videos. [9,13,14,28] Proposed methods for early prediction of video popularity. Do not consider using video tags and statistical features of the videos. ...
Article
Full-text available
YouTube is the most popular video content platform which provides easy and fast accessibility, huge number of videos, qualified and large number of content producers, and wide range of users. Based on these advantages, YouTube datasets have a big data nature in terms of data analytics. Analyzing YouTube big datasets is essential for discovering user-video relations, video recommendation, semantic analysis of video comments and trending videos analysis. However, YouTube big data analysis has several challenges, such as video content issues, textual and semantic challenges, different metadata information about videos, and big data nature of YouTube datasets. In the literature, several studies are performed for sentiment analysis of YouTube video comments, video recommendation methods, and trending video analyses approaches. In this study, a new method is proposed for popular and persistent tags discovery which uses YouTube trending video dataset of United States for the year of 2021. A new algorithm is proposed, named as Popular and Persistent Tag Discovery algorithm (PPTagD algorithm), which uses proposed method. The proposed algorithm is experimentally evaluated on the dataset. The experimental results show the effectiveness of the proposed algorithm on discovering popular and persistent tags. The results reveal the tendency of United States YouTube users in terms of video tag popularity.
... Além de [123], outros trabalhos como [99,34] seguem linha similar, estudando a previsão de popularidade de vídeos publicados. De forma geral, essas abordagens se baseiam na avaliação de características relativas ao conteúdo. ...
Thesis
Full-text available
Internet live streaming has reached large audiences. However, with the rise in popularity, fewer infrastructure resources become available to meet each user performance require- ments. In other words, it becomes harder to conciliate high transmission performance and transmission scale growth. One of the approaches to reacting to resource constraints and maintaining a mini- mum client transmission performance is the use of adaptation mechanisms, which adjusts the bitrate to the client device type and bandwidth. This dynamic bitrate adaptation reduces the probability of reproduction stalls, which have a negative perception by the users. On the other hand, the content provider needs to keep the system available for new users and uses for this goal resource allocation plans, which reduces, when necessary, the bitrate of its clients. The bitrate reduction allows the entrance of new clients. However, it can produce a negative impact on the current users. As a result, they end by abandoning their sessions. In other words, the video bitrate reduction leads to a user engagement reduction. The- refore, there is a conflict of interest where the user always wants the maximum possible bitrate, and the content provider wants to maximize user engagement in both the number of users and client session duration, which may require client bitrate reduction. Based on this perspective, this thesis has as main objectives: (1) to contribute to the current literature concerning the relationship between client transmission perfor- mance and engagement. This knowledge allows the creation of engagement and client behavior models that help content providers in infrastructure planning and performance monitoring, and (2) to explore, through these models, resource allocation alternatives to achieve a better tradeoff between user and content provider interests, that is, to increase resource saving in provider while it preserves engagement of the current users. The path to reaching this better tradeoff is the creation of personalized resource limitations that considers each client’s transmission performance requirements. This thesis map these expressed objectives in four research questions as follows: (1) to characterize client transmission performance in large-scale live streaming and the cor- relation of this performance with user engagement; (2) to develop a client behavior model that considers the impact of the client transmission performance on user engagement and the client adaptation regime; (3) to develop engagement descriptive and predictive models for active monitoring of the engagement in live video streaming a and (4) to project a mechanism for content provider resource allocation that evaluates various allocation scenarios to choose the most suitable for each client individually to preserve user engagement and reduce content provider resource consumption. The main contributions of this thesis are: (1) a characterization of client trans- mission performance in a large-scale event with millions of simultaneous users and an evaluation of the impact of this performance on user engagement. We propose a con- cept of performance scenarios that show that the tolerance to a variation in a particular performance metric varies depending on the value of other performance metrics. For example, the rise in the client bitrate increases user engagement only if the stall and adaptation rates are low. We also addressed the impact of contextual factors on perfor- mance metrics and engagement. We found that the device type, platform, internet service provider, and transmission period influence client transmission performance and engage- ment. Besides this, we also investigated the impact of transmission scale on transmission performance. We verify that the transmission infrastructure applied bitrate limitations to deal with heavy workloads; (2) the creation and validation of a performance-aware client behavior model. This model revealed that client transmission performance impacts user engagement (permanence, time between sessions, and the number of sessions) and client adaptation regime; (3) the creation of descriptive and predictive engagement mo- dels that advance the state-of-the-art concerning precision and accuracy. These models introduce a new approach to describe client performance. Instead of using classical per- formance metrics like stall and adaptation rate, we used the client adaptation regime, stored in a transition matrix, associated with performance scenarios. Using this strategy, we constructed specialized models capable of reaching high accuracy in different client transmission performance levels. More specifically, the descriptive model has reached ac- curacy nearly 90% against 65% of the engagement model that uses classical performance metrics. The predictive model, in turn, obtained an 80% acuraccy in association with performance scenarios; and (4) The proposition of a resource allocation mechanism that considers the impact of the adaptation decisions on user engagement. We use the preser- vation of user engagement to guide allocation decisions which ensures, at the same time, user performance requirements and the reduction of client resource consumption. Using trace-based simulation, we verified an average gain of 100% in user engagement and a rise of hundreds of clients every minute. Considering the resource-saving mode, we registered a reduction of hundreds of gigabytes in bandwidth usage, with an impact of 0.4% in the original engagement.
... As the cached content is delivered inside the own ISP, the content provider could not get access to the user's profile. An interesting approach is proposed by Figueiredo et al [37]. ...
... A language that expresses emotion can be a powerful tool for gaining popularity, but it is not the only factor that plays a role in the relationship between the emotion expressed and the change in popularity. There are many factors that influence popularity, such as the content published by the entity, whether it includes images (Carrasco-Polaino et al. 2018;Carrasco Polaino et al. 2019;Li and Xie 2020), the social context (such as the characteristics of the entity, its social environment, and the area of influence), the accessibility of the content, and external factors such as a link to the content from another site (Figueiredo et al. 2014(Figueiredo et al. , 2016. Analyzing all the factors that influence popularity and their extent can be challenging, especially in dynamic online social media networks that generate unprecedented amounts of data and change rapidly (Gundecha and Liu 2012). ...
Article
Full-text available
4.59 billion people worldwide use social networks. Over a billion new posts are uploaded every day to Facebook’s applications (Facebook, Instagram, WhatsApp). Social networks play a central role in consumption and marketing, politics, and social aspects. They affect users' decision-making processes, emotions, and behavior. Content on social media can include, for example, movies, pictures, and texts. The emotion expressed in the post content has an influence on the exposed individuals. Emotions can cause the user exposed to the content to follow another user, view more content raised by a certain user, share a post, or express support (or lack of support), thus affecting the popularity of the user. In this work, we examined the relationship between emotions expressed by a character and popularity measures. As a case study, we analyzed the texts presented in the popular series “Friends” over ten seasons. We found that women in the series express more emotions in general and, in particular, more emotions of anticipation, joy, trust, and fear. The findings show the relationship between different emotions expressed in the content and various popularity measures.
... Traditional popularity prediction methods are based on the description of information transmission process, mainly including time series-based methods [10][11][12][13], point process theory-based methods [14][15][16] and infectious disease modelbased methods [17][18][19][20][21]. Among them, the first one, time series-based methods, the research object of which is the change value of information popularity over time, aiming to predict the future popularity by mining its evolution law over time. ...
Article
Full-text available
Social media popularity prediction refers to using multi-modal content to predict the popularity of a post offered by an internet user. It is an effective way to explore advanced forecasting trends and make more popularity-sensitive strategic decisions for the future. Existing methods attempt to explore various multi-model features to solve this task, which only focus on local information, lacking global understanding for the post’s content. In this paper, we propose social media popularity prediction with caption (SMPC), a novel architecture that integrates the caption as the global representation into the existing multi-model-feature-based popularity prediction method. To make good use of the generated captions, we process them in word-level, sentence-level and length-level ways, obtaining three kinds of caption features. To incorporate caption features, we exploit seven variants of the architecture by concatenating features in all the possible manners, for the feature fusion and training different combinations for the CatBoost regression. Extensive experiments are conducted on Social Media Prediction Dataset (SMPD) and show that the proposed approaches can achieve competing results against state-of-the-art models.
Article
User-Generated Content (UGC) is increasingly becoming prevalent on various digital platforms. The content generated on social media, review forums, and question-answer platforms impacts a larger audience and influences their political, social, and other cognitive abilities. Traditional credibility assessment mechanisms involve assessing the credibility of the source and the text. However, with the increase in how user content can be generated and shared (audio, video, images), multimodal representation of User-Generated Content has become increasingly popular. This paper reviews the credibility assessment of UGC in various domains, particularly identifying fake news, suspicious profiles, and fake reviews and testimonials, focusing on both textual content and the source of the content creator. Next, the concept of multimodal credibility assessment is presented, which also includes audio, video, and images in addition to text. After that, the paper presents a systematic review and comprehensive analysis of work done in the credibility assessment of UGC considering multimodal features. Additionally, the paper provides extensive details on the publicly available multimodal datasets for the credibility assessment of UGC. In the end, the research gaps, challenges, and future directions in assessing the credibility of multimodal user-generated content are presented.
Article
Studying information diffusion in SNS (Social Networks Service) has remarkable significance in both academia and industry. Theoretically, it boosts the development of other subjects such as statistics, sociology, and data mining. Practically, diffusion modeling provides fundamental support for many downstream applications (e.g., public opinion monitoring, rumor source identification, and viral marketing). Tremendous efforts have been devoted to this area to understand and quantify information diffusion dynamics. This survey investigates and summarizes the emerging distinguished works in diffusion modeling. We first put forward a unified information diffusion concept in terms of three components: information, user decision, and social vectors, followed by a detailed introduction of the methodologies for diffusion modeling. And then, a new taxonomy adopting hybrid philosophy (i.e., granularity and techniques) is proposed, and we made a series of comparative studies on elementary diffusion models under our taxonomy from the aspects of assumptions, methods, and pros and cons. We further summarized representative diffusion modeling in special scenarios and significant downstream tasks based on these elementary models. Finally, open issues in this field following the methodology of diffusion modeling are discussed.
Article
Full-text available
For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor- like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren't actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a "weighted majority voting" classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such "trending topics" in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%.
Article
Full-text available
Predicting Web content popularity is an important task for supporting the design and evaluation of a wide range of systems, from targeted advertising to effective search and recommendation services. We here present two simple models for predicting the future popularity of Web content based on historical information given by early popularity measures. Our approach is validated on datasets consisting of videos from the widely used YouTube video-sharing portal. Our experimental results show that, compared to a state-of-the-art baseline model, our proposed models lead to significant decreases in relative squared errors, reaching up to 20% reduction on average, and larger reductions (of up to 71%) for videos that experience a high peak in popularity in their early days followed by a sharp decrease in popularity.
Conference Paper
Full-text available
The large-scale collection and exploitation of personal information to drive targeted online advertisements has raised privacy concerns. As a step towards understanding these concerns, we study the relationship between how much information is collected and how valuable it is for advertising. We use HTTP traces consisting of millions of users to aid our study and also present the first comparative study between aggregators. We develop a simple model that captures the various parameters of today's advertising revenues, whose values are estimated via the traces. Our results show that per aggregator revenue is skewed (5% accounting for 90% of revenues), while the contribution of users to advertising revenue is much less skewed (20% accounting for 80% of revenue). Google is dominant in terms of revenue and reach (presence on 80% of publishers). We also show that if all 5% of the top users in terms of revenue were to install privacy protection, with no corresponding reaction from the publishers, then the revenue can drop by 30%.
Article
Full-text available
The queries people issue to a search engine and the results clicked following a query change over time. For example, after the earthquake in Japan in March 2011, the query japan spiked in popularity and people issuing the query were more likely to click government-related results than they would prior to the earthquake. We explore the modeling and prediction of such temporal patterns in Web search behavior. We develop a temporal modeling framework adapted from physics and signal processing and harness it to predict temporal patterns in search behavior using smoothing, trends, periodicities, and surprises. Using current and past behavioral data, we develop a learning procedure that can be used to construct models of users' Web search activities. We also develop a novel methodology that learns to select the best prediction model from a family of predictive models for a given query or a class of queries. Experimental results indicate that the predictive models significantly outperform baseline models that weight historical evidence the same for all queries. We present two applications where new methods introduced for the temporal modeling of user behavior significantly improve upon the state of the art. Finally, we discuss opportunities for using models of temporal dynamics to enhance other areas of Web search and information retrieval.
Article
We empirically evaluate several state-of-the-art methods for constructing ensembles of heterogeneous classifiers with stacking and show that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation. Among state-of-the-art stacking methods, stacking with probability distributions and multi-response linear regression performs best. We propose two extensions of this method, one using an extended set of meta-level features and the other using multi-response model trees to learn at the meta-level. We show that the latter extension performs better than existing stacking approaches and better than selecting the best classifier by cross validation.
Article
Many algorithms are available to learn deep hierarchies of features from unlabeled data, especially images. In many cases, these algorithms involve multi-layered networks of features (e.g., neural net-works) that are sometimes tricky to train and tune and are difficult to scale up to many machines effectively. Recently, it has been found that K-means clustering can be used as a fast alternative training method. The main advantage of this approach is that it is very fast and easily implemented at large scale. On the other hand, employing this method in practice is not completely trivial: K-means has several limitations, and care must be taken to combine the right ingredients to get the system to work well. This chapter will summarize recent results and technical tricks that are needed to make effective use of K-means clustering for learning large-scale representations of images. We will also connect these results to other well-known algorithms to make clear when K-means can be most useful and convey intuitions about its behavior that are useful for debugging and engineering new systems.
Conference Paper
The massive volume of queries submitted to major Web search engines reflects human interest at a global scale. While the popularity of many search queries is stable over time or fluctuates with periodic regularity, some queries experience a sudden and ephemeral rise in popularity that is unexplained by their past volumes. Typically the popularity surge is precipitated by some real-life event in the news cycle. Such queries form what are known as search trends. All major search engines, using query log analysis and other signals, invest in detecting such trends. The goal is to surface trends accurately, with low latency relative to the actual event that sparked the trend. This work formally defines precision, recall and latency metrics related to top-k search trend detection. Then, observing that many trend detection algorithms rely on query counts, we develop a linear auto-regression model to predict future query counts. Subsequently, we tap the predicted counts to expedite search trend detection by plugging them into an existing trend detection scheme. Experimenting with query logs from a major Web search engine, we report both the stand-alone accuracy of our query count predictions, as well as the task-oriented effects of the prediction on the emitted trends. We show an average reduction in trend detection latency of roughly twenty minutes, with a negligible impact on the precision and recall metrics.
Conference Paper
Online social networking platforms regularly support hundreds of millions of users, who in aggregate generate substantially more data than can be stored on any single physical server. As such, user data are distributed, or sharded, across many machines. A key requirement in this setting is rapid retrieval not only of a given user's information, but also of all data associated with his or her social contacts, suggesting that one should consider the topology of the social network in selecting a sharding policy. In this paper we formalize the problem of efficiently sharding large social network databases, and evaluate several sharding strategies, both analytically and empirically. We find that random sharding---the de facto standard---results in provably poor performance even when frequently accessed nodes are replicated to many shards. By contrast, we demonstrate that one can substantially reduce querying costs by identifying and assigning tightly knit communities to shards. In particular, our theoretical analysis motivates a novel, scalable sharding algorithm that outperforms both random and location-based sharding schemes.
Conference Paper
Content popularity prediction finds application in many areas, including media advertising, content caching, movie revenue estimation, traffic management and macro-economic trends forecasting, to name a few. However, predicting this popularity is difficult due to, among others, the effects of external phenomena, the influence of context such as locality and relevance to users,and the difficulty of forecasting information cascades. In this paper we identify patterns of temporal evolution that are generalisable to distinct types of data, and show that we can (1) accurately classify content based on the evolution of its popularity over time and (2) predict the value of the content's future popularity. We verify the generality of our method by testing it on YouTube, Digg and Vimeo data sets and find our results to outperform the K-Means baseline when classifying the behaviour of content and the linear regression baseline when predicting its popularity.