Conference PaperPDF Available

The Good, the Bad and the Bait: Detecting and Characterizing Clickbait on YouTube


Abstract and Figures

The use of deceptive techniques in user-generated video portals is ubiquitous. Unscrupulous uploaders deliberately mislabel video descriptors aiming at increasing their views and subsequently their ad revenue. This problem, usually referred to as "clickbait," may severely undermine user experience. In this work, we study the clickbait problem on YouTube by collecting metadata for 206k videos. To address it, we devise a deep learning model based on variational autoencoders that supports the diverse modalities of data that videos include. The proposed model relies on a limited amount of manually labeled data to classify a large corpus of unlabeled data. Our evaluation indicates that the proposed model offers improved performance when compared to other conventional models. Our analysis of the collected data indicates that YouTube recommendation engine does not take into account clickbait. Thus, it is susceptible to recommending misleading videos to users.
Content may be subject to copyright.
The Good, the Bad and the Bait: Detecting and
Characterizing Clickbait on YouTube
Savvas Zannettou, Sotirios Chatzis, Kostantinos Papadamou, Michael Sirivianos
Department of Electrical Engineering, Computer Engineering and Informatics
Cyprus University of Technology
Limassol, Cyprus, {sotirios.chatzis,michael.sirivianos},
Abstract—The use of deceptive techniques in user-generated
video portals is ubiquitous. Unscrupulous uploaders deliberately
mislabel video descriptors aiming at increasing their views and
subsequently their ad revenue. This problem, usually referred to
as "clickbait," may severely undermine user experience. In this
work, we study the clickbait problem on YouTube by collecting
metadata for 206k videos. To address it, we devise a deep
learning model based on variational autoencoders that supports
the diverse modalities of data that videos include. The proposed
model relies on a limited amount of manually labeled data to
classify a large corpus of unlabeled data. Our evaluation indicates
that the proposed model offers improved performance when
compared to other conventional models. Our analysis of the
collected data indicates that YouTube recommendation engine
does not take into account clickbait. Thus, it is susceptible to
recommending misleading videos to users.
Recently, YouTube surpassed cable TV in terms of popu-
larity within teenagers [28]. This is because YouTube offers
a vast amount of videos, which are always available on
demand. However, because videos are generated by the users
of the platform, known as YouTubers, a plethora of them
are of dubious quality. The ultimate goal of YouTubers is to
increase their ad revenue by ensuring that their content will get
viewed by millions of users. Several YouTubers deliberately
employ techniques that aim to deceive viewers into clicking
their videos. These techniques include: (i) use of eye-catching
thumbnails, such as depictions of abnormal stuff or attractive
adults, which are often irrelevant to video content; (ii) use of
headlines that aim to intrigue the viewers; and (iii) encapsulate
false information to either the headline, the thumbnail or the
video content. We refer to videos that employ such techniques
as clickbaits. The continuous exposure of users to clickbaits
cause frustration and degraded user experience ( see Fig. 1 ) .
The clickbait problem is essentially a peculiar form of the
well-known spam problem [38], [37], [36], [17], [20]. In spam,
malicious users try to deceive users by sending them mislead-
ing messages mainly to advertise websites or perform attacks
(e.g., phishing) by redirecting users to malicious websites.
Nowadays, the spam problem is not as prevalent as a few
years ago due to the deployment of systems that diminish
it. Furthermore, users have an increased awareness of typical
spam content (e.g., emails, etc.) and they can effortlessly
discern it. However, this is not the case for clickbait, which
Fig. 1: Comments that were found in clickbait videos. The users’
frustration is apparent (we omit users’ names for ethical reasons).
usually contains hidden false or ambiguous information that
users or systems might not be able to perceive.
Recently, the aggravation of the fake news problem has
induced broader public attention to the clickbait problem.
For instance, Facebook aims at removing clickbaits from its
newsfeed [27], [14]. In this work, we focus on YouTube
for various reasons: i) anecdotal evidence suggests that the
problem exists in YouTube [8] and ii) to the best of our
knowledge, YouTube relies on users to flag suspicious videos
and then manually review them. To this extent, this approach
is deemed to be inefficient. Hence, the need for an automated
approach that minimizes human intervention is indisputable.
To attain this goal, we leverage some recent advances in the
field of Deep Learning [22] by devising a novel formulation of
variational autoencoders (VAEs) [21], [23] that fuses different
modalities of a YouTube video, and infers latent correlations
between them.
The proposed model infers a latent variable vector for
each video that encodes a high-level representation of the
content and the correlations between the various modalities.
The significance of learning to compute such concise repre-
sentations is that: (i) this learning procedure can be robustly
performed by leveraging large unlabeled data corpora; and (ii)
the obtained representations can be subsequently utilized to
drive the classification process, with very limited requirements
in labeled data.
To this end, we formulate the encoder part of the devised
VAE model as a 2-component finite mixture model [10]. That
is, we consider a set of alternative encoder models that may
generate the data pertaining to each video. The decision of
which specific encoder (corresponding to one possible class)
generates each is obtained via a trainable probabilistic gating
network [29]; this constitutes an integral part of the developed
autoencoder. The whole model is trained in an end-to-end
fashion, using the available training data, both the unlabeled
and the few labeled ones. The latter are specifically useful
for appropriately fitting the postulated gating network that
infers the posterior distribution of mixture component (and
corresponding class) allocation.
Contributions. We propose a deep generative model that
allows for combining data from as diverse modalities as video
headline text, thumbnail image and tags text, as well as various
numerical statistics, including statistics from comments. Most
importantly, the proposed model allows for successfully ad-
dressing the problem of learning from limited labeled samples
and numerous unlabeled ones (semi-supervised learning). This
is achieved by postulating a deep variational autoencoder that
employs a finite mixture model as its encoder. In this context,
mixture component assignment is regulated via an appropriate
gating network; this also constitutes the eventually obtained
classification mechanism of our deep learning system. We
provide a large scale analysis on YouTube; we show that, with
respect to the collected data, its recommendation engine does
not consider how misleading a recommended video is. Hence,
it is susceptible to recommending clickbait videos to its users.
By leveraging YouTube’s Data API, between August and
November of 2016, we collect metadata of videos published
between 2005 and 2016. Specifically, we collected the fol-
lowing data descriptors for 206k videos: (i) basic details like
headline, tags, etc.; (ii) thumbnail; (iii) comments from users;
(iv) statistics (e.g., views, likes, etc.); and (v) related videos
based on YouTube’s recommendation system. We started our
retrieval from a popular (108M views) clickbait video [1]
and iteratively collected all the related videos as were rec-
ommended by YouTube. Note that this approach enables us
to study interesting aspects of the problem, by constructing a
graph that captures the relations (recommendations) between
To get labeled data, we opted for two different approaches.
First, we manually reviewed a small subset of the collected
data by inspecting the headline, the thumbnail, comments from
users, and video content. Specifically, we watched the whole
video and compared it to the thumbnail and headline. A video
is considered clickbait only if the thumbnail and headline
deviate substantially from its content. However, this task is
both cumbersome and time consuming; thus, we elected to
retrieve more data that are labeled. To this end, we compiled
a list of channels (available at [2]) that habitually employ
clickbait techniques and channels that do not. To obtain the list
of channels, we used a pragmatic approach; we found channels
that are outed by other users as clickbait channels. For each
channel, we retrieved up-to 500 videos, hence creating a larger
labeled dataset. The overall labeled dataset consists of (i)
1,071 clickbaits and 955 non-clickbaits obtained from the
manual review process and (ii) 8,999 clickbaits and 8,470 non-
clickbaits obtained from the distinguished list of channels. The
importance of this dataset is two-fold, as it allow us to study
the problem and is instrumental for training our deep learning
A. Manually Reviewed Ground Truth Dataset Analysis
In order to better grasp the problem, we perform a compar-
ative analysis of the manually reviewed ground truth.
Category. Table 1 reports the categories we find on the videos.
In total, we find 15 categories but we only show the top
five in terms of count for brevity. We observe that most
clickbaits exist in the Entertainment and Comedy categories,
whereas non-clickbaits are prevalent in the Sports category.
This indicates that, within this dataset, YouTubers employ
clickbait techniques on videos for entertainment.
Headline. YouTubers normally employ deceptive techniques
on the headline like the use of exaggerating phrases. To verify
that this applies to our ground truth dataset, we perform
stemming to the words that are found in clickbait and non-
clickbait headlines. Fig. 2 (a) depicts the ratio of the top 20
stems that are found in our ground truth clickbait videos (i.e.,
95% of the videos that contain the stem “sexi” are clickbait).
In essence, we observe that magnetizing stems like “sexi” and
“hot” are frequently used in clickbait videos, whereas their
use in non-clickbaits is low. The same applies to words used
for exaggeration, like “viral” and “epic”.
Thumbnail. To study the thumbnails, we make use of
Imagga [19], which offers descriptive tags for an image. We
perform tagging of all the thumbnails in our ground truth
dataset. Fig. 2(b) demonstrates the ratio of the top 10 Imagga
tags that are found in the manually reviewed ground truth. We
observe that clickbait videos typically use sexually-appealing
thumbnails in their videos in order to attract viewers. For
instance, 81% of the videos’ thumbnail of which contains the
“pretty” tag are clickbaits.
Tags. Tags are words that are defined by YouTubers before
publishing and can dictate whether a video will emerge on
users’ search queries. We notice that clickbaits use specific
words on tags, whereas non-clickbaits do not. Fig. 2 (c) depicts
the ratio of the top 20 stems that are found in clickbaits.
We observe that many clickbait videos use tags like “try not
to laugh”, “viral”, “hot” and “impossible”; phrases that are
usually used for exaggeration.
Statistics. Fig. 2 (d) shows the normalized score of the video
statistics for both classes of videos. Interestingly, clickbaits
and non-clickbaits videos have similar views; suggesting that
viewers are not able to easily discern clickbait videos, hence
clicking on them. Also, non-clickbait videos have more likes
and less dislikes than clickbaits. This is reasonable as many
users feel frustrated after watching clickbaits.
Comments. We notice that users on YouTube implicitly flag
suspicious videos by commenting on them. For instance,
we note several comments like the following: “the title is
misleading”, “i clicked because of the thumbnail”, “where
is the thumbnail?” and “clickbait”. Hence, we argue that
comments from viewers is a valuable resource for assessing
videos. To this end, we analyze the ground truth dataset to
extract the mean number of occurrences of words widely used
(a) (b) (c)
(d) (e)
Category Clickbaits (%) Non-clickbaits (%)
Entertainment 406 (38%) 308 (32%)
Comedy 318 (29%) 228 (24%)
People & Blogs 155 (14%) 115 (12%)
Autos & Vehicles 33 (3%) 49 (5%)
Sports 29 (3%) 114 (12%)
Fig. 2 & TABLE I: Analysis of the manually reviewed ground truth dataset. Normalized mean scores for: (a) stems from headline text; (b)
tags derived from thumbnails; (c) stems from tags that were defined by uploaders; (d) video statistics; and (e) comments that contain words
for flagging suspicious videos. Table 1 shows the top five categories (and their respective percentages) in our ground truth dataset.
Source Destination Norm. Mean
clickbait clickbait 4.1
clickbait non-clickbait 2.73
non-clickbait clickbait 2.75
non-clickbait non-clickbait 3.57
TABLE II: Normalized mean of related videos for clickbait and
non-clickbait videos in the ground truth dataset
for flagging clickbait videos. Fig. 2 (e) depicts the normalized
mean scores for the identified words. We observe that these
words were greatly used in clickbait comments but not in non-
clickbaits. Also, it is particularly interesting that comments
referring to the video’s thumbnail were found 2.5 times more
often in clickbait than in non-clickbaits.
Graph Analysis. Users often watch videos according to
YouTube’s recommendations. From manual inspections, we
have noted that when watching a clickbait video, YouTube is
more likely to recommend another clickbait video. To confirm
this against our data, we create a directed graph G= (V,E),
where Vthe videos and Ethe connections between videos
pointing to one another via a recommendation. Then, for all the
videos, we select their immediate neighbors in the graph, and
count the videos that are clickbaits and non-clickbaits. Table II
depicts the normalized mean of the number of connected
videos for each class. We apply a normalization factor to
mitigate the bias towards clickbaits, which have a slightly
greater number in our ground truth. We observe that, when
a user watches a clickbait video, they are recommended 4.1
clickbait videos on average, as opposed to 2.73 non-clickbait
recommendations. A similar pattern holds for non-clickbaits;
a user is less likely to be served a clickbait when watching a
YouTube’s countermeasures. To get an insight on whether
YouTube employs any countermeasures, we calculate the num-
ber of offline (either deleted by YouTube or removed by the
uploader) videos in our manually reviewed ground truth, as
of January 10, 2017 and April 30, 2017. We found that only
3% (January 10th) and 10% (April 30th) of the clickbaits are
offline. Similarly, only 1% (January 10th) and 5% (April 30th)
of the non-clickbaits are offline. To verify that the ground
truth dataset does not consist of only recent videos (thus, just
published and not yet detected) we calculate the mean number
of days that passed from the publication date up to January 10,
2017. We find that the mean number of days for the clickbaits
is 700, while it is 917 days for the non-clickbaits. The very
low offline ratio, as well as the high mean number of days,
indicate that YouTube is not able to tackle the problem in a
timely manner.
A. Processed Modalities
Our model processes the following modalities: (i) Headline:
For the headline, we consider both the content and the style
of the text. For the content of the headline, we use sent2vec
embeddings [26] trained on Twitter data. For the style of the
text, we use the features proposed in [6]; (ii) Thumbnail: We
scale down the images to 28x28 and convert them to grayscale.
This way, we decrease the number of trainable parameters
for our developed deep network, thus speeding training time
up without compromising achievable performance; (iii) Com-
ments: We preprocess user comments to find the number of
occurrences of words used for flagging videos. We consider
the following words: “misleading, bait, thumbnail, clickbait,
deceptive, deceiving, clicked, flagged, title”; (iv) Tags: We
encode the tags’ text as a binary representation of the top 100
Fig. 3: Overview of the proposed model. The dotted rectangle
represents the encoding component of our model.
words that are found in the whole corpus; and (v) Statistics
(e.g., likes, views, etc.).
B. Model Formulation
In Fig. 3, we provide an overview of the proposed model.
The thumbnail is initially processed, at the encoding part of
the proposed model, by a CNN [40]. We use a CNN that
comprises four convolutional layers, with 64 filters each, and
ReLU activations. The first three of these layers are followed
by max-pooling layers. The fourth is followed by a simple
densely connected layer, which comprises 32 units with ReLU
activations. This initial processing stage allows for learning to
extract a high-level, 32-dimensional vector of the thumbnail,
which contains the most useful bits of information for driving
The overarching goal of the devised model is to limit
the required availability of labeled data, while making the
most out of large corpora of (unlabeled) examples. To this
end, after this first processing stage, we split the encoding
part into two distinct subencoders that work in tandem. Both
these subencoders are presented with the aforementioned 32-
dimensional thumbnail representation, fused with the data
stemming from all the other available modalities. This results
in a 855-dimensional input vector, first processed by one dense
layer network (Fusing Network) that comprises 300 ReLU
units. Due to its large number of parameters, this dense layer
network may become prone to overfitting; hence, we regularize
using the prominent Dropout technique [34]. We use a Dropout
level of d= 0.5; this means that, at each iteration of the
training algorithm, 50% of the units are randomly omitted
from updating their associated parameters. The obtained 300-
dimensional vector, say h(), is the one eventually presented to
both postulated subencoders.
The rationale behind this novel configuration is motivated
by a key observation; the two modeled classes are expected
to entail significantly different patterns of correlations and
latent underlying dynamics between the modalities. Hence,
it is plausible that each class can be adequately and effec-
tively modeled by means of distinct, and different, encoder
distributions, inferred by the two subencoders. Each of these
subencoders are dense-layer networks comprising a hidden
layer with 20 ReLU units, and an output layer with 10 units.
Since the devised model constitutes a VAE, the output units
of the subencoders are of a stochastic nature; specifically,
we consider stochastic outputs, say ˜
zand ˆ
z, with Gaussian
(posterior) densities, as usual in the literature of VAEs [21],
[23]. Hence, what the postulated subencoders actually compute
are the means, ˜
µand ˆ
µ, as well as the (diagonal) covariance
matrices, ˜
σ2and ˆ
σ2, of these Gaussian posteriors. On this
basis, the actual subencoder output vectors, ˜
zand ˆ
z, are
sampled each time from the corresponding (inferred) Gaussian
posteriors. Note that our modeling selection of sharing the ini-
tial CNN-based processing part between the two subencoders
allows for significantly reducing the number of trainable
parameters, without limiting the eventually obtained modeling
Under this mixture model formulation, we need to establish
an effective mechanism for inferring which observations (i.e.,
videos) are more likely to match the learned distribution of
each component subencoder. This is crucial for effectively
selecting between the samples of ˜
zor ˆ
zat the output of the
encoding stage of the devised model. In layman terms, this
can be considered to be analogous to a (soft) classification
mechanism. This mechanism can be obtained by computation
of the posterior distribution of mixture component membership
of each video (also known as "responsibility" in the literature
of finite mixture models [25]). To allow for inferring this
posterior distribution, in this work we postulate a gating
network. This is a dense-layer network, which comprises one
hidden layer with 100 ReLU units, and is presented with
the same vector, h(), as the two postulated subencoders. It
is trained alongside the rest of the model, and it is the only
part of the model that requires availability of labeled data for
its training.
Note that this gating network entails only a modest number
of trainable parameters, since both the size of its input as
well as of its single hidden layer are rather small. As such,
it can be effectively trained even with limited availability of
labeled data. This is a key merit of our approach, which fully
differentiates it from conventional classifiers that are presented
with raw observed data, which typically are prohibitively high-
To conclude the formulation of the proposed VAE, we
need to postulate an appropriate decoder distribution, and a
corresponding network that infers it. In this work, we opt
for a simple dense-layer neural network, which is fed with
the (sampled) output of the postulated finite mixture model
encoder, and attempts to reconstruct the original modalities.
Specifically, we postulate a network comprising one hidden
layer with 300 ReLU units, and a set of 823 output units,
that attempt to reconstruct the original modalities, with the
exception of the thumbnail. The reason why we ignore the
thumbnail modality from the decoding process is the need
of utilizing deconvolutional network layers to appropriately
handle it, which is quite complicated. Hence, we essentially
treat the thumbnail modality as side-information, that regulates
the inferred posterior distribution of the latent variables z
(encoder) in the sense of a covariate, instead of an observed
modality in the conventional sense. It is empirically known
that such a setup, if appropriately implemented, does not
undermine modeling effectiveness [30].
Let us denote as xnthe set of observable data per-
taining to the nth available video. We denote as xn=
n}the set of five distinct modalities,
i.e., headline, thumbnail, tags, comments, and statistics, re-
spectively. Then, based on the above description, the encoder
distribution of the postulated model reads:
q(zn|xn) =q(˜
Here, znis the output of the encoding stage of the proposed
model that corresponds to xn,˜
znis the output of the first
subencoder, corresponding to the clickbait class, ˆ
znis the
output of the second subencoder, corresponding to the non-
clickbait class, and cnis a latent variable indicator of whether
xnbelongs to the clickbait class or not. We also postulate
zn|xn) = N(˜
θ),diag ˜
θ)) (2)
zn|xn) = N(ˆ
θ),diag ˆ
θ)) (3)
Here, the ˜
θ)and ˜
θ)are outputs of a deep
neural network, with parameters set ˜
θ, that corresponds to the
clickbait class subencoder; it comprises a first CNN-type part
that processes the observed thumbnails, and a further densely-
connected network part that fuses and processes the rest of
the observed modalities, as described previously. Similarly, the
θ)and ˆ
θ)are outputs of a deep neural network
with parameters set ˆ
θ, that corresponds to the non-clickbait
class subencoder.
The posterior distribution of mixture component allocation,
q(cn|xn), which is parameterized by the aforementioned gat-
ing network, is a simple Bernoulli distribution that reads
q(cn|xn) = Bernoulli($(h(xn); ϕ)) (4)
Here, $(h(xn); ϕ)[0,1] is the output of the gating network,
with trainable parameters set ϕ. This is presented with an
intermediate encoding of the input modalities (shared with
the subencoder networks), h(xn), as described previously, and
infers the probability of xnbelonging to the clickbait class.
Lastly, the postulated decoder distribution reads
p(xn|zn) = N(xhe
n|µ(zn;φ),diag σ2(zn;φ))
where the means and diagonal covariances, µ(zn;φ)and
σ2(zn;φ), are outputs of a deep network with trainable
parameters set φ, configured as described previously.
C. Model Training
Let us consider a training dataset X={xn}N
n=1 that
consists of Nvideo samples. A small subset, Xl, of size Mof
these samples is considered to be labeled, with corresponding
labels set Y={ym}M
m=1. Then, following the VAE literature
[21], model training is performed by maximizing the evidence
lower bound (ELBO) of the model over the parameters set
θ,ϕ,φ}. The ELBO of our model reads:
log p(X)≥L(˜
θ,ϕ,φ|X) =
E[log p(xn|zn)] + X
log q(cm=ym|xm)
Here, KLq||pis the KL divergence between the distribution
q(·)and the distribution p(·), while E[·]is the (posterior)
expectation of a function w.r.t. its entailed random (latent)
variables. Note also that, in the ELBO expression (6), the
introduced hyperparameter γis a simple regularization con-
stant, employed to ameliorate the overfitting tendency of
the postulated decoder networks, p(xn|zn). We have noticed
that this simple trick yields a significant improvement in
generalization capacity.
In Eq. (6), the posterior expectation of the log-likelihood
term p(xn|zn)cannot be computed analytically, due to the
nonlinear form of the decoder. Hence, we must approximate
it by drawing Monte-Carlo (MC) samples from the posterior
(encoder) distributions (2)-(3). However, MC gradients are
well-known to suffer from high variance. To resolve this
issue, we utilize a smart re-parameterization of the drawn MC
samples. Specifically, following the related derivations in [21],
we express these samples in the form of a differentiable
transformation of an (auxiliary) random noise variable ; this
random variable is the one we actually draw MC samples from:
n N (0,I)(7)
Hence, such a re-parameterization reduces the computed ex-
pectations into averages over samples from a random variable
with low (unitary) variance, . This way, by maximizing
the obtained ELBO expression, one can yield low-variance
estimators of the sought (trainable) parameters, under some
mild conditions [21]. Turning to the maximization process of
θ,ϕ,φ|X), this can be effected using modern stochastic
optimizations, such as AdaGrad [16] or RmsProp [39].
A. Experimental Setup
Our model is implemented with Keras [12] and Tensor-
Flow [24].
MC Samples. To perform model training, we used S= 10
drawn MC samples, (s); we found that increasing this value
does not yield any statistically significant accuracy improve-
ment, despite the associated increase in computational costs.
Initialization. We employ Glorot Uniform initialization [18].
This scheme allow us to train deep network architectures
without the need of layerwise pretraining. This is effected by
initializing the network weights in a way which ensures that
the signal propagated across the network layers remains in a
reasonable range of values, irrespectively of the network depth
(i.e., it does not explode to infinity or vanish to zero).
Stochastic optimization. We found that RmsProp works better
than the AdaGrad algorithm suggested in [21]. The RmsProp
hyperparameter values we used comprise an initial learning
rate equal to 0.001, ρ= 0.9, and ε= 108.
Prediction generation. To predict the class of a video,
xn, we compute the mixture assignment posterior distribu-
tion q(cn|xn), inferred via the postulated gating network
$(h(xn); ϕ). On this basis, assignment is performed to the
clickbait class if $(h(xn); ϕ)>0.5.
Baselines. To evaluate our model, we compare it against
two baseline models. First, a simple Support Vector Machine
(SVM) with parameters γ= 0.001 and C= 100. Second, a
supervised deep network (SDN) that comprises (i) the same
CNN as the proposed model; and (ii) a 2-layer fully-connected
neural network with Dropout level of d= 0.5.
Evaluation. We train our proposed model using the entirety of
the available unlabeled dataset, as well as a randomly selected
80% of the available labeled dataset, comprising an equal
number of clickbait and non-clickbait examples. Subsequently,
the trained model is used to perform out-of-sample evaluation;
that is, we compute the classification performance of our
approach on the fraction of the available labeled examples
that were not used for model training.
Model Accuracy Precision Recall F1 Score
SVM 0.882 0.909 0.884 0.896
SDN 0.908 0.920 0.907 0.917
Proposed Model (U = 25% ) 0.915 0.918 0.926 0.923
Proposed Model (U = 50% ) 0.918 0.918 0.934 0.926
Proposed Model (U = 100% ) 0.924 0.921 0.942 0.931
TABLE III: Performance metrics for the evaluated methods. We also
report the performance of our model when using only 25% or 50%
of the available unlabeled data.
Table III reports the performance of the proposed model as
well as the two considered baselines. We observe that neural
network-based approaches, such as a simple neural network
and the proposed model, outperform SVMs in terms of all
the considered metrics. Specifically, the best performance is
obtained by the proposed model, which outperforms SVMs
by 3.8%, 1.2%, 5.8% and 3.5% on accuracy, precision, recall,
and F1 score, respectively. Further, to assess the importance
of using unlabeled data, we also report results with reduced
unlabeled data. We observe that, using only 25% of the
available unlabeled data, the proposed model undergoes a
substantial performance decrease, as measured by all the em-
ployed performance metrics. This performance deterioration
only slightly improves when we elect to retain 50% of the
available unlabeled data.
Corpus-Level Inference Insights. Having demonstrated the
performance of our model, we now provide some insights into
the obtained inferential outcomes on the whole corpus of data.
From the 206k examples, our model predicts that 84k (41%)
of them are clickbaits whereas 122k (59%) are non-clickbaits.
The considerable percentage of clickbaits in the corpus, in
conjunction with the data collection procedure, suggests that,
with respect to the collected data, YouTube does not consider
misleading videos in their recommendations. Note also that
we have performed an analysis of the whole corpus akin to
the ground truth dataset analysis of Section II.A. The obtained
results follow the same pattern as for the ground truth dataset.
Specifically, we have found that the normalized mean values,
reported in Table II for the labeled data, for the whole corpus
become equal to 11.18, when it comes to pairs of clickbait
videos, whereas for clickbait and non-clickbaits the mean is
2.62. This validates our deduction that it is more likely to
be recommended a clickbait video when viewing a clickbait
video on YouTube.
The clickbait problem is also identified by prior work that
proposes tools for alleviating the problem in various web
portals. Specifically, Chen et al. [11] provide useful informa-
tion regarding the clickbait problem and future directions for
tackling the problem using SVM and Naive Bayes approaches.
Rony et al. [32] analyze 1.67M posts on Facebook in order to
understand the extent and impact of the clickbait problem as
well as users’ engagement. For detecting clickbaits, they pro-
pose the use of sub-word embeddings with a linear classifier.
Potthast et al. [31] focus on the Twitter platform where they
suggest the use of Random Forests for distinguishing tweets
that contain clickbait content. Furthermore, Chakraborty et
al. [9] propose the use of SVMs in conjunction with a browser
add-on for offering a detection system to end-users of news
articles. Moreover, Biyani et al. [6] recommend the use of
Gradient Boosted Decision Trees for clickbait detection in
news articles. They also demonstrate that the degree of infor-
mality in the content of the landing page can help in discerning
clickbait news articles. To the best of our knowledge, Anand
et al. [4] is the first work that suggests the use of deep learning
techniques for mitigating the clickbait problem. Specifically,
they propose the use of Recurrent Neural Networks in conjunc-
tion with word2vec embeddings for identifying clickbait news
articles. Agrawal [3] propose the use of CNNs in conjunction
with word2vec embeddings for discerning clickbait headlines
in Reddit, Facebook and Twitter. Other efforts include browser
add-ons [15], [13], [7] and manually associating user accounts
with clickbait content [33], [35], [5].
Remarks. In contrast to the aforementioned works, we focus
on the YouTube platform and propose a deep learning model
that: (i) successfully and properly fuse and correlate the diverse
set of modalities related to a video; and (ii) leverage large
unlabeled datasets, while imposing much limited requirements
in labeled data availability.
In this work, we have explored the use of variational
autoencoders for tackling the clickbait problem on YouTube.
Our approach constitutes the first proposed semi-supervised
deep learning technique in the field of clickbait detection. This
way, it enables more effective automated detection of clickbait
videos in the absence of large-scale labeled data. Our analysis
indicates that YouTube recommendation engine does not take
into account the clickbait problem in its recommendations.
This project has received funding from the European
Union’s Horizon 2020 Research and Innovation program,
under the Marie Skłodowska-Curie ENCASE project (Grant
Agreement No. 691025). We also gratefully acknowledge the
support of NVIDIA Corporation, with the donation of the Titan
Xp GPU used for this research.
[1] Clickbait video.
[2] List of ground truth channels., 2018.
[3] A. Agrawal. Clickbait detection using deep learning. In NGCT, 2016.
[4] A. Anand, T. Chakraborty, and N. Park. We used Neural Networks
to Detect Clickbaits: You won’t believe what happened Next! arXiv
preprint arXiv:1612.01340, 2016.
[5] Anti-Clickbait., 2015.
[6] P. Biyani, K. Tsioutsiouliklis, and J. Blackmer. "8 Amazing Secrets
for Getting More Clicks": Detecting Clickbaits in News Streams Using
Article Informality. 2016.
[7] B.s detector browser extension.
[8] R. Campbell. You Wont Believe How Clickbait is Destroying YouTube!,
[9] A. Chakraborty, B. Paranjape, S. Kakarla, and N. Ganguly. Stop
Clickbait: Detecting and Preventing Clickbaits in Online News Media.
[10] S. P. Chatzis, D. I. Kosmopoulos, and T. A. Varvarigou. Signal Modeling
and Classification Using a Robust Latent Space Model Based on t
Distributions. TOSP, 2008.
[11] Y. Chen, N. J. Conroy, and V. L. Rubin. Misleading Online Content:
Recognizing Clickbait as False News. In ACM MDD, 2015.
[12] F. Chollet. Keras., 2015.
[13] Clickbait remover for Facebook.
[26] M. Pagliardini, P. Gupta, and M. Jaggi. Unsupervised Learning of
Sentence Embeddings using Compositional n-Gram Features. arXiv,
[14] J. Constine. Facebook feed change fights clickbait post by post in 9
more languages, 2017.
[15] Downworthy browser extension.
[16] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for
online learning and stochastic optimization. JMLR, 2010.
[17] H. Gao, Y. Chen, K. Lee, D. Palsetia, et al. Towards Online Spam
Filtering in Social Networks. In NDSS, 2012.
[18] X. Glorot and Y. Bengio. Understanding the difficulty of training deep
feedforward neural networks. In AISTATS, 2010.
[19] Imagga. Tagging Service, 2016.
[20] N. Jindal and B. Liu. Review spam detection. In WWW, 2007.
[21] D. Kingma and M. Welling. Auto-Encoding Variational Bayes. In ICLR,
[22] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 2015.
[23] L. Maaløe, C. K. Sønderby, S. K. Sønderby, and O. Winther. Auxiliary
Deep Generative Models. In ICML, 2016.
[24] A. Martın, A. Ashish, B. Paul, B. Eugene, C. Zhifeng, C. Craig, C. G.
S, D. Andy, D. Jeffrey, et al. Tensorflow: Large-scale machine learning
on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467,
[25] G. McLachlan and D. Peel. Finite Mixture Models. 2000.
[27] A. Peysakhovich. Reducing clickbait in facebook feed, 2016.
[28] Piperjaffray. Survey, 2016.
[29] E. A. Platanios and S. P. Chatzis. Gaussian Process-Mixture Conditional
Heteroscedasticity. TPAMI, 2014.
[30] I. Porteous, A. Asuncion, and M. Welling. Bayesian Matrix Factorization
with Side Information. In AAAI, 2010.
[31] M. Potthast, S. Köpsel, B. Stein, and M. Hagen. Clickbait Detection.
In ECIR, 2016.
[32] M. M. U. Rony, N. Hassan, and M. Yousuf. Diving Deep into Clickbaits:
Who Use Them to What Extents in Which Topics with What Effects?
arXiv:1703.09400, 2017.
[33] SavedYouAClick., 2014.
[34] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. R.
Salakhutdinov. Dropout: A Simple Way to Prevent Neural Networks
from Overfitting. JMLR, 2014.
[35] StopClickBait., 2016.
[36] G. Stringhini, M. Egele, A. Zarras, T. Holz, C. Kruegel, and G. Vigna.
B@bel: Leveraging Email Delivery for Spam Mitigation. In USENIX
Security, 2012.
[37] G. Stringhini, T. Holz, B. Stone-Gross, C. Kruegel, and G. Vigna.
BOTMAGNIFIER: Locating Spambots on the Internet. In USENIX
Security, 2011.
[38] G. Stringhini, C. Kruegel, and G. Vigna. Detecting spammers on social
networks. In CSA, 2010.
[39] T. Tieleman and G. Hinton. Lecture 6.5-RMSprop: Divide the gradient
by a running average of its recent magnitude. 2012.
[40] M. D. Zeiler and R. Fergus. Visualizing and Understanding Convolu-
tional Networks. In ECCV, 2014.
... This motivated researchers to redefine clickbaits to encompass their different types and develop models to detect them on various other types of social media such as Instagram [12] and YouTube [41][31] [38]. Previously, we explored visual-centric baits and created a cross-platform clickbait detection model comprising of a stacking framework architecture [17]. ...
... Hence, it can be used for real-time detection of clickbait videos and be used to discard them before they reach a broad audience and lead to poor user experience. Hence, while previous research [41][31] [38] focused on developing clickbait detection models, i.e., 'detect' them after they have been published and garnered significant user engagement, the proposed model can be used to both 'prevent' the publishing of clickbait content and deterioration of content quality and 'detect' it if it has already been published. ...
...  Redefinition of Clickbaits on YouTube: Since, social media platforms such as YouTube are ever-evolving, in terms of technology, available content, its creators and consumers, the accompanying forms of clickbaits are also rapidly expanding. Consequently, even though researchers in the past have attempted to define clickbaits [41][38], these definitions are outdated and not broad enough to encapsulate all types of clickbaits currently present on YouTube. Therefore, we performed a holistic study of clickbaits on YouTube and its previous definitions, to identify its various new and emerging categories and then redefined clickbaits to include a vast majority of them. ...
Full-text available
Unscrupulous content creators on YouTube employ deceptive techniques such as spam and clickbait to reach a broad audience and trick users into clicking on their videos to increase their advertisement revenue. Clickbait detection on YouTube requires an in depth examination and analysis of the intricate relationship between the video content and video descriptors title and thumbnail. However, the current solutions are mostly centred around the study of video descriptors and other metadata such as likes, tags, comments, etc and fail to utilize the video content, both video and audio. Therefore, we introduce a novel model to detect clickbaits on YouTube that consider the relationship between video content and title or thumbnail. The proposed model consists of a stacking classifier framework composed of six base models (K Nearest Neighbours, Support Vector Machine, XGBoost, Naive Bayes, Logistic Regression, and Multilayer Perceptron) and a meta classifier. The developed clickbait detection model achieved a high accuracy of 92.89% for the novel BollyBAIT dataset and 95.38% for Misleading Video Dataset. Additionally, the stated classifier does not use meta features or other statistics dependent on user interaction with the video (the number of likes, followers, or comments) for classification, and thus, can be used to detect potential clickbait videos before they are uploaded, thereby preventing the nuisance of clickbaits altogether and improving the users streaming experience.
... Deception has been a persistent phenomenon in online communication for decades (Hancock & Gonzalez, 2013). Currently, the most widespread deceptive practice may be clickbait (Zannettou et al., 2018): the act of convincing a user to click a link they otherwise would not so as to increase click-based advertising revenue (Biyani et al., 2016). Clickbait can frustrate users and crowd out higher quality content (Rony et al., 2017). ...
... From a journalistic perspective, news headlines have been transformed by digital reporting into elements intended to attract reader attention rather than provide information (Jiang et al., 2019). The use of clickbait techniques seems to be spreading to include mainstream news media and entertainment content (Rony et al., 2017) and has been thrust into the spotlight by the recent proliferation of fake news (Zannettou et al., 2018). Yet, as noted by Potthast et al. (2018), the working mechanisms of clickbait are still not adequately understood. ...
Full-text available
This exploratory study aims to identify which linguistic and typological features commonly associated with clickbait in online news headlines are indicative of clickbait in YouTube video titles. A comparative corpus analysis is conducted to compare YouTube video titles commonly associated with clickbait to titles not associated with clickbait. Results indicate that a majority of the typological and linguistic features associated with clickbait in online news headlines are found to be indicative of clickbait in YouTube video titles. However, the role which each of the features plays seems to differ to that of online news. The findings contribute to the understanding of clickbait in non-news contexts from a linguistics perspective, an area which has been relatively unexplored in the current literature.
... Papadamou et al. [24] have developed a binary classifier to detect potentially annoying/harmful YouTube videos for toddlers. Similarly, several efforts have identified spam and clickbait videos by studying video metadata, comments, user activity, and video attributes [25], [26], [27]. On the contrary, we provide an overview of malicious content based on user reviews through a user-review sharing system based on blockchain technology. ...
... For further information of clickbait on YouTube see e.g.,Zannettou et al., 2018. ...
Full-text available
Recently, former extremists and offenders have begun providing online initiatives in addition to their offline enterprises (e.g., in-school talks, TV productions, autobiographies). They often present these initiatives as designed to prevent and counter violent extremism and crime. Strikingly, while formers’ online narratives are increasing and usually receive positive coverage, research on them has been limited. This study applied a structure analysis to systematically explore a former right-wing extremist’s YouTube channel as a case study. The analysis was based on the formal channel criteria and 421 videos published between May 2017 and May 2020. This is a full survey during this period. To the best of our knowledge, this is the first study to investigate this phenomenon. Examining the YouTube channel provides valuable evidence for: (1) a focus on detailed narratives and visualizations from the extremist and criminal past, (2) using YouTube as a business model, and (3) distributing content and behavior that is inappropriate for children and youths (e.g., depicting violence, alcohol consumption, and [e-]cigarette use). The results indicate that such online initiatives’ content and other relevant aspects (e.g., content creators’ selfpresentation) require more critical attention and reflection before they, for example, are promoted as suitable tools for young people.
YouTube has long been a top-choice destination for independent video content creators to share their work. A large part of YouTube's appeal is owed to its practice of sharing advertising revenue with qualifying content creators through the YouTube Partner Program (YPP). In recent years, changes to the monetization policies and the introduction of algorithmic systems for making monetization decisions have been a source of controversy and tension between content creators and the platform. There have been numerous accusations suggesting that the underlying monetization algorithms engage in preferential treatment of larger channels and effectively censor minority voices by demonetizing their content. In this paper, we conduct a measurement of the YouTube monetization algorithms. We begin by measuring the incidence rates of different monetization decisions and the time taken to reach them. Next, we analyze the relationships between video content, channel popularity and these decisions. Finally, we explore the relationship between demonetization and a channel's view growth rate. Taken all together, our work suggests that demonetization after a video is publicly listed is not a common occurrence, the characteristics of the process are associated with channel size and (in unexplainable ways) video topic, and demonetization appears to have a harsh influence on the growth rate of smaller channels. We also highlight the challenges associated with conducting large-scale algorithm audits such as ours and make an argument for more transparency in algorithmic decision-making.
General paranoia is the term that best describes a user's social media experience. The spaces we go to socialize online are full of suspicion, potential bad-faith actors, and advertisements that seem to know your every move. This attention-grabbing, habit-forming culture is sold on dreams of limitless love between family, friends, and community (as promised by the Facebook slogan "bring the world closer together"). Genuine connection can be found online, but the heart of this network lies outside fiber optic cables.
This thesis is a corpus linguistic investigation of the language used by young German speakers online, examining lexical, morphological, orthographic, and syntactic features and changes in language use over time. The study analyses the language in the Nottinghamer Korpus deutscher YouTube‐Sprache ("Nottingham corpus of German YouTube language", or NottDeuYTSch corpus), one of the first large corpora of German‐language comments taken from the videosharing website YouTube, and built specifically for this project. The metadatarich corpus comprises c.33 million tokens from more than 3 million comments posted underneath videos uploaded by mainstream German‐language youthorientated YouTube channels from 2008‐2018. The NottDeuYTSch corpus was created to enable corpus linguistic approaches to studying digital German youth language (Jugendsprache), having identified the need for more specialised web corpora (see Barbaresi 2019). The methodology for compiling the corpus is described in detail in the thesis to facilitate future construction of web corpora. The thesis is situated at the intersection of Computer‐Mediated Communication (CMC) and youth language, which have been important areas of sociolinguistic scholarship since the 1980s, and explores what we can learn from a corpus‐driven, longitudinal approach to (online) youth language. To do so, the thesis uses corpus linguistic methods to analyse three main areas: 1. Lexical trends and the morphology of polysemous lexical items. For this purpose, the analysis focuses on geil, one of the most iconic and productive words in youth language, and presents a longitudinal analysis, demonstrating that usage of geil has decreased, and identifies lexical items that have emerged as potential replacements. Additionally, geil is used to analyse innovative morphological productiveness, demonstrating how different senses of geil are used as a base lexeme or affixoid in compounding and derivation. 2. Syntactic developments. The novel grammaticalization of several subordinating conjunctions into both coordinating conjunctions and discourse markers is examined. The investigation is supported by statistical analyses that demonstrate an increase in the use of non‐standard syntax over the timeframe of the corpus and compares the results with other corpora of written language. 3. Orthography and the metacommunicative features of digital writing. This iii iv analysis identifies orthographic features and strategies in the corpus, e.g. the repetition of certain emoji, and develops a holistic framework to study metacommunicative functions, such as the communication of illocutionary force, information structure, or the expression of identities. The framework unifies previous research that had focused on individual features, integrating a wide range of metacommunicative strategies within a single, robust system of analysis. By using qualitative and computational analytical frameworks within corpus linguistic methods, the thesis identifies emergent linguistic features in digital youth language in German and sheds further light on lexical and morphosyntactic changes and trends in the language of young people over the period 2008‐2018. The study has also further developed and augmented existing analytical frameworks to widen the scope of their application to orthographic features associated with digital writing.
YouTube videos often include captivating descriptions and intriguing thumbnails designed to increase the number of views, and thereby increase the revenue for the person who posted the video. This creates an incentive for people to post clickbait videos, in which the content might deviate significantly from the title, description, or thumbnail. In effect, users are tricked into clicking on clickbait videos. In this research, we consider the challenging problem of detecting clickbait YouTube videos. We experiment with multiple state-of-the-art machine learning techniques using a variety of textual features.
Which cognitive processes allow us to become ignorant of something? In this chapter, I will discuss “to-ignorance processes” – which I will describe as processes that allow people to shift from a state of partial knowledge or unaware ignorance to another state of ignorance (of which they could be aware or not). I will argue that these processes affect how cognitive agents encounter new information and may in part explain some pressing socio-epistemological issues, such as the fake news problem. In the first section, I will describe some to-ignorance processes that commonly occur in people’s ordinary life. In the following section, I will focus on how people can become misinformed online by discussing what both “true” and “fake” news have in common, which is the fact that they present action-suggesting qualities to users, or affordances. The third section will be dedicated to how we should describe these affordances and which kinds of actions they offer to users online. In the fourth section, I will specifically discuss the action-based relations between agents and the information offered online and I will argue that some epistemic actions imply the explicit or implicit engagement to one’s ignorance. Finally, in the fifth section, I will return to the idea that to-ignorance processes affect how agents encounter information and I will argue that we should address and study to-ignorance processes to set up and manage rich environments in which people should have more opportunities to become curious and surprised than misinformed.KeywordsIgnorance ProcessesAffordanceEpistemic ActionPragmatic ActionFake News
Online News Portals are currently one of the primary sources used by people, though its credibility is under serious question. Because the problem associated with this is Click-bait. Click-baiting being the growing phenomenon on internet has the potential to intentionally mislead and attract online viewership thereby earning considerable revenue for the agencies providing such false information. There is need for accurately detecting such events on online-platforms before the user becomes a victim. The solution incorporates a Novel Neural Network Approach based on FastText Word2Vec Embeddings provided by Facebook and Natural Language Processing where Headlines are specifically taken into consideration. The proposed system consists of Hybrid Bi- Directional LSTM-CNN model and MLP model. Promising Results have been achieved when tested on a dataset of 32,000 columns equally distributed as Click-bait and Non-Click-bait, in terms of Accuracy, Precision and Recall. The graphs achieved are also self-explanatory in terms of reliability of the system. A comparative analysis is also been done to show the effectiveness of our design with respect to detecting Click-bait which is heavily present on-line.
Conference Paper
Full-text available
The use of alluring headlines (clickbait) to tempt the readers has become a growing practice nowadays. For the sake of existence in the highly competitive media industry, most of the on-line media including the mainstream ones, have started following this practice. Although the wide-spread practice of clickbait makes the reader's reliability on media vulnerable, a large scale analysis to reveal this fact is still absent. In this paper, we analyze 1.67 million Facebook posts created by 153 media organizations to understand the extent of clickbait practice, its impact and user engagement by using our own developed clickbait detection model. The model uses distributed sub-word embeddings learned from a large corpus. The accuracy of the model is 98.3%. Powered with this model, we further study the distribution of topics in clickbait and non-clickbait contents.
Full-text available
The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i.e. semantic representations) of word sequences as well. We present a simple but efficient unsupervised objective to train distributed representations of sentences. Our method outperforms the state-of-the-art unsupervised models on most benchmark tasks, and on many tasks even beats supervised models, highlighting the robustness of the produced sentence embeddings.
Full-text available
Online content publishers often use catchy headlines for their articles in order to attract users to their websites. These headlines, popularly known as clickbaits, exploit a user's curiosity gap and lure them to click on links that often disappoint them. Existing methods for automatically detecting clickbaits rely on heavy feature engineering and domain knowledge. Here, we introduce a neural network architecture based on Recurrent Neural Networks for detecting clickbaits. Our model relies on distributed word representations learned from a large unannotated corpora, and character embeddings learned via Convolutional Neural Networks. Experimental results on a dataset of news headlines show that our model outperforms existing techniques for clickbait detection with an accuracy of 0.98 with F1-score of 0.98 and ROC-AUC of 0.99.
Full-text available
Deep generative models parameterized by neural networks have recently achieved state-of-the-art performance in unsupervised and semi-supervised learning. We extend deep generative models with auxiliary variables which improves the variational approximation. The auxiliary variables leave the generative model unchanged but make the variational distribution more expressive. Inspired by the structure of the auxiliary variable we also propose a model with two stochastic layers and skip connections. Our findings suggest that more expressive and properly specified deep generative models converge faster with better results. We show state-of-the-art performance within semi-supervised learning on MNIST (0.96%), SVHN (16.61%) and NORB (9.40%) datasets.
Conference Paper
Full-text available
Tabloid journalism is often criticized for its propensity for exaggeration, sensationalization, scare-mongering, and otherwise producing misleading and low quality news. As the news has moved online, a new form of tabloidization has emerged: ‘clickbaiting.’ ‘Clickbait’ refers to “content whose main purpose is to attract attention and encourage visitors to click on a link to a particular web page” [‘clickbait,’ n.d.] and has been implicated in the rapid spread of rumor and misinformation online. This paper examines potential methods for the automatic detection of clickbait as a form of deception. Methods for recognizing both textual and non-textual clickbaiting cues are surveyed, leading to the suggestion that a hybrid approach may yield best results.
Clickbaits are articles with misleading titles, exaggerating the content on the landing page. Their goal is to entice users to click on the title in order to monetize the landing page. The content on the landing page is usually of low quality. Their presence in user homepage stream of news aggregator sites (e.g., Yahoo news, Google news) may adversely impact user experience. Hence, it is important to identify and demote or block them on homepages. In this paper, we present a machine-learning model to detect clickbaits. We use a variety of features and show that the degree of informality of a webpage (as measured by different metrics) is a strong indicator of it being a clickbait. We conduct extensive experiments to evaluate our approach and analyze properties of clickbait and non-clickbait articles. Our model achieves high performance (74.9% F-1 score) in predicting clickbaits.
Conference Paper
Clickbaits, in social media, are exaggerated headlines whose main motive is to mislead the reader to “click” on them. They create a nuisance in the online experience by creating a lure towards poor content. Online content creators are utilizing more of them to get increased page views and thereby more ad revenue without providing the backing content. This paper proposes a model for detection of clickbait by utilizing convolutional neural networks and presents a compiled clickbait corpus. We create a corpus using multiple social media platforms and utilize deep learning for learning features rather than undergoing the long and complex process of feature engineering. Our model achieves high performance in identification of clickbaits.
Conference Paper
This paper proposes a new model for the detection of clickbait, i.e., short messages that lure readers to click a link. Clickbait is primarily used by online content publishers to increase their readership, whereas its automatic detection will give readers a way of filtering their news stream. We contribute by compiling the first clickbait corpus of 2992 Twitter tweets, 767 of which are clickbait, and, by developing a clickbait model based on 215 features that enables a random forest classifier to achieve 0.79 ROC-AUC at 0.76 precision and 0.76 recall.
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. © 2014 Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov.