ArticlePDF Available

A CNN-Based Model for Detecting Website Defacements

Authors:

Abstract and Figures

Over last decade, defacement attacks to websites and web applications have been considered a critical threat in many private and public organizations. A defacement attack can result in a severe effect to the owner’s website, such as instant discontinuity of website operations and damage of the owner’s fame, which in turn may lead to big financial damages. Many solutions have been studied and deployed for monitoring and detecting defacement attacks, such as those based on simple comparison methods and those based on complicated methods. However, some solutions only work on static web-pages and some others can work on dynamic web-pages, but they generate high level of false alarms. This paper proposes a Convolutional Neural Network (CNN)-based detection model for website defacements. The model is an extension of previous models based on traditional supervised machine learning techniques and its aims are to improve the detection rate and reduce the false alarm rate. Experiments conducted on the dataset of 100,000 web-pages show that the proposed model performs significantly better than models based on traditional supervised machine learning.
Content may be subject to copyright.
Abstract Over last decade, defacement attacks to
websites and web applications have been considered a
critical threat in many private and public organizations. A
defacement attack can result in a severe effect to the owner’s
website, such as instant discontinuity of website operations
and damage of the owner’s fame, which in turn may lead to
big financial damages. Many solutions have been studied
and deployed for monitoring and detecting defacement
attacks, such as those based on simple comparison methods
and those based on complicated methods. However, some
solutions only work on static web-pages and some others can
work on dynamic web-pages, but they generate high level of
false alarms. This paper proposes a Convolutional Neural
Network (CNN)-based detection model for website
defacements. The model is an extension of previous models
based on traditional supervised machine learning
techniques and its aims are to improve the detection rate
and reduce the false alarm rate. Experiments conducted on
the dataset of 100,000 web-pages show that the proposed
model performs significantly better than models based on
traditional supervised machine learning.
Keywords CNN-based Model for Defacement
Detection, Defacement Attacks to Website, Detection of
Website Defacements.
I. INTRODUCTION
Defacements to websites and web applications are a
class of web attacks, which amend the web content and
thus change their appearance [1][2]. Fig. 1 is the website
of UK National Health Services (NHS), which was
defaced in 2018 with the message “Hacked by AnoaGhost
Typical Idiot Security” left on the website [1]. It is
reported that the NHS website may have been defaced for
as long as 5 days. In 2019, 15,000 websites of government
organizations, banks, press agencies and television
broadcasters in Georgia, a small European country, were
defaced and took offline [1]. According to a recent report,
the number of defacement attacks to websites globally
has been risen sharply during the coronavirus lockdown
with an increase of about 51% and 65% in April and May
of 2020 compared to the figures of the same months of
Contact author: Hoang Xuan Dau, email: dauhx@ptit.edu.vn
Received: / /2021, Revised: / /2021, Accepted: / /2021.
2019, respectively [3]. Fig. 2 is a website of a UK-based
canoe and kayak club was recently defaced in 2020 [3].
A number of reasons that websites and web
applications were defaced have been pointed out. Among
them, the prime reason is severe security vulnerabilities
existed in websites, web applications, or hosting servers
allow attackers to download files to the servers, or to have
accesses to the websites’ administrative pages. Common
website vulnerabilities include XSS (Cross-Site
Scripting), SQLi (SQL injection), file inclusion,
inappropriate account and password administration, and
no-update software [1][2].
Fig. 1. The UK NHS website was defaced in 2018 [1]
Fig. 2. A website of a U.K.-based canoe and kayak club was recently
defaced [3]
Defacement attacks to websites can cause serious
Hoang Xuan Dau*, Nguyen Trong Hung+
* Posts and Telecommunications Institute of Technology
+ Academy of People’s Security
A CNN-BASED MODEL FOR DETECTING
WEBSITE DEFACEMENTS
damages to their owners. The attacks can cause instant
discontinuance to the normal operations of websites,
harm the reputation of website owners and lead to
possible data leakages. These in turn may result in large
financial losses [1][2]. Because of the wide spreading of
defacement attacks and their serious consequences, many
defensive measures have been researched and deployed
in practice. Current defensive measures to defacement
attacks can be classified into 3 main groups: (1) scanning
and fixing website security vulnerabilities; (2) using
website defacement monitoring and detecting tools, such
as Nagios Web Application Monitoring Software [4],
Site24x7 Website Defacement Monitoring [5] and
WebOrion Defacement Monitor [6]; and (3) using
various methods to detect website defacement attacks.
This paper proposes a detection model for website
defacements, which is based on the Convolutional Neural
Network (CNN). The proposed CNN-based model is an
alternative approach of the traditional machine learning-
based model proposed in [11], in which we exploit the
power of the CNN-based text classification scheme to
solve the problem of website defacement detection. In the
proposed model, CNN learning is used to construct the
model from the training data and then the model is used
to classify monitored web-pages into either Normal or
Defaced class.
The remaining of this paper is structured as follows:
Section II discusses some closely related works; Section
III describes our proposed model; Section IV shows
experiments and results, and Section V is the paper
conclusion.
II. RELATED WORKS
There have been some proposed techniques and tools
for monitoring and detecting defacement attacks on
websites and web applications. However, due to the paper
scope, this section provides a review of some typical
approaches that belong to group (3) mentioned in Section
I. The proposed approaches of group (3) are composed of
traditional methods and complicated or advanced
methods. These methods will be discussed in the next
sub-sections.
A. Traditional Methods for Detecting Defacements
Traditional methods for website defacement detection
include checksum comparison, diff comparison and
DOM tree analysis. The checksum comparison is the
simplest technique to detect changes in web-pages.
Firstly, the web page content’s checksum is calculated
using a hashing algorithm, such as MD5 or SHA1 and
saved to the detection profile. Then, the web page is
monitored and the new checksum is computed, and then
compared with its checksum stored in the detection
profile. If the two checksums are different, a defacement
alarm is raised. This technique works well for static web-
pages. For dynamic web-pages, such as e-commerce, or
forum web-pages, it is not applicable because their
content changes frequently [11][12][13].
In the Diff comparison method, the DIFF tool is used,
which is popularly supported on Linux and UNIX
systems to find the difference between two web-pages’
content. The most difficult thing to do is to determine an
anomaly threshold as the input for the monitoring process
of each web-page. This technique is relatively effective
and works well for dynamic websites if the anomaly
detection threshold is determined properly [11][12][13].
Document Object Model (DOM) is an Application
Programming Interface (API) that defines the logical
structure of HTML documents, or web-pages. DOM can
be used to scan and analyze the structure of the web-page.
DOM tree analysis technique is used to detect changes in
the page structure, rather than changes in the page
content. Firstly, the page structure is extracted from the
page content in the normal working condition and stored
in the detection profile. Then, the page structure of the
monitored page is extracted and then compared with the
stored page structure saved in detection profile to find the
difference. If a significant difference between the
structures of two pages is found a defacement attack
alarm is raised. Generally, this method works fine for
web-pages with stable structures. However, this method
is not able to detect unauthorized modifications in the
web-pages' content [11][12][13].
B. Complicated Methods for Detecting Defacements
Complicated methods for detecting website
defacements consist of those based on statistics [7],
genetic programming [8][9], page screenshot analysis
[10] and supervised machine leaning [11][12][13]. Kim
et al. [7] proposed a statistical model based on 2 -gram
technique to construct a profile from normal web-pages
for monitoring and detecting defacements, as shown in
Fig. 3. Each normal web-page of the training set is
converted to a vector, in which the page’s HTML content
is splitted to substrings using the 2-gram method and
substrings’ occurrence frequencies are counted. The
detection profile is composed of vectors of all normal
pages of the training set. Then, each monitored web-page
is also converted to a vector and then its vector is
compared with the page’s vector stored in the profile to
find the difference using the cosine distance. If the
difference is greater than a threshold an alarm is raised.
The paper also proposed an algorithm to generate a
dynamic threshold for each web-page to reduce the false
alarms. The proposed method’s major shortcoming is for
monitored pages with frequent changed content, the
periodically adjusted thresholds are not appropriate and
therefore the method still generates a high level of false
alarms.
Bartoli et al. [8] and Davanzo et al. [9] proposed to use
genetic programming to construct the detection profile
for defacement attacks. In their approaches, information
H.X.D et al.: A CNN-Based Model for Detecting Website Defacements
3
from monitored web-pages is monitored and extracted
using 43 sensors embedded in web-pages. Each web-
page’s information is then converted into a 1466-element
vector. In the training stage, normal web-pages are
collected and vectorized to construct the profile based on
genetic programming. In the detection stage, the
monitored page is collected, vectorized and compared
with the profile to look for the difference. Their
approaches’ the major drawback is it requires extensive
computing resources for the detection profile
construction since very large-size page vectors and
slowly-converged genetic programming are used.
Fig. 3. Detection process for web page defacements proposed by Kim
et. al. [7]
Fig. 4. Meerkat's architecture based on deep neural network [10]
Borgolte et al. [10] proposed Meerkat shown in Fig. 4,
which is a system based on image object recognition of
web-page screenshots using computer vision techniques
for detecting defacement attacks. The system first builds
a profile of screenshots of normal web-pages. It then
takes the monitored web-page’s screenshot and conducts
analysis to find the difference between the page’s current
screenshot and its normal screenshots stored in the profile
based on high-level screenshot features using advanced
learning methods, such as stacked auto-encoder and deep
neural network. Experiments con-ducted on 10 million of
defaced web-pages and 2.5 million of normal web-pages
show that the system achieves high detection accuracy
from 97.422% to 98.816% and low false positive rate
from 0.547% to 1.528%. The Meerkat’s advantages are
the profile can be constructed automatically and the
system was tested on a large dataset. However, its major
disadvantage is it requires extensively computational
resources for highly complex image processing and
recognition.
Hoang et al. [11][12][13] proposed several models for
detecting website defacements, including the machine
learning-based model, the hybrid model and the multi-
layer model. The main idea behind these models is they
use traditional supervised machine learning algorithms,
such as naive bayes, decision tree and random forest to
construct the detection models. Specifically, the problem
of defacement detection is transferred to the text
classification problem of web-pages’ HTML content. The
used dataset for training to build the detection model is a
combination of normal web-pages and defaced web-
pages. The detection model is then used to classify
monitored web-pages into either Normal or Attacked
class. The approach’s strong points are (1) the detection
model can be built automatically from the training data
and (2) the overall detection accuracy is high. However,
the approach’s main drawbacks include (1) the false
positive and negative rates are still relatively high and (2)
the experimental datasets of only about 1000-3000 web-
pages are relatively small in order to get a high level of
reliability of the reported results.
In this paper, we extend the defacement detection
model proposed in [11][12][13] by using CNN a deep
machine learning method, instead of traditional
supervised machine algorithms to build our detection
model in order to increase the detection rate as well as to
reduce the false alarm rate. Furthermore, we prepare a
much large dataset to conduct our experiments in order to
comprehensively validate our proposed model.
III. PROPOSED MODEL FOR DETECTING
DEFACEMENT ATTACKS
A. The Proposed Detection Model
The proposed detection model for defacement attacks
is composed of two stages: the training stage and the
detection stage. The training stage as shown in Fig. 5 con-
sists of the following three steps:
Collection of training dataset: The dataset for
training is a combination of normal web-pages
and defaced web-pages. Normal web-pages are
downloaded from various websites in normal
working conditions. Defaced web-pages are
downloaded from Zone-H.org [17].
Pre-processing: In this step, we use n-gram
technique to extract the training features for each
web-page’s full content, including HTML code
and pure text. Based on the analysis of the
previous re-searches [11][12][13], we select 2-
gram and 3-gram to extract the page features and
then use the TF-IDF (Term Frequency Inverse
Document Frequency) [16] to compute the value
for each feature. The result of this process is that
a web-page is converted to a vector and the
training dataset is transferred to the training array.
Training: The CNN is used as the training
algorithm to construct the Classifier or Model
using the the training array.
Fig. 5. Proposed detection model for defacement attacks: Training stage
The detection stage, as illustrated in Fig. 6 also
includes three steps as follows:
Collection of the monitored web-page: The
HTML code of the monitored page is downloaded
for pre-processing.
Pre-processing: The monitored web-page’s
content is processed to extract features to form the
page vector using the same method as done for
each page of the training dataset.
Classification: The page vector is classified using
the Classifier built in the training stage. The result
of this step is the page status of either Normal or
Defaced.
Fig. 6. Proposed detection model for defacement attacks: Detection
stage
B. Training the Detection Model Using CNN
As previously mentioned, we use CNN algorithm to
construct our detection model for website defacements
from the training data. The CNN algorithm is selected
because it is fast and it has been widely used with good
performance in many computer science areas, such as
image processing and recognition, natural language
processing [14][15]. Fig. 7 describes the CNN structure
used in the proposed model, in which a Conv_1D
function, a Flatten layer and 4 fully-connected layers of
Dense 1, 2, 3 and 4 to generate the output. The ELU
activation function and the Softmax loss function are used
in layers.
Fig. 7. The CNN structure used in the proposed detection model for
website defacements
The ELU (Exponential Linear Unit) [15] function is
defined as follows:
or
where α = 1 as recommended in [15]. We select ELU
function because it can produce relatively low level of
error frequencies and average training time.
C. Performance Measurement
We use 6 measurements, including TPR (True Positive
Rate or Recall), FPR (False Positive Rate), FNR (False
Negative Rate), PPV (Positive Predictive Value or
Precision), F1 (F1-Score) and ACC (Overall Accuracy)
to measure the proposed model’s performance as the
following:
(2)
(3)
(4)
(5)
(6)
(7)
H.X.D et al.: A CNN-Based Model for Detecting Website Defacements
5
where TP, FP, FN and TN are elements of the
confusion matrix given in Table I.
TABLE I. TP, FP, FN AND TN IN THE CONFUSION MATRIX
Actual Class
Defaced
Normal
Predicted
Class
Defaced
TP (True
Positives)
FP (False
Positives)
Normal
FN (False
Negatives)
TN (True
Negatives)
IV. EXPERIMENTS AND RESULTS
A. Experimental Dataset
The experimental dataset used in this paper consists of
a subset of normal web-pages and another subset of
defaced web-pages. We developed a small tool written in
JavaScript running on the NodeJS server and the
Puppeteer library to download and process HTML code
of web-pages. Specifically, the two subsets of the dataset
of 100,000 web-pages are as follows:
The normal web-pages are composed of 40,000
web-pages in normal working conditions. These
web-pages are home pages of well-known
websites in Vietnam and in the world, including
news portals, e-commerce sites, online services
sites and forum sites. These websites are selected
from top 1 million websites listed by Alexa [18].
The defaced web-pages consist of 60,000 web-
pages, which are collected from Zone-H.org [17].
Downloaded defaced web-pages are checked and
any duplicated pages are removed.
B. Pre-processing, Training and Validation Testing
The dataset collected is pre-processed using n-grams
and TF-IDF techniques to convert web-pages to the
training array of web-page vectors. Based on previous
works [11][12][13], we select a set of 8000 n-gram
features to create web-page vectors. The vectors from
normal web-pages are labelled “normal” and those from
defaced web-pages are labelled “defaced”. The training
array is then ready for training stage to construct and
validate the detection model.
We use 2 traditional machine learning algorithms of
decision tree and random forest proposed by [11][12][13]
for defacement detection, and the CNN algorithm in the
training stage to build different defacement detection
models for performance comparison. For the models
based on decision tree and 50-tree random forest, 10-fold
cross-validation method is used. For the CNN-based
model, parameters of epochs = 64 and batch_size = 32 are
used in the training and validation. For each run, 75% of
the dataset are used for training and 25% of the dataset
are used for validation testing. The final performance
measurements are computed as the average of measured
values of all runs.
C. Experimental Results and Comments
Table II provides the detection performance of our
CNN-based model and the decision tree-based model
[11] and the random forest (RF)-based model [12][13].
From the experimental results given in Table II, we can
draw the following comments:
Our CNN-based model performs better than
previous models based on traditional supervised
machine learning methods of decision tree [11]
and random forest [12][13]. Specifically, our
model’s measurements are considerably higher
than those of decision tree-based model [11].
However, the proposed model’s ACC and F1 are
only slightly better than those of random forest-
based model [12][13].
Although the proposed model’s ACC and F1 are
only slightly better than those of random forest-
based model [12][13], its false alarm rates (FPR
and FNR) are significantly lower than those of
decision tree-based model [11] and random
forest-based model [12][13]. Low false alarm
rates are very important for any practical solution.
TABLE II. THE PROPOSED MODELS DETECTION PERFORMANCE
VERSUS PERFORMANCE OF [11][12][13]
Detection models
PPV
TPR
FPR
FNR
ACC
F1
Decision tree-
based model [11]
97.47
97.85
3.82
2.15
97.18
97.66
RF-based
model [12][13]
98.91
98.15
1.63
1.85
98.24
98.53
Our CNN-
based model
98.55
98.61
0.97
1.39
98.86
98.61
V. CONCLUSION
This paper proposes a CNN-based model for detecting
website defacement attacks. In our model, we exploit the
CNN’s superior classification capability to solve the
problem of website defacement detection. Experiments
conducted on the dataset of 100,000 web-pages show that
the proposed CNN-based detection model outperforms
previous models based on traditional supervised machine
learning methods of decision tree [11] and random forest
[12][13]. Especially, the false alarm rates, including false
positive and negative rates are reduced significantly
compared to those of previous models.
One of the shortcomings of our model is it requires
higher computing resources because CNN is generally
computationally intensive than traditional supervised ma-
chine learning counterparts, such as decision tree and
random forest. For future work, we will carry out an
extensive assessment on all execution steps of the model
and find a solution to lower its computational
requirements.
REFERENCES
[1] Imperva, Website Defacement Attack, https://www.imperva.com/
learn/application-security/website-defacement-attack/, last
accessed 2020/11/10.
[2] Trend Micro, The Motivations and Methods of Web Defacement,
https://www.trendmicro.com/en_us/research/18/a/hacktivism-
web-defacement.html, last accessed 2020/11/10.
[3] Government Technology, The Coronavirus Pandemic Moved Life
Online a Surge in Website Defacing Followed,
https://www.govtech.com/security/The-Coronavirus-Pandemic-
Moved-Life-Online--a-Surge-in-Website-Defacing-
Followed.html, last accessed 2020/11/10.
[4] Nagios Enterprises, LLC. Web Application Monitoring Software
with Nagios. https://www.nagios.com/solutions/web-application-
monitoring/, last accessed 2020/11/10.
[5] Site24x7. Website Defacement Monitoring.
https://www.site24x7.com/monitor-webpage-defacement.html,
last accessed 2020/11/10.
[6] Banff Cyber Technologies. WebOrion Defacement Monitor.
https://www.weborion.io/website-defacement-monitor/, last
accessed 2020/11/10.
[7] W. Kim, J. Lee, E. Park, S. Kim. 2006. Advanced Mechanism for
Reducing False Alarm Rate in Web Page Defacement Detection.
National Security Research Institute, Korea.
[8] A. Bartoli, G. Davanzo and E. Medvet. 2010. A Framework for
Large-Scale Detection of Web Site Defacements. ACM
Transactions on Internet Technology, Vol.10, No.3, Art.10.
[9] G. Davanzo, E. Medvet and A. Bartoli. 2011. Anomaly detection
techniques for a web defacement monitoring service. Journal of
Expert Systems with Applications, 38 (2011) 1252112530,
doi:10.1016/j.eswa.2011.04.038, Elsevier.
[10] K. Borgolte, C. Kruegel and G. Vigna. 2015. Meerkat: Detecting
Website Defacements through Image-based Object Recognition.
In: Proceedings of the 24th USENIX Security Symposium
(USENIX Security).
[11] X.D. Hoang. 2018. A Website Defacement Detection Method
Based on Machine Learning Techniques. In SoICT ’18: Ninth
International Symposium on Information and Communication
Technology, December 67, 2018, Da Nang City, Viet Nam.
ACM, New York, NY, USA, 6 pages.
https://doi.org/10.1145/3287921.3287975.
[12] X.D. Hoang, N.T. Nguyen. 2019. Detecting Website Defacements
Based on Machine Learning Techniques and Attack Signatures,
Computers 2019, 8, 35; doi:10.3390/computers8020035.
[13] X.D. Hoang, N.T. Nguyen. 2019. A Multi-layer Model for
Website Defacement Detection. In In SoICT’19: Tenth
International Symposium on Information and Communication
Technology, December 4 6, 2019 | Hanoi - Ha Long Bay,
Vietnam. ACM, New York, NY, USA, 6 pages.
https://doi.org/10.1145/3368926.3369730.
[14] D-A. Clevert, T. Unterthiner and S. Hochreiter. 2015. Fast and
accurate deep network learning by exponential linear units (elus).
Available online: https://arxiv.org/abs/1511.07289.
[15] N.K. Sangani, H. Zarger. 2017. “Machine Learning in Application
Security,” Book chapter in "Advances in Security in Computing
and Communications", IntechOpen.
[16] X.D. Hoang. 2021. Detecting Common Web Attacks Based on
Machine Learning Using Web Log. In: Sattler KU., Nguyen D.C.,
Vu N.P., Long B.T., Puta H. (eds) Advances in Engineering
Research and Application. ICERA 2020. Lecture Notes in
Networks and Systems, vol 178. Springer, Cham.
https://doi.org/10.1007/978-3-030-64719-3_35
[17] Zone-H.org, http://zone-h.org/?hz=1, last accessed 2020/11/10.
[18] DN Pedia Top Alexa one million domains. Available online:
https://dnpedia.com/tlds/topm.php, last accessed 2020/11/10.
Hoang Xuan Dau received the
bachelor degree of informatics in
1994 at the Hanoi University of
Science and Technology. He then
received the master degree and PhD
degree in computer science at the
RMIT university, Australia in 2000
and 2006, respectively. He is
currently a senior lecturer of the
faculty of information technology, Posts and
Telecommunications Institute of Technology. His
research interests include attack and intrusion detection,
malware detection, system and software security, web
security, machine learning-based applications for
information security.
Nguyen Trong Hung received the
bachelor degree of information
technology in 2013 at the Academy of
People’s Security. He then received the
master degree in information security at
the Academy of Cryptographic
Techniques in 2018. He is currently a
lecturer of the faculty of information
technology and security, Academy of People’s Security.
His research interests include attack and intrusion
detection, malware detection, and web security.
... The hackers display messages and show images against US President Donald Trump. Various defacement attacks have been conducted worldwide [3]. ...
Article
Full-text available
Uniform resource locator (URL) defacement attack can be defined as any cyberattack in which the attacker replaces the appearance or content of the targeted webpage with their own that is intended to disgrace, mislead, or malign the website. Detecting URL defacement attacks is significant to avoid breaching the security of the website content or its configuration files, modifying the file locations, templates, or attacks on the website environment and applications. A machine learning (ML) technique can be used to detect the defacement attack on any website with complex content and structure, as opposed to the classical techniques for detection, such as Diff comparison, Document Object model tree analysis, and checksum, which can only be applied to static websites. This article proposes a feature selection model based on particle swarm optimization with support vector machine, decision tree, random forest, Naive Bayes, and k-nearest neighbor ML classification algorithms. The proposed model aims to improve the URL defacement attack detection by selecting the best features from the ISCX-URL-2016 dataset. Then, the reduced set of features produced by the proposed model step is used as input to evaluate and compare the results of the used ML classifiers. The results showed that the proposed model has significantly reduced the features, regarding the classification's feature reduction, the random forest classifier outperformed other classifiers in terms of true positive rates, accuracy, precision, sensitivity, and F-measure, whereas the proposed model with random forest classifier has 99.21% True positive rates, 99.29% accuracy rate, 99.38% precision rate, 99.21% sensitivity rate, and 99.29% F-measure rate. In the future directions of this article , more research should be done on a variety of things, including varying and sophisticated techniques of altering the URL defacement since it would better calibrate the model for application in real-life situations.
... While not explicitly applied to web defacement detection, the results showed that transformer models outperformed RNN models in character language modeling. Moreover, recent papers have indicated the superiority of transformer models over RNN or CNN models, further emphasizing their potential [8]. Hence, in this study, we aim to compare the capabilities of transformer models with the RNN model in the context of web defacement detection. ...
... Another study by Dau Hoang et al. [25] adopted previous works [22][23][24] and proposed a CNN-based model (conventional neural network) for detecting website defacement. This defacement detection is an alternative to the traditional machine-learning algorithms, such as random forest and decision tree, which have been used earlier in works [22][23][24]. ...
Article
Full-text available
Web attacks and web defacement attacks are issues in the web security world. Recently, website defacement attacks have become the main security threats for many organizations and governments that provide web-based services. Website defacement attacks can cause huge financial and data losses that badly affect the users and website owners and can lead to political and economic problems. Several detection techniques and tools are used to detect and monitor website defacement attacks. However, some of the techniques can work on static web pages, dynamic web pages, or both, but need to focus on false alarms. Many techniques can detect web defacement. Some are based on available online tools and some on comparing and classification techniques; the evaluation criteria are based on detection accuracies with 100% standards and false alarms that cannot reach 1.5% (and never 2%); this paper presents a literature review of the previous works related to website defacement, comparing the works based on the accuracy results, the techniques used, as well as the most efficient techniques.
Conference Paper
Full-text available
SQL injection (SQLi), Cross-site Scripting (XSS) attacks have long been considered major threats to web-based applications and their users. These types of web attacks can cause serious damage to web applications and web users , ranging from bypassing authentication systems, stealing information from databases and users, to even taking control of server systems. To cope with web attacks, many measures have been researched and applied to protect web applications and users. Among them, the detection of web attacks is a promising approach in the defensive layers for web applications. However, some measures can only detect a single type of web attacks, while others require frequent updates to the detection rule sets, or require extensive computational power because of using complex detection methods. This paper proposes a web attack detection model based on machine learning using web log. The detection model is built using the inexpensive decision tree algorithm and it does not require frequent update. Our experiments on a labelled dataset and real web logs show that the proposed model is capable of detecting several types of web attacks effectively with the overall detection accuracy rate of 98.56%.
Conference Paper
Full-text available
Website defacements have long been considered one of major threats to websites and web portals of enterprises and government organizations. Defacement attacks can bring in serious consequences to website owners, including immediate interruption of website operations and damage of the owner reputation, which may lead huge financial losses. Many solutions have been researched and deployed for monitoring and detection of defacement attacks, such as those based on checksum comparison, diff comparison, DOM tree analysis and advanced methods. However, some solutions only work on static web pages and some others demand extensive computing resources. This paper proposes a multi-layer model for website defacement detection. The proposed model is based on three layers of machine learning-based detection for web text content and a layer for checking the integrity of embedded images in the web pages. Our experiments show that the proposed model produces the overall detection accuracy of more than 98.8% and the false positive rate of less than 1.04% for all tested cases.
Article
Full-text available
Defacement attacks have long been considered one of prime threats to websites and web applications of companies, enterprises, and government organizations. Defacement attacks can bring serious consequences to owners of websites, including immediate interruption of website operations and damage of the owner reputation, which may result in huge financial losses. Many solutions have been researched and deployed for monitoring and detection of website defacement attacks, such as those based on checksum comparison, diff comparison, DOM tree analysis, and complicated algorithms. However, some solutions only work on static websites and others demand extensive computing resources. This paper proposes a hybrid defacement detection model based on the combination of the machine learning-based detection and the signature-based detection. The machine learning-based detection first constructs a detection profile using training data of both normal and defaced web pages. Then, it uses the profile to classify monitored web pages into either normal or attacked. The machine learning-based component can effectively detect defacements for both static pages and dynamic pages. On the other hand, the signature-based detection is used to boost the model's processing performance for common types of defacements. Extensive experiments show that our model produces an overall accuracy of more than 99.26% and a false positive rate of about 0.27%. Moreover, our model is suitable for implementation of a real-time website defacement monitoring system because it does not demand extensive computing resources.
Conference Paper
Full-text available
Website defacement attacks have been one of major threats to websites and web portals of private and public organizations. The attacks can cause serious consequences to website owners, including interrupting the website operations and damaging the owner's reputation, which may lead to big financial losses. A number of techniques have been proposed for website defacement monitoring and detection, such as checksum comparison, diff comparison, DOM tree analysis and complex algorithms. However, some of them only work on static web pages and the others require extensive computational resources. In this paper, we propose a machine learning-based method for website defacement detection. In our method, machine learning techniques are used to build classifiers (detection profile) for page classification into either Normal or Attacked class. As the detection profile can be learned from training data, our method can work well for both static and dynamic web pages. Experimental results show that our approach achieves high detection accuracy of over 93% and low false positive rate of less than 1%. In addition, our method does not require extensive computational resources, so it is practical for online deployment.
Article
Full-text available
We introduce the "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies. Like rectified linear units (ReLUs), leaky ReLUs (LReLUs) and parameterized ReLUs (PReLUs), ELUs also avoid a vanishing gradient via the identity for positive values. However, ELUs have improved learning characteristics compared to the units with other activation functions. In contrast to ReLUs, ELUs have negative values which allows them to push mean unit activations closer to zero. Zero means speed up learning because they bring the gradient closer to the unit natural gradient. We show that the unit natural gradient differs from the normal gradient by a bias shift term, which is proportional to the mean activation of incoming units. Like batch normalization, ELUs push the mean towards zero, but with a significantly smaller computational footprint. While other activation functions like LReLUs and PReLUs also have negative values, they do not ensure a noise-robust deactivation state. ELUs saturate to a negative value with smaller inputs and thereby decrease the propagated variation and information. Therefore, ELUs code the degree of presence of particular phenomena in the input, while they do not quantitatively model the degree of their absence. Consequently, dependencies between ELUs are much easier to model and distinct concepts are less likely to interfere. We found that ELUs lead not only to faster learning, but also to better generalization performance once networks have many layers (>= 5). Using ELUs, we obtained the best published single-crop result on CIFAR-100 and CIFAR-10. On ImageNet, ELU networks considerably speed up learning compared to a ReLU network with similar classification performance, obtaining less than 10% classification error for a single crop, single model network.
Article
Full-text available
These days a web server has become the focal point of most organizations. For this reason, a web server is one of the main targets of an attack from a hacker. Detecting web page defacements is one of the main services for the security moni-toring and controlling system. Users request a web page from the web server via the internet, and then the web server generates and sends a page at the requested time to the user by running a server-side script or querying database. So the web page is dynamically changing its content as time passes. This is one of the difficul-ties in detecting web page defacements from a remote site. In this paper we pro-pose a mechanism for detecting web page defacements at a remote site and a method for a threshold adjustment. For a detection mechanism, we generated a fea-ture vector for each web page by using 2-gram frequency index, which calculates the similarity between the current feature vector and the previous feature vector, and decides whether the page has been defaced or not by comparing the similarity of the current threshold with the threshold of the previous page. Before starting a detection process, we run an initial threshold generation process during a certain period of time. For the threshold adjustment method, we used two types of thresh-old adjustment methods. One is the daily adjustment method and the other is the threshold trend adjustment method. The first is applied two times a day by a weighted moving average of the threshold. The latter is applied during every detec-tion process by applying the difference of the two thresholds from the recent to the current threshold. These two threshold adjustment methods can reduce the false alarm rate.
Article
The defacement of web sites has become a widespread problem. Reaction to these incidents is often quite slow and triggered by occasional checks or even feedback from users, because organizations usually lack a systematic and round the clock surveillance of the integrity of their web sites. A more systematic approach is certainly desirable. An attractive option in this respect consists in augmenting availability and performance monitoring services with defacement detection capabilities. Motivated by these considerations, in this paper we assess the performance of several anomaly detection approaches when faced with the problem of detecting web defacements automatically. All these approaches construct a profile of the monitored page automatically,based on machine learning techniques, and raise an alert when the page content does not fit the profile. We assessed their performance in terms of false positives and false negatives on a dataset composed of 300 highly dynamic web pages that we observed for 3 months and includesa set of 320 real defacements.Highlights► Web site defacements are a widespread problem. ► Reactions by affected administrators are usually slow. ► Anomaly detection techniques can be used to automatically detect defacements.
Article
Web site defacement, the process of introducing unauthorized modifications to a Web site, is a very common form of attack. In this paper we describe and evaluate experimentally a framework that may constitute the basis for a defacement detection service capable of monitoring thousands of remote Web sites systematically and automatically. In our framework an organization may join the service by simply providing the URLs of the resources to be monitored along with the contact point of an administrator. The monitored organization may thus take advantage of the service with just a few mouse clicks, without installing any software locally or changing its own daily operational processes. Our approach is based on anomaly detection and allows monitoring the integrity of many remote Web resources automatically while remaining fully decoupled from them, in particular, without requiring any prior knowledge about those resources. We evaluated our approach over a selection of dynamic resources and a set of publicly available defacements. The results are very satisfactory: all attacks are detected while keeping false positives to a minimum. We also assessed performance and scalability of our proposal and we found that it may indeed constitute the basis for actually deploying the proposed service on a large scale.
Web Application Monitoring Software with Nagios
  • Nagios Enterprises
Nagios Enterprises, LLC. Web Application Monitoring Software with Nagios. https://www.nagios.com/solutions/web-applicationmonitoring/, last accessed 2020/11/10.