Conference PaperPDF Available

Hit-List Worm Detection and Bot Identification in Large Networks Using Protocol Graphs

Authors:

Abstract

We present a novel method for detecting hit-list worms using protocol graphs. In a protocol graph, a vertex represents a single IP ad- dress, and an edge represents communications between those addresses using a specific protocol (e.g., HTTP). We show that the protocol graphs of four diverse and representative protocols (HTTP, FTP, SMTP, and Oracle), as constructed from monitoring for fixed durations on a large in- tercontinental network, exhibit stable graph sizes and largest connected component sizes. Moreover, we demonstrate that worm propagations, even of a sophisticated hit-list variety in which the attacker has advance knowledge of his targets and always connects successfully, perturb these properties. We demonstrate that these properties can be monitored very efficiently even in very large networks, giving rise to a viable and novel approach for worm detection. We also demonstrate extensions by which the attacking hosts (bots) can be identified with high accuracy.
A preview of the PDF is not available
... There have also been many works that leverage specific topology features of botnets such as mixing rates (Nagaraja et al., 2010), number and size of connected graph components (Collins & Reiter, 2007;Iliofotou et al., 2008;), etc. For example, P2P botnets often have fast mixing rates because botnets form a topology that is most efficient in diffusing information and launching attacks. ...
... There are also topology-based approaches (Collins & Reiter, 2007;Iliofotou et al., 2008;Nagaraja et al., 2010;Jaikumar & Kak, 2015;Zhou et al., 2018). Nagaraja et al. (2010) utilize the unique overlay topology patterns and localize botnet through prefiltering, clustering and validation. ...
... However, these approaches involves multiple manual steps of filtering and clustering, and elaborate threshold tuning to identify the embedded botnet subgraph. Collins & Reiter (2007) observe the number of connected graph components since communication insides botnets will suddenly increase that number. Iliofotou et al. (2009) use a graph-level metric for the size of the largest connected component as well as spatial and temporal metrics on node and edge level. ...
Preprint
Full-text available
Botnets are now a major source for many network attacks, such as DDoS attacks and spam. However, most traditional detection methods heavily rely on heuristically designed multi-stage detection criteria. In this paper, we consider the neural network design challenges of using modern deep learning techniques to learn policies for botnet detection automatically. To generate training data, we synthesize botnet connections with different underlying communication patterns overlaid on large-scale real networks as datasets. To capture the important hierarchical structure of centralized botnets and the fast-mixing structure for decentralized botnets, we tailor graph neural networks (GNN) to detect the properties of these structures. Experimental results show that GNNs are better able to capture botnet structure than previous non-learning methods when trained with appropriate data, and that deeper GNNs are crucial for learning difficult botnet topologies. We believe our data and studies can be useful for both the network security and graph learning communities.
... This virus encrypted a system's data and demanded ransom payments in order to unlock the infected system. This attack was widespread and affected energy, transportation, shipping, telecommunications, and health care systems [24]. ...
... A honeynet is used to collect bot-binaries which penetrate the botnets. However, intruders developed novel methods to overcome honeynet traps [5]. ...
Article
This paper proposes three-stage botnet detection technique based on the anomaly and community detection. The first stage is a pragmatic node based distributed approach of sparse graph sequences. The second stage detects the bot from sparse matrix and correlations of interactions among the node. In the third stage, random graph is evaluating the performance of the bots and verified with both odd and even types of nodes. The same is extended and verified through Obrazom triple connected graphs. This verification is helpful to identify the aggressive bots through the optimized pivotal nodes. Machine Learning based Botnet Detection techniques are implemented in various levels like centralized and distributed level of networks. We can apply this three-stage bot detection in large-scale data.
... The progress in the digital technology shows the maximum of processing powers in their hardware equipment's like CPU, GPU and bandwidth capacity [7]. These digital techniques make the good and bad things make more powerful [13,16]. ...
Article
Basically large networks are prone to attacks by bots and lead to complexity. When the complexity occurs then it is difficult to overcome the vulnerability in the network connections. In such a case, the complex network could be dealt with the help of probability theory and graph theory concepts like Erdos – Renyi random graphs, Scale free graph, highly connected graph sequences and so on. In this paper, Botnet detection using Erdos – Renyi random graphs whose patterns are recognized as the number of connections that the vertices and edges made in the network is proposed. This paper also presents the botnet detection concepts based on machine learning.
... Nodes represent actors sending and receiving data, and edges represent communications between nodes. Anomalies can be detected in the changes to the structure of the graph (Noble and Cook, 2003;Collins and Reiter, 2007). ...
Book
The contents of this volume are contributions from invited speakers at a workshop entitled “Data Analysis for Cyber-Security”, hosted by the University of Bristol in March 2013. We are grateful for the generous support of the Heilbronn Institute for Mathematical Research, an academic research unit of the Universisty of Bristol with interests related to cyber-security.
... Anomaly-based bot detection solutions that do not focus on detecting C2 per se, but rather identify bots by observing and analyzing their activities and behaviour, address some of the aforementioned issues. Graph-based approaches, where host network activities are represented by communication graphs, extracted from network flows and host-to-host communication patterns, have been proposed in this regard [5], [9], [29]- [40]. Le et al. [37] present a strong case for leveraging Self-Organizing Maps (SOMs) in the context of bot detection with recall rates beyond 90%. ...
Article
Full-text available
Bot detection using machine learning (ML), with network flow-level features, has been extensively studied in the literature. However, existing flow-based approaches typically incur a high computational overhead and do not completely capture the network communication patterns, which can expose additional aspects of malicious hosts. Recently, bot detection systems that leverage communication graph analysis using ML have gained attention to overcome these limitations. A graph-based approach is rather intuitive, as graphs are true representation of network communications. In this paper, we propose BotChase, a two-phased graph-based bot detection system that leverages both unsupervised and supervised ML. The first phase prunes presumable benign hosts, while the second phase achieves bot detection with high precision. Our prototype implementation of BotChase detects multiple types of bots and exhibits robustness to zero-day attacks. It also accommodates different network topologies and is suitable for large-scale data. Compared to the state-of-the-art, BotChase outperforms an end-to-end system that employs flow-based features and performs particularly well in an online setting.
... Several papers presented flow-based intrusion detection schemes for specific network attacks, including botnets [21,15,16,34,12,38], and others such as port scans [32,17], worms [6,10,9], and denial of service [25,19,13]. Each of these approaches was designed to detect only one of these attack types, but they are related to our work and useful to explain how flows can be used to detect a certain attack. ...
... In addition, they are unreliable, as they can be evaded with encryption and by tweaking flow characteristics [8]. Graph-based approaches, where graphs are extracted from network flows and host-to-host communication patterns, overcome these limitations [8], [24]- [32]. Other approaches [33], [34] rely on rule-based host classification and botnet detection methods, where pre-defined thresholds are used to discriminate between benign and suspicious hosts. ...
Preprint
Full-text available
Bot detection using machine learning (ML), with network flow-level features, has been extensively studied in the literature. However, existing flow-based approaches typically incur a high computational overhead and do not completely capture the network communication patterns, which can expose additional aspects of malicious hosts. Recently, bot detection systems which leverage communication graph analysis using ML have gained attention to overcome these limitations. A graph-based approach is rather intuitive, as graphs are true representations of network communications. In this paper, we propose a two-phased, graph-based bot detection system which leverages both unsupervised and supervised ML. The first phase prunes presumable benign hosts, while the second phase achieves bot detection with high precision. Our system detects multiple types of bots and is robust to zero-day attacks. It also accommodates different network topologies and is suitable for large-scale data.
... In addition, they are unreliable, as they can be evaded with encryption and by tweaking flow characteristics [8]. Graph-based approaches, where graphs are extracted from network flows and host-to-host communication patterns, overcome these limitations [8], [24]- [32]. Other approaches [33], [34] rely on rule-based host classification and botnet detection methods, where pre-defined thresholds are used to discriminate between benign and suspicious hosts. ...
Conference Paper
Full-text available
Bot detection using machine learning (ML), with network flow-level features, has been extensively studied in the literature. However, existing flow-based approaches typically incur a high computational overhead and do not completely capture the network communication patterns, which can expose additional aspects of malicious hosts. Recently, bot detection systems which leverage communication graph analysis using ML have gained attention to overcome these limitations. A graph-based approach is rather intuitive, as graphs are true representations of network communications. In this paper, we propose a two-phased, graph-based bot detection system which leverages both unsupervised and supervised ML. The first phase prunes presumable benign hosts, while the second phase achieves bot detection with high precision. Our system detects multiple types of bots and is robust to zero-day attacks. It also accommodates different network topologies and is suitable for large-scale data.
Chapter
Botnet attacks have now become a major source of cyberattacks. How to detect botnet traffic quickly and efficiently is a current problem for most enterprises. To solve this, we have built a plug-and-play botnet detection system using graph neural network algorithms. The system detects botnets by identifying the network topology and is very good at detecting botnets with different structures. Moreover, the system helps researchers to visualise which nodes in the network are at risk of botnets through a graphical interface.
Article
Full-text available
The Email Mining Toolkit (EMT) is a data mining system that computes behavior profiles or models of user email accounts. These models may be used for a multitude of tasks including forensic analyses and detection tasks of value to law enforcement and intelligence agencies, as well for as other typical tasks such as virus and spam detection. To demonstrate the power of the methods, we focus on the application of these models to detect the early onset of a viral propagation without “content-base ” (or signature-based) analysis in common use in virus scanners. We present several experiments using real email from 15 users with injected simulated viral emails and describe how the combination of different behavior models improves overall detection rates. The performance results vary depending upon parameter settings, approaching 99 % true positive (TP) (percentage of viral emails caught) in general cases and with 0.38 % false positive (FP) (percentage of emails with attachments that are mislabeled as viral). The models used for this study are based upon volume and velocity statistics of a user's email rate and an analysis of the user's (social) cliques revealed in the person's email behavior. We show by way of simulation that virus propagations are detectable since viruses may emit emails at rates different than human behavior suggests is normal, and email is directed to groups of recipients in ways that violate the users' typical communications with their social groups.
Article
Full-text available
Worms are self-replicating malicious programs that repre-sent a major security threat for the Internet, as they can infect and damage a large number of vulnerable hosts at timescales where human responses are unlikely to be effec-tive. Sophisticated worms that use precomputed hitlists of vulnerable targets are especially hard to contain, since they are harder to detect, and spread at rates where even auto-mated defenses may not be able to react in a timely fashion. This paper examines a new proactive defense mechanism called Network Address Space Randomization (NASR) whose objective is to harden networks specifically against hitlist worms. The idea behind NASR is that hitlist information could be rendered stale if nodes are forced to frequently change their IP addresses. NASR limits or slows down hitlist worms and forces them to exhibit features that make them easier to contain at the perimeter. We explore the design space for NASR and present a prototype implementation as well as preliminary experiments examining the effectiveness and limitations of the approach.
Conference Paper
Worm detection and response systems must act quickly to identify and quarantine scanning worms, as when left unchecked such worms have been able to infect the majority of vulnerable hosts on the Internet in a matter of minutes [9]. We present a hybrid approach to detecting scanning worms that integrates significant improvements we have made to two existing techniques: sequential hypothesis testing and connection rate limiting. Our results show that this two-pronged approach successfully restricts the number of scans that a worm can complete, is highly effective, and has a low false alarm rate.
Article
In this paper we build on previous theoretical work and describe the implementation and testing of a virus throttle - a program, based on a new approach, that is able to substantially reduce the spread of and hence damage caused by mobile code such as worms and viruses. Our approach is different from cur- rent, signature-based anti-virus paradigms in that it identifies potential viruses based on their network behaviour and, instead of preventing such programs from entering a system, seeks to prevent them from leaving. The results presented here show that such an approach is effective in stopping the spread of a real worm, W32/Nimda-D, in under a second, as well as several different configurations of a test worm.
Conference Paper
The study of the Web as a graph is not only fascinating in its own right, but also yields valuable insight into Web algorithms for crawling, searching and community discovery, and the sociological phenomena which characterize its evolution. We report on experiments on local and global properties of the Web graph using two AltaVista crawls each with over 200 million pages and 1.5 billion links. Our study indicates that the macroscopic structure of the Web is considerably more intricate than suggested by earlier experiments on a smaller scale.
Conference Paper
After the Code Red incident in 2001 and the SQL Slammer in January 2003, it is clear that a simple self-propagating worm can quickly spread across the Internet, infects most vulnerable computers before people can take effective countermeasures. The fast spreading nature of worms calls for a worm monitoring and early warning system. In this paper, we propose effective algorithms for early detection of the presence of a worm and the corresponding monitoring system. Based on epidemic model and observation data from the monitoring system, by using the idea of "detecting the trend, not the rate" of monitored illegitimated scan traffic, we propose to use a Kalman filter to detect a worm's propagation at its early stage in real-time. In addition, we can effectively predict the overall vulnerable population size, and correct the bias in the observed number of infected hosts. Our simulation experiments for Code Red and SQL Slammer show that with observation data from a small fraction of IP addresses, we can detect the presence of a worm when it infects only 1% to 2% of the vulnerable computers on the Internet.