Article

Wide-scale botnet detection and characterization

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Malicious botnets are networks of compromised computers that are controlled remotely to perform large-scale distributed denial-of-service (DDoS) attacks, send spam, trojan and phishing emails, distribute pirated media or conduct other usually illegitimate activities. This paper describes a methodology to detect, track and characterize botnets on a large Tier-1 ISP network. The approach presented here differs from previous attempts to detect botnets by employing scalable non-intrusive algorithms that analyze vast amounts of summary traffic data collected on selected network links. Our botnet analysis is performed mostly on transport layer data and thus does not depend on particular application layer information. Our algorithms produce alerts with information about controllers. Alerts are followed up with analysis of application layer data, that indicates less than 2% false positive rates.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Developers of botnet detection programs have focused on this characteristic, some employing methods requiring the reading of packet data, and some based their detection methods on flow record data. [49] used packets per IP address, packets per flow, and bytes per packet metrics to compare suspected traffic to known bot models, and searched for periodic patterns by measuring the inter-flow arrival times between a client and a server, using the mean values as a fundamental period T. These values were used as inputs to train either in a hierarchical Bayesian model or a modified K-means algorithm to detect probable bot traffic. ...
... Programs behave less randomly than humans. Karasaridis et al. [49] used repeating interflow arrival periods as an indicator of bot traffic. Bilge et al. [50] also used interflow arrival periods and flow-size distributions as features for detecting botgenerated flows. ...
... The botnet detection method based on traffic statistical features aims to summarize quantifiable behavioral attributes in network traffic (Karasaridis et al. 2007; Kwon et al. 2016), such as network delay, packet size, number of packets, etc. Representative statistical features are typically extracted using prior knowledge, and machine learning classification algorithms are employed to identify attack types and sources. The botnet detection method based on DNS traffic is to detect anomalies by monitoring DNS traffic in the network (Alieyan et al. 2017;Kwon et al. 2016). ...
Article
Full-text available
A botnet is a group of hijacked devices that conduct various cyberattacks, which is one of the most dangerous threats on the internet. Organizations or individuals use network traffic to mine botnet communication behavior features. Network traffic often contains individual users’ private information, such as website passwords, personally identifiable information, and communication content. Among the existing botnet detection methods, whether they extract deterministic traffic interaction features, use DNS traffic, or methods based on raw traffic bytes, these methods focus on the detection performance of the detection model and ignore possible privacy leaks. And most methods are combined with machine learning and deep learning technologies, which require a large amount of training data to obtain high-precision detection models. Therefore, preventing malicious persons from stealing data to infer privacy during the botnet detection process has become an issue worth pondering. Based on this problem, this article proposes a privacy-enhanced framework with deep learning for botnet detection. The goal of this framework is to learn a feature extractor. It can hide the private information that the attack model tries to infer from the intermediate anonymity features, while maximally retaining the interactive behavior features contained in the original traffic for botnet detection. We design a privacy confrontation algorithm based on a mutual information calculation mechanism. This algorithm simulates the game between the attacker trying to infer private information through the attack model and the data processor retaining the original content of the traffic to the maximum extent. In order to further ensure the privacy protection of the feature extractor during the training process, we train the feature extractor in the federated learning training mode. We extensively evaluate our approach, validating it on two public datasets and comparing it with existing methods. The results show that our method can effectively ensure detection accuracy on the basis of removing private information. For the CTU-13 dataset, the detection framework achieves the best detection performance; for the ISCX-2014 dataset, the accuracy of the framework is less than 1% lower than the best effect.
... Bullard et al. [38] collected flow-based information, including but not limited to source positive address, destination positive address, source port, destination port, duration, which were subsequently used to detect individuals and host groups exhibiting similar behavior. Karasaridis et al. [39] devised a methodology based on k-means that utilizes a scalable, non-intrusive algorithm to scrutinize substantial volumes of network traffic data. Gu et al. [40] introduced BotHunter, a anomaly-based botnet detection system that is immune to the influence of botnet protocols and topology. ...
Article
Full-text available
In recent years, with the rapid development of the Internet of Things, large-scale botnet attacks have occurred frequently and have become an important challenge to network security. As artificial intelligence technology continues to evolve, intelligent detection solutions for botnets are constantly emerging. Although graph neural networks are widely used for botnet detection, directly handling large-scale botnet data becomes inefficient and challenging as the number of infected hosts increases and the network scale expands. Especially in the process of node level learning and inference, a large number of nodes and edges need to be processed, leading to a significant increase in computational complexity and posing new challenges to network security. This paper presents a novel approach that can accurately identify diverse intricate botnet architectures in extensive IoT networks based on the aforementioned circumstance. By utilizing GraphSAINT to process large-scale IoT botnet graph data, efficient and unbiased subgraph sampling has been achieved. In addition, a solution with enhanced information representation capability has been developed based on the Graph Isomorphism Network (GIN) for botnet detection. Compared with the five currently popular graph neural network (GNN) models, our approach has been tested on C2, P2P, and Chord datasets, and higher accuracy has been achieved.
... Scanning the literature for methodology, recent papers have used supervised and unsupervised ML techniques for botnet detection. For example, Tegeler et al. [4] used flow-based methods to detect botnets, Choi et al. [5] detected botnet traffic by capturing group activities in network traffic, and Karasaridis et al. [6] developed a K-means based method that employs scalable non-intrusive algorithms to analyze vast amounts of summary traffic data. However, a number of structural challenges increase the complexity of NetFlow data, calling for sophisticated ML techniques-based on statistical insightsfor the analysis and modeling of such datasets. ...
Preprint
Full-text available
We investigate the detection of botnet command and control (C2) hosts in massive IP traffic using machine learning methods. To this end, we use NetFlow data -- the industry standard for monitoring of IP traffic -- and ML models using two sets of features: conventional NetFlow variables and distributional features based on NetFlow variables. In addition to using static summaries of NetFlow features, we use quantiles of their IP-level distributions as input features in predictive models to predict whether an IP belongs to known botnet families. These models are used to develop intrusion detection systems to predict traffic traces identified with malicious attacks. The results are validated by matching predictions to existing denylists of published malicious IP addresses and deep packet inspection. The usage of our proposed novel distributional features, combined with techniques that enable modelling complex input feature spaces result in highly accurate predictions by our trained models.
... A honeynet is used to collect information from bots for further analysis to measure the intensity and vulnerability of the attack [9]. Moreover, the information collected from bots is used to discover the C&C system, unknown susceptibilities, techniques and tools used by the attacker, and the motivation of the attacker [10]. A honeynet is used to collect bot-binaries which penetrate the botnets. ...
Article
Full-text available
This paper proposes three-stage botnet detection technique based on the anomaly and community detection. The first stage is a pragmatic node based distributed approach of sparse graph sequences. The second stage detects the bot from sparse matrix and correlations of interactions among the node. In the third stage, random graph is evaluating the performance of the bots and verified with both odd and even types of nodes. The same is extended and verified through Obrazom triple connected graphs. This verification is helpful to identify the aggressive bots through the optimized pivotal nodes. Machine Learning based Botnet Detection techniques are implemented in various levels like centralized and distributed level of networks. We can apply this three-stage bot detection in large-scale data.
... The progress in the digital technology shows the maximum of processing powers in their hardware equipment's like CPU, GPU and bandwidth capacity [7]. These digital techniques make the good and bad things make more powerful [13,16]. When this powerful processer enters in to the computers and the internet, the vibration of botnet becomes more and more powerful [12]. ...
Article
Full-text available
Basically large networks are prone to attacks by bots and lead to complexity. When the complexity occurs then it is difficult to overcome the vulnerability in the network connections. In such a case, the complex network could be dealt with the help of probability theory and graph theory concepts like Erdos – Renyi random graphs, Scale free graph, highly connected graph sequences and so on. In this paper, Botnet detection using Erdos – Renyi random graphs whose patterns are recognized as the number of connections that the vertices and edges made in the network is proposed. This paper also presents the botnet detection concepts based on machine learning.
... Botnets pose severe threats to Internet security since they provide the platforms for most large-scale and harmonized cyber-attacks, for deploying malicious activities on the internet, such as; hosting phishing web sites [2], search engine abuse [3], spamming activities [4], facilitate distributed denial of service (DDoS) attacks [5][6][7][8][9], click fraud [10][11][12][13] and information theft [14]costing companies, businesses, and governments billions of dollars in losses. More than 100 billion spam messages are believed to be sent daily [15], and botnets are believed to account for 85%of all spam messages [16][17][18][19][20][21], Botnets have expanded rapidly in recent years both in diversity and population and in 2011, it was estimated that about 40% of world's end computers were part of botnets [22]. ...
Article
Full-text available
Over the years, there has been rapid advancement in internet technologies, such as the email, the world wide web, VOIP, social networks, etc. Networks of compromised individual and corporate computers called (botnets) have been used to deploy malware, such as Viruses, Worms, Trojans, Spyware etc, to vulnerable computerson a global scale. Botnets are used for various kinds of malicious activities on the internet including: distributed denial of service (DDOS) attacks, massive spam email messages, distributing other malware, click fraud attack and information theft, etc. Better Security decisions are usually associated with experience in cyber security, advanced-technologies, and rich data and information, as such an earnest and determined collaborative approach to botnet detection is likely to have a significant positive outcome in tackling the menace of botnets. In this paper, we propose a novel botnet detection approach that leverages the expertise and experience of several research collaborators, as well as the abundant data and information at each collaborator’s disposal, to detect botnets irrespective of command and control protocol, type of architecture, or infection behaviour. We use Python scripts to broadcast diagnosis request to peer collaborators and then use Supervised Machine learning to learn through False Positives (ܘ܎ ), False Negatives (ܖ܎ ), True Positives (ܘܜ ), and True Negatives (ܖܜ ), the detection accuracy of peer collaborators, detect malicious collaborators, and finally, detect costly and unreliable collaborators.
... Clustering is another approach taken by researchers to detect botnets using flow based features. Karasaridis, et al, [4] developed a Kmeans based method that employs scalable non-intrusive algorithms that analyze vast amounts of summary traffic data. Statisticians have a lot of potential to offer new advanced analytic frameworks and techniques for botnet detection in network security related problems. ...
Preprint
Cybersecurity, security monitoring of malicious events in IP traffic, is an important field largely unexplored by statisticians. Computer scientists have made significant contributions in this area using statistical anomaly detection and other supervised learning methods to detect specific malicious events. In this research, we investigate the detection of botnet command and control (C&C) hosts in massive IP traffic. We use the NetFlow data, the industry standard for monitoring of IP traffic for exploratory analysis and extracting new features. Using statistical as well as deep learning models, we develop a statistical intrusion detection system (SIDS) to predict traffic traces identified with malicious attacks. Employing interpretative machine learning techniques, botnet traffic signatures are derived. These models successfully detected botnet C&C hosts and compromised devices. The results were validated by matching predictions to existing blacklists of published malicious IP addresses.
... Extensive research has been done on bot detection using anomalies in network traffic. Karasaridis et al. [11] presented an algorithm to detect and characterize botnet by passive analysis of flow data. Their work is scalable and has a very low false positive rate. ...
Preprint
Full-text available
Traditional reactive approach of blacklisting botnets fails to adapt to the rapidly evolving landscape of cyberattacks. An automated and proactive approach to detect and block botnet hosts will immensely benefit the industry. Behavioral analysis of attackers is shown to be effective against a wide variety of attack types. Previous works, however, focus solely on anomalies in network traffic to detect bots and botnet. In this work we take a more robust approach of analyzing the heterogeneous events including network traffic, file download events, SSH logins and chain of commands input by attackers in a compromised host. We have deployed several honeypots to simulate Linux shells and allowed attackers access to the shells. We have collected a large dataset of heterogeneous threat events from the honeypots. We have then combined and modeled the heterogeneous threat data to analyze attacker behavior. Then we have used a deep learning architecture called a Temporal Convolutional Network (TCN) to do sequential and predictive analysis on the data. A prediction accuracy of 85 − 97% validates our data model as well as our analysis methodology. In this work, we have also developed an automated mechanism to collect and analyze these data. For the automation we have used CYbersecurity information Exchange (CYBEX). Finally, we have compared TCN with Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) and have showed that TCN outperforms LSTM and GRU for the task at hand.
... Another popular approach is using anomalies in network traffic [7]- [9] to detect bots. Some works take it further to detect anomalies in DNS traffic [10]- [13]. ...
... Another popular approach is using anomalies in network traffic [7]- [9] to detect bots. Some works take it further to detect anomalies in DNS traffic [10]- [13]. ...
Preprint
Full-text available
Traditional reactive approach of blacklisting botnets fails to adapt to the rapidly evolving landscape of cyberattacks. An automated and proactive approach to detect and block botnet hosts will immensely benefit the industry. Behavioral analysis of botnet is shown to be effective against a wide variety of attack types. Current works, however, focus solely on analyzing network traffic from and to the bots. In this work we take a different approach of analyzing the chain of commands input by attackers in a compromised host. We have deployed several honeypots to simulate Linux shells and allowed attackers access to the shells to collect a large dataset of commands. We have further developed an automated mechanism to analyze these data. For the automation we have developed a system called CYbersecurity information Exchange with Privacy (CYBEX-P). Finally, we have done a sequential analysis on the dataset to show that we can successfully predict attacker behavior from the shell commands without analyzing network traffic like previous works.
... 9) Internet access becomes unavailable for no reason. Web pages couldn't be accessed [5,7,14,16]. 434 | P a g e www.ijacsa.thesai.org ...
... The relative ease to build botnets and the disruptive power they possess rallied researchers to create detection and mitigation methods for this threat. Although many methods have been proposed [11,22,27,32,41,42], the continuous evolution of botnets remains a constant challenge for the disruption of these networks [9,23,30,36,38]. ...
... WnD is a deep learning-based approach proposed by Google [55] that, to the best of our knowledge, has never been evaluated for botnet detection. We consider it due to its appreciable results in other classification contexts [56]. ...
Article
Full-text available
As cybersecurity detectors increasingly rely on machine learning mechanisms, attacks to these defenses escalate as well. Supervised classifiers are prone to adver-sarial evasion, and existing countermeasures suffer from many limitations. Most solutions degrade performance in the absence of adversarial perturbations; they are unable to face novel attack variants; they are applicable only to specific machine learning algorithms. We propose the first framework that can protect botnet detectors from adversarial attacks through Deep Reinforcement Learning mechanisms. It automatically generates realistic attack samples that can evade detection, and it uses these samples to produce an augmented training set for producing hardened detectors. In such a way, we obtain more resilient detectors that can work even against unforeseen evasion attacks with the great merit of not penalizing their performance in the absence of specific attacks. We validate our proposal through an extensive experimental campaign that considers multiple machine learning algorithms and public datasets. The results highlight the improvements of the proposed solution over the state-of-the-art. Our method paves the way to novel and more robust cybersecurity detectors based on machine learning applied to network traffic analytics.
... This bot had very simple tasks to welcome new participants and warn them about the actions of other users [13]. Shortly thereafter, the use of bots in IRC became very popular due to the simplicity of the implementation in and their ability to scale IRCs [14]. Both evolved over time and the tasks these bots were assigned became more complicated and sophisticated. ...
Chapter
Social media provides a fertile ground for any user to find or share information about various events with others. At the same time, social media is not always used for benign purposes. With the availability of inexpensive and ubiquitous mass communication tools, disseminating false information and propaganda is both convenient and effective. In this research, we studied Online Deviant Groups (ODGs) that conduct cyber propaganda campaigns in order to achieve strategic and political goals, influence mass thinking, and steer behaviors or perspectives about an event. We provide case studies in which various disinformation and propaganda swamped social media during two NATO exercises in 2015. We demonstrate ODGs’ capability to spread anti-NATO propaganda using a highly sophisticated and well-coordinated social media campaign. In particular, blogs were used as virtual spaces where narratives are framed. And, to generate discourse, web traffic was driven to these virtual spaces via other social media platforms such as Twitter, Facebook, and VKontakte. By further examining the information flows within the social media networks, we identify sources of mis/disinformation and their reach, i.e., how far and how quickly the mis/disinformation could travel and consequently detect manipulation. The chapter presents an in-depth examination of the information networks using social network analysis (SNA) and social cyber forensics (SCF) based methodologies to identify prominent information brokers, leading coordinators, and information competitors who seek to further their own agenda. Through SCF tools, e.g., Maltego, we extract metadata associated with disinformation-riddled websites. The extracted metadata helps in uncovering the implicit relations among various ODGs. We further collected the social network of various ODGs (i.e., their friends and followers) and their communication network (i.e., network depicting the flow of information such as tweets, retweets, mentions, and hyperlinks). SNA helped us identify influential users and powerful groups responsible for coordinating the various disinformation campaigns. One of the key research findings is the vitality of the link between blogs and other social media platforms to examine disinformation campaigns.
... Multiple methods have been proposed in the literature to identify botnets. One study (Karasaridis et al. 2007) has used an anomaly-based botnet detection method to identify botnet controllers using transport layer data, thus enabling the detection of IRC botnet controllers without known signatures or captured binaries and making it a passive method that is invisible to operators, scale to large networks, and protects end users. This method also determines a botnet's size and activities from outside of compromised networks, making it capable of identifying botnets using encrypted and obfuscated protocols. ...
Article
Full-text available
Botnets are vectors through which hackers can seize control of multiple systems and conduct malicious activities. Researchers have proposed multiple solutions to detect and identify botnets in real time. However, these proposed solutions have difficulties in keeping pace with the rapid evolution of botnets. This paper proposes a model for detecting botnets using deep learning to identify zero-day botnet attacks in real time. The proposed model is trained and evaluated on a CTU-13 dataset with multiple neural network designs and hidden layers. Results demonstrate that the deep-learning artificial neural network model can accurately and efficiently identify botnets.
... Nevertheless, knee/elbow estimation methods have been used in several areas. In fact, the concept of knee/elbow point in error curves is used in many fields like fatigue damage theories [2], detecting the number of clusters [3], botnet detection [4], and system behaviour [5]. With the advent of the Internet of Things (IoT) [6], Machine to Machine (M2M) [7] communications and Machine Learning (ML) [8] several scenarios require a system that works autonomously with minimal human intervention. ...
Conference Paper
With the advent of smart IoT and M2M scenarios it becomes necessary to develop autonomous systems that optimize themselves with minimal human intervention. One possible method to achieve this is through Knee/elbow point estimation. Most of the time these points represent ideal compromises for parameters , methods and algorithms. However, estimating the knee/elbow point in curves is a challenging task. Our focus is on determining the ideal number of clusters autonomously. We analyse and discuss well-known knee/elbow estimators and two extensions based on the theoretical definition. The proposed methods (named AL and S methods) were evaluated against state-of-the-art estimators. The proposed methods are a viable stable solution for knee/elbow estimation.
... Most existing anomaly-based C2 detection techniques are based on the statistical features of packets and flows [7], [12], [17]- [27]. Works like [17], [18] are focused on specific communication protocols, such as IRC, providing narrow-scoped solutions. Whereas, BotMiner [7] is a protocolindependent solution, which assumes that bots within the same botnet are characterized by similar malicious activities and communication patterns. ...
Article
Full-text available
Bot detection using machine learning (ML), with network flow-level features, has been extensively studied in the literature. However, existing flow-based approaches typically incur a high computational overhead and do not completely capture the network communication patterns, which can expose additional aspects of malicious hosts. Recently, bot detection systems that leverage communication graph analysis using ML have gained attention to overcome these limitations. A graph-based approach is rather intuitive, as graphs are true representation of network communications. In this paper, we propose BotChase, a two-phased graph-based bot detection system that leverages both unsupervised and supervised ML. The first phase prunes presumable benign hosts, while the second phase achieves bot detection with high precision. Our prototype implementation of BotChase detects multiple types of bots and exhibits robustness to zero-day attacks. It also accommodates different network topologies and is suitable for large-scale data. Compared to the state-of-the-art, BotChase outperforms an end-to-end system that employs flow-based features and performs particularly well in an online setting.
... 2. Karasaridis [29] proposed botnet detection using mostly transport layer data (passive analysis using flow) for IRC botnet controller detection in Tier-1 ISP networks. This method used to detect, track and characterize IRC botnets. ...
Thesis
The botnet is one of the most widespread malware in the wild aimed at coordinated compromise, control, and misutilization of vulnerable machines. These compromised machines are known as zombies/bots and the controller is often called botmaster. These machines are further used as launching platform to carry out coordinated attack on target system(s)/network [1]. In general, depending on the characteristics of bots, different techniques have been proposed to carry out their detection and bring down such coordinated attack by Security Researchers/Vendors. To counter this, bot writers have come up with detection evasion techniques such as Encrypted Communication, Fast Flux, Domain Generation Algorithm (DGA), migration from earlier IRC based communication to recent Peer to Peer (P2P) techniques etc. It is a catch and run game between bot writers and bot detectors and so far there seems to be no clear winner! My research will concentrate mostly on Analysis and Implementation of selected bot detection methods that are independent of common detection evasive techniques. I will be using Flow based bot detection methodologies that generally consider OSI Layer 3 and 4 metadata without payload profiling for IRC/HTTP/P2P/DNS based botnets. Different existing botnet detection solutions will be considered and some of them will be implemented in experimental setup and tested with live bots and various other Datasets. The performance of such solutions will be evaluated. This will help to figure out optimum botnet detection parameters and techniques for different botnet types.
... Depending upon the study, the features can be deployed anywhere in the network topology. Possible locations like proxy server and Internet Service Provider [2]. But some of the Network based bot detection methods leads to degradation of resources and loss of accuracy. ...
Article
Full-text available
The motivation of our project is to defend DDOS attack without degradation of network resources and bandwidth. In order to defend DDOS attack, black hole filters are placed in the network, where the inbound or outbound traffic is dropped silently. The black hole filters are invisible, while studying the network topology and can be found only by monitoring the lost packets. There is a technique called remote triggered black filter, that has the ability to discard unexceptionable traffic before it gets entered into the protected area network. But the drawback is, it is located within the premises of victim. So, sometimes the trigger itself gets nonresponsive because of too many packet flooding caused due to DDOS attack. Even if the remote triggers relocate its places in order to avoid the effects caused due to DDOS attack, still the remote triggers are vulnerable as we cannot predict the direction flow of attack packets. So, before degradation of internal network bandwidth, it is necessary to defend the DDOS attack. The triggers that are placed in the network in order to defend DDOS attack, should withstand the effects and impacts caused by DDOS attack. In order to defend DDOS attack, a self-triggered black hole filter should be placed within the control of internet service provider as they have the power to block anything and everything. The self-triggered filters in the network are placed after the proxy server. If there is an anonymity in network behavior like packet flooding, the triggers placed after the proxy server will get self-triggered. Keywords: DDOS attack, black hole filter, defending DDOS attack, self-triggered black hole filter, ISP.
... This bot or ASA had very simple tasks which were basically welcoming and greeting the new participants and warn them about the other users' actions [64]. Shortly after that, the usage of bots in IRC became very popular due to the simplicity of implementing them [65]. Bots evolved over time, i.e., they have more functionalities and the tasks these bots were assigned became more complicated and sophisticated. ...
Chapter
In this chapter, we explain what we mean by deviance in social media . We give examples of four types of deviance as observed on social media, viz., deviant acts , deviant events , deviant tactics , and deviant groups . We provide historical information, definitions, and examples that will be studied/explained in more details throughout the book. This chapter would help the readers understand the scope of the problem of deviance in social media, familiarize with definitions and examples of deviant events, groups, acts, and tactics, and peek at the social science theories that can explain such emergent deviant behaviors on social media.
... Livadas et al. [20] proposed a machine learning based approach for botnet detection utilizing some broad system level activity highlights. Karasaridis et al. [18] examined organize stream level detection of IRC botnet controllers for spine systems. Spam Tracker [19] is a spam separating system utilizing social boycotting to characterize email senders in light of their sending conduct as opposed to their personalities. ...
Conference Paper
Full-text available
With the coming of anomaly based intrusion detection systems, numerous methodologies and strategies have been produced to track novel assaults on the systems. High detection rate of 98% at a low caution rate of 1% can be accomplished by utilizing these procedures. In spite of the fact that anomaly-based methodologies are productive, signature- based detection is favored for standard usage of intrusion detection systems. As an assortment of anomaly detection procedures were recommended, it is hard to look at the qualities, shortcomings of these strategies. The motivation behind why ventures don't support the anomaly-based intrusion detection techniques can be surely knew by approving the efficiencies of the every one of the strategies. To explore this issue, the present condition of the examination hone in the field of anomaly-based intrusion detection is surveyed moreover. In this paper, we utilize Deep learning strategies to actualize an anomaly based Novel- IDS. These procedures demonstrate the touchy intensity of generative models with great arrangement, capacities to reason some portion of its knowledge from inadequate data and the versatility.
... Most existing bot detection techniques employ methods for detecting C2 channels based on the statistical features of packets and flows [10]- [22]. Solutions like [10], [11] are focused on specific communication protocols, such as IRC, providing narrow-scoped solutions. On the other hand, Botminer [15] is a protocol-independent solution, which assumes that bots within the same botnet are characterized by similar malicious activities and communication patterns. ...
Preprint
Full-text available
Bot detection using machine learning (ML), with network flow-level features, has been extensively studied in the literature. However, existing flow-based approaches typically incur a high computational overhead and do not completely capture the network communication patterns, which can expose additional aspects of malicious hosts. Recently, bot detection systems which leverage communication graph analysis using ML have gained attention to overcome these limitations. A graph-based approach is rather intuitive, as graphs are true representations of network communications. In this paper, we propose a two-phased, graph-based bot detection system which leverages both unsupervised and supervised ML. The first phase prunes presumable benign hosts, while the second phase achieves bot detection with high precision. Our system detects multiple types of bots and is robust to zero-day attacks. It also accommodates different network topologies and is suitable for large-scale data.
... Most existing bot detection techniques employ methods for detecting C2 channels based on the statistical features of packets and flows [10]- [22]. Solutions like [10], [11] are focused on specific communication protocols, such as IRC, providing narrow-scoped solutions. On the other hand, Botminer [15] is a protocol-independent solution, which assumes that bots within the same botnet are characterized by similar malicious activities and communication patterns. ...
Conference Paper
Full-text available
Bot detection using machine learning (ML), with network flow-level features, has been extensively studied in the literature. However, existing flow-based approaches typically incur a high computational overhead and do not completely capture the network communication patterns, which can expose additional aspects of malicious hosts. Recently, bot detection systems which leverage communication graph analysis using ML have gained attention to overcome these limitations. A graph-based approach is rather intuitive, as graphs are true representations of network communications. In this paper, we propose a two-phased, graph-based bot detection system which leverages both unsupervised and supervised ML. The first phase prunes presumable benign hosts, while the second phase achieves bot detection with high precision. Our system detects multiple types of bots and is robust to zero-day attacks. It also accommodates different network topologies and is suitable for large-scale data.
... Botnet detection systems. Different botnet detection systems have been proposed in the literature, such as those of Gu et al. (2007); Karasaridis et al. (2007); Gu et al. (2008); Zhao et al. (2013); Bou-Harb et al. (2016); Meidan et al. (2018). Some in- vestigate specific protocol channels, others might require deep packet inspection or training periods, while the majority de- pends on malware infections and/or attack life-cycles. ...
Article
Full-text available
The resource-constrained and heterogeneous nature of Internet-of-Things (IoT) devices coupled with the placement of such devices in publicly accessible venues complicate efforts to secure these devices and the networks they are connected to. The Internet-wide deployment of IoT devices also makes it challenging to operate security solutions at strategic locations within the network or to identify orchestrated activities from seemingly independent malicious events from such devices. Therefore, in this paper, we initially seek to determine the magnitude of IoT exploitations by examining more than 1 TB of passive measurement data collected from a /8 network telescope and by correlating it with 400 GB of information from the Shodan service. In the second phase of the study, we conduct in-depth discussions with Internet Service Providers (ISPs) and backbone network operators, as well as leverage geolocation databases to not only attribute such exploitations to their hosting environment (ISPs, countries, etc.) but also to classify such inferred IoT devices based on their hosting sector type (financial, education, manufacturing, etc.) and most abused IoT manufacturers. In the third phase, we automate the task of alerting realms that are determined to be hosting exploited IoT devices. Additionally, to address the problem of inferring orchestrated IoT campaigns by solely observing their activities targeting the network telescope, we further introduce a theoretically sound technique based on L1-norm PCA, and validate the utility of the proposed data dimensionality reduction technique against the conventional L2-norm PCA. Specifically, we identify "in the wild" IoT coordinated probing campaigns that are targeting generic ports and campaigns specifically searching for open resolvers (for amplification purposes). The results reveal more than 120,000 Internet-scale exploited IoT devices, some of which are operating in critical infrastructure sectors such as health and manufacturing. We also infer 140 large-scale IoT-centric probing campaigns; a sample of which includes a worldwide distributed campaign where close to 40% of its population includes video surveillance cameras from Dahua, and another very large inferred coordinated campaign consisting of more than 50,000 IoT devices. The reported findings highlight the insecurity of the IoT paradigm at large and thus demonstrate the importance of understanding such evolving threat landscape.
... Nevertheless, knee/elbow estimation methods have been used in several areas. In fact, the concept of knee/elbow point in error curves is used in many fields like fatigue damage theories [1]- [3], detecting the number of clusters [4], botnet detection [5], and system behaviour [6]. With the advent of the Internet of Things (IoT) [7], Machine to Machine (M2M) [8] communications and Machine Learning (ML) [9] several scenarios require a system that works autonomously with minimal human intervention. ...
Conference Paper
Estimating the knee/elbow point in curves is a challenging task. However, most of the time these points represent ideal compromises for parameters, methods and algorithms. Nowadays several IoT and M2M scenarios require autonomous systems that optimize themselves with minimal human intervention. Thus, knee/elbow estimation has become an important research area. Our focus is determining the ideal number of clusters autonomously. In this paper, we formalize the notion of knee/elbow point based on continuous curvature function and propose two theoretical methods based on the same function. We analyse and discuss well-known knee/elbow estimators and propose our own method. Contrary to most methods, ours is resilient to long tails in the curve. We also propose an iterative refinement method to increase the resilience to long heads. All the previously mentioned methods were implemented (and are publicly available) and evaluated against eight datasets. The proposed method is a viable stable solution for knee/elbow estimation.
... Our model can also be seen as a classi cation problem and is related to literature on (hierarchical) classi cation [11]. We are also not the rst to observe the great potential of machine learning and big data analytics in networking, which is currently a very active eld, see e.g., [1,2,7,12,14,15], to just name a few. ...
Conference Paper
Companies often have very limited information about the applications running in their datacenter or public/private cloud environments. As this can harm efficiency, performance, and security, many network administrators work hard to manually assign actionable description to (virtual) machines. This paper presents and evaluates NetSlicer, a machine-learning approach that enables an automated grouping of nodes into applications and their tiers. Our solution is based solely on the available network layer data which is used as part of a novel graph clustering algorithm, tailored toward the datacenter use case and accounting also for observed port numbers. For the sake of this paper, we also performed an extensive empirical measurement study, collecting actual workloads from different production datacenters (data to be released together with this paper). We find that our approach features a high accuracy.
... Incorporating Endhost Defense: MiddlePolice can cooperate with the DDoS defense mechanism deployed, if any, on the victim. For instance, via botnet identification [53], [54], the victim can instruct the mboxes to block botnet traffic early at upstream so as to save more downstream bandwidth for clients. Such benefits are possible because the traffic control policies enforced by MiddlePolice are completely driven by designation servers. ...
... These efforts have mainly been done via infiltration [18], as done by Bacher et al. [32,33] or passive measurement, as done by Rajab et al. [34]. Many early studies looked at the most common IRC-based bots relying on a centralized control, as shown by karasaridis et al. [35] and Barford et al. [36]. Later on numerous new botnets began to use http-based C&C channels and leverage the more stable P2P based communication architecture, per Wang et al. [17] and Holz et al. [37], to mitigate failure due to centralization. ...
... Gao and Chen [10] designed and developed a flow-based intrusion detection system. Karasaridis et al. [17], Shahrestani et al. [38]. A sound evaluation of a neural network based IDS requires high-quality training and testing datasets. ...
Article
Full-text available
With the rapid expansion of computer networks during the past decade, security has become a crucial issue for computer systems. And to keep security at highest level, there is an increasing need for effective security monitors such as Network Intrusion Detection System to prevent such illicit. In the recent years many researchers focus their hard work on this field using different approaches to build dependable intrusion detection systems. One of these approaches is Flow-based intrusion detection systems that rely on aggregated network traffic flows. In this paper, Multistage Neural Network intrusion detection system based on aggregated flow data is proposed for detecting and classifying attacks in network traffic. The proposed system detects significant changes in the traffic that could be a possible attack in the first stage of neural network, while the second stage has the ability to recognize an attack, to differentiate one attack from another i.e. classifying attack, and the most important, to detect new attacks with high detection rate and low false negative. Two different neural network structures with the use of different training algorithms have been used in our proposed Intrusion Detection System. The experimental results show that the designed system is promising in terms of accuracy and low probability of false alarms, where the overall accuracy classification rate average is equal to 99.25%.
Article
Full-text available
Traditional reactive approach of blacklisting botnets fails to adapt to the rapidly evolving landscape of cyberattacks. An automated and proactive approach to detect and block botnet hosts will immensely benefit the industry. Behavioral analysis of attackers is shown to be effective against a wide variety of attack types. Previous works, however, focus solely on anomalies in network traffic to detect bots and botnet. In this work we take a more robust approach of analyzing the heterogeneous events including network traffic, file download events, SSH logins and chain of commands input by attackers in a compromised host. We have deployed several honeypots to simulate Linux shells and allowed attackers access to the shells. We have collected a large dataset of heterogeneous threat events from the honeypots. We have then combined and modeled the heterogeneous threat data to analyze attacker behavior. Then we have used a deep learning architecture called a Temporal Convolutional Network (TCN) to do sequential and predictive analysis on the data. A prediction accuracy of 85−97% validates our data model as well as our analysis methodology. In this work, we have also developed an automated mechanism to collect and analyze these data. For the automation we have used CYbersecurity information Exchange (CYBEX). Finally, we have compared TCN with Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) and have showed that TCN outperforms LSTM and GRU for the task at hand.
Article
In this work, we present an IoT botnet detection solution, EDIMA, consisting of a set of lightweight modules designed to be deployed at the edge gateway installed in home networks with the remaining modules expected to be implemented on cloud servers. EDIMA targets early detection of IoT botnets prior to the launch of an attack and includes a novel two-stage Machine Learning (ML)-based detector developed specifically for IoT bot detection at the edge gateway. The ML-based bot detector first employs supervised ML algorithms for aggregate traffic classification and subsequently Autocorrelation Function (ACF)-based tests to detect individual bots. The EDIMA architecture also comprises a malware traffic database, a policy engine, a feature extractor and a traffic parser. Performance evaluation results using our testbed setup with real-world IoT malware traffic as well as other public IoT datasets show that EDIMA achieves high bot scanning and bot-CnC traffic detection accuracies with very low false positive rates. The detection performance is also shown to be robust to an increase in the number of IoT devices connected to the edge gateway where EDIMA is deployed. Further, the runtime performance analysis of a Python implementation of EDIMA deployed on a Raspberry Pi reveals low bot detection delays and low RAM consumption. EDIMA is also shown to outperform existing detection techniques for bot scanning traffic and bot-CnC server communication.
Article
Full-text available
This paper presents a comprehensive survey on intrusion and extrusion phenomena and their existing detection and prevention techniques. Intrusion and extrusion events, breach of security system, hamper the protection of the devices or systems. Needless to say that security threats are flourishing with new level of complexity making difficulty in recognizing them. Therefore, security is the remarkable issue at the core of developing a boundless, constant and reliable web. In this paper, our purpose is to unveil and categorize all possible intrusion and extrusion events, bring out issues related to events and explore solutions associated with them. Nevertheless, we suggest further recommendations to improve the security in these issues. We strongly believe that this survey may help understanding intrusion and extrusion phenomena, and pave the way for a better design to protect against security threats © 2021. International Journal of Advanced Computer Science and Applications. All Rights Reserved.
Chapter
It is undeniable that technology is developing and growing at an unstoppable pace. Technology has become a part of people's daily lives. It has been used for many purposes but mainly to make human life easier. In addition to being useful, these advancements in technology have some bad consequences. A new malware called botnet has recently emerged. It is considered to be one of the most important and dangerous cyber security problems as it is not well understood and evolves quickly. Communication of bots between each other and their botmaster results in the formation of botnet; this is also known as a zombie army. As botnets become popular among cybercriminals, more studies have been done in botnet detection area. Researchers have developed new detection mechanisms in order to understand and tackle this growing botnet issue. This chapter aims to review working principles of botnets and botnet detection mechanisms in order to increase general knowledge about botnets.
Article
Full-text available
Over the decades, as the technology of Internet thrives rapidly, more and more kinds of cyber-attacks are blasting out around the world. Among them, botnet is one of the most noxious attacks which has always been challenging to overcome. The difficulties of botnet detection stem from the various forms of attack since the viruses keep evolving to avoid themselves from being found. Rule-based botnet detection has its shortcoming of detecting dynamically changing features. On the other hand, the more the Internet functionalities are developed, the severer the impacts botnets may cause. In recent years, many network devices have suffered from botnet attacks as the Internet of things technology prospers, which caused great damage in many industries. Consequently, botnet detection has always been a critical issue in computer security field. In this paper, we introduce a method to detect potential botnets by inspecting the behaviors of network traffics from network packets. In the beginning, we sample the given packets by a period of time and extract the behavioral features from a series of packets. By analyzing these features with proposed deep learning models, we can detect the threat of botnets and classify them into different categories.
Chapter
In rivalry competition to Mirai Botnet, the second last week of December 2016 experienced a massive 650 Gbps DDoS attack by IoT Botnet named as Leet IoT Botnet. These attacks used large payloads to jam network pipes and thereby bring down the network switches (Seals 2017). Windigo botnet in 2014 infected 10,000 Linux servers and made them send 35 million spam emails per day which affected almost five lakh computers. On the same lines, Grum botnet in 2012 has been found to be responsible for up to 26% of the world’s spam email traffic (Thomas 2015).
Chapter
It is undeniable that technology is developing and growing at an unstoppable pace. Technology has become a part of people's daily lives. It has been used for many purposes but mainly to make human life easier. In addition to being useful, these advancements in technology have some bad consequences. A new malware called botnet has recently emerged. It is considered to be one of the most important and dangerous cyber security problems as it is not well understood and evolves quickly. Communication of bots between each other and their botmaster results in the formation of botnet; this is also known as a zombie army. As botnets become popular among cybercriminals, more studies have been done in botnet detection area. Researchers have developed new detection mechanisms in order to understand and tackle this growing botnet issue. This chapter aims to review working principles of botnets and botnet detection mechanisms in order to increase general knowledge about botnets.
Chapter
Botnet represent a critical threat to computer networks because their behavior allows hackers to take control of many computers simultaneously. Botnets take over the device of their victim and performs malicious activities on its system. Although many solutions have been developed to address the detection of Botnet in real time, these solutions are still prone to several problems that may critically affect the efficiency and capability of identifying and preventing Botnet attacks. The current work proposes a technique to detect Botnet attacks using a feed-forward backpropagation artificial neural network. The proposed technique aims to detect Botnet zero-day attack in real time. This technique applies a backpropagation algorithm to the CTU-13 dataset to train and evaluate the Botnet detection classifier. It is implemented and tested in various neural network designs with different hidden layers. Results demonstrate that the proposed technique is promising in terms of accuracy and efficiency of Botnet detection.
Article
Cloud computing has gained an important role in providing high quality and cost-effective IT services by outsourcing part of their operations to dedicated cloud providers. If intrinsic security issues of this architecture have been extensively studied, it has recently been considered as a ready-to-use platform able to perform malicious activities, thus offering new targets for indirect threats. However, its large scale, the heterogeneous and dynamic nature of the activities it executes, as well as multi-tenancy and privacy-related issues, make the security operation complex. Consequently, cloud providers can hardly detect and mitigate malicious activities they unknowingly host. Leveraging the autonomic paradigm represents a promising solution to face such a complexity, but it requires efficient grounded monitoring and analysis functions to efficiently detect malicious activities hidden within the large number of legitimate ones. In this effort, this paper presents a robust and cost-effective solution to detect malicious activities in a public virtualized environment. Its contribution is twofold: (1) a scalable and robust workload estimation of the virtual host activities in a cloud and (2) a detection algorithm able to discriminate infected hosts with low malicious activities hidden within their legitimate workload and potentially scattered across several tenants. For both of these contributions, we establish their theoretical performance, which demonstrates their optimality, and we evaluate their efficiency on a dataset made of real data collected on PlanetLab. Finally, we study the scalability on a large dataset that consists of simulated data resulting from the real dataset modeling. This demonstrates to what extent the proposal exhibits an excellent sharpness and a reasonable cost, even at a very large scale.
Conference Paper
The rapid rise of cloud computing technology marks the next wave of enterprise information technology, catering up a market demand of a digitized economy to deliver traditional utilities such as electricity, gas, water. It, however, also paves a secure and cheap way of forming a so-called botnet in the cloud. A botnet consists of a network compromised machines controlled by an attacker (a.k.a. botmaster). Traditionally botnets have been integrated with computers, and have been the primary cause of many malicious Internet attacks. However, with emerging technologies such as cloud computing have presented new challenges in simulating what a modern botnet could look like, and how effective they can be executed with the easily accessible resources provided by such technologies. In this paper we implement a novel cloud based botnet and then propose a new method for detecting it. It is our belief that each cloud based botnet has a unique level of entropy in their networking exchanges, and thus determining the randomness of the communications between the command and control server and the bots could be applied to discriminate bot behaviors from normal cloud users. The proposed approach is evaluated in a closed networking environment and the preliminary experimental evaluation results are promising and show significant potentials of using entropy to detect command and control channel of botnets in the cloud.