Detection of Botnets Using Combined Host- and Network-Level Information
Yuanyuan Zeng, Xin Hu, Kang G. Shin
University of Michigan, Ann Arbor, MI 48109-2121, USA
Bots are coordinated by a command and control (C&C) in-
frastructure to launch such attacks as Distributed-Denial-of-
Service (DDoS), spamming, identity theft and phishing, all
of which seriously threaten the Internet services and users.
Most contemporary botnet-detection approaches have been
designed to function at the network level, requiring the anal-
ysis of packets’ payloads. However, analyzing packets’ pay-
loads raises privacy concerns and incurs large computational
overheads. Moreover, network traffic analysis alone can sel-
dom provide a complete picture of botnets’ behavior. By con-
trast, general in-host detection approaches are useful to iden-
tify each bot’s host-wide behavior, but are susceptible to the
host-resident malware if used alone. To address these limi-
tations, we account for both the coordination within a botnet
and the malicious behavior each bot exhibits at the host level,
and propose a C&C protocol-independent detection frame-
work that combines both host- and network-level informa-
tion for making detection decisions. This framework clusters
similarly-behaving hosts into groups based on network-flow
analysis without accessing packets’ payloads, and then cor-
relates the clusters with each individual’s in-host behavior for
validation. The framework is shown to be effective and incurs
low false-alarm rates in detecting various types of botnets.
Botnets have now become one of the most serious security
threats to Internet services and applications. A bot is a com-
puter compromised by worms, Trojan horses, or backdoors,
under a remote command and control (C&C) infrastructure. A
group of coordinated bots is called a botnet, and can cooper-
atively mount Distributed-Denial-of-Service (DDoS) attacks,
spamming, phishing, identity theft, and other cyber crimes.
To control a botnet, a botmaster needs to use a C&C chan-
nel to issue commands, and coordinate bots’ actions. Tradi-
tional botnets utilize the IRC protocol as their C&C infrastruc-
ture. Attackers set up an IRC server and specify a channel via
which bots connect to, and listen on, in order to receive com-
mandsfrombotmasters. HTTP-basedbotnetsare similar to the
IRC-based ones, but after infection, bots contact a web-based
C&C serverand notify the serverwith their system-identifying
information via HTTP. This server sends back commands via
HTTP responses. Although IRC- and HTTP-based C&C have
been adopted by many past and current botnets, both of them
are vulnerable to a central-point-of-failure. That is, once the
central IRC or HTTP server is identified and removed, the en-
tire botnet will be disabled.
To counterthis weakness, attackershave recentlyshifted to-
ward a new generation of botnets utilizing decentralized C&C
protocols such as P2P. This C&C infrastructure makes detec-
tion and mitigation much harder. A well-known example is
the Storm worm (a.k.a. Nuwar, W32.Peacomm, and Zhelatin)
 which spreads via email spam and is known to be the first
malware to seed a botnet in a hybrid P2P fashion. Storm uses
peers as HTTP proxies to relay C&C traffic and hides the bot-
masters well behind the P2P network. Storm was estimated
to run on between 250,000 and 1 million compromised sys-
tems in 2007. The Storm botnet has been used in some crim-
inal activities, primarily for sending spam emails. A recent
spambot Waledac, which came to the wild at the end of 2008,
also spreads via spam emails and forms its botnet using a C&C
out that Waledac is the new and improvedversion of the Storm
To date, most botnet-detection approaches operate at the
network level; a majority of them target traditional IRC- or
HTTP-based botnets [12, 5, 10, 14, 17, 22] by looking for traf-
fic signatures or flow patterns. We are aware of only one ap-
proach  designed for protocol- and structure-independent
botnet detection. This approach requires packet-level inspec-
tion and depends solely on network traffic analysis unlikely to
have a complete view of botnets’ behavior. We thus need the
finer-grained host-by-host behavior inspection to complement
the network analysis. On the other hand, since bots behave
maliciously system-wide, general host-based detection can be
useful. One such way is to match malware signatures, but it is
effective in detecting known bots only. To deal with unknown
bot infiltration, in-host behavior analysis [6, 15, 8, 21, 20] is
needed. However, since some in-host malicious behavior is
not exclusive to bots and in-host mechanisms are vulnerableto
provide reliable detection results and thus we need external,
hard-to-compromise(i.e., network-level)informationfor more
accurate detection of bots’ malicious behavior.
Considering the required coordinationwithin each botnet at
the network level and the malicious behavior each bot exhibits
at the host level, we propose a C&C protocol-independentde-
tection framework that incorporates information collected at
Figure 1. System architecture
both the host and the network levels. The two sources of infor-
mation complement each other in making detection decisions.
Our framework first identifies suspicious hosts by discovering
similar behaviors among different hosts using network-flow
analysis, and validates the identified suspects to be malicious
or not by scrutinizing their in-host behavior. Since bots within
the same botnet are likely to receive the same input from the
botmasterandtakesimilaractions, whereasbenignhosts rarely
demonstrate such correlated behavior, our framework looks
for flows with similar patterns and labels them as triggering
flows. It then associates all subsequent flows with each trig-
gering flow on a host-by-host basis, checking the similarity
among those associated groups. If multiple hosts behave sim-
ilarly in the trigger-action patterns, they are grouped into the
same suspicious cluster as likely to belong to the same botnet.
Whenever a group of hosts are identified as suspicious by the
network analysis, the host-behavior analysis results based on a
historyof monitoredhost behaviorsare reported. A correlation
algorithm finally assigns a detection score to each host under
inspection by considering both network and host behaviors.
Our contributions are three-fold. First, to the best of our
knowledge, this is the first framework that combines both
network- and host-level information to detect botnets. The
benefit is that it completes a detection picture by considering
not only the coordination behavior intrinsic to each botnet but
also each bot’s in-host behavior. For example, it can detect
botnets that appear stealthy in network activities with the as-
sistance of host-level information. Moreover, we extract fea-
tures from NetFlow data to analyze the similarity or dissim-
ilarity of network behavior without inspecting each packet’s
payload, thus preserving privacy. Second, our detection relies
on the invariant properties of botnets’ network and host behav-
iors, whichareindependentoftheunderlyingC&C protocol. It
can detectbothtraditionalIRC andHTTP, as well as recenthy-
brid P2P botnets. Third, our approach was evaluated by using
several days of real-worldNetFlow data froma corerouterof a
major campus network containing benign and botnet traces, as
well as multiple benign and botnetdata sets collectedfrom vir-
tual machines and regular hosts. Our evaluation results show
that the proposed framework can detect different types of bot-
nets with low false-alarm rates.
The remainder of the paper is organizedas follows. Section
2 provides an overview of our system architecture. Section
3 details the proposeddetection methodology. Implementation
andevaluationresults arepresentedin Section 4 and5. Limita-
tions are discussedin Section6. Section 7 describesthe related
work. The paper concludes with Section 8.
2 System Architecture
Figure 1 shows the architecture of our system, which pri-
marily consists of three components: host analyzer, network
analyzer, and correlation engine.
As almost all of current botnets target Windows machines,
our host analyzer is designed and implemented for Windows
platforms. The host analyzer is deployed at each host and con-
tains two modules: in-host monitor and suspicion-level gen-
erator. The former monitors run-time system-wide behavior
taking place in the Registry, file system, and network stack
on a host. The latter generates a suspicion-level by applying
a machine-learning algorithm based on the behavior reported
at each time window and computes the overall suspicion-level
using a moving average algorithm. The host analyzer sends
the average suspicion-level along with a few network feature
statistics to the correlation engine, if required. The network
analyzeralso contains two modules: flow analyzer and cluster-
The flow analyzer takes the flow data from a router as input
andsearches fortrigger-actionbotnet-likeflow patterns among
different hosts. It then extracts a set of features that can best
represent those associated flows and transforms them into fea-
ture vectors. Those vectors are then fed to the clustering mod-
ule that groups similarly-behaving hosts into the same cluster,
assuming them likely to be part of a botnet. Whenever a sus-
picious group of hosts are identified by the network analyzer,
their host analyzers are required to provide the suspicion-level
and network statistics to the correlation engine, which veri-
fies the validity of the host information by comparing the net-
work statistics collected from the network and those received
from the host. The correlation engine finally assigns a detec-
tion score to each host and produces a detection result.
ysis, whereas our framework combines both network- and
host-level information to complete a detection picture. More-
over, both TAMD and BotMiner require packet-level inspec-
tion in the traffic analysis, while our network analyzer only
looks into Netflow data avoiding privacy issues and large com-
putational costs. Another difference worth noting is that our
ing clusters, which was not used by previous work. This tech-
nique significantly reduces the number of benign hosts for
clustering, making the network-level analysis more efficient.
As for host-based solutions, there are many general detec-
tion approaches [6, 15, 8, 21, 20]. They either use signature
matching or behavior analysis by system or API call sequence
modeling. Unless a virtual machine monitor is integrated into
those techniques,they can be disabled or compromisedif there
is malware sitting below the detection framework. Our ap-
proach does not rely on one single source of information; it
incorporates both network- and host-level behavior. Also, our
host analyzer sends a few network metrics to the correlation
engine for validation, which adds an additional layer of secu-
Considering the coordination of bots within a botnet and
each bot’s malicious behavior at the host level, we proposed
a C&C protocol-independent botnet detection framework that
combines both host- and network-level information. Our net-
work flow analyzer searches for trigger-action traffic patterns
amongdifferenthosts withoutaccessing thepackets’payloads,
and clusters similarly-behaving hosts into suspicious groups.
Our host analyzer then obtains suspicion-level information
along with a few network statistics on a host-by-host basis for
verification. Finally, our correlation engine generates a detec-
tion result for each host by taking into account both suspicion-
level andclustering results. Our experimentalevaluationbased
on real-world data has shown the following results. The net-
work analyzer can be effective in forming suspicious clusters
of aggressive bots but may fail to separate benign hosts from
bot-infected hosts if the latter are stealthy at the network level.
When the stealthy bots are present, it is the host analyzer that
provides correct detection results by generating distinguishing
suspicion levels. By using combined host- and network-level
ent types of botnets with low false-positive and false-negative
 Netflow. http://www.manageengine.com/products/netflow/
 Passmark. http://www.passmark.com/.
 Thepvclust package. http://www.is.titech.ac.jp/shimo/prog/pvclust/.
 Storm worm. http://en.wikipedia.org/wiki/Storm-Worm.
 J. R. Binkley and S. Singh. Analgorithm for anomaly-based bot-
net detection. In Proceedings of the 2nd conference on Steps to
Reducing Unwanted Traffic on the Internet, Berkeley, CA, USA,
2006. USENIX Association.
 M. Christodorescu, S. Jha, S. A. Seshia, D. Song, and R. E.
Bryant. Semantics-aware malware detection. In Proceedings
of IEEE Symposium on Security and Privacy, 2005.
 E.F.Glynn. Correlation ”distances” and hierarchical clustering.
cluster/index.htm, Dec 2005.
 S. Forrest, S. A. Hofmeyr, A. Somayaji, and T. A. Longstaff. A
sense of self for unix processes. IEEE Symposium on Security
and Privacy, 1996.
 S. Foundation. Waledac is storm is waledac? peer-to-peer over
http.. http2p? http://www.shadowserver.org/wiki/pmwiki.php
/Calendar/20081231, Dec 2008.
 J. Goebel and T. Holz. Rishi: Identify bot contaminated hosts by
irc nickname evaluation. In HotBots’07: Proceedings of the first
conference on First Workshop on Hot Topics in Understanding
Botnets, Berkeley, CA, USA, 2007. USENIX Association.
 G. Gu, R. Perdisci, J. Zhang, and W. Lee. BotMiner: Clus-
tering analysis of network traffic for protocol- and structure-
independent botnet detection.
USENIX Security Symposium, 2008.
 G. Gu, J. Zhang, and W. Lee. BotSniffer: Detecting botnet com-
mand and control channels in network traffic. In Proceedings of
the 15th Annual Network and Distributed System Security Sym-
posium (NDSS’08), February 2008.
 T. Holz, M. Steiner, F. Dahl, E. Biersack, and F. Freiling. Mea-
surements and mitigation of peer-to-peer-based botnets: A case
study on storm worm. In In Proc. First USENIX Workshop on
Large-scale Exploits and Emergent Threats (LEET), 2008.
 A. Karasaridis, B. Rexroad, and D. Hoeflin. Wide-scale botnet
detection and characterization. In HotBots’07, Berkeley, CA,
USA, 2007. USENIX Association.
 E. Kirda, C. Kruegel, G. Banks, G. Vigna, and R. Kemmerer.
Behavior-based spyware detection. In Proceedings of the 15th
USENIX Security Symposium, 2006.
 H. Lin, C. Lin, and R. Weng. A note on platt’s probabilistic
outputs for support vector machines, 2003.
 C. Livadas, R. Walsh, D. Lapsley, and W. Strayer. Using ma-
chine learning technliques to identify botnet traffic. In Proceed-
ings of the 2nd IEEE LCN Workshop, Nov, 2006.
walking-waledec/, January 2009.
 P. Porras, H. Saidi, and V. Yegneswaran. A multi-perspective
analysis of the storm(peacomm)worm. Technical report, SRI,
 R. Sekar, M. Bendre, P. Bollineni, and D. Dhurjati.
automaton-based method for detecting anomalous program be-
haviors. In Proceedings of the IEEE Symposium on Security and
 A. Somayaji and S. Forrest. Automated response using system-
call delays. In Proceedings of the USENIX Security, 2000.
 W. T. Strayer, R.Walsh, C. Livadas, and D. Lapsley. Detecting
botnets with tight command and control. In Proceedings of the
31st IEEE LCN, November, 2006.
 T. Yen and M. K. Reiter. Traffic aggregation for malware detec-
tion. In Proceedings of the 5th GI International Conference on
Detection of Intrusions and Malware, and Vulnerability Assess-
ment (DIMVA), 2008.
 Y.Zeng, X.Hu, A.Bose, H.Wang, andK. G.Shin. Containment
of network worms via per-process rate-limiting. In Proceedings
of 4th International Conference on Security and Privacy inCom-
munication Networks (SecureComm), 2008.
In Proceedings of the 17th