Distributed scheduling in large scale monitoring infrastructures.
ABSTRACT Network monitoring is becoming a necessity for network operators, who usually deploy several monitoring applications that aid in tasks such as traffic engineering, capacity planning and the detection of attacks or other anomalies. There is also an increasing interest in large-scale network monitoring infrastructures that can run multiple applications in several network viewpoints .
- SourceAvailable from: Josep Solé-Pareta
Conference Paper: Load Shedding in Network Monitoring Applications.[Show abstract] [Hide abstract]
ABSTRACT: Monitoring and mining real-time network data streams is crucial for managing and operating data networks. The information that network operators desire to extract from the network traffic is of different size, granularity and accuracy depending on the measurement task (e.g., relevant data for capacity planning and intrusion detection are very different). To satisfy these different demands, a new class of monitoring systems is emerging to handle multiple arbitrary and continuous traffic queries. Such systems must cope with the effects of overload situations due to the large volumes, high data rates and bursty nature of the network traffic. In this paper, we present the design and evaluation of a system that can shed excess load in the presence of extreme traffic conditions, while maintaining the accuracy of the traffic queries within acceptable levels. The main novelty of our approach is that it is able to operate without explicit knowledge of the traffic queries. Instead, it extracts a set of features from the traffic streams to build an on-line predictionmodel of the query resource requirements. This way the monitoring system preserves a high degree of flexibility, increasing the range of applications and network scenarios where it can be used. We implemented our scheme in an existing network monitoring system and deployed it in a research ISP network. Our results show that the system predicts the resources required to run each traffic query with errors below 5%, and that it can efficiently handle extreme load situations, preventing uncontrolled packet losses, with minimum impact on the accuracy of the queries' results.Proceedings of the 2007 USENIX Annual Technical Conference, June 17-22, 2007, Santa Clara, CA, USA; 01/2007
- [Show abstract] [Hide abstract]
ABSTRACT: Developing new tools to analyze network data is often a com- plex and error-prone process. Current practices require developers to pos- sess an in-depth understanding of the original data sets and to develop ad-hoc software tools to first extract the relevant information from the data and then implement the internals of the new algorithm. This de- velopment process results in long delays during the analysis of the data and in the production of software that is often hard to reuse, debug or optimize. We present the design and implementation of CoMo, a system for fast prototyping network data mining applications. CoMo provides an abstraction layer both for the network data as well as for the hardware architecture used to collect and process the data. This allows developers to focus on the correctness of the implementation of their analysis tools while the system makes the tool amenable to optimization when running on dierent hardware architectures. In this paper we discuss CoMo's de- sign challenges, our solution to address them and evaluate the resulting software in terms of performance, flexibility and ease of use.
- [Show abstract] [Hide abstract]
ABSTRACT: One measure of the usefulness of a general-purpose distributed computing system is the system's ability to provide a level of performance commensurate to the degree of multiplicity of resources present in the system. A taxonomy of approaches to the resource management problem is presented in an attempt to provide a common terminology and classification mechanism necessary in addressing this problem. The taxonomy, while presented and discussed in terms of distributed scheduling, is also applicable to most types of resource managementIEEE Transactions on Software Engineering 03/1988; · 2.59 Impact Factor
Distributed Scheduling in Large Scale
Josep Sanjuàs-Cuxart?Pere Barlet-Ros?Gianluca Iannaccone†Josep Solé-Pareta?
?Universitat Politècnica de Catalunya
Network monitoring is becoming a necessity for net-
work operators, who usually deploy several monitoring
applications that aid in tasks such as traffic engineering,
capacity planning and the detection of attacks or other
anomalies. There is also an increasing interest in large-
scale network monitoring infrastructures that can run
multiple applications in several network viewpoints .
Network monitoring infrastructures are extremely re-
source constrained, due to the continuous growth of
network link speeds and computational complexity of
monitoring applications. It is therefore highly desirable
for such systems to be scalable, e.g., to provide their
operators with the ability to incrementally add more
computing nodes to the system, in order to support
more applications and to sustain a higher volume of
traffic. However, providing network monitoring appli-
cations with the ability to migrate across nodes is not
trivial. The solution of simply replicating the incoming
traffic to other nodes is prohibitively expensive, since it
requires additional bandwidth and may involve traffic
replication devices and packet capture hardware.
Furthermore, such infrastructures must efficiently deal
with hot spots that monitoring applications naturally
create, since the events of interest are usually localized
(e.g. intrusion and anomaly detection). Such events
can cause load to be distributed unevenly across the
monitoring infrastructure. This scenario differs from
the ones traditionally explored in distributed systems
research  in that monitoring applications are continu-
ous and never finish. Therefore, in such an environment,
the principal scheduling mechanism is task migration.
In this work, we propose an architecture for net-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ACM CoNEXT 2008 Student Workshop, December 9, 2008, Madrid,
Copyright 2008 ACM 978-1-60558-264-1/08/0012 ...$5.00.
work monitoring applications that enables task migra-
tion across nodes. We then propose a distributed schedul-
ing scheme that dynamically balances the load across all
the nodes of the monitoring infrastructure.
We propose a two-stage architecture for the appli-
cations to provide network monitoring infrastructures
with migration capabilities. The first stage of each ap-
plication performs those computations with severe real-
time constraints that require access to the raw packet
stream. Therefore, the first stage must run in the nodes
equipped with the specialized packet capture hardware
(i.e., capture nodes). The main goal of the first stage is
to perform traffic filtering and short-term aggregation
to enable the second stage to be easily migrated to re-
mote nodes. The second stage continuously receives the
results from the first stage and performs stateful, poten-
tially more complex and longer-term computations.
Our architecture provides the following migration pro-
cedure: 1) the application is paused, 2) it is asked to
serialize its state, 3) its code, state and queued input
are transferred to the destination node, 4) it is loaded
and asked to deserialize its state, and 5) it is resumed.
Applications that do not support migration can still
run on the capture node. We have in the past addressed
the resource management problem in the capture node
by developing a load shedding subsystem . In this
work, we explore the complementary problem: deciding
how to (re)assign the second stage of each application
to the available computing nodes.
The goal of our scheduling algorithm is to introduce
the smallest possible processing delays in the applica-
tions, while balancing the load across nodes in order to
provide fairness of service (i.e., to allow all applications
to experience similar delays). These objectives can be
summarized as a single one: to minimize the maximum
processing delay across applications.
Traditional distributed scheduling approaches are de-
signed for finite tasks and are not appropriate in our
environment, given the continuous nature of monitor-
ing applications. Our thesis is that only a predictive
approach can consider the long-term impact of schedul-
ing decisions. We also argue that the migration costs
and the associated benefits form the fundamental trade-
off that should guide the scheduling algorithm. How-
ever, previous proposals in similar environments have
not considered the migration costs explicitly.
Our proposal is to consider estimates of the future
workloads of applications to balance the load across
nodes, while only paying the migration overheads when
they can be amortized in the future. We propose the fol-
lowing distributed algorithm, based on the well-known
technique of pairwise load balancing.
Periodically, the nodes of the infrastructure randomly
pair with another node. Both nodes predict the delay
that applications will experience under different migra-
tion plans, including the migration overheads, and se-
lect the one that yields the lowest maximum delay.
Our prediction relies on two different models based on
Holt-Winters forecasting. The first maintains the cost
per item (received from the first stage) for each applica-
tion, while the second models the future item arrivals.
Both parameters exhibit temporal patterns that Holt-
Winters can capture. Using these models and the cur-
rent amount of queued items, each node can estimate
the future delays of each application. Other solutions
that consider these parameters as constant or slowly
changing over time would result in less stable plans.
As nodes keep pairing, the load fluctuates from more
loaded to less loaded nodes, continuously adapting to
the varying CPU usage of applications.
4. PRELIMINARY RESULTS
We implement the two-stage architecture described
in Section 2 on eight representative network monitoring
applications. For the sake of reproducibility, we use the
CoMo system  to record the arrivals from stage 1 and
the CPU time required to process them in stage 2 using
a 15 minutes long real-world packet trace collected at
the Catalan RREN. A description of the applications
and the packet trace can be found in .
plication is run with different configuration parameters
resulting in 100 application workload traces.
Our current prototype only considers the costs of
queued (unprocessed) items to perform the prediction,
and does not yet model future arrivals. We also imple-
ment a reactive strategy to compare against. In this
scheme, when the queue of a node is empty, the appli-
cation that suffers the highest delay across the infras-
tructure is moved to the idle node.
We perform several simulations with varying number
of nodes and a fixed available bandwidth between nodes
of 1000 KB/s. As we increase the number of nodes,
we reduce their processor speed to maintain a constant
5 10 1520
number of nodes
application delay (s)
5 10 1520
number of nodes
number of migrations
Figure 1: Average and average maximum delay
(left) and number of migrations (right).
aggregate system capacity.
Figure 1 (left) shows similar average application de-
lays for both strategies, but a significant difference in
the average maximum application delay, indicating that
our strategy is more fair to the applications. Note that
the delays increase with the number of nodes, since it is
easier to correctly handle an equal amount of load with
a reduced amount of more capable processors.
Figure 1 (right) plots the number of migrations per-
formed by each strategy as a function of the number of
nodes. As expected, the number of migrations in the
reactive strategy increases linearly with the number of
nodes, while our strategy scales better.
5.SUMMARY AND FUTURE WORK
We have presented an early prototype of a distributed
network monitoring architecture that can balance the
load across nodes, moving tasks across the system as
necessary. The proposed scheduling algorithm uses a
prediction of the CPU requirements of tasks and the
migration costs to govern its decisions.
The most important piece of future work is to incor-
porate a sense of the long-term consequences of migra-
tions. Our current prototype algorithm only predicts
the cost of the unprocessed queued items for each ap-
plication, but does not yet anticipate future arrivals.
We are also working on conducting a study of both
the accuracy of our prediction techniques and the com-
putational overhead of our algorithm.
 P. Barlet-Ros et al. Load shedding in network
monitoring applications. In Proc. of USENIX
Annual Technical Conf., June 2007.
 T. Casavant and J. Kuhl. A taxonomy of
scheduling in general-purpose distributed
computing systems. IEEE Transactions on
Software Engineering, 14(2):141–154, 1988.
 G. Iannaccone. Fast prototyping of network data
mining applications. In Proc. of PAM, Mar. 2006.
 kc claffy et al. Community-oriented network
measurement infrastructure (CONMI) workshop
report. SIGCOMM CCR, 36(2), 2006.