Conference PaperPDF Available

A Case Study in Network Architecture Tradeoffs

Authors:

Abstract and Figures

Software defined networking (SDN) establishes a separation between the control plane and the data plane, allowing network intelligence and state to be centralized -- in this way the underlying network infrastructure is hidden from the applications. This is in stark contrast to existing distributed networking architectures, in which the control and data planes are vertically combined, and network intelligence and state, as well as applications, are distributed throughout the network. It is also conceivable that some elements of network functionality be implemented in a centralized manner via SDN, and that other components be implemented in a distributed manner. Further, distributed implementations can have varying levels of decentralization, ranging from myopic (in which local algorithms use only local information) to coordinated (in which local algorithms use both local and shared information). In this way, myopic distributed architectures and fully centralized architectures lie at the two extremes of a broader hybrid software defined networking (HySDN) design space. Using admission control as a case study, we leverage recent developments in distributed optimal control to provide network designers with tools to quantitatively compare different architectures, allowing them to explore the relevant HySDN design space in a principled manner. In particular, we assume that routing is done at a slower timescale, and seek to stabilize the network around a desirable operating point despite physical communication delays imposed by the network and rapidly varying traffic demand. We show that there exist scenarios for which one architecture allows for fundamentally better performance than another, thus highlighting the usefulness of the approach proposed in this paper.
Content may be subject to copyright.
A Case Study in Network Architecture Tradeoffs
Nikolai Matni
California Institute of
Technology
1200 E California Blvd
Pasadena, California
nmatni@caltech.edu
Ao Tang
Cornell University
337 Frank H. T. Rhodes Hall
Ithaca, NY
atang@ece.cornell.edu
John C. Doyle
California Institute of
Technology
1200 E California Blvd
Pasadena, California
doyle@caltech.edu
ABSTRACT
Software defined networking (SDN) establishes a separation
between the control plane and the data plane, allowing net-
work intelligence and state to be centralized – in this way the
underlying network infrastructure is hidden from the appli-
cations. This is in stark contrast to existing distributed net-
working architectures, in which the control and data planes
are vertically combined, and network intelligence and state,
as well as applications, are distributed throughout the net-
work. It is also conceivable that some elements of network
functionality be implemented in a centralized manner via
SDN, and that other components be implemented in a dis-
tributed manner. Further, distributed implementations can
have varying levels of decentralization, ranging from myopic
(in which local algorithms use only local information) to
coordinated (in which local algorithms use both local and
shared information). In this way, myopic distributed archi-
tectures and fully centralized architectures lie at the two
extremes of a broader hybrid software defined networking
(HySDN) design space.
Using admission control as a case study, we leverage re-
cent developments in distributed optimal control to provide
network designers with tools to quantitatively compare dif-
ferent architectures, allowing them to explore the relevant
HySDN design space in a principled manner. In particu-
lar, we assume that routing is done at a slower timescale,
and seek to stabilize the network around a desirable operat-
ing point despite physical communication delays imposed by
the network and rapidly varying traffic demand. We show
that there exist scenarios for which one architecture allows
for fundamentally better performance than another, thus
highlighting the usefulness of the approach proposed in this
paper.
N. Matni & J. C. Doyle were in part supported by the
AFOSR and the Institute for Collaborative Biotechnologies
through grant W911NF-09-0001 from the U.S. Army Re-
search Office. A. Tang was in part supported by the ONR
under grant N00014-12-1- 1055.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from Permissions@acm.org.
SOSR2015, June 17 - 18, 2015, Santa Clara, CA, USA
c
2015 ACM ISBN 978-1-4503-3451-8/15/06 ...$15.00
DOI: http://dx.doi.org/10.1145/2774993.2775011.
Categories and Subject Descriptors
C.2.1 [Network Architecture and Design]: Miscella-
neous; C.4 [Performance of Systems]: Miscellaneous.
General Terms
Design, Theory, Performance
1. INTRODUCTION
A common challenge that arises in the design of networks
is that of achieving globally optimal behavior subject to the
latency, scalability and implementation requirements of the
system. Many system properties and protocols – such as net-
work throughput, resource allocation and congestion avoid-
ance – are inherently global in scope, and hence benefit from
centralized solutions implemented through Software Defined
Networking (SDN) (e.g., [3, 4, 9]). However, such central-
ized solutions are not always desirable due to latency and
scalability constraints – in particular the delay inherent in
communicating the global state of the network, solving a
global optimization problem and redistributing the solution
to the network, can often negate the benefits achieved from
this holistic strategy. In these cases, distributed networking
solutions can be preferable (e.g., [1]).
Traditionally, there has been very little in the way of the-
oretical tools at the disposal of network designers to quan-
titatively compare different architectures. Because of this,
the appropriateness of an architectural decision could only
be confirmed once a suitable algorithm had been prototyped
and validated via experiment. This process is time consum-
ing and expensive, and even worse, can be inconclusive. In
particular, if an algorithm performs poorly, it is not clear
if this poor performance is due to an inherent limitation of
the chosen architecture, or simply due to a poorly designed
algorithm, making it very difficult to quantitatively compare
network architectures.
In this paper, we argue that due to both the increased
flexibility afforded to network designers by SDN and to re-
cent advances in distributed optimal control, the gap be-
tween theory and practice has closed significantly. Although
the gap has not yet completely closed, we show that in the
context of certain network applications, existing theory can
indeed be used to quantitatively compare the fundamen-
tal performance limits of different network architectures. In
particular, we leverage the fact that new theory can now
explicitly take into account the effect of communication de-
lays inherent to coordinating control actions in a networked
setting.
Towards this end, we define the broader Hybrid Software
Defined Networking (HySDN) design space (§2) of network
architectures, and argue that an important metric in deter-
mining the appropriateness of an architecture is the optimal
performance achievable by any algorithm implemented on
that architecture. This additional metric can then be used
by a network designer to quantify the performance tradeoffs
associated with using a simpler or more complex architec-
ture, allowing for more informed decisions about architec-
ture early in the design process.
We further explain how recent advances in distributed
optimal control theory allow us to apply this approach to
a class of network control problems in which the objec-
tive is to regulate network state around a pre-specified de-
sired set-point. In §3, we illustrate the usefulness of this
approach with an admission control case study. Perhaps
surprisingly, we show that for two nearly identical routing
topologies, there can be significant differences in the per-
formance achievable by centralized and distributed network
architectures.
We emphasize that we are not arguing that a specific im-
plementation, algorithm or architecture is best suited for a
given application – rather, we are illustrating the usefulness
of a methodology that allows network designers to make
quantitative decisions about architectural choices early in
the design process. We are also not proposing that this
method replace the important steps of simulation, proto-
typing and experiments, but rather as a complement to it.
Indeed, in order to achieve theoretical tractability, simpli-
fying assumptions such as linearized flow models, constant
delays, reliable control packet communication and negligible
computation time are made – the validity of these assump-
tions can only be confirmed through experiments. We do
however believe that the proposed tools can help network de-
signers streamline the prototyping process by allowing them
to narrow down the range of potential architectures that
need to be explored.
2. ARCHITECTURAL TRADEOFFS
Completely distributed network architectures, in
which local algorithms take local actions using only locally
available information, and centralized network architectures,
in which a centralized algorithm takes global action using
global information, can be viewed as lying at the extremes
of a much richer design space. It is possible to build a net-
work architecture in which certain network logic elements
are implemented in a centralized fashion via SDN, and in
which other network logic elements are implemented in a
distributed fashion. Further, distributed architectures can
have varying levels of decentralization, ranging from com-
pletely distributed (as described above), or myopic, archi-
tectures to coordinated distributed architectures, in which
local algorithms take local actions using both locally avail-
able information and shared subsets of global state informa-
tion. We call this broad space of architectures the Hybrid
Software Defined Networking (HySDN) design space (illus-
trated in Figure 1), as its constituent architectures are nat-
urally viewed as hybrids of distributed and software defined
networks.
The question then becomes how to explore this even larger
design space in a systematic way. As we have already al-
luded to, there are inherent tradeoffs associated with any ar-
chitecture: algorithms running on centralized architectures
typically achieve better steady state performance, but often
react with higher latency than those implemented on a dis-
tributed architecture – conversely distributed algorithms are
often simpler to implement but can lead to less predictable
steady state performance.
We pause here to recognize that network architectures and
algorithms are judged by many different metrics, including
but not limited to performance, scalability, ease of deploy-
ment and troubleshooting, and flexibility. We argue that
as much as possible, each of these different metrics should
be traded off against each other in a quantitative way – for
example, it is important for a network designer to be able
to quantify the performance degradation suffered by using a
network architecture or algorithm that is simpler to deploy
and troubleshoot. Our approach to exploring these tradeoffs
is simple: we compare network architectures by comparing
the optimal performance achievable by any algorithm im-
plemented using them. This approach allows for a network
designer to compare fundamental limits of achievable per-
formance across different architectures. The approach also
allows for the comparison of the performance of more scal-
able, flexible, or easier to deploy algorithms implemented
using a given network architecture to that network architec-
ture’s best possible performance, allowing for a quantifiable
tradeoff between these metrics of interest.
In order to make the discussion concrete, we focus on algo-
rithms that can be viewed as controllers that aim to keep the
state of the network as close to a nominal operating point as
possible, while exchanging and collecting information sub-
ject to the communication delays imposed by the network.
For example, in §3, we consider admission control algorithms
that aim to keep the link flow rates at a user-specified set-
point while minimizing the admission buffer size, despite
physically imposed communication delays and rapidly vary-
ing source rates. In particular, we are not addressing the
problem of determining what nominal operating point the
controllers should attempt to bring the network state to –
we aim to extend our analysis to such problems in future
work. We recognize that this measure of performance is not
standard in the networking community, but note that it is a
natural one to consider when explicitly taking into account
rapidly varying and unpredictable source rates.
By restricting ourselves to problems of this nature, we
can leverage recent results in distributed optimal control
theory to classify those architectures for which the opti-
mal algorithm and achievable performance can be computed
efficiently.1It is well known that the optimal centralized
controller, delayed or not, can be computed efficiently via
convex optimization [13]. Further, it is known that myopic
distributed optimal controllers are in general NP-hard to
compute [12, 8]. Up until recently however, it was unclear if
and when coordinated distributed optimal controllers could
be specified as the solution to a convex optimization prob-
lem.
The challenge inherent in optimizing distributed control
algorithms is that control actions (e.g., local admission con-
trol decisions) can potentially serve two purposes: actions
taken by local controllers can be used to both control the
state of the system in a manner consistent with performance
1Throughout this discussion, we assume that the dynamics
of the network are linear around a neighborhood of the nom-
inal operating point. This assumption holds true for many
commonly used network flow models [5, 2].
Data$
Forwarding$$
OS$
Applica3ons$
Data$
Forwarding$$
OS$
Applica3ons$
Data$
Forwarding$$
OS$
Applica3ons$
Data$
Forwarding$$
OS$
Applica3ons$
(a) Distributed Networking
Control'Plane'
Centralized'
Applica1on'
Centralized'
Applica1on'
Centralized'
Applica1on'
Centralized'Applica1on'Plane'
Network'Opera1ng'System'
Data'Plane'
Data'Controller'Plane'Interface''
Applica1on'Controller'Plane'Interface''
Data'
Local'OS'
Dist.'App' Data'
Local'OS'
Dist.'App'
Data'
Local'OS'
Dist.'App' Data'
Local'OS'
Dist.'App'
Centralized'
Applica1on'
(b) Hybrid Software Defined Network-
ing
Control'Plane'
Applica/on'Plane'
Network'Opera/ng'System'
Data'Plane'
Data'
Forwarding''
Data'
Forwarding''
Data'
Forwarding''
Data'
Forwarding''
Data'Controller'Plane'Interface''
Applica/on'Controller'Plane'Interface''
Centralized'
Applica/on'
Centralized'
Applica/on'
Centralized'
Applica/on'
Centralized'
Applica/on'
(c) Centralized Software Defined Net-
working
Figure 1: The Hybrid Software Defined Networking Design space, ranging from distributed (Fig 1a) to centralized (Fig 1c)
network protocols.
objectives, and to signal to other local controllers, allowing
for implicit communication and coordination. Intuitively, it
is this attempt to both control the system and to implic-
itly communicate that makes the problem difficult to solve
computationally. However, if local controllers are able to
coordinate their actions via explicit communication, rather
than by implicit signaling through the system, then the op-
timal controller synthesis problem becomes computationally
tractable [11, 10]. Further it is not difficult to argue that
distributed controllers using explicit communication to co-
ordinate will outperform those relying on implicit signal-
ing through the system. Removing the incentive to signal
through the system can be done in a network control setting
by giving control dedicated packets, i.e., packets containing
the information exchanged between local algorithms, prior-
ity in the network.2
These theoretical developments thus provide the neces-
sary tools to explore a much larger section of the HySDN
design space in a principled manner. We propose leverag-
ing these results to compare the performance achievable by
algorithms implemented on four different classes of architec-
tures, described below:
1. The GOD architecture: in order to quantify the
fundamental limits on achievable performance, we pro-
pose computing the optimal controller implemented
using the Globally Optimal Delay-free (GOD) archi-
tecture. This architecture assumes instantaneous com-
munication to and from a central decision maker –
although not possible to implement, the performance
achieved by this architecture cannot be beaten, and
as such represents the standard against which other
architectures should be compared.
2. The centralized architecture: this architecture cor-
responds to the SDN approach, in which a centralized
decision maker collects global information, computes
2Specifically, if local controllers can communicate with each
other as quickly as the effect of their actions propagate
through the network, then the resulting optimal control
problem is convex. By giving such communication packets
priority in the network and ensuring that they are routed
along suitably defined shortest paths, this property is guar-
anteed to be satisfied [10].
a global control action to be taken, and broadcasts it
to the network. Although global in scope, the latency
of algorithms implemented using this architecture is
determined by the communication delays inherent in
collecting the global network state and broadcasting
global actions.
3. The coordinated architecture: this architecture is
distributed, but allows for sufficient coordination be-
tween local controllers so that the optimal control law
can be computed efficiently [11, 10]. This architecture
takes both rapid action based on timely local infor-
mation, and slower scale action based on global but
delayed shared information, and can thus be viewed as
an intermediate between centralized and myopic archi-
tectures.
4. The myopic architecture: this architecture is one
in which local controllers take action based on local
information. Although the optimal controller cannot
be computed, the performance achieved by any my-
opic controller can be compared with the performance
achieved by an optimal coordinated controller, thus
providing a bound on the performance difference be-
tween the two architectures.
It should be noted that the coordinated architecture will
always perform at least as well as the myopic and centralized
architectures, as any algorithm implemented on the latter
architectures can also be implemented on the former. It is
clear why this holds for myopic algorithms, but it is worth
emphasizing why this holds for centralized algorithms. This
is true because the delays faced by a coordinated distributed
algorithm in collecting and sharing state information are also
faced by a centralized algorithm. Whereas a centralized al-
gorithm waits until the global state has been collected and
processed to react to make changes to the system, a coordi-
nated distributed algorithm takes both local timely actions
based on local information and delayed actions based on
shared information.
What we seek to understand is how large of a gap in per-
formance exists between these different architectures. By
computing the performance of each of these architectures,
the network designer can then quantify tradeoffs in imple-
mentation complexity and performance in a computationally
efficient and inexpensive manner. We demonstrate the use-
fulness of this approach on an admission control case study
in the next section.
3. ADMISSION CONTROL DESIGN
In this section we pose an admission control problem, de-
fine the relevant HySDN design space and show that it can
be explored in a principled and quantitative manner using
tools from distributed optimal control theory. We discuss
the problem at a conceptual level in this section, and refer
the interested reader to [7] for the technical details.
3.1 Problem
We consider the following admission control task: given
a set of source-destination pairs (s, d), a set of desired flow
rates f?
`,(s,d)on each link `for said source-destination pairs,
and a fixed routing strategy that achieves these flow rates,
design an admission control policy that maintains the link
flow rates f`,(s,d)(t) as close as possible to f?
`,(s,d)while min-
imizing the amount of data stored in each of the admission
control buffers, despite fluctuations in the source rates xs(t).
The architectural decision that the network designer is
faced with is whether to implement the admission control
policy in a myopic, coordinated, or centralized manner –
representative examples of these possible architectures are
illustrated in Figure 2 for the case of three sources. In the
myopic distributed architecture, local admission controllers
AC1, AC 2 and AC 3 have policies that depend solely on their
local information – in what follows, we define the local in-
formation available to a local algorithm for the specific case
studies that we consider. In the coordinated distributed ar-
chitecture, the local admission controllers take action based
on both locally available information and on information
shared amongst themselves – this shared information is de-
layed, as it must be communicated across the network and is
therefore subject to propagation delays. Finally, in the cen-
tralized architecture, a central decision maker collects the
admission control buffer and link flow rate states subject to
appropriate delays, determines a global admission control
strategy to be implemented and broadcasts it to the local
AC controllers – this strategy also suffers from delay due to
the need to collect global state information and to broadcast
the global policy to each AC controller.
As mentioned in the previous section, we know a priori
that the coordinated distributed optimal control architec-
ture will perform no worse than either the myopic or cen-
tralized architecture. Conversely, the myopic and central-
ized schemes are significantly easier to deploy, and the cen-
tralized scheme is significantly easier to troubleshoot. Thus,
in quantifying the gaps in performance between the cen-
tralized, myopic and distributed architectures, the network
designer is able to make an informed decision as to whether
the added complexity of the coordinated distributed scheme
is warranted.
As described in §2, our approach to exploring the HySDN
design space is to compute optimal admission controllers im-
plemented on each of these different architectures. In par-
ticular we compute admission controllers that minimize a
performance metric of the form
N
X
t=1
X
`,(s,d)
f`,(s,d)(t)f?
`,(s,d)2+λkA(t)k2
2,(1)
Noisy source
xs(t)=x?
s(t)+s(t)
Admission
Control
Buer
As(t)
Admitted flow
as(t)
Figure 3: Diagram of an edge admission controller.
As(t)
Ingoing flows
f1(t)
f2(t)
f3(t)
a3(t)
a2(t)
a1(t)
Admitted
flows
Figure 4: Diagram of an internal admission controller.
where A(t) is a vector containing the size of the admission
control buffers and Nis the optimization horizon. Thus the
controllers aim to minimize a weighted sum of flow rate de-
viations and admission queue lengths over time, where λ > 0
determines the relative weighting assigned to each of these
two terms in the final cost. By solving the corresponding
optimal control problems, we obtain two parameters: an
optimal cost and an admission control policy that achieves
it. These optimal costs thus serve as a quantitative measure
of the performance of a given architecture, as by definition,
they correspond to the best performance achievable by any
admission control policy implemented on that architecture.
3.2 Case Study
We consider a simple routing topology overlaid onto the
abilene network, and two different admission control scenar-
ios: one in which only edge admission control is allowed (cf.
Figure 3) and one in which edge and internal admission con-
trol is allowed (cf. Figure 4). We model the dynamics of the
system using a flow based model and solve the optimal con-
trol problem with the cost (1) taken to be the infinite horizon
LQG cost using the methods described in [13] and [6] – we
refer the reader to [7] for the technical details. Intuitively,
this cost measures the amount of “energy” transferred from
the source rate deviations to the flow rate deviations and
buffer sizes. We assume that the nominal source rates x?
s(t)
and the nominal flow rates f?
`,(s,d)are all equal to 1 (this
is without loss of generality through appropriate normaliza-
tion of units), and empirically choose λ= 50 based on the
observed responses of the synthesized controllers.
We compute three optimal controllers for each of the sce-
narios considered: a coordinated distributed optimal con-
troller in which local admission controllers are able to ex-
change information via the network in order to coordinate
their actions, as illustrated in the middle pane of Figure
2, a centralized optimal controller subject to the delays in-
duced by collecting global state information and broadcast-
ing a global control action, and the GOD controller. We also
compare the performance of these controllers with the per-
formance achieved by the best myopic distributed controller
we are able to compute via non-linear optimization (recall
that optimal myopic controllers are in general computation-
ally intractable to compute).
Noisy&
Src&1&
AC1&
Flow&1&
Network&
Noisy&
Src&2&
AC2&
Noisy&
Src&3&
AC3&
Info&1&
Flow&2&
Info&2&
Flow&3&
Info&3&
Noisy&
Src&1&
AC1&
Flow&1&
Network&
Noisy&
Src&2&
AC2&
Noisy&
Src&3&
AC3&
Info&1&
Flow&2&
Info&2&
Flow&3&
Info&3&
Noisy&
Src&1&
AC1&
Flow&1&
Network&
Noisy&
Src&2&
AC2&
Noisy&
Src&3&
AC3&
Info&1&
Flow&2&
Info&2&
Flow&3&
Info&3&
Admission&Controller&
Myopic&Distributed& Coordinated&Distributed& Centralized&
Figure 2: The HySDN design space for an admission control problem with three sources. Blue arrows denote the flow of traffic,
whereas dashed black lines denote the flow of admission control related information. Note that the dotted lines correspond to
virtual connections, and can be implemented using either control dedicated communication links, or using control dedicated
tunnel overlays on the network.
Src$1$
Src$2$ Src$3$
Dst$1,2,3$
comms$
~6ms$
~3ms$
~9ms$
~3ms$
AC1$
AC2$
AC3$
Figure 5: Routing topology used for case study: each source-
destination path is denoted by a dashed line. Sources 1 and
2 have edge admission controllers as depicted in Figure 3,
whereas Source 3 either has an edge admission controller
or an internal admission controller, as depicted in Figure 4,
depending on the scenario considered.
We present two different settings to illustrate the benefit
of our approach to exploring the HySDN design space: one
in which using a coordinated distributed algorithm leads to
a significant improvement over a centralized algorithm and
a slight improvement over a myopic distributed algorithm,
and one for which the optimal GOD algorithm is inherently
myopic in nature. In the former case, the significant im-
provement over a centralized implementation justifies the
use of a coordinated or myopic architecture, whereas in the
latter case, the myopic distributed architecture is the clear
choice as it achieves the same performance as a controller
implemented using the GOD architecture.
The topology that we consider is illustrated in Figure 5
– source-destination pairs are illustrated with dashed lines,
and each source has an admission controller. We first con-
sider the edge only admission control scenario, where each
admission controller is as depicted in Figure 3, and compute
the optimal controller implemented using the GOD archi-
tecture. Perhaps surprisingly, the optimal control policy is
naturally myopic, i.e., the admitted flow as(t) at admission
control buffer sis strictly a function of As(t). In other words,
there is no loss in performance in using a myopic distributed
architecture so long as the local control actions are appro-
priately specified.
We next consider a scenario where Sources 1 and 2 have
edge admission controllers, but now Source 3 has an internal
admission controller as depicted in Figure 4. In particular,
the internal admission controller takes as inputs both the
incoming flows due to Sources 1 and 2 arriving at the Den-
ver switch, as well as the contribution from Source 3. We
once again begin by computing the optimal controller im-
plemented using the GOD architecture. In this case, the
GOD policy requires instantaneous sharing of information
between different admission controllers and hence cannot be
implemented. We then compute a centralized optimal con-
troller, in which we assume that the central decision maker
is located at the Denver switch. At this location, the largest
round trip time between Denver and an admission controller
is 18ms: we assume that computation time is negligible, and
thus, it takes 18ms for the centralized decision maker to re-
act to local changes.
We also compute the optimal coordinated distributed con-
troller, in which we assume that admission controllers have
access to flow rate and admission control buffer information
with delay specified by the routing topology. For example,
the admission control buffer at Source 1 has access to the
admission control buffer state at Source 2 with a delay of
3ms and to the admission control buffer state at the Den-
ver switch with a delay of 6ms. This information sharing
protocol is sufficient for the optimal coordinated distributed
controller to be specified by the solution to a convex op-
timization problem, and can be computed efficiently using
the methods in [6]. Finally, we bound the performance of
the myopic architecture using the best myopic distributed
algorithm that we are able to generate via non-linear opti-
mization.
We normalize the costs achieved by each of the architec-
tures by the performance achieved by the GOD architecture:
in this way, a ratio of 1 corresponds to the best possible per-
formance achievable. The optimal algorithm implemented
using the centralized architecture achieved a cost ratio of
1.13 – thus the delay needed to take centralized decisions
leads to a 13% performance degradation over the GOD ar-
chitecture performance. We then computed the optimal co-
0 0.5 1 1.5
0.05
0
0.05
0.1
0.15
0.2
Time (sec)
Flow rate (Data/sec)
GOD Controller
f1
f2
f3
1
f3
2
f3
3
0 0.5 1 1.5 2
0
0.002
0.004
0.006
0.008
0.01
Time (sec)
Admission Control Buffer Size (Data)
GOD Controller
AC1
AC2
AC31
AC32
AC33
0 0.5 1 1.5
0.05
0
0.05
0.1
0.15
0.2
Time (sec)
Flow rate (Data/sec)
Coordinated Controller
f1
f2
f3
1
f3
2
f3
3
0 0.5 1 1.5 2
0
0.002
0.004
0.006
0.008
0.01
Time (sec)
Admission Control Buffer Size (Data)
Coordinated Controller
AC1
AC2
AC31
AC32
AC33
Figure 6: Sample flow deviations and admission buffer
length evolution under the GOD and coordinated dis-
tributed controllers when the system is subject to the ran-
dom source rate fluctuations, illustrated in Figure 7.
0 0.5 1 1.5
1
0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
Time (sec)
Flow rate
Source Rate Fluctuations
x1
x2
x3
Figure 7: Sample source rate fluctuations – these are lower
bounded by -1 as the nominal source rates are all assumed
to be 1.
ordinated distributed controller and obtained a cost ratio of
1.01 – thus, with a mild amount of coordination, a realistic
controller implementation can achieve performance nearly
identical to that of a controller using the GOD architecture.
Finally, the best myopic distributed controller we were able
to synthesize achieved a cost ratio of 1.04 – these results are
summarized in Table 1.
A representative example of flow deviation and admission
buffer length evolution under the GOD and coordinated dis-
tributed controllers can be found in Figure 6 – the driving
source rate deviations are illustrated in Figure 7. As there
is very little quantitative difference in the performance of
these two controllers, one does not expect to see a qualita-
tive difference in the state trajectories.
Thus, in this scenario there is a significant quantifiable
advantage in performance when adopting either a myopic
or coordinated distributed architecture over a centralized
GOD Myopic Coordinated Centralized
Ratio 1 1.04 1.01 1.13
Table 1: Summary of admission control case study results
for edge and internal admission control scenario.
architecture, indicating that the increased complexity in de-
ployment and troubleshooting may be worthwhile. The dif-
ference in performance between the two distributed archi-
tectures considered is much less significant – in this case, it
is up to the network designer to choose whether the addi-
tional complexity of implementing a coordinated algorithm
is worth the 3% performance gain over the proposed myopic
algorithm.
Although the differences in performance in this case study
ranged from 1% to 13%, in general, the gap between cen-
tralized and distributed architectures can become arbitrar-
ily large as the network size (as measured in terms of end to
end communication delays) increases. In particular, as the
network size increases, the centralized approach requires an
increasingly larger delay to collect and take global actions,
leading to a corresponding degradation in performance. How-
ever, this is not true for the coordinated distributed archi-
tecture, as local actions are taken without delay, and the
delay needed for two controllers to exchange information to
coordinate their control actions is independent of the rest of
the network.
4. SUMMARY
In this paper, we propose a methodology for quantifying
the achievable performance of network architectures. We fo-
cused on problems in which the task is to keep the network
state near a nominal operating point despite physically im-
posed delays and varying operating conditions. We showed
how recent results in distributed optimal control theory al-
low for the efficient computation of optimal algorithms im-
plemented using the GOD, centralized and coordinated dis-
tributed architectures, and proposed using the performance
achieved by these controllers, as well as the performance
achieved by a candidate myopic distributed algorithm, as
a means of comparing the respective network architectures.
We applied this approach to an admission control case study,
and in particular, discovered that for the topology consid-
ered, the GOD policy can be implemented using a myopic
distributed architecture when only edge admission control
is allowed. However, if internal admission control is used, a
coordinated distributed architecture leads to the best per-
formance.
We believe that our proposed approach could play an im-
portant and complementary role to the traditional exper-
imental validation of network architectures and algorithms
by allowing network designers to narrow the range of promis-
ing architectures earlier in the design process, and by let-
ting them quantifiably measure the effects on performance
of other metrics such as scalability, ease of deployment and
troubleshooting, and flexibility.
5. REFERENCES
[1] M. Alizadeh, T. Edsall, S. Dharmapurikar,
R. Vaidyanathan, K. Chu, A. Fingerhut, F. Matus,
R. Pan, N. Yadav, G. Varghese, et al. CONGA:
Distributed congestion-aware load balancing for
datacenters. In Proceedings of the 2014 ACM
conference on SIGCOMM, pages 503–514. ACM, 2014.
[2] M. Chiang, S. H. Low, A. R. Calderbank, and J. C.
Doyle. Layering as optimization decomposition: A
mathematical theory of network architectures.
Proceedings of the IEEE, 95(1):255–312, 2007.
[3] C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang,
V. Gill, M. Nanduri, and R. Wattenhofer. Achieving
high utilization with software-driven WAN. In ACM
SIGCOMM Computer Communication Review,
volume 43, pages 15–26. ACM, 2013.
[4] S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski,
A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu,
et al. B4: Experience with a globally-deployed
software defined WAN. In ACM SIGCOMM Computer
Communication Review, volume 43, pages 3–14. ACM,
2013.
[5] F. Kelly and R. Williams. Fluid model for a network
operating under a fair bandwidth-sharing policy.
Annals of Applied Probability, pages 1055–1083, 2004.
[6] A. Lamperski and L. Lessard. Optimal state-feedback
control under sparsity and delay constraints. In 3rd
IFAC Workshop on Distributed Estimation and
Control in Networked Systems, pages 204–209, 2012.
[7] N. Matni, A. Tang, and J. C. Doyle. Technical report:
A case study in network architecture tradeoffs.
Technical Report, 2015.
[8] C. H. Papadimitriou and J. Tsitsiklis. Intractable
problems in control theory. SIAM Journal on Control
and Optimization, 24(4):639–654, 1986.
[9] J. Perry, A. Ousterhout, H. Balakrishnan, D. Shah,
and H. Fugal. Fastpass: a centralized zero-queue
datacenter network. In Proceedings of the 2014 ACM
conference on SIGCOMM, pages 307–318. ACM, 2014.
[10] M. Rotkowitz, R. Cogill, and S. Lall. Convexity of
optimal control over networks with delays and
arbitrary topology. Int. J. Syst., Control Commun.,
2(1/2/3):30–54, Jan. 2010.
[11] M. Rotkowitz and S. Lall. A characterization of
convex problems in decentralized control. Automatic
Control, IEEE Transactions on, 51(2):274–286, 2006.
[12] H. S. Witsenhausen. A counterexample in stochastic
optimum control. SIAM Journal on Control,
6(1):131–147, 1968.
[13] K. Zhou, J. C. Doyle, K. Glover, et al. Robust and
optimal control, volume 40. Prentice Hall New Jersey,
1996.
... Cyber-physical systems and programmable networking are two complementary paradigms, but often separately addressed by two different research communities (control and computing-network communities). Both paradigms use similar elements, that can be presented either following control-theoretic architectures [10] or via computer-based programmatic definitions [12]. In the control community, cyber-physical systems are regarded as a particular class of networked-control systems [10] (cf. Figure 1). ...
... Programmable networking can be represented using similar elements and definitions [12], as depicted in Figure 1. In such a representation, the controller is governed by software elements, supervising both the management and the control domains of a programmable architecture. ...
Article
Full-text available
Cyber-physical technologies are prone to attacks, in addition to faults and failures. The issue of protecting cyber-physical systems should be tackled by jointly addressing security at both cyber and physical domains, in order to promptly detect and mitigate cyber-physical threats. Towards this end, this letter proposes a new architecture combining control-theoretic solutions together with programmable networking techniques to jointly handle crucial threats to cyber-physical systems. The architecture paves the way for new interesting techniques, research directions, and challenges which we discuss in our work.
... 1) The control mechanism of centralized and coordinated distributed coexistence: The theoretical basis of SDWCM comes from HySDN [16]. In the fully distributed control mechanism, each network node makes decisions based on local information observed by itself, and there is no information interaction between network nodes. ...
... The SDN-like collaborative schedule controller gives the global optimizing advice for task execution [15]. ...
Conference Paper
Full-text available
Intelligent Eco Networking (IEN) makes significant progress to be a shared in-network computing infrastructure towards the future Internet, owing to its value-oriented ideology, content-centric fashion, intelligent collaborative management, and decentralized consensus trust preservation. Comprehensively uniting resources of computing, storage, and network, IEN executes the computing tasks in the network layer with name-based functions, integrating the SDN-like collaborative schedule and coexisting with the current TCP/IP via network slicing. To gradually evolve to be an advanced networking ecosystem, IEN copes with ultra-low latency, massive connections, and large-scale coverage to sustain the new rising information services. The simple evaluations demonstrate the profound potential of IEN in settling the in-network computing, providing the practical foundation for the multimodal computing paradigm in the future Internet.
... The concept of myopia has been used before to conceptualize the negative traits of (strictly) distributed PPC and systems in general (see Section 2.2; Zambrano Rey [9] and Zambrano Rey et al. [44]). A similar framework has been presented by Matni et al. [60] for the decision-making dimensions of software-defined networks. There, the authors use (the absence of) myopia to characterize fully centralized and fully distributed myopic algorithms, which then serve as the corner stones for defining a more continuous, hybrid design space for software-defined networks. ...
... Multiple approaches to modeling store-and-forward and flow control on networks have been proposed in the literature (e.g. [4], [5]). Of them, ArchNet adapts and efficiently implements the two-state Markov leveling scheme presented in reference [2], as it was originally conceived and validated for networks that support spaceexploration applications such as the ones considered in this paper. ...
Article
Controlling an opaque system by reading some "dials" and setting some "knobs," without really knowing what they do, is a hazardous and fruitless endeavor, particularly at scale. What we need are transparent networks, that start at the top with a high-level intent and map all the way down, through the control plane to the data plane. If we can specify the behavior we want in software, then we can check that the system behaves as we expect. This is impossible if the implementation is opaque. We therefore need to use open-source software or write it ourselves (or both), and have mechanisms for checking actual behavior against the specified intent. With fine-grain checking (e.g., every packet, every state variable), we can build networks that are more reliable, secure, and performant. In the limit, we can build networks that run autonomously under verifiable, closed-loop control. We believe this vision, while ambitious, is finally within our reach, due to deep programmability across the stack, both vertically (control and data plane) and horizontally (end to end). It will emerge naturally in some networks, as network owners take control of their software and engage in open-source efforts; whereas in enterprise networks it may take longer. In 5G access networks, there is a pressing need for our community to engage, so these networks, too, can operate autonomously under verifiable, closed-loop control.
Article
This letter studies the problem of optimizing the decision latency for software-defined networking (SDN) structured by multiple controllers. According to the mode of process ing non-local tasks in SDN, we propose a novel model for this multi-controller structure, which generalizes the traditional flat and hierarchical structures. We formulate the decision location problem (DLP) to optimize the decision latency based on this model. It is shown that the DLP in the flow initialization can be reformulated as a 0-1 integer optimization problem, which we use a heuristic genetic algorithm to solve. The numerical evaluation verifies the effectiveness of the proposed method.
Article
We present SWAN, a system that boosts the utilization of inter-datacenter networks by centrally controlling when and how much traffic each service sends and frequently re-configuring the network's data plane to match current traffic demand. But done simplistically, these re-configurations can also cause severe, transient congestion because different switches may apply updates at different times. We develop a novel technique that leverages a small amount of scratch capacity on links to apply updates in a provably congestion-free manner, without making any assumptions about the order and timing of updates at individual switches. Further, to scale to large networks in the face of limited forwarding table capacity, SWAN greedily selects a small set of entries that can best satisfy current demand. It updates this set without disrupting traffic by leveraging a small amount of scratch capacity in forwarding tables. Experiments using a testbed prototype and data-driven simulations of two production networks show that SWAN carries 60% more traffic than the current practice.
Conference Paper
We present the design, implementation, and evaluation of B4, a private WAN connecting Google's data centers across the planet. B4 has a number of unique characteristics: i) massive bandwidth requirements deployed to a modest number of sites, ii) elastic traffic demand that seeks to maximize average bandwidth, and iii) full control over the edge servers and network, which enables rate limiting and demand measurement at the edge. These characteristics led to a Software Defined Networking architecture using OpenFlow to control relatively simple switches built from merchant silicon. B4's centralized traffic engineering service drives links to near 100% utilization, while splitting application flows among multiple paths to balance capacity against application priority/demands. We describe experience with three years of B4 production deployment, lessons learned, and areas for future work.
Article
We present the design, implementation, and evaluation of CONGA, a network-based distributed congestion-aware load balancing mechanism for datacenters. CONGA exploits recent trends including the use of regular Clos topologies and overlays for network virtualization. It splits TCP flows into flowlets, estimates real-time congestion on fabric paths, and allocates flowlets to paths based on feedback from remote switches. This enables CONGA to efficiently balance load and seamlessly handle asymmetry, without requiring any TCP modifications. CONGA has been implemented in custom ASICs as part of a new datacenter fabric. In testbed experiments, CONGA has 5× better flow completion times than ECMP even with a single link failure and achieves 2-8× better throughput than MPTCP in Incast scenarios. Further, the Price of Anarchy for CONGA is provably small in Leaf-Spine topologies; hence CONGA is nearly as effective as a centralized scheduler while being able to react to congestion in microseconds. Our main thesis is that datacenter fabric load balancing is best done in the network, and requires global schemes such as CONGA to handle asymmetry.
Conference Paper
This paper presents the solution to a general decentralized state-feedback problem, in which the plant and controller must satisfy the same combination of delay constraints and sparsity constraints. The control problem is decomposed into independent subproblems, which are solved by dynamic programming. In special cases with only sparsity or only delay constraints, the controller reduces to existing solutions.
Article
An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) avoidance. Current datacenter networks inherit the principles that went into the design of the Internet, where packet transmission and path selection decisions are distributed among the endpoints and routers. Instead, we propose that each sender should delegate control---to a centralized arbiter---of when each packet should be transmitted and what path it should follow. This paper describes Fastpass, a datacenter network architecture built using this principle. Fastpass incorporates two fast algorithms: the first determines the time at which each packet should be transmitted, while the second determines the path to use for that packet. In addition, Fastpass uses an efficient protocol between the endpoints and the arbiter and an arbiter replication strategy for fault-tolerant failover. We deployed and evaluated Fastpass in a portion of Facebook's datacenter network. Our results show that Fastpass achieves high throughput comparable to current networks at a 240x reduction is queue lengths (4.35 Mbytes reducing to 18 Kbytes), achieves much fairer and consistent flow throughputs than the baseline TCP (5200x reduction in the standard deviation of per-flow throughput with five concurrent connections), scalability from 1 to 8 cores in the arbiter implementation with the ability to schedule 2.21 Terabits/s of traffic in software on eight cores, and a 2.5x reduction in the number of TCP retransmissions in a latency-sensitive service at Facebook.
Conference Paper
We present the design, implementation, and evaluation of B4, a private WAN connecting Google's data centers across the planet. B4 has a number of unique characteristics: i) massive bandwidth requirements deployed to a modest number of sites, ii) elastic traffic demand that seeks to maximize average bandwidth, and iii) full control over the edge servers and network, which enables rate limiting and demand measurement at the edge. These characteristics led to a Software Defined Networking architecture using OpenFlow to control relatively simple switches built from merchant silicon. B4's centralized traffic engineering service drives links to near 100% utilization, while splitting application flows among multiple paths to balance capacity against application priority/demands. We describe experience with three years of B4 production deployment, lessons learned, and areas for future work.
Conference Paper
We present SWAN, a system that boosts the utilization of inter-datacenter networks by centrally controlling when and how much traffic each service sends and frequently re-configuring the network's data plane to match current traffic demand. But done simplistically, these re-configurations can also cause severe, transient congestion because different switches may apply updates at different times. We develop a novel technique that leverages a small amount of scratch capacity on links to apply updates in a provably congestion-free manner, without making any assumptions about the order and timing of updates at individual switches. Further, to scale to large networks in the face of limited forwarding table capacity, SWAN greedily selects a small set of entries that can best satisfy current demand. It updates this set without disrupting traffic by leveraging a small amount of scratch capacity in forwarding tables. Experiments using a testbed prototype and data-driven simulations of two production networks show that SWAN carries 60% more traffic than the current practice.
Article
This paper is an attempt to understand the apparent intractability of problems in decentralized decision-making, using the concepts and methods of computational complexity. We first establish that the discrete version of an important paradigm for this area. proposed by Witsenhausen, is NP-complete, thus explaining the failures reported in the literature to attack it computationally. In the rest of the paper we show that the computational intractability of the discrete version of a control problem (the team decision problem in our particular example) can imply that there is no satisfactory (continuous) algorithm for the continuous version. To this end, we develop a theory of continuous algorithms and their complexity, and a quite general proof technique, which can prove interesting by themselves.