Conference PaperPDF Available

Virtual topology partitioning towards an efficient failure recovery of software defined networks


Abstract and Figures

Software Defined Networking is a new networking paradigm that has emerged recently as a promising solution for tackling the inflexibility of the classical IP networks. The centralized approach of SDN yields a broad area for intelligence to optimise the network at various levels. Fault tolerance is considered one of the most current research challenges that facing the SDN, hence, in this paper we introduce a new method that computes an alternative paths re-actively for centrally controlled networks like SDN. The proposed method aims to reduce the update operation cost that the SDN network controller would spend in order to recover from a single link failure. Through utilising the principle of community detection , we define a new network model for the sake of improving the network's fault tolerance capability. An experimental study is reported showing the performance of the proposed method. Based on the results, some further directions are suggested in the context of machine learning towards achieving further advances in this research area.
Content may be subject to copyright.
1School of Computing University of Portsmouth, Portsmouth PO1 3HE, UK
2Department of Computer Science and Information Engineering, National Quemoy University, Taiwan
E-MAIL: {, benjamin.aziz, han.liu, mo.adda},
Software Defined Networking is a new networking paradigm
that has emerged recently as a promising solution for tackling the
inflexibility of the classical IP networks. The centralized approach
of SDN yields a broad area for intelligence to optimise the network
at various levels. Fault tolerance is considered one of the most cur-
rent research challenges that facing the SDN, hence, in this paper
we introduce a new method that computes an alternative paths re-
actively for centrally controlled networks like SDN. The proposed
method aims to reduce the update operation cost that the SDN
network controller would spend in order to recover from a single
link failure. Through utilising the principle of community detec-
tion, we define a new network model for the sake of improving the
network’s fault tolerance capability. An experimental study is re-
ported showing the performance of the proposed method. Based
on the results, some further directions are suggested in the context
of machine learning towards achieving further advances in this
research area.
Network Topology, Community Detection, Graph Theory, Soft-
ware Defined Networking
1. Introduction
The Internet and networking system in general plays an es-
sential role in changing our life style through producing vari-
ous type of technologies that get involved in our daily activities
such as social media, economics and business. Networking de-
vices exchange data in a variety networks (e.g. wireless sensor
networks, the Internet-of-Things and Cloud networks) where
currently there are about 9 billion devices connected to the In-
ternet and this number is expected to more than double by 2020.
However, the today’s IP network infrastructures do not have
the ability to accommodate such a huge number of devices,
hence the Internet ossification is highly expected [1]. One ef-
fective solution to tackle the ossification issue is via replace
the complex and rigid networking systems by a programmable
networking instead.
Software-defined networking (SDN) was resulted from a
long history of efforts that have been exerted for the purpose
of simplifying the computer networks management and control
[2]. In SDN the control plane has been moved out from the
data plane and placed in a central location usually called the
network controller or the network operating system. Due to
this decoupling, the network forwarding elements (e.g. routers
and switches) became a dummy devices, which typically dic-
tated by the network controller. In reality, the SDN controller is
either has a direct connection to the data plane forwarding ele-
ments or indirect way through which the controller will be able
to instruct those devices. The OpenFlow [3] protocol is the
most commonly used to establish the connection between the
data plane forwarding elements and the controller, so the con-
troller will be able to send the forwarding rules to each device
that lies on its domain. So far, SDN has gained much attention
of both academia and industry community and adopted by some
of well known pioneering companies like Deutsche Telekom,
Google, Microsoft, Verizon, and Yahoo [4]. Although SDN has
brought a significant benefits to the concept of networks, some
new challenges accompanied this innovation such as faults,
configuration, security and performance [5]. Thus, our goal
in this paper is to eliminate the SDN drawbacks through miti-
gating the one of the most current challenges, which is the data
plane fault tolerance.
The remainder of this paper is organised as follows. In Sec-
tion 2 various techniques to solve the issue of SDN data plane
failures are presented. Section 3 and 4 illustrate our model and
the proposed framework. The experimental results of the eval-
uation are presented in section 5. Finally, the conclusion of this
paper and future work directions are provided in section 6.
2. Related work
The topic of SDN faults and recovery has been already stud-
ied, so we will discuss the relevant literature in this section.
Since the SDN has two separated planes (i.e. control and data),
hence each plane is susceptible to failure. Apart from control
plane failure, which is not the main focused in this work, the
recovery mechanism of data plane can be classified into pro-
tection and restoration techniques [6]. In protection method,
which also known as proactive, the flow entries of the backup
paths are determined in early stage of failure, hence the effected
data packets will convey through the backup path directly at the
moment of failure. On the other hand, in restoration mecha-
nism, which also known as reactive, the backup path will be de-
termined by the controller after the occurrence of failure, thus
extra time is required to set up the discovered alternative path.
Authors in [7] and [8] have shown how a fast data plane recov-
ery can be achieved within the protection approach. However,
the cost of proactive mechanisms is high as it consumes the ca-
pacity of the Ternary Content Addressable Memory (TCAM)
[6], where the forwarding rules to be stored. In addition, there
is no guarantee that the pre-planned paths will be in a good
health once the primary paths will fail.
In contrast, efforts have been exerted to investigate the effec-
tiveness of the reactive approach. In this context, the authors in
[9] and [10] have shown how a fast restoration can be accom-
plished, however in both works the experimental topologies
were small scale (i.e. 6 and 14 nodes respectively). Further-
more, the processing time for setting up the chosen path was
ignored, which is a requirement in SDNs in order to re-routing
from the affected primary path to the backup one.
Unlike the previous studies, the authors in [11] produced
a new method for fast restoration by reducing the processing
time, which typically paid by the controller, through finding
the alternative path (from end-to-end) that has a minimum op-
eration requirement. One main drawback with this work is that
it does not guarantee the sequentiality of the health nodes in the
effected path to be in similar order on the alternative one. Addi-
tionally, there is a lack of information regarding the simulation
tool that has been used. We also noted that there was a mistake
with the first network topology that used in their experiments,
since the authors mentioned that it contains 37 nodes and 57
links, while in reality their network contains a 56 links as it
missed the edge that connects between Hamburg and Frankfurt
according to the SNDlib library [12], hence their results could
be significantly deviant from the expected one.
All the above issues motivated us to put a further efforts in
order to investigate a more possible solutions for the problem of
SDN fault tolerance. The next section will present the proposed
system model of this work.
3. System model
The most frequently used notations throughout this paper are
listed in Table 1.
TABLE 1. List of notations
Symbol Description
CThe set of cliques
cThe failed clique
rsSource router
rdDestination router
ric Any arbitrary router in the failed clique
rjc Any arbitrary router in the failed clique
Pmin Dijkstra’s shortest path in terms of number of hops
DGDijkstra for finding the shortest path based on the graph G
DcDijkstra for finding the shortest path based on the clique c
We utilised the undirected graph theory as a basis to model
the computer network topology. Generally, each simple graph
G= (V, E )consists of a set of vertices (i.e. nodes), V, as well
as a set of edges (i.e. links), E, which connect the nodes to one
another. The set of all links in Gcan be defined as a 2-element
subset of nodes, EV×V. We define a path P, from
source to destination, as a sequence of consecutive vertices
representing nodes or routers in the network1. The path starts
at the source router, rs, and ends with a destination router, rd,
with ric and rjc being any two adjacent routers along P:
P= (rs, . . . , ric, rj c, . . . , rd)
We define the set of all possible paths, Prs,rd, between any
source router rsand destination router rd, as the following set:
Prs,rd={P|(first(P) = rs)(last(P) = rd)}
and the definition of first and last is given as functions on any
general sequence (a1, . . . , an):
first((a1, . . . , an)) = a1
last((a1, . . . , an)) = an
1We use the terms router and node interchangeably.
FIGURE 1. Community detection
Community detection (or sometimes known as cliques iden-
tification) has been productively proposed as a solution to re-
solve various kinds of network-related problems including the
problem of network path optimisation. In this context, we will
use the concept of non-overlapping cliques as an approach to
optimise SDN restoration through accelerating the process of
failure recovery. By dividing the network’s graph Ginto a cer-
tain number of cliques, we are assuming that when the link
failure occurs then only one clique will suffer from the failure.
Meantime, the other cliques should be working fine. It is highly
likely that most of the path’s links will be distributed over var-
ious cliques, hence at the link failure moment, the only clique
c, which includes the effected link, will be treated rather than
dealing with the whole graph Gin looking for an alternative
path from end-to-end.
Informally, Figure 1 depicts how the virtual cliques can be
extracted from the original network topology graph through
applying any community detection algorithm, where the
number of the extracted cliques is vary and typically depends
on the network topology and the used algorithm itself. In our
model, we define the set of cliques dividing the network graph
Gas CG, where individual cliques cCare defined as
c= (V0, E0)|V0VE0E
The definition of cliques is mutually exclusive, in other words:
c1, c2C|c1= (V1, E1)c2= (V2, E2)(V1V2=
We define a failed link (i.e. a 2-router sub-path), F, in a path
Pas follows:
F= (ric, rj c)| ∃ c:c= (V0, E 0)FE0
This definition of Fassumes only those cases where failed
links are always of an intra-clique type, i.e. it will never be the
case that the failed link connects two cliques, i.e.:
@F= (ric, rj c), c1= (V1, E1), c2= (V2, E2)|
c16=c2ric V1rjc V2
Inter-clique failures will be the focus of future work.
We use the term longest-shortest path as the path that has
the maximum number of hops amongst the set of solutions
returned by applying Dijkstra’s algorithm [13] to our network
topology for finding the shortest path between every possible
two nodes in that network. To find this longest-shortest path,
we define the special function LS as follows:
LS(PDset) = x, such that x∈ PDset and
y∈ PDset :len(y)len(x)
If there are more than one longest-shortest paths in PDset,
we pick one randomly. PDset itself represents the set of all
Dijkstra-based solutions for some network topology (V, E ):
PDset ={P| ∀ rs, rdV:P=D(Prs,rd)}
4. The Proposed Framework
From a high level view, Figure 2 illustrates the main com-
ponents of our proposed framework where the Fault Tolerance
Enhancer component is the primary contribution of this frame-
work. We discuss next in more detail the developed component
as well as the tools that we used in this framework.
FIGURE 2. Proposed framework components
4.1 SDN controller
SDN controller representing the network’s brain and the
most vital part, which is the place where the intelligence and
decision making resides. Presently, there are more than 30 dif-
ferent SDN controller offering from both academia (e.g. for
research purposes) and industry (e.g. for commercial use). Our
framework currently supports the POX controller [14], which
is a python-based open source SDN controller. We selected
the POX as it is more suitable for research purposes and also
more convenient for fast prototyping than the other available
ones [15]. The OpenFlow [3] protocol is used to communicate
between the control and data planes in order to gather statistics
from the data plane and carry the controller instructions to pro-
gram the data plane elements, whereas the set of POX APIs can
be utilised for developing various network applications.
4.2 Fault tolerance enhancer component
Currently, there are three main parts that consisting this com-
ponent as follows:
A.Topology parser: will be responsible for fetching the under-
lying network topology characteristics and build a topological
view with the aid of the POX openflow.discovery, which is an
already developed component. In order to represent the gained
network topology as a graph G, we utilised the NetworkX [16]
tool, which is a pure python package with a set of functions
that can be used to manipulate and simplify the network graphs.
B.Cliques producer: is responsible for virtually partitioning
the network topology graph Ginto C(i.e. sub-graphs) by
incorporating the well known community detection algorithm
Girvan and Newman [17] to produce the possible cliques
(with any size) on the basis of the network graph that acquired
from the topology parser. The densely connection between
the resulted clique’s vertices is the main feature of Girvan
and Newman that interesting to us, in other words, the strong
connection among the nodes in each clique could provide a
multiple alternative paths that would utilised at the failure
events. To do so, we have incorporated the igraph tool [18]
to our framework, which is a python open source library for
analysing and manipulating the graphs where the Girvan and
Newman algorithm is already implemented.
C.Route finder: is used to identify the paths for the network’s
flows on the basis of the global view of the underlay topology,
which can be gained from the topology parser. in fact, the route
finder will be called on two occasions; the first one when a new
packet arrives to the network, while the second one when the
failure occurs so a new path should be computed. For this pur-
pose, two algorithms have been developed to obtain the shortest
path based on the well known Dijkstra’s algorithm. We adopt
the hop count as an additive metric on which the Dijkstra’s al-
gorithm will be able to form the shortest path for any incoming
requests. The first algorithm reflects the default action that per-
formed by the SDN controller at failure moments, which is to
erase the flow entries of the effected path and then install the
rules of the backup one from the source router rsto destination
rd, the pseudo code is demonstrated in Algorithm 1.
Algorithm 1: First algorithm to find the shortest path with
Dijkstra from End-to-End based on the Graph G
On Normal: Set Primary Path as Pmin ∈ P rs,rd
On Failure : Do the following procedure
1Prs,rd:= Prs,rd− {Pmin}
2Pmin := DG(Prs,rd)
On the other hand, the second algorithm will tackle the fail-
ure based on the effected clique cCrather than searching
from end-to-end based on the whole graph G, the pseudo code
of this algorithm is illustrated in Algorithm 2. Currently, the
Algorithm 2 is capable to manage the clique’s intra-link failure
and for the future we will extend the framework to include the
inter-link failure among the cliques.
Algorithm 2: Second algorithm to find the shortest path
with Dijkstra on the basis of either Gor effected c
On Normal: Set Primary Path as Pmin ∈ P rs,rd
On Failure : Do the following procedure
1if F= (ric, rj c)then
2Pmin := Dc(Pric,rj c )
The complexity of the two above algorithms are similar to
the Dijkstra, which is O(|V|+|E|log |V|). It also worth not-
ing here that both algorithms have the same strategy to obtain
the shortest path for the newly arriving packets, which is by
running the Dijkstra based on the network graph G. However,
they differ in respect of finding the alternative path to recover
from the failure, in which the Algorithm 2 will turn to apply the
Dijkstra on the crather than G.
5. Simulations and experimental results
In order to test the performance of the proposed framework,
we first build it on top of POX controller as illustrated in Figure
2, then we used the popular SDN emulator Mininet [19] to eval-
uate the prototype of our proposed framework. As mentioned
earlier that SDN has splitted the network architecture into con-
trol plane, the brain, and data plane, the muscles. So, we have
FIGURE 3. Cliques-based network topology of Germany50, [12]
modelled the Germany50 from SNDlib [12] as an instance of
a real world network topology to act our data plane structure,
which contains 50 nodes and 88 edges. The black links in
Figure 3 represent the original Germany50 topology and we
added the colored layer (i.e. from c1 to c6) to demonstrate the
cliques that achieved after applying the Girvan and Newman
algorithm. According to the Germany50 topology, the longest-
shortest path lies between 1 and 13 through the route:(1, 5, 27,
22, 32, 42, 48, 50, 26, 13), there might be several longest-
shortest paths in the network topology and in such a case we
choose randomly. We attached two virtual hosts H1and H2
to the rsand rdof the longest-shortest path respectively in or-
der to simulate the scenario of packet injection. For now let’s
assume that the link (27, 22) fails. Consequently, Algorithm
1 will response to the failure by removing the failed-primary
path and install (1, 25, 47, 3, 39, 7, 41, 17, 40, 13) as an alter-
native shortest path to mask the failure. Since our selected path
passes through 4 cliques, which are c1,c2,c3 and c6, this
means only one clique (e.g. sub-graph) will be involved at the
moment of the intra link failure in the scenario of Algorithm
2, which will react first by detecting the effected clique c(i.e.
c2 in our case) and thence find a loop-free shortest path be-
tween the routers on both sides of the failed link, i.e. between
the two nodes of F. According to the given example, the sub-
path (27, 45, 22) will be returned by the second algorithm as
a quick solution based on c2 without taking into consideration
the other clique’s nodes as they will remain on the same settings
(untouched). Hence, it can be clearly seen that the second algo-
rithm does not guarantee the shortest path from end-to-end as
it works over a clique and not a whole graph. We are expecting
a significant reduction in the duration of the recovery process
for two reasons, (I) because the proposed algorithm will con-
sider only one clique but not the whole graph and (II) due to
minimising the number of rules modification. Figure 4 shows
the behaviour of the two algorithms, which is based on the se-
lected path (i.e. from 1 to 13), in both failure and non-failure
scenarios. We note that the two algorithms have nearly similar
FIGURE 4. Before and after failure measurements
durations for setting up the path (i.e. before failure scenario).
However unlike the Algorithm 1, Algorithm 2 has an extremely
positive impact in terms of enhancing the time cost of fault tol-
erance. The reduction rate can be calculated through compar-
ing the consumed time before and after the failure, which can
be arrived through the following formula:
R=Time BF Time AF
Time BF ×100 (1)
Where Rrepresents the reduction rate, Time BF is the time
of setting up the primary path before the failure and Time AF
is the time of setting up the alternative path after the failure.
Based on (1) and according to the result of figure 4, the Rof
Algorithm 2 is 84.333%. Let us assume that len(rs, ..., rd) =
n, then the utilisation rate of Algorithm 2 can be measured
through (n2
n)×100%, hence, the pre-installed rules utilisa-
tion percentage (based on our example case) will be 80%, this
ratio interprets the high reduction rate that gained by the algo-
rithm. As a result, the larger the value of n, the more utilisation
we obtain out of the selected primary path.
6. Conclusions
This paper provides a new approach for ameliorating the
fault tolerance mechanism of software-defined networks. We
have defined a new network model based on the set theory and
graph theory to represent the theoretical part. We also have
developed a new framework on the basis of the created model
to represent the practical side of this work. The experimen-
tal results, which have been carried out through a well-known
tools and emulation, demonstrate how the proposed method has
led to improving the performance of reactive failure recovery
through segmenting the network topology into a certain num-
ber of non-overlapping cliques. The proposed approach has not
been explored so far, which indicates that this paper first time
suggests the partitioning as a technique towards enhancing the
restoration mode of the SDN’s fault tolerance.
In future, we will position the study in the setting of ma-
chine learning, towards achieving that the routing solution can
be dynamically adjusted according to the update of the statis-
tical details of topological data. In other words, we aim to in-
volve learning strategies in routing to achieve globally optimal
search towards finding the shortest path.
This work is supported by the College of Computer Science
and Information Technology at the University of Al-Qadisiyah
under the scholarship program of the Iraqi Ministry of Higher
Education and Scientific Research.
[1] Lin, P., Bi, J., Hu, H., Feng, T., & Jiang, X. (2011, Novem-
ber). A quick survey on selected approaches for preparing pro-
grammable networks. In Proceedings of the 7th Asian Internet
Engineering Conference (pp. 160-163). ACM.
[2] Feamster, N., Rexford, J., & Zegura, E. (2014). The road to
SDN: an intellectual history of programmable networks. ACM
SIGCOMM Computer Communication Review, 44(2), 87-98.
[3] McKeown, N., Anderson, T., Balakrishnan, H., Parulkar, G., Pe-
terson, L., Rexford, J., ... & Turner, J. (2008). OpenFlow: en-
abling innovation in campus networks. ACM SIGCOMM Com-
puter Communication Review, 38(2), 69-74.
[4] Kreutz, D., Ramos, F. M., Esteves Verissimo, P., Esteve Rothen-
berg, C., Azodolmolky, S., & Uhlig, S. (2015). Software-defined
networking: A comprehensive survey. Proceedings of the IEEE,
103 (1), 14-76.
[5] Wickboldt, J. A., De Jesus, W. P., Isolani, P. H., Both, C. B., Ro-
chol, J., & Granville, L. Z. (2015). Software-defined network-
ing: management requirements and challenges. IEEE Commu-
nications Magazine, 53(1), 278-285.
[6] Akyildiz, I. F., Lee, A., Wang, P., Luo, M., & Chou, W. (2014).
A roadmap for traffic engineering in SDN-OpenFlow networks.
Computer Networks, 71, 1-30.
[7] Kempf, J., Bellagamba, E., Kern, A., Jocha, D., Takcs, A., &
Skldstrm, P. (2012, June). Scalable fault management for Open-
Flow. In Communications (ICC), 2012 IEEE International Con-
ference on (pp. 6606-6610). IEEE.
[8] Sgambelluri, A., Giorgetti, A., Cugini, F., Paolucci, F., & Cas-
toldi, P. (2013). OpenFlow-based segment protection in Ethernet
networks. Journal of Optical Communications and Networking,
5(9), 1066-1075.
[9] Sharma, S., Staessens, D., Colle, D., Pickavet, M., & De-
meester, P. (2011, October). Enabling fast failure recovery in
OpenFlow networks. In Design of Reliable Communication Net-
works (DRCN), 2011 8th International Workshop on the (pp.
164-171). IEEE.
[10] Staessens, D., Sharma, S., Colle, D., Pickavet, M., & Demeester,
P. (2011, October). Software defined networking: Meeting car-
rier grade requirements. In Local & Metropolitan Area Networks
(LANMAN), 2011 18th IEEE Workshop on (pp. 1-6). IEEE.
[11] Astaneh,S.A.,& Heydari,S. S. (2016).Optimization of SDN flow
operations in multi-failure restoration scenarios.IEEE Transac-
tions on Network and Service Management,13(3),421-432.
[12] SNDlib library,
[13] Dijkstra, E. W. (1959). A note on two problems in connexion
with graphs. Numerische mathematik, 1(1), 269-271.
[14] POX ,
[15] Shalimov, A., Zuikov, D., Zimarina, D., Pashkov, V., & Smelian-
sky, R. (2013, October). Advanced study of SDN/OpenFlow
controllers. In Proceedings of the 9th central & eastern euro-
pean software engineering conference in russia (p. 1). ACM.
[16] Schult, D. A., & Swart, P. (2008, August). Exploring network
structure, dynamics, and function using NetworkX. In Proceed-
ings of the 7th Python in Science Conferences (SciPy 2008) (Vol.
2008, pp. 11-16).
[17] Girvan, M., & Newman, M. E. (2002). Community structure
in social and biological networks. Proceedings of the national
academy of sciences, 99(12), 7821-7826.
[18] Csardi, G., & Nepusz, T. (2006). The igraph software package
for complex network research. InterJournal, Complex Systems,
1695(5), 1-9.
[19] Lantz, B., Heller, B., & McKeown, N. (2010, October). A net-
work in a laptop: rapid prototyping for software-defined net-
works. In Proceedings of the 9th ACM SIGCOMM Workshop on
Hot Topics in Networks (p. 19). ACM.
... Among the existing work, the use of a community detection scheme for the failure recovery process of SDN is one of the reactive approaches. The existing work [6], [7] shown good performance, however, they have not addressed the multiple link failure and inter-community link failure problem. This paper presents a reactive failure recovery approach based CDRA scheme for link failure recovery in SDN. ...
... Unlike previous studies, the approach developed by [6], [7] used the community detection technique for failure recovery in SDN. In their study, the authors first divide the whole network into cliques. ...
... Does not ensure the health nodes in the impacted path, and there is a scarcity of information on the simulation technology. Malik [6] 2017 Reactive Single Link Failure ...
Full-text available
The increase in size and complexity of the Internet has led to the introduction of Software Defined Networking (SDN). SDN is a new networking paradigm that breaks the limitations of traditional IP networks and upgrades the current network infrastructures. However, like traditional IP networks, network failures may also occur in SDN. Multiple research studies have discussed this problem by using a variety of techniques. Among them is the use of the community detection method is one of the failure recovery technique for SDN. However, this technique have not considered the specific problem of multiple link multi-community failure and inter-community link failure scenarios. This paper presents a community detection-based routing algorithm (CDRA) for link failure recovery in SDN. The proposed CDRA scheme is efficient to deal with single link intra-community failure scenarios and multiple link multi-community failure scenarios and is also able to handle the inter-community link failure scenarios in SDN. Extensive simulations are performed to evaluate the performance of the proposed CDRA scheme. The simulation results depicts that the proposed CDRA scheme have better simulations results and reduce average round trip time by 35.73%, avg data packet loss by 1.26% and average end to end delay 49.3% than the Dijkstra based general recovery algorithm and also can be used on a large scale network platform.
... Link failure recovery is also considered in this model to provide network resiliency. As our recovery model, we are inspired from the study of Malik et al. [16]. In their study, the authors first divide the whole network into cliques. ...
... Accordingly, the topologies are generated to well fit into the proposed network model. In addition to the scenarios in the study [16], path recovery analysis is performed for an additional scenario. Further, different metrics, such as throughput and transmission overhead, are included to measure the routing performance of fog-enabled smart grids. ...
... In addition, in this study, the path recovery method in [16] is adapted to our generated topologies. In case of any path loss between two nodes in the graph, the controller with its global view notices this failure immediately and ensures a new shortest path from the source to the destination points. ...
... Therefore, the time to reconfigure the networks includes new path calculation and switch update time. To reduce these issues, [138] introduced the principle of a community detection scheme. If a failure occurs, the affected community is detected, and a backup path is established within the affected community without tampering with packet forwarding in other communities. ...
... However, the processing of removing the whole set of old flow entries in the affected path thereby re-installing the new entries for the alternative can be costly especially when the path length is long. This concern has been addressed in their follow-up works [138] [27]. When failure occurs, the scheme only searches the new path from the point of failure down to the destination switch and removed the old flow entries of the affected switches only; the remaining flow entries on the path are preserved. ...
Full-text available
Software Defined Networks (SDN) is a new network paradigm that emerged to offer better network management through the separation of network control logic and data forwarding element. This separation speed up network innovation without the need to rely on the vendor-proprietary interface for network element configuration to forward packets. However, SDN is flow driven network, for each arrived flow, a feasible path is computed to forward the flow to its destination. Afterward, the SDN control logic process the corresponding routing and instruct the set of data forwarding elements to install them on their Flowtable to guide the routing process. Unfortunately, the network changes more frequently in dynamic large-scale networks and the Flowtable is a constraint with limited space. These challenges require the SDN controller to compute paths more often which may also need a large number of flow routing rules placement. In addition, the frequency of communication link failures increases lately. The successful deployment of SDN heavily depends on how it satisfies the reliability requirement with uninterrupted services. Several studies were conducted to compute the optimal path for data forward to meet their Quality of Service demand. Other studies focus on reducing the frequency of link failure. Some studies were conducted to manage the constraint Flowtable resources. This survey focuses on Routing rules placement, unoptimized routing, link, and switch load balancing, failure detection, and recovery. The paper extensively discusses each issue and analyzes the weakness of the current solutions. Finally, it highlights potential challenges that need future research attention.
... Within the same context, authors in [34] produced new algorithms for minimising the required time to update through reducing the solution search space from source to destination in the affected path. Similarly, in [35] an approach to divide the network topology into non-overlapping cliques has been produced to tackle the failure issue in local-based manner rather than global. Both [34,35] took into account the time required to compute the alternative route in order to speed up the operation of update. ...
... Similarly, in [35] an approach to divide the network topology into non-overlapping cliques has been produced to tackle the failure issue in local-based manner rather than global. Both [34,35] took into account the time required to compute the alternative route in order to speed up the operation of update. While, the main issue with the last three works is that it does not secure the shortest path from source to destination. ...
Full-text available
In recent years, the emerging paradigm of software-defined networking has become a hot and thriving topic in both the industrial and academic sectors. Software-defined networking offers numerous benefits against legacy networking systems by simplifying the process of network management through reducing the cost of network configurations. Currently, data plane fault management is limited to two mechanisms: proactive and reactive. These fault management and recovery techniques are activated only after a failure occurrence and hence packet loss is highly likely to occur. This is due to convergence time where new network paths will need to be allocated in order to forward the affected traffic rather than drop it. Such convergence leads to temporary service disruption and unavailability. Practically, not only the speed of recovery mechanisms affects the convergence, but also the delay caused by the process of failure detection. In this paper, we define a new approach for data plane fault management in software- defined networks where the goal is to eliminate the convergence process altogether rather than accelerate the failure detection and recovery. We propose a new framework, called Smart Routing, which allows the network controller to receive forewarning signs on failures and hence avoid risky paths before the failure incidents occur. The proposed approach aims to decrease service disruption, which in turn increases network service availability. We validate our framework through a set of experiments that demonstrate how the underlying model runs and its impact on improving service availability. We take as example of the applicability of the new framework three types of topologies covering real and simulated networks.
Full-text available
Software-defined network (SDN) is a new paradigm that decouples the control plane and data plane. This offered a more flexible way to efficiently manage the network. However, the increasing number of traffics due to the proliferation of the Internet of Things (IoT) devices also increase the number of flow arrival which in turn causes flow rules to change more often, and similarly, path setup requests increased. These events required route path computation activities to take place immediately to cope with the new network changes. Searching for an optimal route might be costly in terms of the time required to calculate a new path and update the corresponding switches. However, the current path selection schemes considered only single routing metrics either link or switch operation. Incorporating link quality and switch’s role during path selection decisions have not been considered. This paper proposed Route Path Selection Optimization (RPSO) with multi-constraint. RPSO introduced joint parameters based on link and switches such as Link Latency (LL), Link Delivery Ratio (LDR), and Critical Switch Frequency Score (CWFscore). These metrics encourage path selection with better link quality and a minimal number of critical switches. The experimental results show that the proposed scheme reduced path stretch by 37%, path setup latency by 73% thereby improving throughput by 55.73%, and packet delivery ratio by 12.5% compared to the baseline work.
Full-text available
Software-Defined Networking (SDN) is an emerging paradigm that promises to change the state of affairs of current networks, by breaking vertical integration, separating the network's control logic from the underlying routers and switches, promoting (logical) centralization of network control, and introducing the ability to program the network. The separation of concerns introduced between the definition of network policies, their implementation in switching hardware, and the forwarding of traffic, is key to the desired flexibility: by breaking the network control problem into tractable pieces, SDN makes it easier to create and introduce new abstractions in networking, simplifying network management and facilitating network evolution. Today, SDN is both a hot research topic and a concept gaining wide acceptance in industry, which justifies the comprehensive survey presented in this paper. We start by introducing the motivation for SDN, explain its main concepts and how it differs from traditional networking. Next, we present the key building blocks of an SDN infrastructure using a bottom-up, layered approach. We provide an in-depth analysis of the hardware infrastructure, southbound and northbounds APIs, network virtualization layers, network operating systems, network programming languages, and management applications. We also look at cross-layer problems such as debugging and troubleshooting. In an effort to anticipate the future evolution of this new paradigm, we discuss the main ongoing research efforts and challenges of SDN. In particular, we address the design of switches and control platforms -- with a focus on aspects such as resiliency, scalability, performance, security and dependability -- as well as new opportunities for carrier transport networks and cloud providers. Last but not least, we analyze the position of SDN as a key enabler of a software-defined environment.
Conference Paper
Full-text available
This paper presents an independent comprehensive analysis of the efficiency indexes of popular open source SDN/OpenFlow controllers (NOX, POX, Beacon, Floodlight, MuL, Maestro, Ryu). The analysed indexes include performance, scalability, reliability, and security. For testing purposes we developed the new framework called hcprobe. The test bed and the methodology we used are discussed in detail so that everyone could reproduce our experiments. The result of the evaluation show that modern SDN/OpenFlow controllers are not ready to be used in production and have to be improved in order to increase all above mentioned characteristics.
Full-text available
Software Defined Networking (SDN) is an exciting technology that enables innovation in how we design and manage networks. Although this technology seems to have appeared suddenly, SDN is part of a long history of efforts to make computer networks more programmable. In this paper, we trace the intellectual history of programmable networks, including active networks, early efforts to separate the control and data plane, and more recent work on OpenFlow and network operating systems. We highlight key concepts, as well as the technology pushes and application pulls that spurred each innovation. Along the way, we debunk common myths and misconceptions about the technologies and clarify the relationship between SDN and related technologies such as network virtualization.
Conference Paper
Full-text available
In the OpenFlow based split architecture, data-plane forwarding is separated from control and management functions. Forwarding elements make only simple forwarding decisions based on flow table entries populated by the controller. While OpenFlow does not specify how topology monitoring is performed, the centralized controller can use Link-Layer Discovery Protocol (LLDP) messages to discover link and node failures and trigger restoration actions. This monitoring and recovery model has serious scalability limitations because the controller has to be involved in the processing of all of the LLDP monitoring messages. For fast recovery, monitoring messages must be sent with millisecond interval over each link in the network. This poses a significant load on the controller. In this paper we propose to implement a monitoring function on OpenFlow switches, which can emit monitoring messages without posing a processing load on the controller. We describe how the OpenFlow 1.1 protocol should be extended to support the monitoring function. Our experimental results show that data plane fault recovery can be achieved in a scalable way within 50 milliseconds using this function.
Conference Paper
Full-text available
Software Defined Networking is a networking paradigm which allows network operators to manage networking elements using software running on an external server. This is accomplished by a split in the architecture between the forwarding element and the control element. Two technologies which allow this split for packet networks are For CES and Openflow. We present energy efficiency and resilience aspects of carrier grade networks which can be met by Openflow. We implement flow restoration and run extensive experiments in an emulated carrier grade network. We show that Openflow can restore traffic quite fast, but its dependency on a centralized controller means that it will be hard to achieve 50 ms restoration in large networks serving many flows. In order to achieve 50 ms recovery, protection will be required in carrier grade networks.
Full-text available
Metro and carrier-grade Ethernet networks, as well as industrial area networks and specific local area networks (LANs), have to guarantee fast resiliency upon network failure. However, the current OpenFlow architecture, originally designed for LANs, does not include effective mechanisms for fast resiliency. In this paper, the OpenFlow architecture is enhanced to support segment protection in Ethernet-based networks. Novel mechanisms have been specifically introduced to maintain working and backup flows at different priorities and to guarantee effective network resource utilization when the failed link is recovered. Emulation and experimental demonstration implementation results show that the proposed architecture avoids both the utilization of a full-state controller and the intervention of the controller upon failure, thus guaranteeing a recovery time only due to the failure detection time, i.e., a few tens of milliseconds within the considered scenario.
Flexible network configuration in software-defined networks makes it possible to dynamically restore flows. To this end, network devices carry out flow operations (i.e., adding or removing flow-entries to/from the flow-tables) to re-route the disrupted flows. Current flow restoration techniques do not consider the number of operations, and hence, are inefficient in disaster scenarios. We aim to minimize the number of operations in such cases and formulate integer programs to find a path: 1) with the lowest path cost requiring up to a given number of operations; 2) requiring the fewest possible operations; and 3) with a Dijkstra-like path cost requiring minimum operations. We study the tradeoff between path cost and the number of operations and prove that the second and third problems are polynomial-time solvable. We propose optimal/suboptimal algorithms with Dijkstra-like complexity that find nearly-optimal solutions. The simulation results show that our methods reduce the number of operations up to 50%, and the best performance is achieved when the number of failed links is small.
SDN is an emerging paradigm currently evidenced as a new driving force in the general area of computer networks. Many investigations have been carried out in the last few years about the benefits and drawbacks in adopting SDN. However, there are few discussions on how to manage networks based on this new paradigm. This article contributes to this discussion by identifying some of the main management requirements of SDN. Moreover, we describe current proposals and highlight major challenges that need to be addressed to allow wide adoption of the paradigm and related technology.
Software Defined Networking (SDN) is an emerging networking paradigm that separates the network control plane from the data forwarding plane with the promise to dramatically improve network resource utilization, simplify network management, reduce operating cost, and promote innovation and evolution. Although traffic engineering techniques have been widely exploited in the past and current data networks, such as ATM networks and IP/MPLS networks, to optimize the performance of communication networks by dynamically analyzing, predicting, and regulating the behavior of the transmitted data, the unique features of SDN require new traffic engineering techniques that exploit the global network view, status, and flow patterns/characteristics available for better traffic control and management. This paper surveys the state-of-the-art in traffic engineering for SDNs, and mainly focuses on four thrusts including flow management, fault tolerance, topology update, and traffic analysis/characterization. In addition, some existing and representative traffic engineering tools from both industry and academia are explained. Moreover, open research issues for the realization of SDN traffic engineering solutions are discussed in detail.