ArticlePDF Available

Abstract and Figures

The failure of a single network element composing a Service Function Chain (SFC) unavoidably leads to some degradation in terms of availability (ability of guaranteeing working conditions), and/or performance (ability of sustaining a certain workload) for the whole SFC. By considering both of these aspects, we propose, as a case study, a joint analysis of availability and performance (a.k.a. performability) of IP Multimedia Subsystem, an SFC infrastructure which plays a key role in the all-IP convergence of telecommunication services, especially as per prospects of 5G. We refer to an implementation of IMS based on container technology (containerized IMS, or cIMS) which allows to decouple the application layer from the underlying hardware infrastructure more efficiently than classic virtualization schemes. We model the probabilistic behavior of a cIMS by means of Stochastic Reward Networks (SRN) and Reliability Block Diagram (RBD) formalisms to take into account failure and repair events. Then, with the assistance of a designed- from-scratch algorithm (OptChains+), we carry on a performability analysis: i) to evaluate and compare series/parallel cIMS configurations (or settings), and ii) to find settings with minimum cost and maximum availability, given a performance level. The proposed assessment lends itself to a sensitivity analysis, here demonstrated by examples, useful for robustness evaluation.
part included in dashed rectangle A) describes the SRN model of the homogeneous scheme depicted in Fig. 2(a) implementing a generic CNR. Places P u pC N T [P dnC N T ], P u p DC K [P dnDC K ], P u pV M [P dnV M ], P u p H PV [P dn H PV ], and P u p HW [P dn HW ] take into account the working [failure] conditions of container instance, docker daemon, virtual machine, hypervisor, and hardware, respectively. Note that each place contains only one token (indicated by number 1), except for the place P u pC N T that contains n k tokens, which denotes the possibility of having more replicas of a single container instance. Transitions T f C N T [T rC N T ], T f DC K [T r DC K ], T f V M [T rV M ], T f H PV [T r H PV ], and T f HW [T r HW ] denote failure [repair] activities characterizing containers, docker, virtual machine, hypervisor, and hardware respectively. Such transitions (depicted as unfilled rectangles) are called "timed" transitions and, as previously said, are characterized by exponentially distributed times. If a timed transition is "marking-dependent" (in Fig. 5 we insert the "#" symbol to denote such condition), its effective rate is multiplied by the number of tokens available in the pertinent place. Conversely, transitions t C N T , t DC K , t V M , and t H PV (represented by filled and thin rectangles) are called "immediate" transitions and account for actions occurring in a zero-length time interval. The time-evolution of the SRN in Fig. 5 (part A) can be analyzed by starting from the initial working condition, where n k tokens are located in place P u pC N T , while a single token is present in all the remaining "up" places. When a single container failure occurs (e.g. an uncontrolled reboot of a
Content may be subject to copyright.
Comparative Performability Assessment of SFCs:
The case of Containerized IP Multimedia Subsystem
Mario Di Mauro, Member, IEEE, Giovanni Galatro, Maurizio Longo, Member, IEEE, Fabio Postiglione, Marco
Abstract—The failure of a single network element composing
a Service Function Chain (SFC) unavoidably leads to some
degradation in terms of availability (ability of guaranteeing
working conditions), and/or performance (ability of sustaining
a certain workload) for the whole SFC. By considering both
of these aspects, we propose, as a case study, a joint analysis
of availability and performance (a.k.a. performability) of IP
Multimedia Subsystem, an SFC infrastructure which plays a
key role in the all-IP convergence of telecommunication services,
especially as per prospects of 5G. We refer to an implementation
of IMS based on container technology (containerized IMS, or
cIMS) which allows to decouple the application layer from the
underlying hardware infrastructure more efficiently than classic
virtualization schemes. We model the probabilistic behavior of
a cIMS by means of Stochastic Reward Networks (SRN) and
Reliability Block Diagram (RBD) formalisms to take into account
failure and repair events. Then, with the assistance of a designed-
from-scratch algorithm (OptChains+), we carry on a performa-
bility analysis: i) to evaluate and compare series/parallel cIMS
configurations (or settings), and ii) to find settings with minimum
cost and maximum availability, given a performance level. The
proposed assessment lends itself to a sensitivity analysis, here
demonstrated by examples, useful for robustness evaluation.
Index Terms—IP Multimedia Subsystem, 5G, Containers,
Availability, Performability, Stochastic Reward Networks, Redun-
dancy Optimization, SFC.
(c)IMS (Containerized) IP Multimedia Subsystem
CNF Containerized Network Function
CNR Containerized Network Replica
CNT Container
CSCF Call Session Control Function
CTMC Continuous-Time Markov Chain
DCK Docker
HPV Hypervisor
HW Hardware
I-CSCF, I Interrogating CSCF
M. Di Mauro, G. Galatro, M. Longo, F. Postiglione are with the
Department of Information and Electrical Engineering and Applied
Mathematics (DIEM), University of Salerno, 84084, Fisciano, Italy. E-mails:
M. Tambasco is with Research Consortium on Telecommunications
(CoRiTeL), University of Salerno, 84084, Fisciano, Italy. E-mail:
MAA Message Authentication Answer
MAR Message Authentication Request
RBD Reliability Block Diagram
S-CSCF, S Serving CSCF
SFC Service Function Chain
SIP Session Initiation Protocol
SRN Stochastic Reward Networks
UAA User Authentication Answer
UAR User Authentication Request
VM Virtual Machine
VNF Virtual Network Function
A(t),AInstantaneous, Steady-State Availability
cjCapacity for node j
γjNormalized perfomance level for node j
EjCost (expenditure) for node j
λ(µ) Failure (Repair) rate
pi j Probability of node jbeing in marking i
Pup (Pdn ) Place denoting an up (down) condition
ri(j) Reward rate in marking ifor node j
Tf(Tr) Timed transition of failures (repairs)
tImmediate transition
TOday, network and telco providers are benefiting from
the Service Function Chaining (SFC) paradigm, which
allows to easily create and deploy novel services through a
series (namely, a chain) of concatenated network components
[1], [2]. They are often designed by exploiting softwarized
technologies such as virtualization, microservices, container-
ized environments, that provide a flexible and handy habitat
for diverse telco frameworks [3], [4]. Among such frameworks
we focus on the IP Multimedia Subsystem (IMS), which,
embracing the service chaining concept, has been elected both
by standardization groups [5] and by the industry world [6],
[7] as the ideal intermediary between legacy networks and
5G-based solutions.
In this regard, we highlight that, being mainly focused on an
architectural problem, we consider, for the sake of simplicity,
a high-level perspective of the IMS service chain, as often
contemplated in the technical literature on SFC infrastructures
(e.g., [8], [9]).
Moreover, a recent project named Clearwater [10] has been
specifically conceived to offer an open source implementation
of IMS frameworks within virtualized and containerized en-
vironments, and to be exploited as a benchmark testbed for
assorted performance analyses [11], [12], [13].
Inspired by this last trend, we draw up a technique for
managing availability and performance of service chains,
where a containerized version of IMS (cIMS for brevity) has
been considered as pivotal use case. Basically, a container is
a lightweight process [14], [15] that, differently from classic
virtual machines (VMs), does not include a whole operating
system. Container technology is normally coupled with the
Network Function Virtualization (NFV) paradigm which al-
lows to encompass the network logic (e.g. routing, firewalling,
load balancing) in virtualized elements referred to as Virtual
Network Functions (VNFs). A valuable example is provided
in [16] where the authors present Glasgow Network Function
(GNF), a container-based NFV framework aimed at running
and orchestrating VNFs encapsulated in Linux containers, with
the advantage of allowing a fast deployment on commodity
devices that do not need hardware-accelerated virtualization.
In the specific case considered here, containers fulfill func-
tionalities encountered in an IMS domain. Moreover, since
failure and repair events can occur at various layers of the
cIMS system (container, VM, hypervisor, etc.), we first model
the probabilistic behavior of the whole cIMS chain, and
then solve an optimization problem to achieve cIMS config-
urations guaranteeing, simultaneously, high availability1and
minimal costs at a given performance level. More generally,
the methodology can be adapted to other instances in the field
of telecommunications and hence can be useful to management
organizations involved in planning/deploying/maintaining sys-
tems and services, where trade-offs among cost, availability,
and performance are critical.
The paper is organized as follows. In Section II we advance
a general perspective of the problem and state the contributions
offered herein. In Section III we review the related literature,
highlighting the main differences w.r.t. our proposal. Section
IV starts with a brief overview of IMS, providing more details
on its deployment in a containerized setting, which may give
rise to different configurations. In Section V, we present the
probabilistic model of cIMS which accounts for failure and
repair events by means of two formalisms: Reliability Block
Diagram (RBD) and Stochastic Reward Networks (SRN). In
1High or “five nines” availability requirement indicates a steady-state
availability greater than 0.99999, corresponding to a maximum tolerated
downtime of 5.26 minutes per year.
Section VI, after the formalization of our optimization prob-
lem, we outline the procedure for performability assessment
which exploits an algorithm (OptChains+) designed to manage
the cIMS probabilistic models and to expedite the search for
optimal (minimum cost) deployments. In Section VII we report
the outcomes of a numerical performance analysis across the
various possible configurations a cIMS may assume. An as-
sessment of the robustness of the system in critical conditions
is also provided through a sensitivity analysis. Finally, Section
VIII draws some conclusions and hints for further research.
Service availability is considered a crucial parameter of
Quality of Service (QoS) as specified by ITU-T E.800 [17].
The severe requirements described by this recommendation
imply a very careful network design of 5Ginfrastructures,
where virtualized and containerized modules interact not only
among them, but also with the underlying hardware infras-
tructure. For instance, more containers deployed on a single
virtual machine can be adversely affected by a malfunctioning
operating system running on the VM, or, similarly, more
virtualized network functions sharing the same physical layer
can be adversely influenced by a misconfiguration of the
resource isolation mechanism [18]. Further issues can arise
when the network elements have to be traversed in a specific
order to provide a service. It is the case of SFCs, of which IMS
can be considered a particular realization. Other examples may
include: virtualized Evolved Packet Core (vEPC) solutions
able to interoperate in a chained fashion with SDN components
[19], service chains deployments across virtual data centers
[20], virtual Mobility Management Elements (vMMEs) orga-
nized as SFCs interested by signaling processing flows [21].
Across such a chained scheme, the availability analysis must
consider the features of each single node which, if affected by
a failure event, disrupts the whole chain network flow.
Accordingly, we advance the following main contribu-
tions: i) We propose a detailed availability characterization
of softwarized network chains by exploiting two techniques:
Reliability Block Diagrams (RBD) to describe the high-level
interconnections among concatenated nodes, and Stochastic
Reward Networks (SRN) to model the internal structure
of each node from a probabilistic point of view; ii) We
conduct a performability (availability plus performance) as-
sessment of an exemplary softwarized chain constituted by
an IP Multimedia Subsystem deployed in a container-based
setting, namely, a containerized IMS (cIMS), where three
containerized schemes have been compared and discussed in
detail; moreover, through a sensitivity analysis, we evaluate the
robustness of cIMS with respect to deviations of some external
parameters from their nominal values due, e.g., to designer’s
uncertainty on failure/repair mean times; iii) We devise an
algorithm (nicknamed OptChains+) in charge of evaluating
first RBD/SRN models associated to cIMS deployments (or
settings) and then pinpointing the settings which satisfy, at the
same time, minimum cost and high availability under specific
performance criteria. Finally, an experimental testbed based
on the Clearwater project has been deployed to execute stress
tests aimed at deriving the relevant workload parameters.
Over last years, industry and academia have shown a notable
interest in refining techniques and methods for evaluating the
performance, reliability, and availability of novel communica-
tion systems based on cloud models [22]. In particular, the
attention for the Service Level Agreements (namely, the per-
formance indicators that operators must guarantee) encourages
ever deeper analyses about the performability of the systems,
where performance and availability are considered in a unified
manner ([23], [24]).
Aimed at performing a broad-range comparison with affine
literature, we select assorted criteria to highlight the introduced
differences/novelties along several directions. The first crite-
rion pertains the adoption of the state-space model, namely
the SRN. With respect to classic Continuous-Time-Markov-
Chain (CTMC) models (adopted for example in [25], [26], [27]
to characterize the availability of cloud-based infrastructures),
SRN allows to overcome the uncontrollable state space growth
issue arising when modeling real-world complex systems.
A second criterion involves the completeness of the avail-
ability modeling, whereas many works focus only on se-
lected aspects (e.g. only failures). Examples include: [28],
where some open-source and commercial systems have been
considered for the comparison, but, without proposing a
failure/repair mathematical model. Similarly, the stochastic
model proposed in [29], dealing with aspects of an IaaS
infrastructure by adopting SRN methodology, does not include
a failure/repair characterization. Other works are focused on
availability analyses of virtualized elements compositions that
realize an SFC infrastructure. The authors in [30], for instance,
analyze availability issues of an SFC, with respect to the
minimum number of backup VNFs to deploy. The proposed
model addresses (for the sake of simplicity) only failures
events but not repair actions. A repair model is also missed
in [31], where a framework for a reliability evaluation of a
virtualized deployment has been proposed. Similarly, in [32]
the problem of distributing VNF replicas between primary and
backup paths aimed at maximizing the SFC’s availability has
been faced via heuristic algorithm, without taking into account
failure/repair models.
A third criterion regards the broader vision offered by the
performability analysis w.r.t. classic availability assessments
which neglect performance aspects. Along this line we cite
[33], where the authors face an availability analysis concern-
ing micro-service oriented architectures inspired to Google
Kubernetes service, where the performability is considered
as a future work. Performability analysis lacks also in [34],
where the Stochastic Petri Net framework is adopted, and
where a cloud server, a load balancer, and a database distribute
different requests across VMs to realize a Disaster-Recovery-
as-a-Service paradigm.
The present work, which follows the lines traced in recent
works by the same authors ([35], [36]), advances with respect
to the existing literature along two directions: first, it offers an
high availability modeling of a cIMS architecture that, being
the ultimate state-of-art of technology in telco world, has not
been characterized yet at sufficient level of detail (to the best
knowledge of the authors); second, it presents a performability
assessment, not to be found in most related literature, that
may be crucial when coping with service management trade-
offs. Moreover, beside the current application to cIMS, the
tool here developped can be more generally adapted to novel
infrastructures implemented via series-parallel arrangements,
as is the case of redundant SFCs.
The IP Multimedia Subsystem was originally conceived
as a framework for accessing a large amount of multimedia
IP-based facilities with guaranteed quality of service [37].
Nowadays, IMS is becoming crucial in revitalizing telco in-
frastructures, by enabling, for example, strong inter-operations
of Voice over LTE (VoLTE) across network operators’ bound-
aries, and supporting advanced services such as video calling,
HD voice, web messaging and enriched communications.
Within the IMS domain, the signaling flows are regulated
by Call Session Control Function (CSCF) servers by means
of the SIP protocol. The CSCF functionalities are shared
among three servers. The Pr ox yCSCF (P-CSCF) is a SIP
proxy, and basically acts as an interface between the user
equipment and the IMS domain. The I nterrogatingCSCF (I-
CSCF) forwards SIP requests or responses within the domain.
The Ser vingCSCF (S-CSCF) is in charge of performing
some core functions as session and routing control and user
registration management. Another important node is the Home
Subscriber Server (HSS), an evolved database accomodating
users’ profiles that can be retrieved through a specific protocol
called Diameter.
Usually, the SIP flow across an IMS framework follows a
predefined path. A classic example is the Registration proce-
dure shown in Figure 1: the user device sends a REGISTER
message to P-CSCF (1) in order to get access to the IMS
domain; the REGISTER message is transferred to I-CSCF
(2) that, in turn, retrieves from the HSS the appropriate S-
CSCF address in charge of managing the SIP session. Such
a query/response procedure is managed by means of two
messages: User Authentication Request (UAR) (3), and User
Authentication Answer (UAA) (4). Once REGISTER message
arrives to the S-CSCF (5), the latter queries user profile to
the HSS through another couple of messages: Message Au-
thentication Request (MAR) (6) and Message Authentication
Answer (MAA) (7). If the procedure finishes correctly, the S-
CSCF transmits a 200 OK message to user device (8), (9),
(10), and the registration is terminated.
A. Containerized IMS infrastructure
We consider a deployment of an IMS framework in
a container-based domain such as the recently introduced
Docker [38], RKT [39], OpenVZ [40].
With respect to VMs, containers exhibit some differences.
First, containers share the host operating system, whereas a
VM has its own guest operating system, resulting in a heavier
structure (in terms of disk/memory utilization, start-up time,
etc.). Second, a VM exhibits a strong isolation at the host
!"#$%" &'()(*
'*+(+, -(( (*+(+,
Fig. 1: Registration procedure in IMS domain (simplified).
kernel level (being the operating system not shared), thus
exhibiting a stronger security w.r.t. the containers counterpart.
Finally, containers are more flexible in terms of portability
(since they do not have a separate operating system), whereas
VMs require additional efforts during porting operations es-
pecially when the hosting platforms are different.
Market leaders such as Amazon Web Services and Google
Container Engine typically take advantage both from virtual-
ization and containerization by designing infrastructures where
containers run on top of VMs [41]. Actually, the possibilities
of combining VMs with containers strongly depend on the
specific policy adopted by the cloud provider. An useful
taxonomy is provided in [33], where two common schemes
stand out: the first is the homogeneous one, where several
instances of a single kind of containers run on top of one and
the same VM. Such a scheme is well suited for public cloud
environments where, for security issues, it is preferable not to
share VMs among different users or tenants. The second one is
the heterogeneous scheme, where different kinds of containers
are allowed to share the same VM. It is the case of private
cloud environments (see for example [42]), where there are no
strict security requirements, and where it is possible to design
scalable redundant policies (by replicating the entire VM) to
cope with failure events.
Specifically, we consider Docker-based implementations of
both schemes that share a common five-layer arrangement
referred to as a Containerized Network Replica (CNR), con-
sisting of (see Fig. 2): i) an infrastructure layer that generically
embodies hardware components (HW) such as CPU, RAM,
power supplies; ii) a hypervisor (HPV) acting as an interface
between hardware and upper layers; iii) a virtual machine
(VM) layer that provides a wrapper for the Docker environ-
ment; iv) a Docker daemon2(DCK) that offers a runtime
environment to handle containers; v) a Container (CNT) which
embeds the specific software functionality to be provided (e.g.
Proxy, Serving, etc.), and represents the basic element erogat-
ing/managing IMS sessions. Since a DCK can handle more
than one CNT, and since an HPV can handle different VMs,
different CNRs schemes are possible. As illustrated in Fig.
2, scheme (a) accounts for a homogeneous implementation
of a CNR hosting only one kind of instance (e.g. P-CSCF
or P instance); scheme (b) represents a co-located homo-
2Hereafter, for the sake of brevity, docker daemon will be simply referred
to as docker.
!"# !$#
!%# !&#
!"#$%&'()% 6!7*8
.)=@$% .3$A)? 6./08
.)=@$% .3$A)?
Fig. 2: Different schemes of CNRs. Schemes (a) and (b)
realize pure and co-located homogeneous deployments,
respectively. Schemes (c) and (d) realize co-located and
mixed heterogeneous deployments, respectively. Note that
co-location in (b) is intended at infrastructure level, whereas
in (c) is intended at container level.
geneous implementation where different containers instances
(e.g. I-CSCF (I) and HSS (H) instances) are deployed over
different dockers and virtual machines, although they share
the same underlying infrastructure; scheme (c) accounts for
a heterogeneous implementation that allows to consider a co-
located deployment of different containers; finally, scheme (d)
represents a mixed heterogeneous deployment with different
dockers and virtual machines that is not really used in practice,
but inserted for the sake of completeness. We remark that
schemes (b) and (c) represent different kinds of co-location
(that in real IMS settings typically involves HSS and I-CSCF
- see [43]): the former implements co-location at infrastructure
level, whereas the latter implements co-location at container
level. It is useful to define a Containerized Network Function
(CNF) as an ensemble of CNRs deployed to provide a spe-
cific IMS functionality. Otherwise stated, a CNF is a logical
abstraction of a cIMS node composed by one or more CNRs.
Thus, in the sequel, the terms CNF and cIMS node may be
used interchangeably.
In Fig. 3 we outline an exemplary mapping between
functionalities implemented through CNFs (P/I/S-CSCF and
HSS), and the corresponding physical deployments realized
via CNRs. The dashed rectangle surrounding I-CSCF and HSS
indicates that such nodes are typically co-located, meaning that
the corresponding CNRs are implemented through schemes
2(b) and/or 2(c).
The introduction of the CNF representation provides some
degrees of freedom. First, a single CNF can be realized by
means of multiple CNRs that, in principle, can be distributed
geographically. Second, different CNRs belonging to the same
CNF can have a different number of containers since a CNR
has a limited resource to support up to a certain number of
cIMS sessions. Finally, different CNRs belonging to the same
CNF can be deployed according to different schemes (see Fig.
2), as occurs for HSS that, in this example, is deployed by
means of two CNRs: a co-located homogeneous one (CNR 3)
shared with I-CSCF, and a pure homogeneous one (CNR 4).
&"#$#% '$$ $"#$#%
1#2*+ 1#2*,
' ' '''
$ $
Fig. 3: A Containerized Network Function (CNF) represents
a logical abstraction of an IMS functionality (P/I/S-CSCF,
HSS) and can be deployed through one or more CNRs.
Fig. 4: Interconnections among nodes in a (containerized) IMS
infrastructure (homogeneous deployment case).
We now demonstrate how a combination of RBD and SRN
formalisms can help the availability analysis of a cIMS. The
former allows to interpret a cIMS infrastructure in terms of
high-level interconnections among nodes, as illustrated in Fig.
4, that reflects the sequential nodes connection depicted in
Fig. 1. As reported in Fig. 4, each CNR can host a different
number of containers, thus realizing an effective redundancy
only up to the DCK layer. This is due to the fact that specific
availability requirements can be met through a variable number
of containers per CNR.
On the other hand, the SRN methodology, stemming from
Markov Reward Models [44], [45], allows to describe the
interactions occurring among the various layers of a CNR
which composes a generic cIMS node. More specifically, we
adopt its graphical description in terms of bi-partite directed
graphs where places (depicted as circles) account for specific
conditions (e.g. nodes up/down), and transitions (depicted
as rectangles) represent the actions (e.g. a node fails or
is repaired). Inside a place, tokens (represented by dots or
numbers) characterize holding conditions. In case of a CNT
layer, one (or more) tokens lying in the “up” place indicate
one or more working containers, whereas for the remaining
layers (DCK, VM, HPV, HW) one token in the “up” place
indicates a working layer. When a failure/repair event related
to a specific CNR layer occurs (namely, a transition is fired),
a token (or more than one token in case of a CNT layer) is
transferred from the source place to a destination place.
In an SRN, transitions times are supposed to be exponential
random variables (a common assumption in reliability and
availability analyses), with λdenoting the failure rate, and µ
the repair rate. Solving an SRN amounts to evaluate the reward
function, defined as a non-negative random process associated
to some dependability metrics (among them, the availability).
Let Y(t) be the reward function that is equal to 1, when
the system is working at time t, and to 0 otherwise. The
instantaneous availability can be expressed as [44]
A(t)=P{Y(t)=1}=E[Y(t)] =X
where Iis the set of markings, namely, the set of feasible
tokens distributions, ri(commonly referred to as reward rate)
is the value of Y(t) in marking i, and pi(t) is the corresponding
probability. The set Ican be split in a subset of “up” states
(ri=1), and a subset of “down” states (ri=0).
A. Availability model of homogeneous scheme
Figure 5 (part included in dashed rectangle A) describes
the SRN model of the homogeneous scheme depicted in Fig.
2(a) implementing a generic CNR. Places Pu pC N T [PdnC N T ],
Pup DC K [Pdn DC K ], Pu pV M [PdnV M ], Pu p H PV [Pdn H PV ],
and Pup H W [Pdn HW ] take into account the working [failure]
conditions of container instance, docker daemon, virtual ma-
chine, hypervisor, and hardware, respectively. Note that each
place contains only one token (indicated by number 1), except
for the place Pu pC N T that contains nktokens, which denotes
the possibility of having more replicas of a single container
Transitions Tf C NT [TrC NT ], Tf DC K [Tr DC K ], Tf V M
[TrV M ], Tf H PV [Tr H PV ], and Tf H W [Tr HW ] denote failure
[repair] activities characterizing containers, docker, virtual ma-
chine, hypervisor, and hardware respectively. Such transitions
(depicted as unfilled rectangles) are called “timed” transitions
and, as previously said, are characterized by exponentially dis-
tributed times. If a timed transition is “marking-dependent” (in
Fig. 5 we insert the “#” symbol to denote such condition), its
effective rate is multiplied by the number of tokens available in
the pertinent place. Conversely, transitions tC N T ,tD C K ,tV M ,
and tH PV (represented by filled and thin rectangles) are called
“immediate” transitions and account for actions occurring in
a zero-length time interval.
The time-evolution of the SRN in Fig. 5 (part A) can
be analyzed by starting from the initial working condition,
where nktokens are located in place Pu pC N T , while a single
token is present in all the remaining “up” places. When a
single container failure occurs (e.g. an uncontrolled reboot of a
%#(#$% %#)#$%
%($*+ %(,-
%(,./ %),./ %),-
%(/0 %)/0
Fig. 5: SRN-based model representative of a generic CNR deployed according to: the homogeneous scheme (part A); the
homogeneous co-located scheme (parts A+B), the heterogeneous co-located scheme (parts A+D); the heterogeneous mixed
scheme (parts A+B+C+D).
container instance), transition Tf C NT is fired and one token in
Pup C N T is moved to place Pd nC NT . As a consequence, nk1
tokens remain in Pu pC N T . Conversely, once the container
becomes again repaired, Tr C NT is fired and the token comes
back to Pu pC N T .
Now, let us consider the case of a docker layer failure.
The transition Tf DC K is fired and the token is moved from
Pup DC K to Pdn DC K . Notice that, when the docker layer fails,
all container instances that need the underlying docker layer
to be up and running become inactive. Such an issue is taken
into account through an inhibitory arc (depicted as a segment
between Pup DC K and tC N T with a little circle close to the
latter) that, in case of a docker failure, forces tC NT to be fired.
When the docker gets repaired, Tr DC K is fired, and two actions
occur: first, the token passes from Pdn DC K to Pu p DC K , then
the inhibitory arc between Pdn DC K and TrC N T is disabled
and, consequently, the nktokens are ready to be transferred
from PdnC N T to Pu pC N T .
Similar behaviors occur in case of virtual machine, hyper-
visor, and hardware failures/repairs. It is worth noting that the
only layer without an immediate transition is the hardware
layer. Indeed, being hardware the lower layer of a generic
cIMS node structure, no further underlying dependencies have
to be taken into account. Let us now define two quantities use-
ful for the forthcoming performability analysis: the demand W,
that is the required system performance in terms of concurrent
IMS sessions; the capacity cj, namely the maximum number
of concurrent IMS sessions a container belonging to CNF j
can manage. Therefore, the reward rate in marking iis
1 if Pk
u pC N T ·cjW,
0 otherwise,
where his the number of CNRs forming CNF j, and “#”
refers to the number of tokens3. By defining γj=W/cjas the
normalized performance level, (2) can be recast as
1 if Pk
u pC N T γj,
0 otherwise.
Finally, in the limit for t→ ∞, we get the steady-state
availability for cIMS node j:
ri(j)·pi j,(4)
where: ri(j) is derived from (3), and pi j is the steady-state
probability given by pi j =limt+pi j (t) (where pi j (t) is
the instantaneous probability of node jbeing in marking i).
From the single cIMS node availability in (4), it is possible to
derive the overall steady-state availability for the homogeneous
scheme as
Ac I M S =Y
The product in (5) stems from the RBD-like modeling of Fig.
4 representative of series connection among cIMS nodes. The
overall cIMS steady-state availability, in fact, requires that
each node must be available.
B. Availability model of homogeneous co-located scheme
Let us now consider the availability model of a CNR
deployed according to a homogeneous co-located scheme as
3In the standard Petri Net terminology there is a little abuse of notation, as
the symbol # denotes both the number of tokens and the marking-dependent
transitions in SRN graphical representations.
shown in Fig. 2(b). We recall that such a co-location is realized
by sharing the infrastructural level (hypervisor/hardware). The
correspondent SRN is given by the parts A e B in Fig. 5,
where, for the sake of simplicity, we consider only two differ-
ent co-located containers and two different dockers and virtual
machine layers. Comparing this co-located scheme against the
homogeneous scheme of Fig. 5 (part A), we can observe that
the main structure remains unaltered, whereas a new piece
(B) typifies the presence of co-located elements. Places and
transitions characterizing such a new piece are distinguished
by means of a prime superscript (e.g. P0
u pC N T ,T0
f C N T , etc.).
Moreover, we can notice the presence of two further inhibitory
arcs connecting the two parts of the graph. The first one
between Pup H P V and t0
V M accounts for the fact that, if
hypervisor fails, the co-located VM (and, in turn, co-located
DCK and CNT) cannot be operative, thus, t0
V M is fired and
the only token in P0
u pV M is moved to P0
dnV M .
The second inhibitory arc between Pdn H PV and T0
rV M
accounts for the fact that the token cannot be moved from
dnV M to P0
u pV M , until the token in Pdn H PV is transferred
to Pup H P V , since the co-located virtual machine cannot be
restored until hypervisor gets repaired.
Similarly, given marking i, it is possible to define a new
reward rate as
1 if Pk
u pC N T γj1
u pC N T γj2,
0 otherwise,
where the symbol denotes a logical A N D operator between
the two conditions. The above expression can be interpreted as
a generalization of (3) to the case of two co-located containers
with performance capacities cj1and cj2belonging to co-
located nodes j1and j2that, in our case, are I-CSCF and
Accordingly, the steady-state availability pertinent to a pair
of co-located nodes can be expressed as
i j,(7)
where r0
i(j1,j2) is given by (6), and p0
i j is the corresponding
steady-state probability. Considering that, in a homogeneous
co-located scheme, the overall cIMS infrastructure is com-
posed by two non-colocated nodes (typically P-CSCF and S-
CSCF) and two co-located nodes (typically I-CSCF and HSS),
the overall steady-state availability is:
c I M S =A0
with ( j,j1,j2)P,S,I,H. The product of Aj(derived from
(4)) spans across the two non-colocated nodes, whereas A0
(from (7)) takes into account the remaining co-located nodes.
C. Availability model of heterogeneous co-located scheme
The next model refers to a heterogeneous co-located scheme
of a CNR, and is depicted in Fig. 5 (parts included in dashed
rectangles A and D). Such a scheme represents a lightweight
co-location since the whole infrastructure from docker to
hardware can host different kinds of containers. In this case,
just one new element represented by another container has
been introduced (part D in the pertinent SRN). Similar to
the previous case, the inhibitory arc from Pu pD C K and t00
forces the co-located container to fail in case of a docker
failure, whereas the inhibitory arc from Pdn DC K to T00
rC N T
prevents that co-located containers could be working again
until docker gets repaired.
For a given marking i, the reward rate admits the following
1 if Pk
u pC N T γj1
u pC N T γj2,
0 otherwise.
Notice that, such a reward rate is similar to (6) except for the
fact that, now, the co-location is realized at container level,
thus, P00
u pC N T intervenes in (9). Accordingly, the steady-state
availability expression turns out akin to the one derived for
the previous homogeneous case, viz.
c I M S =A00
with ( j,j1,j2)P,S,I,H. Again, the product of Aj(derived
from (4)) spans across the two non-colocated nodes (P-CSCF
and S-CSCF), whereas A00
j1j2(built by starting from (9) as
for (7)) accounts for remaining co-located nodes j1and j2,
namely, I-CSCF and HSS.
D. Availability model of heterogeneous mixed scheme
This last SRN model pertains to a heterogeneous mixed
case that is a combination of homogeneous co-located and
heterogeneous co-located schemes; referring to Fig. 5, it
comprises parts A, B, C, and D. The introduction of part
C (logically connected to part B) guarantees the symmetry
with part D (logically connected to part A) and allows to
model the behavior of two separated sub-structures (each
composed of container(s), docker, and VM) which share the
same underlying infrastructure. The two inhibitory arcs (one
from P0
u p DC K and t000
C N T and another from P0
dn DC K to T000
rC N T )
admit the same interpretation, mutatis mutandis, offered for
the heterogeneous case. Let us now derive expressions for the
reward rate and the steady-state availability. Given marking i,
we express the reward rate as
1 if fPk
u pC N T γj1
u pC N T γj2g
u pC N T γj3
u pC N T γj4g
0 otherwise.
Notice that the above expression can be derived from (3) con-
sidering two couples of co-located containers with capacities
cj1(with j1representing I-CSCF), cj2(with j2representing
HSS), cj3(with j3representing P-CSCF), and cj4(with j4
representing S-CSCF) in accordance to Fig. 2(d). Hence, the
pertinent steady-state availability can be expressed as
c I M S =lim
i j ,(12)
with ( j1,j2,j3,j4)P,S,I,H. The reward function
i(j1,j2,j3,j4) is given by (11) and p000
i j is the corresponding
steady-state probability.
In this section, after a formal statement of the problem
of searching for the optimal cIMS configurations, we present
an automated procedure designed to render the performability
analysis more efficient.
A. Problem Formalization
Let setting Sbe a generic deployment of a cIMS infras-
tructure composed of a certain number of CNRs. Our goal is
to find the settings satisfying high availability requirements
at minimal cost (since CNRs can be variously combined
among them and with different schemes, the optimum could be
achieved by more than one setting). This optimization problem
can be formalized as follows.
Letting Ejbe the cost (expenditure) of node j, composed of
hCNRs, the overall cost of a cIMS setting is
Letting R={S:Ac I M S (S)A0}be the ensemble of settings
satisfying a steady-state availability requirement A0, the formal
solution of the problem amounts to:
S=arg min
S ∈R
We shall work under two assumptions. The first one concerns
the cost computation/assignment of a cIMS system. Since
a single cIMS node is composed of one or more CNRs,
we assume that the cost of a single CNR is the sum of
three dimensionless contributions: i) cost per container (CNT)
($01#2"(#'$ 0)3).4&/"&#'$0
!"#!"#$%&' &()*+,-.
484924:929%6 %4'; #%.
Fig. 6: Big picture of the procedure implementing
OptChains+ to support the performability assessment.
embodying the software logic and licenses, ii) cost per docker
and virtual machine (DCK+VM) that includes the operating
system, and iii) cost per hypervisor and hardware (HPV+HW)
representing the infrastructure cost. Each contribution is sup-
posed to be equally priced with a normalized cost amounting
to 1. Such assumption, in line with the policy pricing of top-
player services such as Amazon AWS or Microsoft Azure,
reflects the fact that software parts have a cost comparable
with physical parts since an extra amount due to licenses must
be considered. Needless to say, the proposed analysis can be
generalized by customizing (13) and by choosing different cost
The second assumption concerns the diversity between
CSCF containers (P, S, I) and the HSS, as the latter implies an
additional criticality due to the underlying database structure.
This issue is taken into account by imposing that the HSS
container provides one extra replica w.r.t. CSCF containers.
In other words, we impose that γH S S =γC SC F +1. To avoid
overburdened notation, we will use γin place of γCS C F .
B. Performability analysis through OptChains+ algorithm
In our analysis, we face a combinatorial search problem
across a huge number of possible redundancy schemes ob-
tained by variously combining CNRs and pertinent containers.
Our analysis is assisted by TimeNET [46], a powerful tool
for SRN model evaluation, whose functionalities have been
further enriched by means of a purposely designed Python-
based external module4implementing a multi-stage procedure
(sketched in Fig. 6) which:
automatically builds, replicates (to achieve redundancy),
and evaluates SRN models per cIMS node on the basis of
some parameters such as: desired scheme (homogeneous,
heterogeneous, etc.), Mean Time to Failure (MTTF) 1
and Mean Time to Repair (MTTR) 1for various layers,
desired steady-state availability target A0(0.99999 in our
case), cost per layer, value of γ;
automatically composes the series/parallel structures (set-
tings) through the RBD formalism, along with the overall
availability evaluation; at the same time, an extraction
of feasible settings satisfying the desired constraints is
4Available on request.
Algorithm 1: OptChains+
1Initialize the vector R’ containing all possible CNFs with
various parameters (schemes, λ,µ, A0, costs, γ);
2for CNF R’ do
3if #Container γthen
4SRN model evaluation (CNF)
5if AC N F A0then
Intermediate Input:R,g1,g2,g3,G
10 minC ost in f
11 for pRpc sc f do
12 calculate Ep
13 if Ep>g1·minCost then
14 continue;
15 end
16 for sRsc sc f do
17 calculate Es
18 if Ep+Es>g2·minCost
19 OR (Ap·As)<A0then
20 continue;
21 end
22 if homogeneous then
23 for iRic sc f do
24 calculate Ei
25 if Pk=p,s,iEk>g3·minCost
26 OR Qk=p,s,iAk<A0then
27 continue;
28 end
29 for hRhss do
30 calculate Eh
31 if Pk=p,s,i,hEk>G·minCost
32 OR Qk=p,s,i,hAk<A0then
33 continue;
34 end
35 minC ost min{minCost, Ec I M S }
36 save [cIMS, AcI M S , Ec I M S ]
37 end
38 end
39 end
40 else if co-located or heterog. then
41 for cRco lh et do
42 calculate Ec
43 if Pk=p,s,cEk>G·minCost
44 OR Qk=p,s,cAk<A0then
45 continue;
46 end
47 minC ost min{minCost, Ec I M S }
48 save [cIMS, AcI M S , Ec I M S ]
49 end
50 end
51 end
52 end
The described procedure has been embedded into an algo-
rithm dubbed OptChains+, whose pseudo-code is reported in
the column to the left. The first part (lines 1 9) embodies the
external call to TimeNET to build and evaluate SRN models
for single CNFs (made of one or more CNRs - see Fig. 3),
by retaining only the (feasible) CNFs that satisfy a given
availability constraint (AC N F A0, line 5). The rationale
behind this choice is to save computational resources for the
evaluation of the final availability Ac I M S , which is obtained as
the product of AC N F terms, one per node. We want to remark
that such external call (line 4) is just preparatory to obtain the
vector Rof feasible CNFs, thus, in case a different tool is
used, the rest of OptChains+ remains unaltered. The second
part of the algorithm aims at achieving a reduced number of
cIMS settings (matching specific costs Ec I M S and availabil-
ity criteria Ac I M S at the same time) for different schemes
(homogeneous (lines 22 39), co-located/heterogeneous (lines
40 50)). The intermediate inputs for such a second part
include: R,γ, and four weight factors (g1,g2,g3,G) adopted to
tune the pruning/searching process. The idea is to perform an
exhaustive search with pruning, starting to cycle on the sub-
vector Rpc sc f which includes all feasible Proxy-type CNFs
(line 11). The algorithm prunes all the settings with a cost
exceeding g1times the cost of cIMS. The variable minCost
represents the whole cIMS cost calculated/updated within a
cycle, and initialized at line 10. Then, when analyzing the
sub-vector Rsc sc f (line 16), OptChains+ prunes all the settings
whose total cost of Proxy and Serving-type CNFs exceeds g2
times the cIMS cost, or whose availability product Ap·As
is less than A0. Similar logic holds for: I-type containers
(line 23) and H-type containers (line 29). At line 31, the
weight factor Gindicates that an extra amount of settings
(with a cost increased by G% regarding the actual cost) is
retained for backup. The final output is a vector gathering:
all the feasible cIMS settings along with their availability
Ac I M S and cost Ec I M S (line 36 for homogenous schemes,
and line 48 for co-located/heterogeneous schemes). We remark
that any reasonable criterion can be pursued to select the
weight factors. The practical rule we adopted is based on the
assumption that each of the four nodes is worth 1/4 of the
whole cIMS deployment. Hence, the algorithm starts to prune
all settings whose P-CSCF cost exceeds its redoubled value,
thus, g1=1/2. Such a “conservative” reasoning is repeated
further ahead in OptChains+, so as to obtain the rescaled
weight factors g2=3/4 and g3=1. Ultimately, the value
of Gis set to 1.15, implying that we keep more settings than
needed (precisely, settings whose cost is increased by 15%),
with the aim of providing a broader set of cIMS combinations.
Intuitively, greater values of weight factors result in a more
conservative strategy since more settings are kept, but at the
cost of a higher computation time.
In our case study, that assumes a maximum of 6 containers
to deploy per CNR (an assumption in keeping with the
resource constraint of the experimental testbed), the variety of
redundancy schemes to analyze produces a number of settings
in the order of 1013 (consider combining 7 containers (0-6)
deployed across 4 nodes, and, then, composing a setting of 4
elements: (74)4).
%)"+%,-.//+*001 4%$&'"()
Fig. 7: Sketch of the experimental testbed relying on the
Clearwater architecture.
It is worth noting that, being OptChains+ a heuristic algo-
rithm, its time complexity is related to the choice of weight
factors. If they are high (conservative policy with few pruned
settings) the complexity could reach O(n4). Per contra, for low
weight factors (relaxed policy with few pruned settings - the
typical case), the complexity decays to O(n·log(n)) due to an
embedded ascending sorting operation within cost vectors.
With the proposed OptChains+ tuning, the number of set-
tings to analyze decreases to 105, and, on a standard PC (Intel
Core CPU i53230@2.60 GHz, with a RAM of 8 GB), the
whole procedure requires about 450 seconds to run (neglecting
the call to external tool TimeNET).
In Fig. 7 we sketch the deployed experimental testbed based
on the Clearwater platform that we exploit to support the load
assumption about the number of cIMS sessions representing
the adopted performance capacity indicator.
On a laptop with an Intel Core CPU i73630QM@2.40GHz
and with a RAM of 8 GB, we deploy two Linux-based virtual
machines (1 virtual Core and 2 GB of RAM per VM): the first
one serves as a containerized deployment of the whole cIMS
architecture including P-CSCF (Bono), S/I-CSCF (Sprout),
and HSS (Homestead). The second VM is a stress node
that executes some routines useful to perform a load stress
against the containerized platform. The test scenario considers
1000 concurrent IMS sessions with a BHCA (Busy Hour Call
Attempts) equal to 2.6 per user (in line with values provided
for VoLTE - see [47]). The resulting average call setup delay
is 80 msec, a value quite reasonable since the infrastructure
is deployed on the same node, so that interconnection delays
are negligible.
On the other hand, due to the lack of measured data
concerning MTTF and MTTR of containerized components,
we refer in part to the technical literature (see e.g [33]), and
in part to expert hints. Such parameters are shown in Table I.
The experiment allows for a performance demand Wrang-
ing from 2000 to 5000, assuming a performance capacity
c=1000 (both in terms of IMS sessions) and assuming, for
simplicity, cj=c. This basically means that, if a provider
needs to guarantee up to, say, 4000 concurrent IMS sessions,
we get γ=4, indicating the need for at least 4 containers.
Hence, according to the Wvalue, γranges from 2 to 5 (in
TABLE I: Parameters values. CNT and DCK repair times must
be interpreted as times spent to perform a software reboot.
Parameter Description Value
1CN T container MTTF (hour) 500
1 DC K docker daemon MTTF (hour) 1000
1V M virtual machine MTTF (hour) 2880
1 H PV hypervisor MTTF (hour) 2880
1 H W hardware MTTF (hour) 60000
1 C N T container MTTR (sec) 2
1 DC K docker daemon MTTR (sec) 5
1 V M virtual machine MTTR (hour) 1
1 H PV hypervisor MTTR (hour) 2
1 H W hardware MTTR (hour) 8
Wperformance demand (IMS sessions) (2000, 5000)
cperformance capacity (IMS sessions) 1000
A0steady-state availability requirement 0.99999
case of non integer result, we consider the next integer value
for γ).
Aimed at considering a practical case, we analyze some
relevant settings (extracted among about 1000 produced by
the procedure) as reported in Table II. The column Scheme
indicates the type of cIMS deployment along with different
values of γ. With a little abuse of notation, homogeneous
(HOM.) scheme refers to a cIMS setting where all nodes
are composed of homogeneous CNRs, whereas in co-located
(COL.) and heterogeneous (HET.) schemes, I-CSCF and HSS
share the same CNR(s) of co-located and heterogeneous type,
respectively. For each scheme, we consider four exemplary
settings (S1, . . . , S4) where a maximum of 4 CNRs per node
are allowed. Let us clarify the notation adopted in Table II by
considering, for instance, setting S1in the co-located scheme
with γ=2. The notation [2 2 0 0] used for P-CSCF
indicates that 2 out of 4 (homogeneous) CNRs are exploited
and 2 containers per CNR are used. Similarly, for the S-CSCF
([1 1 1 0]), 3 out of 4 (homogeneous) CNRs are exploited
and 1 container per CNR is used.
A slightly different notation is used for I-CSCF and
HSS that share the same CNRs. In such a case,
[2,3 2,3 0,0 0,0] indicates that 2 out of 4 (co-located)
CNRs are exploited where 2 I-type containers and 3 H-type
containers are deployed per CNR, respectively.
Such a concise notation is also helpful to quickly compute
the cost Efor each setting. As regards the previous example,
the deployment cost EPfor P-CSCF amounts to 1 ·4 (CNT)
+1·2 (DCK+VM) +1·2 (HPV+HW); cost ESfor S-CSCF
amounts to 1 ·3 (CNT) +1·3 (DCK+VM) +1·3 (HPV+HW);
cost EI,Hfor co-located I-CSCF and HSS amounts to 1 ·10
(CNT) +1·4 (DCK+VM) +1·2 (HPV+HW). The total cost
amounts to E=EP+ES+EI,H=33.
Let us now explore the results in terms of availability and
costs for various settings through the panel of Figs. 8, where
we report the steady-state availability Ac I M S for different
values of γ. Let us consider, for instance, Fig. 8(b) showing
the case γ=3, where the four exemplary settings have been
grouped per scheme. Each bar indicates the availability value,
whereas the number inside reports the cost associated to that
particular setting. The horizontal dashed line represents the
“five nines” threshold that, if crossed, means that the pertinent
TABLE II: A selection of 4 exemplary settings (S1,S2,S3,S4) with different distributions of CNRs grouped per scheme and
for values of γranging from 2 to 5.
S1[2 2 0 0] [2 2 0 0] [2 2 0 0] [2 2 1 0] S1[3 3 0 0] [3 3 0 0] [3 3 0 0] [2 2 2 0]
HOM. S2[2 2 0 0] [2 2 0 0] [2 2 0 0] [3 3 0 0] HOM. S2[3 3 0 0] [3 3 0 0] [3 3 0 0] [4 4 0 0]
γ=2 S3[2 2 0 0] [2 2 0 0] [2 2 0 0] [3 4 0 0] γ=3 S3[3 3 0 0] [3 3 0 0] [3 3 0 0] [4 5 0 0]
S4[2 2 0 0] [2 2 0 0] [2 3 0 0] [3 3 0 0] S4[3 3 0 0] [3 3 0 0] [3 4 0 0] [4 4 0 0]
S1[4 4 0 0] [4 4 0 0] [4 4 0 0] [3 3 2 0] S1[5 5 0 0] [5 5 0 0] [5 5 0 0] [3 3 3 0]
HOM. S2[4 4 0 0] [4 4 0 0] [4 4 0 0] [5 5 0 0] HOM. S2[5 5 0 0] [5 5 0 0] [5 5 0 0] [6 6 0 0]
γ=4 S3[4 4 0 0] [4 4 0 0] [4 4 0 0] [5 6 0 0] γ=5 S3[5 5 0 0] [5 5 0 0] [5 6 0 0] [6 6 0 0]
S4[4 4 0 0] [4 4 0 0] [4 5 0 0] [5 5 0 0] S4[5 6 0 0] [5 5 0 0] [5 5 0 0] [6 6 0 0]
Scheme Setting P-CSCF S-CSCF I,H (CNR sharing) Scheme Setting P-CSCF S-CSCF I,H (CNR sharing)
S1[2 2 0 0] [1 1 1 0] [2,3 2,3 0,0 0,0] S1[3 3 0 0] [3 3 0 0] [3,2 3,2 0,2 0,0]
COL. S2[2 2 0 0] [2 2 0 0] [2,3 2,3 0,0 0,0] COL. S2[3 3 0 0] [3 3 0 0] [3,3 3,3 0,1 0,1]
γ=2 S3[2 2 0 0] [2 3 0 0] [2,3 2,3 0,0 0,0] γ=3 S3[3 3 0 0] [3 3 0 0] [3,2 3,2 0,2 0,2]
S4[2 2 0 0] [2 2 0 0] [2,3 2,3 0,0 0,0] S4[3 4 0 0] [3 3 0 0] [3,3 3,3 0,1 0,1]
S1[4 4 0 0] [4 4 0 0] [2,3 2,3 2,2 0,0] S1[5 5 0 0] [5 5 0 0] [3,3 3,3 2,3 0,0]
COL. S2[4 4 0 0] [4 4 0 0] [3,3 3,3 1,2 1,2] COL. S2[5 5 0 0] [2 3 3 3] [2,3 3,3 3,3 3,0]
γ=4 S3[4 4 0 0] [4 4 0 0] [2,3 2,3 2,2 2,2] γ=5 S3[2 3 3 3] [5 5 0 0] [2,3 3,3 3,3 3,0]
S4[4 5 0 0] [4 4 0 0] [3,3 3,3 1,2 1,2] S4[5 5 0 0] [2 3 3 4] [2,3 3,3 3,3 3,0]
S1[1 1 1 0] [2 2 0 0] [2,3 2,3 0,0 0,0] S1[3 3 0 0] [3 3 0 0] [2,2 2,2 1,2 0,0]
HET. S2[2 2 0 0] [2 2 0 0] [2,3 2,3 0,0 0,0] HET. S2[3 3 0 0] [3 3 0 0] [3,3 3,3 0,1 0,1]
γ=2 S3[2 2 0 0] [2 3 0 0] [2,3 2,3 0,0 0,0] γ=3 S3[3 3 0 0] [3 3 0 0] [3,1 3,2 0,2 0,3]
S4[2 3 0 0] [2 2 0 0] [2,3 2,3 0,0 0,0] S4[3 3 0 0] [3 3 3 0] [2,2 2,2 1,2 0,0]
S1[4 4 0 0] [4 4 0 0] [2,3 2,3 2,2 0,0] S1[5 5 0 0] [5 6 2 0] [3,3 3,3 2,3 0,0]
HET. S2[4 4 0 0] [2 2 2 2] [2,3 2,3 2,2 0,0] HET. S2[5 5 0 0] [2 3 3 3] [3,3 3,3 2,3 0,0]
γ=4 S3[2 2 2 2] [4 4 0 0] [2,3 2,3 2,2 0,0] γ=5 S3[2 3 3 3] [5 5 0 0] [3,3 3,3 2,3 0,0]
S4[4 4 0 0] [2 2 2 2] [3,3 1,3 1,2 0,0] S4[5 5 0 0] [2 3 3 4] [3,3 3,3 2,3 0,0]
E =31
(a) Availability for the case γ=2.
E =37
)"*+",-.%/ !%.%0"$%&%"'(
(b) Availability for the case γ=3.
E =44
(c) Availability for the case γ=4.
E =56
!"#"$%&%"'( )"*+",-.%/ !%.%0"$%&%"'(
(d) Availability for the case γ=5.
Fig. 8: Steady-state availability considering 4 exemplary settings (S1,S2,S3,S4) per scheme for: γ=2,3,4,5.
setting does not match the availability requirement.
In order to put forth some unexpected and interesting
behaviors, for each case we report also a setting satisfying the
“four nines” but not the “five nines” condition. This is the case
of S1, whose availability values are 0.999985, 0.999986, and
0.999987, for homogeneous, co-located, and heterogeneous
schemes, respectively.
Let us now focus on the homogeneous scheme: among
settings S2,S3, and S4that barely satisfy the availability
constraint, S2has the lowest cost (E=42), so we elect this
setting as the best one. Notice that S1achieves the same cost as
S2, but with a different distribution of containers in HSS node
(see Table II). For the co-located scheme, we also consider
S2as optimal although the same cost (E=44) is achieved
by S3but at lower availability level (0.9999911 for S3vs.
0.9999925 for S2). This notwithstanding, a network designer
could more comfortably choose S3, should the uniformity
of replica distribution be at a premium (related, may be, to
deployment flexibility). In such a case, in fact, being all CSCF
nodes equal, HSS exhibits distributions [2 2 2 2] for S3
and [3 3 1 1] for S2.
Similar considerations hold true for the heterogeneous
scheme where, again, S2emerges as the setting satisfying the
best trade-off between availability and cost.
Now, consider the case γ=5, whose availability results are
shown in Fig. 8(d). In comparison to case γ=3, two facts
emerge: first, the availability values for settings S2,S3, and S4
(those able to guarantee the “five nines”) are almost equal (in
each scheme), and are very close to the 0.99999 threshold. Ba-
sically, this is related to the need of achieving high availability
requirements with a more challenging performance level that,
in turn, implies more redundancy at the container level for
all considered settings and schemes. Second, the difference in
terms of costs between homogeneous and co-located settings
becomes more pronounced. This behavior can be explained as
follows. For γ=5, the system has to manage a greater number
of cIMS sessions w.r.t. the case γ=3, which in turn implies
that we need more containers. When the number of containers
grows, the co-located setting is filled more “quickly” than the
homogeneous one, thus more DCK and VM layers are needed,
resulting in additional costs.
In conclusion, watching from afar the availability results,
two aspects should be highlighted. The first aspect concerns
the monotonic increase of the cost with γ, since more re-
sources are needed (in terms of CNRs and/or containers). The
second aspect pertains to the choice of a particular scheme
among the three considered: according to the performed
analysis, in fact, co-located and heterogeneous schemes offer
the best trade-off in terms of cost and availability when the
performance level is not so high. Basically, this is due to the
possibility of arranging containers in a more ductile way, since
the homogeneous scheme forces to introduce a new CNR in
case different types of containers have to be deployed.
On the other hand, when high performance level is required,
the redundancy at CNR level is needed also for co-located
and heterogeneous schemes, thus, homogeneous arrangement
becomes more attractive in terms of cost reduction.
A. Sensitivity analysis
We carry out a sensitivity analysis useful, from the de-
signer’s perspective, to cope with parameters uncertainty.
Precisely, we evaluate the effects of drifts from nominal values
(see Table I) for six critical parameters: failure and repair times
pertaining to container, docker, and virtual machine layers.
The results are reported in the panel of Figs. 9, starting from
the best settings (S2) derived in the previous analysis for the
case γ=5. Let us first analyze the Fig. 9(a) showing the
sensitivity analysis for the container failure time (1C N T ). It
is possible to observe that, for the case of co-located scheme,
the failure time can be reduced from its nominal value (500
hours, circled in red) to about 370 hours with no side effects on
the availability, since the corresponding curve remains above
the horizontal dashed line (five nines limit). For homogeneous
and heterogeneous schemes, such analysis reveals a similar
behavior (see the zoomed inset), but with more stringent
margins since nominal values can be reduced from 500 hours
to not less than 480 hours. However the improved robustness
for co-located scheme is paid in the coin of a higher cost (see
Fig. 8(d)).
Similar arguments hold for the container repair time sensitivity
as shown in Fig. 9(b). In fact, the nominal value of 1C N T
can be relaxed from 2 seconds to about 2.5 seconds for the co-
located scheme, and to about 2.15 seconds for homogeneous
and heterogeneous schemes.
Similar results come from Figs. 9(c) and 9(d) for docker
failure and repair times, respectively. On one hand, we can
see that nominal value of 1DC K can be decreased from
1000 hours to about 640 hours (co-located) or to about 900
hours (homogeneous and heterogeneous cases). On the other
hand, the nominal value of 1DC K can be relaxed from 5
seconds to about 9 seconds for the co-located scheme, and to
about 5.6 seconds in case of homogeneous and heterogeneous
schemes. Also in this case, the improved robustness to devia-
tion of docker parameters is paid in terms of a higher cost of
Finally, we analyze the sensitivity for VM failure and
repair times as reported in Figs. 9(e) and 9(f), respectively.
The nominal value of 1V M can be diminished from 2880
hours to 2870 hours (co-located) and to 2878 hours (ho-
mogeneous/heterogeneous) with no side effects on the high
availability requirement. On the contrary, the margin for the
parameter 1V M is even more stringent: it can be relaxed
from 1 hour to 1 hour and 1 second (co-located) and to 1
hour and 7 seconds (homogeneous/heterogeneous).
In summary, the sensitivity analysis reveals that the robust-
ness of the whole cIMS is influenced by two factors: i) the
robustness of individual layers, that, for some cases (CNT,
DCK) exhibits a reasonable margin, whereas in other cases
(VM) is practically not amenable to any significant deviation;
ii) the type of deployment where, typically, the co-located
scheme offers more room for manoeuvre.
This work represents, to our knowledge, the first attempt
for a performability assessment of a container-based IP Multi-
media Subsystem (cIMS), a particular realization of a Service
250 300 350 400 450 500 550 600 650 700 750
495 500 505
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
1.998 1.999 2 2.001 2.002
500 600 700 800 900 1000 1100 1200 1300 1400 1500
!!!!"# "#$%
980 990 1000 1010 1020
3 3.5 4 4.5 5 5.5 6 6.5 7
!!!!"# "#$
4.98 4.99 5 5.01 5.02
1500 2000 2500 3000 3500 4000
!!!!" "#$%
2880 2880.2 2880.4 2880.6 2880.8 2881
2879.9 2880 2880.1
0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
!!!!" "#$%
0.99998 0.99999 1 1.00001 1.00002
0.99998 1 1.00002
Fig. 9: Influence on the overall cIMS infrastructure (case γ=5) of: container failure time (a), container repair time (b),
docker failure time (c), docker repair time (d), virtual machine failure time (e), virtual machine repair time (f). Nominal values
(reported in Table I) are circled in red.
Function Chain. We adopt a two-level hierarchy approach to
model cIMS by invoking two frameworks: Reliability Block
Diagram (RBD) and Stochastic Reward Networks (SRN). The
former, to capture high level interconnections among cIMS
nodes; the latter, to characterize the probabilistic behavior of
each node in terms of failure and repair events. First, we set
up an experimental testbed relying on a containerized IMS
platform (Clearwater) to derive a performance benchmark in
terms of maximum IMS sessions simultaneously supported.
Then, we derive an optimal set of cIMS deploy-
ments (organized in a taxonomy including homogeneous/co-
located/heterogeneous schemes) satisfying the “five nines”
availability requirement at minimum cost, for a given perfor-
mance demand. The analysis is supported by TimeNET (a
well assessed tool for SRN evaluation), and by an algorithm
(OptChains+) which allows to: i) automatically build and
evaluate SRN models by starting from specs of single cIMS
nodes, ii) automatically build and evaluate RBD models for the
high-level composition of cIMS settings, iii) assign/compute
costs and extract feasible settings.
Numerical results suggest that, when the performance level
is not particularly demanding, heterogeneous and co-located
schemes offer a better availability/cost trade-off than the
homogeneous setting. The latter, instead, is more suitable for
high performance levels. The assessment is enriched by a
sensitivity analysis to evaluate the robustness of cIMS archi-
tecture when deviations of some critical design parameters
from their nominal values take place. We have plans to extend
the analysis to other cIMS deployments that, for instance,
might include: co-location of multiple network nodes, more
sophisticated interconnections among involved elements, time-
varying load requirements as demanded by contemporary
Service Level Agreements. Other hints for future research
stem from the consideration that the method here presented
can be adapted, for the benefit of service management organi-
zations, to other infrastructures exhibiting a chained arrange-
ment. Among modern networking systems, examples include:
virtualized EPC nodes able to intervene in a service chain
thanks to the SDN paradigm; dedicated SFCs (virtualized
or containerized) composed, for instance, by firewalls, load
balancers, IDSs built as chained resources in virtual data
centers; virtualized Mobility Management Entities (vMMEs)
whose signaling flow is organized in a chained fashion.
[1] G. Davoli, W. Cerroni, C. Contoli, F. Foresta, F. Callegati, “Implemen-
tation of service function chaining control plane through OpenFlow,” in
2017 IEEE Conference on Network Function Virtualization and Software
Defined Networks , pp. 1–4, 2017.
[2] D. Borsatti, W. ,Cerroni, G. Davoli, F. Callegati, “Intent-based Service
Function Chaining on ETSI NFV Platforms,” in 2019 IEEE Conference
on Networks of the Future, pp. 144–146, 2019.
[3] Google cloud platform - container engine. Available online: https://,accessed:2020-04-28.
[4] Amazon EC2. Available online:,accessed:
2020-04- 28.
[5] ETSI Tech. Spec. 124 173 V15.2.0 (2018-09). Available on-
line: ts/124100 124199/124173/15.02.
00 60/ts 124173v150200p.pdf,accessed:2020- 04-28.
[6] Ericsson Tech. Rep., “Real-time interaction in 5G –
A use case example from the health care industry,
2019 [Online]. Available:
health-care- case-real-time-interaction-in- 5g-with- ims-data-channel.
pdf, accessed: 2019-08-01.
[7] Huawei Tech. Rep., “Vo5G Technical White Paper ,” 2018
[Online]. Available: insights/
technology/vo5g-technical-white-paper, accessed: 2019-08-01.
[8] J. Sun, G. Zhu, G. Sun, D. Liao, Y. Li, A.K. Sangaiah, M. Ra-
machandran, and V. Chang, “A Reliability-Aware Approach for Resource
Efficient Virtual Network Function Deployment,” IEEE Access, vol. 6,
pp. 18238–18250, 2018.
[9] A.S. Sendi, Y. Jarraya, M. Pourzandi, and M. Cheriet, “Efficient Provi-
sioning of Security Service Function Chaining Using Network Security
Defense Patterns,” IEEE Trans. Serv. Comput., vol. PP, no. 99, pp. 1–1,
[10] The Clearwater Project. Available online: http://www.projectclearwater.
[11] D. Cotroneo, R. Natella, and S. Rosiello, “NFV-throttle: An overload
control framework for network function virtualization,IEEE Trans.
Netw. Service Manag., vol. 14, no. 4, pp. 949–963, 2017.
[12] D. Cotroneo, L. De Simone and R. Natella, “NFV-Bench: A Depend-
ability Benchmark for Network Function Virtualization Systems,IEEE
Trans. Netw. Service Manag., vol. 14, no. 4, pp. 934–948, 2017.
[13] M. Di Mauro, A. Liotta, “Statistical Assessment of IP Multimedia Sub-
system in a Softwarized Environment: A Queueing Networks Approach,
IEEE Trans. Netw. Service Manag., vol. 16, no. 4, pp. 1493–1506, 2019.
[14] C. Negus, W. Henry, Docker Containers. Prentice-Hall, 1 ed., 2015.
[15] Y. Zhang, Network Function Virtualization: Concepts and Applicability
in 5G Networks (cap.2, par. 2.2.3). Hoboken (NJ), Wiley-IEEE Press,
Inc., 1 ed., 2018.
[16] R. Cziva, and D.P. Pezaros, “Container Network Functions: Bringing
NFV to the Network Edge,” IEEE Commun. Mag., vol. 55, no. 6, pp. 24–
31, 2017.
[17] Recommendation E.800. Available online:
[18] H. Chantre, and N.L.S. Fonseca, “Reliable Broadcasting in 5G NFV-
Based Networks,” IEEE Commun. Mag., vol. 56, no. 3, pp. 218–224,
[19] NEC Corporation, “NEC Virtualized Evolved Packet Core - vEPC,”
2014 [Online]. Available:
white paper w.cover final.pdf, accessed: 2019-08-01.
[20] Ericsson Review, “Virtualizing network services - the telecom cloud,”
2014 [Online]. Available:
publications/ericsson-technology- review/docs/2014/er-telecom-cloud.
pdf, accessed: 2019-08-01.
[21] H. Jin, Y. Jin, H. Lu, C. Zhao and M. Peng, “NFV and SFC: A Case
Study of Optimization for Virtual Mobility Management,IEEE J. Sel.
Area Comm., vol. 36, no. 10, pp. 2318–2332, 2018.
[22] R. Ghosh, F. Longo, F. Frattini, S. Russo, and K. S. Trivedi, “Scalable
analytics for IaaS cloud availability,” IEEE Trans. Cloud Comput., vol. 2,
no. 1, pp. 57–70, 2014.
[23] B.R. Haverkort, R. Marie, G. Rubino, and K.S. Trivedi, Performability
modelling techniques and tools. Chichester(UK), John Wiley and Sons,
Ltd., 2001.
[24] K. Nagaraja, G. Gama, R. Bianchini, R.P. Martin, W. Meira, and
T.D. Nguyen “Quantifying the performability of cluster-based services,
IEEE Trans. Parallel Distrib. Syst, vol. 16, no. 5, pp. 456–467, 2005.
[25] R. Matos, J. Dantas, J. Araujo, K.S. Trivedi, and P. Maciel, “Redundant
Eucalyptus private clouds: Availability modeling and sensitivity analy-
sis,” Journal of Grid Computing, vol. 15, no. 1, pp. 1–22, 2017.
[26] M. C. Bezerra, R. Melo, J. Dantas, P. Maciel and F. Vieira, “Availability
modeling and analysis of a VoD service for eucalyptus platform,” in
2014 IEEE International Conference on Systems, Man, and Cybernetics,
pp. 3779–3784, 2014.
[27] Z. Hong, M. Shi, Y. Wang“CTMC-Based Availability Analysis of Multi-
ple Cluster Systems with Common Mode Failure” in 2016 IEEE Confer-
ence on Applied Computing and Information Technology, pp. 394–396,
[28] W. Li, A. Kanso, “Comparing Containers versus Virtual Machines for
Achieving High Availability,” in 2015 IEEE International Conference
on Cloud Engineering, pp. 353–358, 2015.
[29] D. Bruneo, “A Stochastic Model to Investigate Data Center Performance
and QoS in IaaS Cloud Computing Systems,” IEEE Trans. Parallel
Distrib. Syst, vol. 25, no. 3, pp. 560–569, 2014.
[30] J. Fan, C. Guan, Y. Zhao, and C. Qiao, “Availability-aware mapping of
service function chains,” in IEEE INFOCOM 2017 - IEEE Conference
on Computer Communications, pp. 1–9, 2017.
[31] J. Liu, Z. Jiang, N. Kato, O. Akashi, and A. Takahara, “Reliability
evaluation for NFV deployment of future mobile broadband networks,
IEEE Wireless Commun., vol. 23, no. 3, pp. 90–96, 2016.
[32] J. Kong, I. Kim, X. Wang, Q. Zhang, H. C. Cankaya, W. Xie, T. Ikeuchi,
and J. P. Jue, “Guaranteed-availability Network Function Virtualization
with Network Protection and VNF replication,” in GLOBECOM 2017,
pp. 1–6, 2017.
[33] S. Sebastio, R. Ghosh, and T. Mukherjee, “An availability analysis
approach for deployment configurations of containers,” IEEE Trans.
Serv. Comput., vol. PP, no. 99, pp. 1–1, 2018.
[34] E. Andrade, B. Nogueira, R. Matos, G. Callou, and P. Maciel, “Availabil-
ity modeling and analysis of a disaster-recovery-as-a-service solution,
Computing, vol. 99, no. 10, pp. 929–954, 2017.
[35] M. Di Mauro, M. Longo, and F. Postiglione, “Availability Evaluation
of Multi-tenant Service Function Chaining Infrastructures by Multidi-
mensional Universal Generating Function,IEEE Trans. Serv. Comput.,
DOI: 10.1109/TSC.2018.2885748, 2018.
[36] M. Di Mauro, G. Galatro, M. Longo, F. Postiglione, and M. Tambasco,
“IP Multimedia Subsystem in an NFV environment: availability evalu-
ation and sensitivity analysis”, in IEEE NFV-SDN Conference, (Verona,
Italy, Nov. 2018).
[37] G. Camarillo, and M.A. Garcia-Martin The 3G IP Multimedia Subsys-
tem. New York, John Wiley and Sons, Inc., 3rd ed., 2008.
[38] Docker. Available online:,accessed:
2020-04- 28.
[39] CoreOS. Available online:,accessed:2020-04- 28.
[40] OpenVZ. Available online:,accessed:2020-04- 28.
[41] T. Combe, A. Martin, and R. Di Pietro, “To Docker or Not to Docker: A
Security Perspective,IEEE Cloud Computing, vol. 3, no. 5, pp. 54–62,
[42] Amazon AWS Lambda. Available online:
lambda/,accessed:2020- 04-28.
[43] S.I. Ahson, IP Multimedia Subsystem (IMS) Handbook. Broken Sound
Parkway (NW), CRC Press, 2008.
[44] J.K. Muppala, G. Ciardo, and K.S. Trivedi, “Stochastic Reward Nets for
Reliability Prediction,” in Communications in Reliability, Maintainabil-
ity and Serviceability, pp. 9–20, 1994.
[45] A. Reibman and R. Smith, and K.S. Trivedi, “Markov and Markov
reward model transient analysis: An overview of numerical approaches,
Europ. Journ. of Oper. Res., vol. 40, no. 2, pp. 257–267, 1989.
[46] R. German, C. Kelling, A. Zimmermann, and G. Hommel, “TimeNET: a
toolkit for evaluating non-Markovian stochastic Petri nets,Performance
Evaluation, vol. 24, no. 1-2, pp. 69–87, 1995.
[47] Tonse Telecom, “The LTE Data Storm in the Core of Your Network”,
White Paper, Jan. 2013.
Mario Di Mauro (Laurea in Electronics Engineer-
ing, Univ. of Salerno (Italy), 2005; MS in Network-
ing, Telecom Italia Learning Centre, 2006, PhD.
degree in information engineering, Univ. of Salerno,
2018). He was a Research Engineer with CoRiTeL
(Research Consortium on Telecommunications, led
by Ericsson Italy) and then a Research Fellow with
Univ.of Salerno. His main fields of interest include:
network availability, network security, data analysis
for telecommunication infrastructures.
Giovanni Galatro received the Laurea degree
(summa cum laude) in information engineering from
the University of Salerno (Italy) in 2018, and has
been a visiting student at Dept. of Computer Science
at Groningen University (Netherlands). In 2017 he
got a scholarship with Telecommunication and Ap-
plied Statistics groups, focused on the availability
analysis of modern telco infrastructures.
Maurizio Longo (Laurea in Electronics Engineering
, Univ. of Napoli (Italy), 1972; MSEE, Stanford
Univ., CA, 1977) retired in 2018 from the Univ. of
Salerno (Italy) as Full Professor of Telecommunica-
tions. In this university he also served as Department
Dean, as the Chairman of the Graduate School of
Information Engineering, and as the Director of the
CoRiTeL (Research Consortium on Telecommuni-
cations) Lab. He held academic positions also with
the Univ. Federico II (Napoli), the Parthenope Univ.
(Napoli), the Univ. of Lecce and the Aeronautical
Academy. He has authored over 180 papers in international journals and
conference proceedings, mainly in the fields of telecommunication networks.
Fabio Postiglione is currently an Assistant Professor
of Applied Statistics Univ. of Salerno (Italy). He
received his Laurea degree (summa cum laude) in
Electrical Engineering and his Ph.D. degree in Infor-
mation Engineering from Univ. of Salerno in 1999
and 2005, respectively. His main research interests
include degradation analysis, lifetime estimation,
reliability and availability evaluation of complex
systems (telecommunication networks, fuel cells),
Bayesian statistics and data analysis.
Marco Tambasco received his Master’s degree in
Electronic Engineering from Univ. of Salerno in
2010. He then joined CoRiTeL (Research Consor-
tium on Telecommunications). Research interests
include networks analysis and design, availability
and security of cloud-based telecommunication sys-
tems, Network Function Virtualization (NFV) and
Software Defined Networking (SDN) prototyping.
... For example, Petri-based formalisms provide a compact way to model the availability of chained structures through the analysis of the state changes. Among the works which exploit such a formalism we find: [32], including a VNF migration strategy where the underlying SFC has been modeled according to the Petri formalism; stochastic Petri networks (SPNs) have been exploited in [33] to set an automatic method useful to evaluate the availability of SFCs; authors in [34] propose a comparative analysis of different SFC configurations exploiting the stochastic reward networks (SRNs) formalism, a variant of classic stochastic Petri networks with a reward function; SRN have been used also in [35] to characterize from an availability view point homogeneous and heterogeneous deployments of SFCs; stochastic activity networks (SANs) have been adopted in [36] and [37] to assess the availability of an end-to-end NFV-aware network service; generalized stochastic Petri networks (GSPNs) have been employed in [38] to model availability problems in data centers in charge of managing SFCs. ...
Full-text available
In modern telecommunication networks, services are provided through Service Function Chains (SFC), where network resources are implemented by leveraging virtualization and containerization technologies. In particular, the possibility of easily adding or removing network resources has prompted service providers to redefine some concepts including performance and availability. In line with this new trend, we propose a performability study of a multi-provider containerized IP Multimedia Subsystem (cIMS), an SFC-like infrastructure used in the core part of 4G/5G networks to handle multimedia sessions. On the one hand, performance issues are tackled by modeling each cIMS node in terms of a G/G/m queueing system to derive the Call Setup Delay (CSD), a performance metric related to the user-end experience in multimedia communications. On the other hand, availability issues are addressed through the Multi-State System (MSS) formalism, to take into account different performance rates of the system. Then, we devise an algorithm called PE-MUGF (Performability Evaluation through Multidimensional Universal Generating Function) to identify the minimum-redundancy cIMS configuration which meets given performance and availability targets at the same time. Finally, an extensive experimental analysis based on Clearwater, a containerized IMS testbed, allows us to estimate most of system parameters whose robustness is evaluated through a sensitivity analysis.
... However, this technique requires the installation of an operating system along with all software dependencies that leads to a significant overhead [3] which is more concerning as each VNF provides a single application. On the other hand, container technology [4] is an attractive alternative virtualization paradigm that does not require a complete operating system installation and significantly lowers the storage overhead [5], [6]. Moreover, layering is a property of container technology that allows different containers to share and reuse executables, object files, libraries, source codes, and scripts, which further lowers the storage and bandwidth requirements of containerazed applications [7]. ...
Full-text available
Edge computing provides computational resources in the vicinity of end-users to reduce delay compared to traditional remote clouds. However, the capacity of edge resources usually is not sufficient for the required computational demands. Therefore, it is necessary to design methods for employing these resources in an efficient manner. On the other hand, network function virtualization (NFV) is a promising solution to use the network resources in a more flexible way than traditional schemes. Although more focus has been on realization of NFV systems via virtual machines so far, recent studies show that container-based solutions can improve efficiency thanks to lightweight implementation and layered structure of containers. Nonetheless, to the best of our knowledge, there is no comprehensive study on the problem of orchestrating services composed of a chain of containerized network functions in edge networks. In this paper, we consider this scenario when service requests are submitted to the system and address important aspects of this problem such as downloading and sharing container layers and steering traffic among network functions. We present the formulation of the problem as an integer linear program (ILP) and prove its NP-hardness. Then, to handle this problem, we propose RCCO, a polynomial-time algorithm based on ideas from deterministic and randomized rounding framework. Our results from extensive evaluations show that the bandwidth consumption of the proposed algorithm compared to the optimal algorithm is higher by only about 4% while it can outperform baselines from literature by more than 37%.
... 15 The 4 renowned feeding approaches were the "microstrip line, coaxial probe (both contacting schemes), aperture coupling and proximity coupling (both non-contacting schemes)"; nevertheless, their patterns differ based on frequency. [16][17][18] Moreover, they reveal poorer back-radiation and XP features. [19][20][21] Commonly, the most well-known approaches for examining MPA were the cavity model, transmitting line model, and full-wave schemes. ...
Full-text available
The antenna design for a specified resonant frequency necessitates the computation of optimal values of different sizes. This is a harder task for microstrip patch antenna (MPA) since there is no precise numerical formula that leads to accurate solutions for designing these antennas. Presently, bio‐inspired approaches are widely deployed in numerous antenna designs and it has revealed an immense assurance in handling the rising necessities of antenna engineering for overall cost, reduced size, and enhanced performances. This work aims to introduce an optimal MPA design, where the antenna elements like patch length, patch height, substrate width, and substrate length are optimally tuned by a new hybrid optimization model. For this, a new hybridized model known as shark smell integrated EHO is proposed. Eventually, the primacy of the suggested model is scrutinized via varied assessments.
... Current approaches do not consider edge components or IoT environments. For example, Cerveira et al. deal with common mode failures on cloud environments [47] and Mauro et al. use containers to migrate services in case of failure [48]. The IoTManS graph approach considers the end-to-end impact described on Layer 3 dataflow, not only on the cloud but also on all IoTinuum stages. ...
Full-text available
The management of IoT solutions is a complex task due to their inherent distribution and heterogeneity. IoT management approaches focus on devices and connectivity, thus lacking a comprehensive understanding of the different software, hardware, and communication components that comprise an IoT-based solution. This paper proposes a novel four-layer IoT Management Architecture (IoTManA) that encompasses various aspects of a distributed infrastructure for managing, controlling, and monitoring software, hardware, and communication components, as well as dataflows and data quality. Our architecture provides a cross-layer graph-based view of the end-to-end path between devices and the cloud. IoTManA has been implemented in a set of software components named IoT management system (IoTManS) and tested in two scenarios—Smart Agriculture and Smart Cities—showing that it can significantly contribute to harnessing the complexity of managing IoT solutions. The cross-layer graph-based modeling of IoTManA facilitates the implemented management system (IoTManS) to detect and identify root causes of typically distributed failures occurring in IoT solutions. We conducted a performance analysis of IoTManS focusing on two aspects—failure detection time and scalability—to demonstrate application scenarios and capabilities. The results show that IoTManS can detect and identify the root cause of failures in 806ms to 90,036ms depending on its operation mode, adapting to different IoT needs. Also, the IoTManS scalability is directly proportional to the scalability of the underlying IoT Platform, managing up to 5,000 components simultaneously.
... Most functionalities are embodied into a designed-from-scratch algorithm whose structure is inherited from the OptChains+ algorithm conceived by the same authors and available in [12]. ...
Most softwarized telco services are conveniently framed as service function chains (SFCs). Indeed, being structured as a combination of interconnected nodes, service chains may suffer from the single point of failure problem, meaning that an individual node malfunctioning could compromise the whole chain operation. To guarantee highly available (HA) levels, service providers are required to introduce redundancy strategies to achieve specific availability demands, where cost constraints have to be taken into account as well. Along these lines we propose HASFC, a framework designed to support, through a dedicated REST interface, the MANO infrastructure in deploying SFCs with an optimal availability/cost trade-off. Our framework is equipped with: an availability model builder aimed to construct probabilistic models of the SFC nodes in terms of failure/repair actions; and a chaining and selection module to compose the possible redundant SFCs, and extract the best candidates thereof. Beyond providing architectural details, we demonstrate the functionalities of HASFC through a use case that considers the IP Multimedia Subsystem, an SFC-like structure adopted to manage multimedia contents within 4G/5G networks.
... The second one is to extract (through a customized and embedded heuristic search algorithm) a subset of SFCs satisfying the best trade off between availability requirements and deployment costs; • REST API compliance: although HASFC can be used as a stand alone framework, it exposes a REST interface, thus allowing the interaction with most of the existing MANO infrastructures. Most functionalities are embodied into a designed-fromscratch algorithm whose structure is inherited from the OptChains+ algorithm conceived by the same authors and available in [11]. ...
Full-text available
Most softwarized telco services are conveniently framed as Service Function Chains (SFCs). Indeed, being structured as a combination of interconnected nodes, service chains may suffer from the single point of failure problem, meaning that an individual node malfunctioning could compromise the whole chain operation. To guarantee "highly available" (HA) levels, service providers are required to introduce redundancy strategies to achieve specific availability demands, where cost constraints have to be taken into account as well. Along these lines we propose HASFC (standing for High Availability SFC), a framework designed to support, through a dedicated REST interface, the MANO infrastructure in deploying SFCs with an optimal availability-cost trade off. Our framework is equipped with: i) an availability model builder aimed to construct probabilistic models of the SFC nodes in terms of failure and repair actions; ii) a chaining and selection module to compose the possible redundant SFCs, and extract the best candidates thereof. Beyond providing architectural details, we demonstrate the functionalities of HASFC through a use case which considers the IP Multimedia Subsystem, an SFC-like structure adopted to manage multimedia contents within 4G and 5G networks.
Multi-access edge c omputing (MEC)-enabled Internet of Things (IoT) is considered as a promising paradigm to deliver computation-intensive and delay-sensitive services to users. IoT service requests can be served by multiple m icro s ervices (MSs) that form a chain, called a micro s ervice c hain (MSC). However, the high complexity of MSs and security threats in MEC-enabled IoT pose new challenges to MSC dependability. Proactive rejuvenation techniques can mitigate the impact of resource degradation of MSs and host o perating s ystems (OSes) executing them. In this paper, we develop a multi-dimensional semi-Markov model to investigate the effectiveness of proactive rejuvenation techniques in improving the dependability (availability and reliability) of a dynamic and heterogeneous MSC. The results of numerical experiments firstly reveal how MSs can be effectively combined, in different deployment configurations, with host OSes to improve MSC dependability, secondly jointly optimize the rejuvenation trigger intervals of host OS and MSs running on it, and finally show the impact of time-varying parameters. We also identify the bottlenecks for MSC dependability improvement by sensitivity analysis, and give the ranges of important parameter values guaranteeing five-nines availability. In addition, the superiority of our model is demonstrated by comparison with the continuous-time Markov chain model.
Conference Paper
The promise of telecommunication networks to deliver more demanding and complex applications requires them to become more flexible and efficient. To achieve better performance, telecommunication networks adopt technologies such as NFV (Network Function Virtualization). However, this evolution also brings more potential risks to the telecommunication network. Reliability and resilience are becoming critical for service delivery in the networks. To answer to service requirements of high level availability and reliability, a model with a global view of infrastructure, virtual network elements, and network layer structure is required. Toward this end, this paper presents a Petri Net method to model 5G and beyond telecommunication networks. We introduce an extended Petri Net to model physical infrastructure, virtual infrastructure, network services, their behaviors, and dependencies. We present a simulation result on network availability estimation. This result shows the potential of the Petri Net-based model to be applied to a complex telecommunication system resilience assessment.
The virtual network function (VNF) is a virtual machine with specific software. A set of related VNFs can be represented as a network service (NS). The placement of all the VNFs of an NS can be in a single host node, and in multiple host nodes. Also, the placement of VNFs of an NS can be done using mixed-mode. In this work, we first analyze the NS availability considering the deployment of NS with the use of multiple host nodes, single host node, and mixed-mode. In the availability analysis, we consider the failure perspective of VNFs as well as the failure perspective of host node(s). Further, we analyze the NS reliability considering the placement of VNFs of NS based in different host nodes, single host node, and mixed-mode. Then we compare the availability as well as the reliability of NS considering these three placement strategies. Comparison results show that the availability, as well as the reliability, are better considering single host node based placement of VNFs of NS.
Full-text available
The Next Generation 5G Networks can greatly benefit from the synergy between virtualization paradigms, such as the Network Function Virtualization (NFV), and service provisioning platforms such as the IP Multimedia Subsystem (IMS). The NFV concept is evolving towards a lightweight solution based on containers that, by contrast to classic virtual machines, do not carry a whole operating system and result in more efficient and scalable deployments. On the other hand, IMS has become an integral part of the 5G core network, for instance, to provide advanced services like Voice over LTE (VoLTE). In this paper we combine these virtualization and service provisioning concepts, deriving a containerized IMS infrastructure, dubbed cIMS, providing its assessment through statistical characterization and experimental measurements. Specifically, we: i) model cIMS through the queueing networks methodology to characterize the utilization of virtual resources under constrained conditions; ii) draw an extended version of the Pollaczek-Khinchin formula, which is useful to deal with bulk arrivals; iii) afford an optimization problem focused at maximizing the whole cIMS performance in the presence of capacity constraints, thus providing new means for the service provider to manage service level agreements (SLAs); iv) evaluate a range of cIMS scenarios, considering different queuing disciplines including also multiple job classes. An experimental testbed based on the open source platform Clearwater has been deployed to derive some realistic values of key parameters (e.g. arrival and service times).
Full-text available
The Network Function Virtualization (NFV) paradigm has been devised as an enabler of next generation network infrastructures by speeding up the provisioning and the composition of novel network services. The latter are implemented via a chain of virtualized network functions, a process known as Service Function Chaining. In this paper, we evaluate the availability of multi-tenant SFC infrastructures, where every network function is modeled as a multi-state system and is shared among different and independent tenants. To this aim, we propose a Universal Generating Function (UGF) approach, suitably extended to handle performance vectors, that we call Multidimensional UGF. This novel methodology is validated in a realistic multi-tenant telecommunication network scenario, where the service chain is composed by the network elements of an IP Multimedia Subsystem implemented via NFV. A steady-state availability evaluation of such an exemplary system is presented and a redundancy optimization problem is solved, so providing the SFC infrastructure which minimizes deployment cost while respecting a given availability requirement.
Full-text available
Network function virtualization (NFV) is a promising technique aimed at reducing capital expenditures (CAPEX) and operating expenditures (OPEX), and improving the flexibility and scalability of an entire network. In contrast to traditional dispatching, NFV can separate network functions from proprietary infrastructure and gather these functions into a resource pool that can efficiently modify and adjust service function chains (SFCs). However, this emerging technique has some challenges. A major problem is reliability, which involves ensuring the availability of deployed SFCs, namely, the probability of successfully chaining a series of virtual network functions (VNFs) while considering both the feasibility and the specific requirements of clients, because the substrate network remains vulnerable to earthquakes, floods and other natural disasters. Based on the premise of users’ demands for SFC requirements, we present an Ensure Reliability Cost Saving (ER_CS) algorithm to reduce the CAPEX and OPEX of telecommunication service providers (TSPs) by reducing the reliability of the SFC deployments. The results of extensive experiments indicate that the proposed algorithms perform efficiently in terms of the blocking ratio, resource consumption, time consumption and the first block.
A horizontal view of newly emerged technologies in the field of network function virtualization (NFV), introducing the open source implementation efforts that bring NFV from design to reality. This book explores the newly emerged technique of network function virtualization (NFV) through use cases, architecture, and challenges, as well as standardization and open source implementations. It is the first systematic source of information about cloud technologies' usage in the cellular network, covering the interplay of different technologies, the discussion of different design choices, and its impact on our future cellular network. Network Function Virtualization: Concepts and Applicability in 5G Networks reviews new technologies that enable NFV, such as Software Defined Networks (SDN), network virtualization, and cloud computing. It also provides an in-depth investigation of the most advanced open source initiatives in this area, including OPNFV, Openstack, and Opendaylight. Finally, this book goes beyond literature review and industry survey by describing advanced research topics such as service chaining, VNF orchestrations, and network verification of NFV systems. In addition, this resource: • Introduces network function virtualization (NFV) from both industrial and academic perspectives. • Describes NFV's usage in mobile core networks, which is the essence of 5G implementation. • Offers readers a deep dive on NFV's enabling techniques such as SDN, virtualization, and cloud computing. Network Function Virtualization: Concepts and Applicability in 5G Networks is an ideal book for researchers and university students who want to keep up with the ever-changing world of network function virtualization.
The emerging 5G technology will support high data rates with low latency and high levels of reliability. To satisfy these requirements, mobile operators have deployed enhancements to LTE networks in a move toward the development of future 5G networks. One of these enhancements is the LTE-broadcasting service. Another enhancement is the employment of network functions virtualization, which will provide elasticity of resources, scalability, and flexibility. In a virtualized LTE-broadcasting network, the main components of the LTE-broadcasting service will be implemented as VNFs. This article introduces a scheme to determine the number of redundant VNF components necessary to satisfy the reliability requirements of broadcasting services in 5G networks. A series-parallel redundant model determines the optimum number of virtual components instantiated to achieve a required service reliability level. The model is compared to a parallel-series one and shown to achieve greater effectiveness, achieving five nines reliability and promoting low end-to-end delay.
Operating system (OS) containers enabling the microservice-oriented architecture are becoming popular in the context of Cloud services. Containers provide the ability to create lightweight and portable runtime environments decoupling the application requirements from the characteristics of the underlying system. Services built on containers have a small resource footprint in terms of processing, storage, memory and network, allowing a denser deployment environment. While the performance of such containers is addressed in few previous studies, understanding the failure-repair behavior of the containers remains unexplored. In this paper, from an availability point of view, we propose and compare different configuration models for deploying a containerized software system. Inspired by Google Kubernetes, a container management system, these configurations are characterized with a failure response and migration service. We develop novel non-state-space and state-space analytic models for container availability analysis. Analytical as well as simulative solutions are obtained for the developed models. Our analysis provides insights on k out-of N availability and sensitivity of system availability for key system parameters. Finally, we build an open-source software tool powered by these models. The tool helps Cloud administrators to assess the availability of containerized systems and to conduct a what-if analysis based on user-provided parameters and configurations. IEEE