Content uploaded by Ilhem Fajjari
Author content
All content in this area was uploaded by Ilhem Fajjari on Mar 27, 2019
Content may be subject to copyright.
IEEE TRANS. ON BROADCASTING - 5G SPECIAL ISSUE, MARCH 2019 1
Next Generation Platform as a Service: Towards Virtualized
DVB-RCS2 Decoding System
Riwal KERHERVE 1, Julien LALLET 2, Laurent BEAULIEU 1, Ilhem FAJJARI 3,
Paul VEITCH 4, Jean DION 1, Bessem SAYADI 2, Laurent ROULLET 2
1b<>com, Rennes, France
2Nokia Bell Labs, Nozay, France
3Orange Labs, Chˆ
atillon, France
4British Telecom, Ipswich, United Kingdom
Platform-as-a-Service (PaaS) cloud computing services have emerged in the last few years, offering an abstraction framework to
enable the development, execution and management of applications, which is de-coupled from complex infrastructure considerations.
The next generation of PaaS solutions (NGPaaS) will have to address more stringent performance requirements compared to
traditional PaaSes. They will have to support Telco-specific requirements in terms of resource efficiency, availability and resilience.
According to the 5G promise of convergence between vertical and Telco markets, they will also have to be able to address new use
case scenarios. One of the possible scenarios to address could be the combining of Telco and vertical components as Digital Video
Broadcasting (DVB).
This paper demonstrates how NGPaaS features enable the implementation of 5G-oriented connectivity services in cloud data
centers. Regarding the convergence aspect, it aims to demonstrate how NGPaaS features facilitate the build of a combined Telco-
Broadcasting PaaS system on the cloud where a DVB-RCS2 system is able to bring its own components and integrate them with the
connectivity services deployed on the PaaS by a Telco operator. This is achieved by applying specific customizations and enhancements
to Kubernetes.
Index Terms—Broadcasting, DVB-S, FPGA, Cloud, Microservice, Kubernetes, Acceleration
I. INTRODUCTION
PLATFORM-as-a-Service (PaaS) cloud computing services
have emerged in the last few years, offering an abstraction
framework to enable the development, execution and man-
agement of applications, which is de-coupled from complex
infrastructure considerations. PaaS users share hardware and
software resources (operation systems, development environ-
ments, data base, network access, etc.) to develop and launch
their own applications. The next generation of PaaS solutions
(NGPaaS) will have to address more stringent performance
requirements compared to traditional PaaSes. They will have
to support Telco-specific requirements in terms of resource
efficiency, availability and resilience [1]. According to the 5G
promise of convergence between vertical and Telco market
segments, they will also have to be able to address cloud use
case scenarios combining Telco and vertical components, such
as Digital Video Broadcasting (DVB) applications.
A. Introduction to NGPaaS
In recent years, cloud computing has rapidly become an
unavoidable and prevalent tool used by every major industry.
On the one hand, computing virtualization is a growing
environment dedicated to application usage and getting more
accurate to the needs of software providers. On the other hand,
PaaS systems offer software providers a very rich environment
in which to build, deploy, and run applications that ease the
management of a service.
Manuscript received November 9, 2018. Corresponding author: J. Lallet
(email: julien.lallet@nokia.com).
This paper aims to demonstrate how NGPaaS [2] enables
the implementation of a 5G-oriented connectivity service.
Regarding the convergence aspect, it aims to demonstrate
how NGPaaS features facilitate the build of a combined
Telco-Broadcasting PaaS platform where vertical players are
able to bring their own components and integrate them with
the connectivity services deployed on the PaaS by a Telco
operator. To support Telco-grade features, this paper mainly
focuses on cloud hardware acceleration solutions used in the
context of NGPaaS. Some efficiency and performance results
will be shown as examples of improvements of our solution.
The H2020 5GPPP Phase 2 Next Generation Platform-as-
a-Service project (NGPaaS) started in June 2017 and will
complete in June 2019. The project addresses several chal-
lenges, some of them are (i) the adoption of a cloud stack
architecture based on a layered approach (business/ dev-for-
operations/ platform/ and infrastructure), (ii) the use of compo-
nent modularity to implement the “build-to-order” principle for
services and platforms with recursion support and (iii) bringing
Telco-grade enhancements implemented either in the control,
orchestration, virtualization or operational frameworks. Three
prototypes which cover three domains have been identified to
illustrate the aforementioned challenges: Telco domain, IoT
domain and 5G domain.
Legacy and other monolithic platforms only provide
complex ways to interpret high level business requirements
and try to reduce this complexity by moving it to lower
levels. This often comes at the expense of flexibility.
NGPaaS, in contrast, has defined an architecture with an open
collaboration between all stakeholders involved in network
provisioning (vendors, service providers, etc.). As depicted
IEEE TRANS. ON BROADCASTING - 5G SPECIAL ISSUE, MARCH 2019 2
in Figure 1, NGPaaS offers a cloud stack architecture based
on layers: (i) Business, (ii) Dev-for-operations, (iii) Platform
and (iv) Infrastructure. It moves away from a hierarchical
cloud stack with a fixed set of features, to a modular and
distributed stack. Furthermore, the NGPaaS architecture relies
on two phases of orchestration: deployment of a specialized
PaaS, and deployment of service workloads on top of
already deployed PaaS(es). The way PaaSes and Services
are deployed is based on respective blueprints (PaaS and
Service Blueprint). They are both decomposed into Reusable
Functional Blocks (RFBs) which is a concept initially defined
in the EU-funded Superfluidity consortium [3], and extended
in NGPaaS. The RFB concept is fully leveraged in NGPaaS
to maximise flexibility and re-use of platform and service
components, thus enabling build-to-order customization.
To introduce and illustrate the RFB concept at a very high
level, Figure 2 shows the PaaS graph editor of a “Telco-grade
flavor1” variant of Kubernetes (more detailed rationale for
using Kubernetes as the baseline PaaS subject to customiza-
tions, will be provided in the next section). The figure depicts
how the RFB concept has been adapted to the components in
Kubernetes. Some internal dependencies in Kubernetes were
broken down to be “RFBized” into stand-alone components.
As conveyed in Figure 2, distinct categories of RFBs are used
in the 5G Telco PaaS. While some RFBs relate to a logical
grouping (e.g. network fb), others relate to the so-called leaf
RFBs (e.g. Calico[4]), which are at the lowest level since
they are the actual components that will be deployed. For
simplification, only the Kubernetes cluster deployed on the
Edge Cloud has been illustrated in Figure 2. The Central cloud
deployment would be depicted in a similar way. It is assumed
that in practical scenarios, the end-to-end service would be
comprised of both Edge and Central cloud components.
B. NGPaaS applied to Broadcasting Verticals
The second generation of DVB interaction channel for
Satellite Distribution System (also known as DVB-RCS2:
Return Channel via Satellite) has been specified by [5]. DVB-
RCS2 has been defined to provide a standardized broadband
interactivity connection as an extension of the Digital Video
Broadcasting Satellite systems. It specifies MAC and phys-
ical layer protocols of the air interface used between the
satellite operator hub and interactive user terminal. It also
specifies network layer and the essential functions of terminal
management and control planes. As application domain, a
typical Rural, Remote, and Difficult for access (RRD) areas
deployment scenario in the cloud has been chosen [6]. As
depicted in Figure 3, a DVB-S2 RCS hub connects devices
from a rural area to the 5G broadband core through a DVB-
S2 RCS terminal and a 5G hypercell retransmitter. The Cloud
Accelerated DVB-RCS2 physical layer deployment would take
place in the DVB-RCS Hub depicted in Figure 3. The physical
layer receiver related to [7] is implemented with a DVB-RCS2-
like turbo decoding feature and is described in section IV.
The remainder of the article is outlined as follows: Section II
introduces the 5G PaaS objectives, discusses background and
related work in the field. In section III, a detailed description
of the prototype will be provided, in terms of use case
coverage, PaaS requirements, service and PaaS components
to be implemented, a high-level view of its architecture and
monitoring aspects. Section IV will describe the broadcasting
case study chosen to demonstrate the efficiency of NGPaaS
and also the improved performance results concerning the
hardware virtualization solutions provided in the context of
the project. Finally, the last section will conclude this paper
followed by the enumeration of some perspectives of our work.
II. RE LATE D WORK
A. Introduction to PaaS service deployment
From a 5G perspective, PaaSes must meet new requirements
compared to traditional PaaSes associated with the IT domain.
First, they have to be Telco-Grade which means they have
to support Telco specifications in terms of performance,
availability and resilience [1]. A 5G PaaS should not
only facilitate building, shipping and running classical
virtual network functions (VNF) and applications with
“Telco-Grade” quality, it should also combine all sort of
third-party applications using those VNF for creating new
more versatile and powerful cloud objects, breaking silos
between connectivity and computing. In this section, we
will provide an overview of solutions for the automatic
deployment and execution of services in cloud environments.
1) PaaS for managing application lifecycle agility
With PaaS systems, customers can build, deploy, and run
applications. PaaS provides infrastructure, storage, database,
information, and process as a service, along with well-defined
Application Programming Interfaces (APIs), and services to
manage running applications, such as dashboards for mon-
itoring and service composition. PaaS also helps software
developers and service providers to migrate their solutions
to the Cloud as platform software maintenance is practically
eliminated.
PaaS systems rely on the emerging paradigm of microser-
vices [8]. The essence of microservices is to decompose
complex applications into independent self-contained services.
This is commonly realized by means of container images
that encapsulate all dependencies required by an application
[9]. Other microservices approaches exist, Amazon Lambda
for example [10] executes applications that consist of single
functions, while unikernels allow applications to be built
and integrated within a minimal operating system so that
application execution on a virtual machine is much more
efficient than with traditional virtualisation approaches [11].
Since the PaaS architecture is usually based on containers,
container managers/orchestrators are core components
of any PaaS systems [9]. Well-known orchestrators are
Docker Swarm [12] and Google Kubernetes [13]. The
former is designed to run applications embedded in Docker
containers and provides fast start up time. The latter is more
configurable, and provides additional services with a strong
focus on persistency and network management [14][15].
IEEE TRANS. ON BROADCASTING - 5G SPECIAL ISSUE, MARCH 2019 3
Fig. 1: NGPaaS Architecture and Key Roles
Fig. 2: Telco Grade Kubernetes PaaS RFBs
2) Toward 5G PaaS
Looking at the PaaS portfolio, Kubernetes is a prominent
candidate for efficient container-based VNF management and
orchestration. However, since it is biased towards IT applica-
tions, it lacks a number of features that are mandatory for the
management and performance guarantie of VNFs. To close
the gap in regards to Network Function Virtualization (NFV)
requirements, several features need to be added.
First, it is necessary to an Enhance Platform Awareness
(EPA) with:
•CPU pinning to avoid unpredictable latency and host
CPU overcommitment by dedicating CPUs to the Telco
containers,
•Non Uniform Memory Access (NUMA) awareness to
improve the utilization of compute resources for Telco
containers that need to avoid cross NUMA node memory
access,
•Huge pages to decrease the overhead of memory man-
agement by using larger page sizes.
Second, it is necessary to support multiple network inter-
faces to work around the single-interface-per-pod limitation
and to resort to a Container Networking Interface (CNI)
multiplexing approach based on the open-source Multus CNI
plugin [16].
Third, data plane acceleration using a Data Plane Devel-
opment Kit (DPDK) and Single Root – I/O Virtualization
(SR-IOV) has to be supported. DPDK can be supported
by bypassing the operating system kernel and moving all
packet processing into user space as SR-IOV can be supported
by bypassing the hypervisor and virtual switch and giving
multiple VNFs direct access to the same physical NIC.
A fourth requirement is to support Stream Control Transmis-
sion Protocol (SCTP) in Kubernetes as transport layer protocol
which would be useful for handling the signaling between
IEEE TRANS. ON BROADCASTING - 5G SPECIAL ISSUE, MARCH 2019 4
Fig. 3: Illustration of the deployment of a DVB-S2 terminal decoder PaaS
with DVB-RCS2 return channel decoding services [6]
some components of LTE networks.
A fifth requirement is to enable distributed multi-cluster
deployment to have clusters in different regions to be nearer to
the users, and to tolerate failures and/or invasive maintenance.
Multi-cluster orchestration facilitates the shift from a single
container host to clusters of container hosts to run container-
ized applications over multiple clusters in multiple clouds.
Finally, it is necessary to support application acceleration
using Field Programmable Gate Areas (FPGAs), by adding
an FPGA device plugin to Kubernetes allowing offloading
of application functions to Kubernetes (as discussed in later
sections).
The so-called Telco-Grade flavor1 extends the default Ku-
bernetes deployment with NUMA-aware CPU pinning and
SCTP capability and multiple network extension, whereas the
Telco-Grade flavor2 extends only with the SCTP capability. It
is worth noting that NUMA-aware cpu pinning corresponds
to a new CPU management policy with support for NUMA-
aware CPU pinning. The main idea is to make possible
the selection of specific logical CPUs while considering the
NUMA topology. When such a policy is activated, a pod
specification can provide a hint indicated on which NUMA
node its CPUs should be placed and allocated exclusively. This
makes it possible to avoid costly cross-NUMA communication
between a container and network interface.
B. FPGA acceleration in the context of virtualization
It can be argued that virtualization improves the flexibility
in how the processing resources are accessed by the supported
workloads, but at the cost of overall efficiency. As presented
previously, some solutions for achieving more efficient
processing are emerging in PaaSes. We have observed that
most of the biggest IT providers have started to use Field
Programmable Gate Areas (FPGAs) within their servers. In
this section, we will present some of the solutions which exist
today for FPGA processing in cloud computing environments.
We could observe that two main application domains are
targeted when FPGAs are used. One purpose is to accelerate
infrastructure functionality, while the second purpose is to
offer FPGA as a service to end-users for their own needs of
acceleration.
1) FPGA acceleration for IT infrastructure purpose
To the best of our knowledge, only one example can
be found where FPGAs are used for the IT infrastructure
directly and not for end-use purpose. Catapult is the
technology used behind Microsoft”s hyperscale acceleration
fabric [17]. The project started in 2010 with the purpose
of building a supercomputing substrate able to provide
computing acceleration in many domains such as network,
security, cloud services or artificial intelligence. Each data
center server has one connected FPGA making the Catapult
acceleration architecture distributed. In parallel, all FPGAs
being interconnected constitutes an elastic reconfigurable
acceleration fabric and provides the flexibility to harness an
individual FPGA or up to thousands of FPGAs for a single
service with 40 Gigaops/W energy efficiency for deployed
at-scale accelerators.
2) FPGA acceleration for the end-user usage
In a contrasting set of use cases which leverage FPGA
acceleration for cloud computing, we can find Amazon [18],
Accelize [19], [20], SuperVessel [21] or NetFPGA [22]. In
these cases, the proposed architectures are very similar. An
FPGA board is connected to the server through PCIe and can
be accessed by an end-user for their own purpose. Amazon
and Accelize are offering proprietary sets of development,
simulation, debugging and compilation tools. The end-user can
register his accelerator and deploy it to his virtual machine
instance. NetFPGA and SuperVessel can be thought of as
open-source variants of Amazon.
For all of these solutions, FPGAs are considered as
offloading resources and are not used by the IT infrastructures
themselves, nor concurrently shared among microservices
running on the same host.
IEEE TRANS. ON BROADCASTING - 5G SPECIAL ISSUE, MARCH 2019 5
III. HAR DWARE ACC ELERATIO N FO R VI RTUA LI ZE D
FUNCTIONS
The increased use of virtualization in data centers has led
to the use of new processing systems to achieve maximal
processing efficiency. As presented in the related work section,
FPGAs are a representative class of computing engine that can
help obtain such performance gains. In this section, we will
present enhancements developed in the context of NGPaaS that
facilitate FPGA virtualisation within cloud infrastructures.
A. Kubernetes and Docker FPGA plugins
Container-based microservices architectures have shown
great benefits in relation to software, as they are flexible,
scalable, modular and easy to deploy. As FPGA acceleration
starts to be available on cloud infrastructures, it is important
to find a way to efficiently use those FPGAs.
In an acceleration context, FPGAs are usually dedicated to
one specific task, where there is no flexibility, no dynamic
deployment and where the FPGA is not efficiently used (not
full, and not used all the time).
Thus, a novel plugin architecture is proposed in this paper
to enable deployment of FPGA accelerated applications into
Docker containers. As an illustrative example, Figure 4 shows
two containerized virtual functions sharing the same FPGA.
Containers can access the reconfigurable accelerators VF1 and
VF2 through a shared PCIe interface. To manage accelerator
units, a Kubernetes device plugin has been developed. It is
worthnoting that the device plugin is a framework provided
by Kubernetes for vendors to make them able to advertise
their resources whithout modifying Kubernetes core code. The
Kubernetes device plugin exposes the number of FPGA accel-
erator units available on a Kubernetes node. In this context,
a container can be automatically deployed as orchestrated by
Kubernetes, loading a Virtual Function (VF) in the FPGA.
As FPGA partial reconfiguration is not yet available on
public cloud offerings such as Amazon Web Service F1 (AWS)
or Intel DCP, this section will focus on FPGA orchestration,
considering one virtual function per FPGA. Partial reconfigu-
ration aspects will be discussed in the next section.
1) Docker FPGA Container plugin
The Dockerfile will contain environment variables defining
the acceleration resources needed by the container:
•ACCELERATOR DEVICES: comma separated list of
accelerator devices to be made accessible inside the
container.
•ACCELERATOR FUNCTIONS: expected acceleration
function(s) provided by the device(s). If only one function
name is provided, all devices should contain this function.
If a list of function names is provided, the first function
is associated to the first device, the second function to
the second device, and so on.
To instantiate a new container, the Docker client forwards
the request to the Docker daemon (dockerd). The daemon
relies then on the runc runtime process to spawn the container.
The tool runc, maintained by the Open Containers Initiative
(OCI), is used for spawning and running containers according
to the OCI specification. This specification defines the runc
Fig. 4: Docker FPGA Container
interface including the root file system path and two JavaSript
Object Notation (json) files for configuration and runtime
settings.
The accelerator-container plugin needs to customize the
container before it gets started, so the plugin should be
interfaced to Docker through a prestart hook.
As the Docker daemon does not manage the OCI specs
hooks yet (i.e. there is no way to configure these hooks), the
solution is to patch the runc program with the static addition
of the accelerator-container hook to the list of prestart hooks.
So the plugin will be made of three components :
•accelerator-container-runtime: the patched runc program
which will be used as Docker runtime instead of runc.
•accelerator-container-runtime-hook: the hook program
called by the runtime each time it starts a container.
•accelerator-container-runtime-tool: the C program called
by the hook implements all the container customization.
For each requested device, the runtime tool checks whether the
accelerator already contains the expected function, otherwise
it automatically loads the function bitstream to the accelerator
device based on the acceleration json.
Following are the steps for the runtime tool to customize
the container:
•attach devices node,
•mount bind device node special file from host FS to
container FS,
•limit memory usage resources,
•mount host directories,
•attach host libraries.
IEEE TRANS. ON BROADCASTING - 5G SPECIAL ISSUE, MARCH 2019 6
2) Kubernetes FPGA Device Plugin
Deployed as a daemonset, the Kubernetes FPGA device
plugin manages AWS FPGA accelerator units. It exposes the
number of FPGA accelerator units available on a Kubernetes
node. It is required for the pod to know the identifiers of the
resources on which it has gained access as Kubernetes man-
ages only generic resources by default. Whenever Kubernetes
allocates an accelerator resource to a pod, the device plugin
passes the resource identifier to the pod container through an
environment variable.
On startup, the plugin instantiates its own Open Source
Remote Protocol Call (gRPC) server to serve kubelet requests.
Then it registers near the kubelet gRPC server with its gRPC
socket and the resource to be exposed.
The kubelet asks the list of available resource accelerators to
the plugin by calling the ‘ListAndWatch‘ method. The plugin
replies with the available devices list, each containing a device
identifier and its state (healthy or unhealthy).
Once the plugin has sent the list of devices it manages, the
kubelet advertises those resources to the API server so the
node status is updated to advertise the accelerator resources.
Users can request accelerator devices in a container specifi-
cation as they request other types of resources. When the pod
is created on a node, this node kubelet notifies the plugin of
the accelerator resource(s) allocated to the pod by calling the
‘Allocate‘ method.
On Allocate, the plugin returns an AllocateResponse with
a container environment variable ‘ACCELERATOR UNITS‘
to be passed by kubelet to the container runtime. This env
variable contains a comma separated list of the accelerator
unit identifier(s) allocated to the pod which can be used by
the container application to access the corresponding FPGA
resources.
B. Partial Reconfiguration for FPGA virtualization
A novel approach to break silos between acceleration
infrastructure by making use of FPGAs and software
microservices running on cloud servers has been previously
introduced [23]. For this purpose, a computing architecture
had been defined where microservices are able to access and
share the same FPGA for function offloading and acceleration.
Multiple access and sharing of acceleration can take place
without service disruption in specific operating conditions.
These capabilities will be evaluated and defined in the next
section.
Our computing node consists of a server connected to
an FPGA as depicted in Figure 5. The server is used for
the processing of virtual functions in the form of containers
whereas the FPGA board is connected to the server through a
PCIe interface (PCIeItfce), and will be used for the function
offloading. To provide flexible and shared acceleration re-
sources, the FPGA is split into several acceleration slots which
are directly and independently accessible from the containers.
In order to grant maximum throughput and minimal latency
for the accelerated microservices, data-plane and control-plane
communications are routed through different interfaces to the
FPGA.
1) The data-plane
The data-plane is an abstraction layer for the data that is
processed by a microservice. When a microservice needs to be
accelerated, its data have to be transmitted as fast as possible to
the FPGA slots to limit the processing latencies. To get most of
the FPGA’s flexibility, partial reconfiguration [24] mechanisms
managed in the control-plane are used.
Once an FPGA acceleration slot has been configured for a
given Docker container, access will be granted to the container.
The latter can thus send data packets to the FPGA through the
PCIe interface for efficient data processing. At the same time,
the use of the other acceleration slots is possible. Each such
slot can be managed and used independently from any other
Docker instance or any other kind of microservice running on
the server.
2) The control-plane and the partial reconfiguration con-
troller
The control-plane in our architecture offers the following
features:
•control of the container instantiations,
•partial reconfiguration management of the FPGA that is
associated to a container.
C. Container instantiation
The control-plane is the communication layer dedicated to
control messages that are shared between the server and the
FPGA. As specified before, all the control-plane is imple-
mented in software on both the server and the FPGA side.
For this purpose, ARM processors are used on the FPGA
side. An accelerated microservice is launched at the start of a
Docker instance. This step is done through the standard Docker
instantiation without specific requirements or add-ons. During
the instantiation time, the Docker instance requests the FPGA
controller for a partial reconfiguration to prepare the dedicated
logic used for the accelerated function. The bitstream used to
configure the target slot is provided by the Docker instance
and is specific to the slot being used. The number of these
acceleration slots can be parameterized when specifying the
FPGA architecture.
D. Partial reconfiguration controller
The need for flexible management of the FPGA resources
and the separation of the data-plane and the control-plane has
motivated the development of a new partial reconfiguration
controller. The latter enables remote management of the partial
reconfiguration. The remote communication stack is depicted
in Figure 6. On one side, the controller manages command
messages received from the ethernet interface. On the other
side, it directly interfaces with the Processor Configuration
Access Port (PCAP) which is the Xilinx internal port for
the configuration of the FPGA. The protocol stack used for
the communication coming from the Ethernet interface uses
the TCP protocol. The state machine which represents the
controller behavior is depicted in Figure 7. At q0 state, the
controller is in idle, waiting for either a reconfiguration request
(a) or a slot release request (d). In case of a reconfiguration
IEEE TRANS. ON BROADCASTING - 5G SPECIAL ISSUE, MARCH 2019 7
Fig. 5: Overview of the proposed FPGA-based architecture.
request (a) the controller checks the availability of the ac-
celeration slot requested by the container (q1). If the slot is
available (b), the controller will perform the reconfiguration
(q2) with the bitstream sent by the container. The controller
returns to idle state (q0) when the last bytes of bitstream
data have been received c. A watchdog running on the server
ensures that the acceleration slots are used. If a slot has not
been used for a configured period of time, a release request
(d) is sent to the controller to make the slot available (q3) to
the next reconfiguration request. A release request can also
be sent by a container itself when the acceleration slot is
not needed. Typical partial bitstream size are a few mega
bytes. The reconfiguration throughput on the PCAP interface
is around 124 MB/s giving a reconfiguration time of 24.2 ms
for the partial bitstream used for one acceleration slot.
Fig. 6: Protocol stacks of the remote partial reconfiguration
controller.
IV. CAS E STU DY ON A V IRT UAL DVB-RCS2 DECODER
In this section, implementation results of the deployment
scenario presented in the introduction are presented. It is
demonstrated how NGPaaS could be used by verticals in the
context of 5G performance improvements. Those improve-
ments are obtained thanks to the use of FPGA acceleration
and virtualization solutions developed in the context of the
NGPaaS project.
The experimentation has been defined to highlight the effi-
ciency of FPGA acceleration of virtual microservices against
pure software virtualized function in a cloud infrastructure.
On the one hand, a software implementation of a turbo
decoder has been used. This has been optimized to intel CPU
architecture taking advantages of Streaming Single Instruction
Multiple Data (SIMD) Extensions (SSE) intrinsics for highly
efficient compilation and execution. On the other hand, an
FPGA implementation of the turbo decoder is done using the
FPGA framework presented in section III. Both implemen-
tations are running on a cloud server based on an Intel R
Xeon R
CPU E5-1620 able to run at 3.60GHz. The quantity
of available RAM is 16GB. The operating system installed
is a Linux Ubuntu 16.04.4 Long Term Support distribution
based on kernel 4.4.0-119-lowlatency. The FPGA development
board which has been chosen is the ZC706 from Xilinx with
a Zynq Z-7045 [25]. The main advantage of using Zynq
based FPGA board is the presence of an embedded Dual
ARM Cortex-A9 core processors running at 800MHz. Using
the ARM cores inside the FPGA for all the control-plane
mechanisms permits us to benefit from the full FPGA logic for
microservices acceleration on the data-plane. The Zynq model
embeds 350000 logic cells, 19.3Mbits RAM memory and 900
Digital Signal Prpcessing (DSP) cores.
A. Turbo decoder decoding time
In this section, we study the decoding time of the two
turbo decoders. Results are presented in Figure 8 and in
Table I. It can be noticed that small-sized code blocks are
decoded slightly faster by the software implementation than
the hardware implementation. This can be explained by the
fact that the FPGA is connected to the server through a
PCIe interface which introduces a latency before the samples
are processed. The actual processing time is faster and more
predictable on FPGA. For bigger code blocks, the PCIe latency
is largely compensated by the FPGA processing efficiency. In
the best case, processing on FPGA is 53% faster than on CPU.
B. Turbo decoder decoding power consumption
In this section, power consumption needed for the turbo
decoding on both targets are analyzed. Results are presented
in Figure 9 and in Table II. For code blocks smaller than
IEEE TRANS. ON BROADCASTING - 5G SPECIAL ISSUE, MARCH 2019 8
q0
q3
q2
q1
d
!c
!a&!d
c
a
!b
b
Fig. 7: The remote partial reconfiguration controller’s state machine.
1e+2 1e+3 1e+4
0
50
100
150
200
Code Block Size
Mean Decoding time for one block in us
SSE Intrinsic decoding
FPGA offload decoding
Mean Decoding time for Turbo decoding offlad on FPGA vs sse optimized software decoding
Fig. 8: Comparison of the decoding time/latency needed for
the turbo decoding of code blocks on FPGA against SSE
optimized CPU code
Code Block Size Decoding Time Gain(-)/Loss(+)
FPGA SSE
128 bits 28.08 µs 24.90 µs +12.77%
256 bits 31.01 µs 31.98 µs +3.03%
512 bits 33.94 µs 49.07 µs -30.08%
1024 bits 43.95 µs 61.04 µs -27.99%
2048 bits 59.08 µs 101.81 µs -41.97%
4096 bits 92.04 µs 183.11 µs -49.73%
TABLE I: Timing results for the turbo decoding of code
blocks on FPGA against SSE optimized CPU code
1856 bits, power consumption of turbo decoding on FPGA is
mainly due to data transfer through PCIe interface. Indeed, the
proportion of actual decoding time is insignificant compared
to the data transfer time through PCIe. When the code blocks
are bigger than 1856 bits, the proportion of decoding time
becomes higher which automatically reduces the consumption
of the complete decoding process. For the biggest code blocks,
the power consumption of decoding on FPGA is 10% less than
processing done on a CPU.
C. Turbo decoder decoding throughput
Finally, the data throughput for both turbo decoder imple-
mentations have been studied. As for previous results, the
1e+2 1e+3 1e+4
70
75
80
85
90
Code Block Size
Mean Decoding Power (in Watts)
SSE Intrinsic decoding
FPGA offload decoding
Mean Decoding Power Turbo decoding offlad on FPGA vs sse optimized software decoding
Fig. 9: Comparison of the decoding time/latency needed for
the turbo decoding of code blocks on FPGA against SSE
optimized CPU code
Code Block Size Decoding Time Gain(-)/Loss(+)
FPGA SSE
128 bits 85.9W 79.3W +8.32%
256 bits 85.9W 79.3W +8.32%
512 bits 86.2w 79.3W +8.7%
1024 bits 86.2W 79.5W +8.42%
1248 bits 85.9W - -
1536 bits 84.9W - -
1728 bits 83.7W - -
1792 bits 82.0W - -
1824 bits 80.0W - -
1856 bits 77.5W - -
2048 bits 75.4W 79.3W -1.32%
3072 bits 72.9W - -
4096 bits 72.1W 79.5W -9.30%
TABLE II: Power consumption results for the turbo decoding
of code blocks on FPGA against SSE optimized CPU code
decoding of smaller code blocks achieves a higher throughput
on software than on FPGA due to the PCIe interface latency.
Anyway, as for the decoding time, from 256 bits code blocks
size, the throughput reached on FPGA grows much faster than
on CPU providing an improvement of up to 112% for the
biggest code blocks.
IEEE TRANS. ON BROADCASTING - 5G SPECIAL ISSUE, MARCH 2019 9
1e+2 1e+3 1e+4
0
10
20
30
40
50
Code Block Size
Mean Decoding throughput in Gb/s
SSE Intrinsic decoding
FPGA offload decoding
Mean Decoding throughput for Turbo decoding offlad on FPGA vs sse optimized software decoding
Fig. 10: Comparison of the decoding time/latency needed for
the turbo decoding of code blocks on FPGA against SSE
optimized CPU code
Code Block Size Throughput Gain(-)/Loss(+)
FPGA SSE
128 bits 4.35 Gb.s−14.90 Gb.s−1+11.22%
256 bits 7.87 Gb.s−17.63 Gb.s−1-3.14%
512 bits 14.39 Gb.s−19.95 Gb.s−1-44.62%
1024 bits 22.73 Gb.s−116.00 Gb.s−1-42.06%
2048 bits 33.06 Gb.s−119.18 Gb.s−1-72.36%
4096 bits 42.44 Gb.s−121.33 Gb.s−1-98.96%
TABLE III: Throughput results for the turbo decoding of
code blocks on FPGA against SSE optimized CPU code
V. CONCLUSION
In this paper, we have presented the Next Generation PaaS
project and how virtual FPGA resources are managed to
accelerate cloud processing and able to reach higher levels
of performance compared to traditional IT PaaSes. First,
we have presented NGPaaS, focusing on the features and
hardware resources to enable the implementation of a 5G-
oriented connectivity service in the cloud. Then, deployment
and building blocks paving the way to a combined Telco-
Broadcasting PaaS platform have been shown. Finally,
we have addressed a cloud deployment use case scenario
combining Telco and vertical digital video broadcasting
components. The actual end to end implementation of the
scenario was out of scope of this paper. The purpose was to
show how a specific vertical scenario may benefit in terms
of performance and efficiency, by adopting the NGPaaS
principles. This has been realized thanks to the design of a
Kubernetes framework aware of the FPGA-based system on
which concurrent microservices can run. We have presented a
virtualized hardware implementation of the DVB-RCS2-like
turbo decoder. We have shown that small code blocks remain
more efficiently processed on CPU than on FPGA due to the
data transfer latency induced by the PCIe interface. However,
the latency impact is quickly reduced for bigger code blocks.
We have thus demonstrated that NGPaaS deployment is a
good solution for the virtual deployment of services and
platforms dedicated to domains related to broadcasting within
the context of 5G networks.
ACKNOWLEDGMENT
This work has been carried out with the support of the
EU project NGPaaS funded from the European Union’s Hori-
zon H2020-ICT-2016-2017 Program under grant agreement n˚
761557
REFERENCES
[1] D. Edvard, E. K. Ibtissam, Q. Rapha¨
el, W. Einar, and
W. Jacky, “Paving the way to telco grade paas,” 2016.
[Online]. Available: https://www.ericsson.com/en/ericsson-technology-
review/archive/2016/paving-the-way-to-telco-grade-paas
[2] S. V. Rossem, B. Sayadi, L. Roullet, A. Mimidis, M. Paolino, P. Veitch,
B. Berde, I. Labrador, A. Ramos, W. Tavernier, E. Ollora, and J. Soler,
“A vision for the next generation platform-as-a-service,” in 2018 IEEE
5G World Forum (5GWF), July 2018, pp. 14–19.
[3] Superfluidity, “Deliverable d3.1: Final system architecture, programming
interfaces and security framework specification’,” December 2016.
[4] Calico. [Online]. Available: https://github.com/projectcalico/cni-plugin
[5] ETSI TS 101 545-1, “Digital video broadcasting (dvb); second gener-
ation dvb interactive satellite system (dvb-rcs2); part 1: Overview and
system level specification,” ETSI, Tech. Rep., May 2012.
[6] A. Markhasin, V. Belenky, V. Drozdova, A. Loshkarev, and I. Svinarev,
“Cost-effective ubiquitous iot/m2m/h2h 5g communications for rural and
remote areas,” in 2016 International Conference on Information Science
and Communications Technologies (ICISCT), Nov 2016, pp. 1–8.
[7] ETSI EN 301 545-2, “Digital video broadcasting (dvb); second genera-
tion dvb interactive satellite system (dvb-rcs2); part 2: Lower layers for
satellite standard,” ETSI, Tech. Rep., April 2014.
[8] S. Newman, “Building microservices (1st ed.),” O’Reilly Media, Inc..,
2015.
[9] B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, “Borg,
omega, and kubernetes. queue 14, 1, pages 10 (january 2016),” 2016.
[10] [Online]. Available: https://aws.amazon.com/it/lambda/
[11] P. Enberg, “A performance evaluation of hypervisor, unikernel, and
container network i/o virtualization’, master’s thesis, helsinki,” May
2016.
[12] [Online]. Available: https://www.docker.com/products/docker-swarm
[13] [Online]. Available: http://kubernetes.io/
[14] [Online]. Available: http://www.infoworld.com/article/3042573/appli-
cation-virtualization/docker-swarmbeats-kubernetes-not-so-fast.html
[15] [Online]. Available: http://containerjournal.com/2016/04/07/kubernetes-
vs-swarm-container-orchestratorbest/
[16] [Online]. Available: https://github.com/intel/multus-cni
[17] A. M. Caulfield et al., “Configurable clouds,” IEEE Micro, vol. 37, no. 3,
pp. 52–61, 2017.
[18] [Online]. Available: https://aws.amazon.com/ec2/instance-types/f1/
[19] [Online]. Available: https://www.accelize.com
[20] [Online]. Available: https://github.com/OPAE
[21] IBM. Openpower cloud: Accelerating cloud computing. [Online].
Available: https://www.research.ibm.com/labs/china/supervessel.html
[22] N. Zilberman, Y. Audzevich, G. A. Covington, and A. W. Moore,
“Netfpga sume: Toward 100 gbps as research commodity,” IEEE Micro,
vol. 34, no. 5, pp. 32–41, Sept 2014.
[23] J. Lallet, A. Enrici, and A. Saffar, “Fpga-based system for the accelera-
tion of cloud microservices,” in 2018 IEEE International Symposium on
Broadband Multimedia Systems and Broadcasting (BMSB), June 2018,
pp. 1–5.
[24] [Online]. Available: https://www.xilinx.com/support/documentation/sw-
manuals/xilinx2017 4/ug909-vivado-partial-reconfiguration.pdf
[25] [Online]. Available: https://www.xilinx.com/support/documentation/data-
sheets/ds190-Zynq-7000-Overview.pdf
IEEE TRANS. ON BROADCASTING - 5G SPECIAL ISSUE, MARCH 2019 10
Riwal KERHERVE Riwal Kerherve: Riwal Kerherve joined B-COM Institute
of Technology in 2015 as a Research & Innovation Engineer, in charge of
Continuous Integration in Networks & Security domain. He received his
master degree in Telecommunications & Networks from the University of
Brest (UBO), France, in 2001. He began his career at Alcatel-Lucent before
moving to Images & R´
eseaux cluster where his activities were about the Cloud
(OpenStack), 4G LTE, IMS, VoLTE and FTTH.
Julien LALLET joined the company Alcatel-Lucent in 2011 and is currently
a research engineer at Nokia Bell-Labs since 2016. He received his PhD
degree in electrical engineering from the University of Rennes in 2008. He
was a Post-doctoral fellow at the University of Bielefeld in Germany from
2009 to 2010. His research interests include efficient processing in the context
of cloud computing and hardware acceleration on FPGA. He has published
papers in computing architectures and FPGA systems.
Laurent BEAULIEU received the MS in engineering degree from the Ecole
Polytechnique de l’Universit´
e d’Orl´
eans, France, in 2007. He successively
worked in the fields of digital video broadcasting and telecommunication
signal processing, carrying out research and developement for digital designs
on FPGAs and ASICs (Teamcast, Advanten, Renesas). Since 2015, he is a
senior hardware engineer at Institute of Research and Technology b¡¿com,
Cesson-S´
evign´
e, France. His main research interests include digital design
methodology, FPGA acceleration and FPGA in the cloud ecosystem.
Ilhem FAJJARI is a research project leader on cloud-native network function
orchestration in Orange Labs. From 2012 to 2014, she worked as research
project leader on network virtualization in VirtuOR Startup. In 2012, she
obtained the PhD in computer sciences with honors from Pierre & Marie
Curie university (Paris 6) in France. Her main research interests include
cloud, network function virtualization, orchestration and optimization of
communication networks.
Paul VEITCH holds M.Eng. and Ph.D. degrees from the University of
Strathclyde, Glasgow. He joined BT in 1996, and worked on core transmission,
multi-service platforms, and 3G mobile infrastructure design, before moving
to Verizon Business (UUNET) in 2000. Paul returned to BT in 2003, and was
infrastructure design authority for IP VPN and BT consumer networks. From
2012 to 2018, Paul led on BT’s NFV Proof-of-Concept (PoC) validation, trials
and business down-streaming. Paul is currently senior manager of Quality-of-
Experience converged network research, and also manages BT’s contribution
to H2020 Phase 2 Project, NGPaaS.
Jean DION received the Engineer degree in Telecommunication from TELE-
COM Bretagne, Brest, France, in 2010. From 2010 to 2013, he was a research
engineer in Orange Labs, Cesson-S´
evign´
e, France, working toward a doctoral
degree on the topic of hardware mutualization for advanced Forward Error
Correction decoders on FPGA target. Since 2013, he is a research engineer in
the Institute of research and technology b<>com, Cesson-S´
evign´
e, France.
His research interests are focused on advanced FEC de-coding and modulation
scheme on multi-RAT context.
Bessem SAYADI is senior researcher at Nokia Bell-Labs, in France, where
he is leading the virtualization and architecture activity in Cloud RAN and
EDGE Cloud projects. He received his M.Sc. (2000) and Ph.D. (2003) from
SUPELEC, France, with highest distinction. He has been a postdoctoral fellow
from 2005 to 2006 in the National Centre for Scientific Research (CNRS)
funded by SAGEM SA. From 2005 to 2006, he worked as researcher in
Orange Labs. Since 2006, Bessem joined Bell-Labs as Senior Researcher
in wireless technology where he worked on diverse areas ranging from
physical and MAC layer design and scheduling to transport protocols, video
coding/delivery and Future Internet architectures. Currently he is working on
building future 5G systems. He acts as Technical Manager of the European
H2020 project Superfluidity. He is also leading the French team contributing
in the European H2020 project 5G NORMA. Bessem has been involved in
several EU projects, being project coordinator of FP7 MEDIEVAL project,
short-listed as one of the three finalists for the Future Internet award 2012. He
has authored over 70 publications in journal and conference proceedings and
serves as a regular reviewer for several technical journals and conferences.
He holds 20 patents and has more than twenty patent applications pending in
the area of video coding and wireless communications.
Laurent ROULLET is head of the Cloud Native Telco Platforms Group
in Nokia Bell-Labs. He is leading Activities focusing on Cloud RAN archi-
tectures, front-hauling optimization, Software Defined Networks and Network
Function Virtualization applied to wireless access, both 4G and 5G. He joined
Bell Labs in 2010 and has worked on small cells optimization and interference
coordination. He joined Alcatel Lucent Mobile Broadcast in 2006 and led
the standardization effort of the DVB-SH (hybrid satellite terrestrial mobile
broadcast). From 2001 to 2005 he worked in UDcast, a start-up focusing on
mobile routers for satellite and DVB networks and developed the first DVB-
H infrastructure in cooperation with Nokia. From 1997 to 2001 he worked
in Alcatel Space on onboard processing. Laurent Roullet is graduated from
Ecole Polytechnique and SupAero.