Conference PaperPDF Available

Abstract and Figures

Function as a Service (FaaS)-the reason why so many practitioners and researchers talk about Serverless Computing-claims to hide all operational concerns. The promise when using FaaS is that users only have to focus on the core business functionality in form of cloud functions. However, a few configuration options remain within the developer's responsibility. Most of the currently available cloud function offerings force the user to choose a memory or other resource setting and a timeout value. CPU is scaled based on the chosen options. At a first glance, this seems like an easy task, but the tradeoff between performance and cost has implications on the quality of service of a cloud function. Therefore, in this paper we present a local simulation approach for cloud functions and support developers in choosing a suitable configuration. The methodology we propose simulates the execution behavior of cloud functions locally, makes the cloud and local environment comparable and maps the local profiling data to a cloud platform. This reduces time during the development and enables developers to work with their familiar tools. This is especially helpful when implementing multi-threaded cloud functions.
Content may be subject to copyright.
Optimizing Cloud Function Configuration via Local
Johannes Manner, Martin Endreß, Sebastian B¨
ohm and Guido Wirtz
Distributed Systems Group
Bamberg, Germany
{johannes.manner@, martin.endress@stud., sebastian.boehm@, guido.wirtz@}
Abstract—Function as a Service (FaaS) – the reason why
so many practitioners and researchers talk about Serverless
Computing – claims to hide all operational concerns. The promise
when using FaaS is that users only have to focus on the core
business functionality in form of cloud functions. However, a few
configuration options remain within the developer’s responsibil-
ity. Most of the currently available cloud function offerings force
the user to choose a memory or other resource setting and a
timeout value. CPU is scaled based on the chosen options. At a
first glance, this seems like an easy task, but the tradeoff between
performance and cost has implications on the quality of service
of a cloud function.
Therefore, in this paper we present a local simulation approach
for cloud functions and support developers in choosing a suitable
configuration. The methodology we propose simulates the execu-
tion behavior of cloud functions locally, makes the cloud and
local environment comparable and maps the local profiling data
to a cloud platform. This reduces time during the development
and enables developers to work with their familiar tools. This
is especially helpful when implementing multi-threaded cloud
Index Terms—Serverless, Function as a Service, FaaS, Bench-
marking, Simulation, Profiling
FaaS is the next evolution in cloud computing [1]. In
2009, the Berkeley University published their view on cloud
computing [2]. Back then, they identified challenges like un-
predictable performance or scaling and motivated researchers
to work on them. Ten years later, they published their new
view on cloud computing [3] and claim that Serverless will
dominate the cloud in the future. Since this new service model
is still evolving, there is no common terminology yet. As
FaaS is the predominant reason for the Serverless hype, many
practitioners and researchers use the term Serverless and FaaS
interchangeably. However, we consider this as misleading and
categorize FaaS as a subset of Serverless technologies like
others already did [4]. The term FaaS is used in the following
to address this cloud function concept.
A core characteristic is the abstraction of operational con-
cerns. Cloud providers offer an Application Programming
Interface (API), that a developer has to correspond with by
implementing a handler interface. After deploying the source
code artifact to the FaaS platform, the whole cloud function
lifecycle is managed by the provider. From a developer’s
point of view, this sounds like the evolution of DevOps to
NoOps, but a few configuration options remain for most
FaaS platforms. These options are the focus of this paper.
To be more precise, we investigate possibilities for simulating
function executions locally. This enables us to predict runtime
characteristics of functions in the cloud in order to find suitable
configuration options.
As a cloud function user, the possibilities to influence the
runtime behavior are quite limited. Some authors state, e.g. [3],
that either the developer receives more configuration options or
the optimal setting is automatically determined by the provider.
There are offerings, where providers dynamically link mem-
ory and CPU resources to the cloud function container, but
situations arise where a user wants to waste computing power
to speed up the execution for a better user experience.
EIV Y [5, p. 9] claims that the ”real execution is the only
valid test” to figure out the most suitable configuration. This is
reasonable to provide an understanding for absolute measures
but is also time consuming and costly. The cloud function
must be deployed first, executed based on a load profile in
the testing phase, analyzed and finally reconfigured. In our
approach we do not state absolute values but relate local and
platform measures to each other and support developers in
their decision making process.
Therefore, we state the following research questions and
answer them as we proceed through the paper:
RQ1: How can two distinct virtualized execution envi-
ronments be made comparable? How can the runtime
behavior of local function executions be mapped to a
FaaS platform in the cloud?
RQ2: How can we measure the CPU demand of a cloud
function locally with different resource settings? Can
these measures provide an accurate prediction on a FaaS
cloud platform?
RQ3: How can we support developers to make a reason-
able decision about their cloud function graphically?
Answering our research questions helps to identify function
characteristics and their implications. We simulate the execu-
tion and directly focus on the cloud function runtime behavior
itself in contrast to research where the FaaS platform serves
as an execution environment to simulate other systems [6] or
where the FaaS platform itself is the object under investiga-
tion [7]. Based on this assessment, a developer can select a
fitting resource setting and timeout value to prioritize cost over
performance or vice versa depending on the business needs.
The agenda is as follows: In the next Section, we briefly
describe virtualization fundamentals and possible profiling
strategies. Section III states the work related to ours by
discussing cloud simulation approaches in general, FaaS simu-
lation in particular and strategies to calibrate different environ-
ments. The following paragraphs in Section IV introduce our
methodology and answer RQ1. The evaluation of our proposed
approach is done in Section V, which answers RQ2 and RQ3.
In our conclusion in Section VI, we discuss the results and list
threats to validity. Finally, we conclude our paper with ideas
and next steps for future work.
A. Container Technology
Containers are the enabler and runtime environment for
executing functions in the cloud. To understand the execution
and runtime performance of cloud functions, it is necessary to
look at the platform stack and the technologies which bring
FaaS in production and to understand why they are used.
Figure 1 shows the FaaS platform stack and its evolution from
traditional self-hosted systems via VM abstraction to container
Traditional IT
Virtual Machine
App App
Container Runtime
Runtime Runtime
App App
Container Runtime
Containers FaaS Platform
Fig. 1. FaaS Platform Stack related to Predecessor Architectures [8]–[10].
EL AI R et al. [11] recently conducted a study and compared
configuration, code and rule based security mechanisms for
containers. They stated that in many cases, especially multi-
tenant environments, security options using container tech-
nology are not sufficient. In their discussion they list hybrid
solutions, where the security advantages of VMs with isolated
OS and containers with fast startup times are combined. One
such tool is AWS firecracker1, a so-called micro VM for
serverless computing [12]. Various users can execute cloud
functions on a single physical machine due to this architecture.
These micro VMs enable FaaS platform providers to offer
a secure multi-tenant cloud environment, where each cloud
function runs in a container on a user dedicated VM. This
architecture has also performance implications as LLYO D et
al. [13] investigated. Cloud functions are particularly used
in hybrid environments where the functionality stresses the
CPU. The architecture shown in Figure 1 (right) prevents
these functions from influencing other tenants. The shared
kernel in the virtualized environment is important when only
shares of specific resources are exposed to the function via
cgroup properties. For a simulation solution, the dev-prod
parity consideration starts at the platform stack and is therefore
crucial for meaningful results and our local simulation tool.
Dev-prod parity2means in our case that the development
environments and the production environments are as similar
as possible. A virtualization solution is a first step to this
parity, therefore a container runtime is a minimum requirement
to be comparable to a provider’s platform. ARIF et al. [14]
compared physical and virtual environments and came to
the conclusion that a comparison is only possible when the
systems run on a comparable technology stack. They point
out that normalization of performance metrics helps to reduce
environmental discrepancies when comparing systems on dif-
ferent technology stacks.
B. Profiling Strategies
Profiling is the process of generating a profile of the
resource consumption of an application, a virtual or physical
environment over time. The need for profiling [15] also
exists in the cloud function domain as the following three
aspects attest: Management of cloud functions means that the
resources are configured properly to avoid function perfor-
mance degradation. Resource considerations should avoid an
overprovisioning situation. And finally the cost perspective in
this pay per use model balances the two prior aspects. There
are two types of approaches: Hardware profilers introduce less
overhead by getting coarsely grained data. In contrast, software
profilers introduce a lot of overhead by instrumenting code
resulting in fine grained information [16]. Which approach to
choose depends on the use case.
Abstraction Layer Approach
intrusive app layer Dynamically or statically instrument
events inside an application. Altering
source code approaches [17]–[21].
non intrusive virtual layer Periodically inspect the state of virtual
resources by using APIs [22]–[24].
physical layer Periodically inspect the state of sys-
tem using OS tools [19], [23].
Table I is a selection of different profiling strategies. The
ones which target the application layer are intrusive ap-
proaches since custom metrics or information can only be
exposed on a source code level. Therefore, these approaches
typically introduce some overhead which has to be in balance
with the information gain. With their instrumentation tool,
MACE et al. [17] enabled a recording of distributed application
topologies. They introduced a happened-before join operator
to allow the user to investigate traces across component or
application boundaries. Another intrusive approach used the
additional information to generate test cases a posteriori for
faulty executions to support developers to resolve runtime
errors [18]. CU OM O et al. [20] implemented some wrapper
for often used components to derive runtime metrics during
benchmark and use them when executing their simulations.
REN et al. [19] introduced the way how Google profiles their
data centers from an infrastructure point of view but they also
enable application profiling by a commonly used library. They
collect heap allocation, lock contention, CPU time and other
profiling metrics.
In contrast to intrusive approaches, non-intrusive profiling
approaches focus on the management perspective and observe
the current system state. Container technology and hypervi-
sor (VM) based systems are research areas of our virtual
abstraction layer. PIet al. [22] used the container API as a
source to collect metrics for implementing a feedback control
tool in a distributed environment. Docker as the de facto
container standard also supplies some metrics via its docker
stats API3. IBM published their framework to profile and
monitor their cloud infrastructure [24], too. They focus on
the whole virtualization layer by collecting the memory and
persistent state of containers and VMs. CASALICCHIO and
PER CI BAL LI [23] used this API to conduct an experiment
where they compared different metrics including CPU and
memory on a native Linux environment and Docker. Docker
stats and cAdvisor4were used as profiling sources in the
container area.
Their research as well considered the physical layer, where
they used the mpstat and iostat profilers of the native Linux
kernel. As mentioned, Google also uses whole-machine pro-
files on a hardware basis to investigate the different ap-
plications and how they consume the machine’s resources.
In contrast to the application metrics, this data is hidden
from a cloud service user. They collect CPU cycles, L1 and
L2 cache misses, branch mispredictions and other hardware
metrics [19].
A. Cloud Simulation
Simulation of the cloud gets more important as a special
issue on simulation in and of the cloud demonstrates [25].
Time and execution cost are the driving forces to simulate the
cloud infrastructure upfront to estimate the probably achieved
Quality of Service. The most notable cloud computing simula-
tors [26] are GridSim [27], a tool for simulating grid environ-
ments, SCORE5[28], a tool for data center simulation based
on Google’s Omega lightweight simulator, GreenCloud [29],
a tool to investigate the energy consumption in data centers,
and CloudSim6. Other simulation tools are listed in several
literature studies [30]–[33].
CloudSim [34] is one of the first simulation environments
for cloud computing and started with a focus on VM based
simulation and federated clouds. This framework especially
tackled inter-network components and their delays. With the
rise of container technology, they extended their framework to
simulate containers as well [35]. Their focus is on container-
ized cloud computing environments, i.e., studying resource
management of containers holistically by looking at container
scheduling, placement and their consolidation. Their research
point of view is on the provider and not on a single container.
To validate simulated results, complementary approaches
like benchmarking and simulating a system emerged [20], [36].
JOH NG et al. [36], for example, developed an ontology based
methodology where a mapping function between the different
ontologies tries to compare the environments to achieve a
closer dev-prod parity relation. Execution of benchmarks is
necessary for their approach to calibrate the simulation.
B. FaaS Simulation Approaches
FaaS simulation is a subarea of the previously introduced
cloud simulation. Approaches present in literature can be
divided into two categories: Firstly, the FaaS platforms are
used as simulation engines where other systems are deployed
to and investigated, like in [6], [37]. Secondly, research where
the FaaS platform itself is simulated and cloud functions
are only deployed to validate the simulation in the specific
experiments. A lack of such simulation systems is present [38].
In this Subsection, we discuss current tools tackling this issue,
name similarities and demarcate our approach.
DFaaSCloud7[39] introduced a simulation framework for
using functions in the continuum of core cloud and edge
technologies by extending CloudSim. The executed functions
are not mentioned in their research, which makes interpreting
the results challenging. MAHMOUDI and KHAZA EI [38] intro-
duced SimFaaS8, a simulation tool which focuses on higher
level platform and function characteristics like the average
response time, cold starts and the number of instances serving
the functions. They only evaluated a single function at a
single memory setting in their evaluation. Therefore, specific
function characteristics are not within the scope of their work.
Another approach was to predict the end-to-end latency for a
collection of cloud functions building an application. LI N et
al. [40] proposed some profiling of the target cloud platform
upfront to have some measure for the model and algorithms.
Furthermore they made suggestions on how to solve two
optimization problems when searching for minimum cost or
the best performance.
HOROVITZ et al. [41] built the self optimizing Machine
Learning (ML) tool FaaStest by predicting the workload
of a function and scheduled functions on VMs or on a
cloud function platform. Since the workload is one of the
determining factors influencing performance [42] with respect
to cold starts and parallelism level on the platform, their
research is important for simulating cloud function platforms
but does not include function characteristics as well. Another
approach to simulate FaaS is FaaSSimulator [43]. In contrast
to FaaStest, this tool aims to support hybrid decisions by
providing a spectrum between VM and FaaS solutions. Their
work did not include function characteristics nor a technical
setup description.
Sizeless9[44] and SAAF10 [45] are closely related to our
research. EI SM AN N et al. [44] proposed an approach to predict
the best configuration. They rely on monitoring data for a
single memory setting and are able to predict the execution
time for other memory settings. Their system is also ML
based in the configuration phase and currently limited to AWS
Lambda and Node.js. CORDINGLY et al. [45] focused on
the multitenancy aspect, where various functions are executed
on the same VMs. They also stressed the fact that cloud
providers use different hardware. Therefore, the prediction of
execution times is determined by the hardware which also
directly influences the price. Linear regression models based
on the Linux CPU time were used to calculate means and
mean errors when profiling the functions. Concurrency was
taken into account on a workload level, but not on a function
implementation level.
None of the presented approaches included function charac-
teristics or concurrency within the implementation of a cloud
function. We include these two aspects as important points in
the concept and evaluation of our work.
C. Experiment Calibration
Calibrating local test-beds to use them for simulations is
an already known approach for IaaS offerings. ZAK ARYA et
al. [46] extended CloudSim to enable VM migration to save
energy. Based on a small set of executions in the cloud, they
built linear regression models and were able to simulate their
System under Test (SUT) with an accuracy of 98.6%.
Researchers in the FaaS area have also applied some kind
of calibration steps in their research to compare measurements
and draw conclusions. BAC K and AN DR IKO PO UL OS [47]
compared their experiments on a local Apache OpenWhisk11
deployment using VirtualBox VM to experiments on other
commercial FaaS platforms. JO NAS et al. implemented a
prototype to run map primitives on top of AWS Lambda [48].
They executed a matrix multiplication benchmark to measure
the overall system performance in Giga Floating Point Op-
erations per Second (GFLOPS) and also drew a histogram
about the GFLOPS performance per CPU core, which shows
a distribution of the CPU core performances. Different CPUs
vary in their peak performance, e.g., 16 to 17 GFLOPS
and 30 GFLOPS as shown in the above mentioned matrix
multiplication case. This suggests that different CPUs are
used. In another research experiment [49], this assumption was
confirmed by finding five different CPU models on AWS. Also
co-location of VMs, where the containers are running in, cause
multi-tenancy issues, which influence the runtime performance
and explain slight deviations of measured values. Both aspects
are only partly considered in the related research. Therefore,
questions about their impact on the runtime behavior remain
These insights result in a profiling strategy where the
virtualized approach to build the system is accommodated.
The presented process in Figure 2 is a subprocess of the
overall benchmarking pipeline presented in prior work [42].
Equations and Figures used in this work are added to the
steps where they correspond to. This workflow is an essential
part towards performance and cost simulation in FaaS [50].
As already mentioned, the generated artifact at the end of
the subprocess is a graphical representation of various local
simulation runs which serves as a decision guidance to choose
a suitable resource setting depending on the developer’s needs.
The following subsections explain the most important parts
of this process in detail and give a concrete example, how
to implement such a process for the integration of developer
machines and FaaS platforms.
A. Calibration
We introduce a calibration step to compare the performance
offered by cloud infrastructure with the performance of a local
experiment machine. ARIF et al. [14] already stated that a
scalar factor to compare different environments is not enough.
On our machines, we control the resources using container
quotas (cgroups), and on the provider side, computing per-
formance depends on the selected resource setting. As input
for the calibration, users specify the granularity of the local
calibration experiment and a set of providers, which are in
focus for deployment. The calibration has to be executed
on the platform provider once per resource setting. If the
results are up-to-date and not outdated, we proceed with the
execution of our functions locally as explained in the following
Subsection. If not, the calibration is divided into two tasks,
which can be executed in parallel, and a following mapping
SODA N [51] compared different CPU architectures. Specific
types of instruction sets were under investigation to optimize
algorithms, as presented in SP RU NT’s paper [52]. He describes
often used program characterization events, for example the
floating point event, to assess the SUT. This research is im-
portant for improving CPU architectures and optimizing algo-
rithms in the area of high performance computing, but of lim-
ited relevance in the FaaS area since most of the providers use
commodity hardware in their data centers. SP RUNT empha-
sizes that processor’s implementations are mostly abstracted
by these program characterizations events. This allows an
objective comparison of different hardware. On a conceptual
level, such a comparison is needed for a simulation of the
performance characteristics of a cloud function on the FaaS
platform while executing it offline on the developer’s machine.
He stated that such a local simulation of the application
stack, including the Operating System (OS) and processor
information results in less accurate predictions for specialized
already executed?
Calibration Step
(Eq-1, Fig.3)
Calibration Step
on Provider
(Eq-2, Fig.3)
Execute Cloud
Function Locally
(Fig.4, Fig.5)
Execution Data
Fig. 2. Perform Simulation Subprocess of the Overall Simulation and Benchmarking Process.
algorithms and is ultimately not convincing. We overcome this
problem in our approach since we are conducting established
and well controllable experiments locally and on the specified
FaaS platforms w.r.t. the mentioned program characterization
events. In a second step, we compute a function to equate the
two application stacks and use it for our simulation task and
the local execution of the functions.
1) Calibration Function/Benchmark: We use LINPACK
benchmark, first introduced in 1979 [53] and still extend-
ing [54], as a hardware independent experiment on provider
and user side. LINPACK is a package to solve linear equations
of different complexity in single or double precision arith-
metic. It is a de facto standard to compare CPU performance12.
DON GA RR A stated that the LINPACK benchmark results not
in a one-size-fits-all performance value, but the problem do-
main of solving linear equations is very common for any type
of application. Therefore, the LINPACK results give a good
hint about the CPU peak performance. It is also included in
various micro-benchmark experiments and part of a workload
suite for cloud function comparisons [55].
Most related to our calibration approach is the work of
MAL AWSKI et al. [56]. They also use LINPACK benchmark
to compare the performance of AWS Lambda and Google
Cloud Functions with different memory settings. Their results
strengthen the hypothesis gained from previous work, that
CPU resources are scaled linearly with the resource setting.
AWS Lambda shows consistent linear scaling in GFLOPS per-
formance, but a high variation in the results. Two performance
ranges emerge when memory increases beyond 1024MB. In
previous research [57], we also found this different levels
of performance using a CPU intensive fibonacci function to
compare the different platform offerings but did not investigate
this phenomenon in more detail. LEE et al. [58] used matrix
manipulation to obtain the CPU performance of a Lambda
function deployed on AWS. They submitted a workload and
ascertained doubled execution time in the concurrent mode
w.r.t. the sequential execution, which yields to the multi-
tenancy assumptions, where they acted as their noisy neighbor.
They also found similar absolute values like MAL AWS KI with
19.63 GFLOPS for 1.5 MB memory setting and approximately
40 GFLOPS for 3 MB configuration.
LEE and MAL AWSKI found similar values but the reasons
are unclear since some data is missing to interpret the technical
infrastructure or other aspects influencing the response time
and performance. Therefore, we included the CPU model
information and the VM identification data. We are then able
to relate the hardware used in the experiments with the cloud
function execution. In the response of our cloud functions
we included some metadata and used them in our evaluation
to shed some light on this ongoing discussion about the
causes. Cloud providers may use various commodity hardware
in different geographical regions or even in a single data
center. To identify the VM and CPU model, we use the
/proc/cpuinfo and /proc/stat data from the shared
file system of the Linux Host as [13] or [49]. We use the CPU
model,CPU model name and the btime parameters from the
Linux host. As LLYOD already noted, it is not guaranteed that
boot times of two VMs are unique and therefore the VMs
are not distinguishable. Since the computed likelihood for a
collision is 7.8*10-9, we also use the btime to identify VMs.
2) Calibration Mapping: For many FaaS providers, the
resource setting directly determines the CPU resources linked
to a container where the cloud function is executed. This is
understandable since a cloud provider wants a high utilization
of the machine by enabling a robust quality of service without
interference for the functions running on it. The output of our
previous calibration is the input of this mapping process. For
the local machine and the FaaS cloud platform, we get two
sets of execution data. The local machine data includes the
GFLOPS achieved in relation to the CPU core shares. This
is formalized in Equation 1, where flocal(y),y∈ {y|0<
yc}with cbeing the number of physical cores. The cloud
provider side is modeled by Equation 2, where fprovider(x)and
x∈ {x|x is a cloud resource setting}.
flocal(y) = m1y+t1(1)
fprovider(x) = m2x+t2(2)
Figure 3 shows exemplary diagrams. To enable an OS inde-
pendent calibration, a Docker image is prepared for the local
execution of the LINPACK benchmark. To get GFLOPS of
different CPU shares, we use the capabilities of Complete Fair
Scheduler (CFS) of the Linux kernel to limit CPU resources
0 20 40 60 80
Calibration on Developer Machine (H60)
CPU quota
0 50 100 150 200
Calibration on Developer Machine (H90)
CPU quota
0 2000 4000 6000 8000 10000
0 50 100 150 200
Calibration on AWS Lambda
CPU quota
Fig. 3. Calibration Result of the performed LINPACK Benchmarks on a
Cloud Provider Platform and Locally.
to the executing Docker container. For a solid data basis, we
execute the Docker image repeatedly while incrementing the
CPU share. Further, we compute the correlation using linear
regressions and the corresponding function to calculate the cor-
rect CPU share for the memory settings we want to consider.
Our resulting mapping function is presented in Equation 3 and
computed on the assumption that flocal(y) = fprovider (x). This
equation answers RQ1, where we asked how two virtualized
environments can be made comparable and mapped to each
Assuming that we want to simulate our cloud function
for 256MB and 512MB memory, we can compute the local
CPU share using Equation 3. For an experimental setting this
resulted in 0.26 cores for 256MB and 0.48 cores for 512MB.
Doubling the memory setting does not result in doubling the
CPU resources for the local container and vice versa. This
sample data gives a first hint about the obvious fact that the
two regression lines have different positive slopes and a direct
conversion from one to the other by using a scalar factor is
not possible.
B. Execute Cloud Functions Locally
The next step in our subprocess is the local simulation of our
function which we want to deploy to the cloud. We overcome
shortcomings KA LI BE RA and JON ES [59] identified in the
evaluation of system research. They categorized experiment
dimensions in influencing factors which are random, uncon-
trolled or controlled by the experimenter. We specify only a
set of memory settings in our example 256MB and 512MB.
Therefore, we have only a single influencing factor, which is
controlled within the experiment. Other factors like the CPU
share are determined by the memory setting and therefore
transitively controlled.
To adhere to the dev-prod parity principle mentioned in
Section II-A, we suggest Docker as a container platform to
simulate our functions. We use the Docker API to get runtime
data13. One advantage of this approach is the non-intrusive
nature (II-B) to the function under investigation since we
only observe runtime values and collect them by another tool.
Currently we get data for the sum of requested CPU times in
nanoseconds, individual CPU time for the specific core, total
memory usage, bytes that are received/sent and the dropped
ones. Via the cgroup metrics we see some system level settings
like the amount of time a process has direct control over the
CPU. Relevant for the analysis are the overall execution time
by storing start and end time as well as the memory consumed
on the local system.
Concrete execution times from local machines are not
directly interpretable but relevant when comparing to values
from executions in the cloud [14]. To continue with our ex-
ample, when simulating a function with 256MB and 512MB,
we assume for this example that the simulation for 512MB is
1.5 times faster than the simulation for 256MB. Hence, the
256MB solution is more cost effective. If time is a critical
factor, e.g. when a function is handling user requests, the
512MB solution might be preferable. These insights can be
drawn without deploying the function to a platform.
Ideally, a user of the simulation tool knows the best, worst
and average case for the input of their function as well as the
load distribution. These aspects are currently not controlled
by the experimenter and also not reflected by the concept
since the average case is sufficient for a prediction of the
performance. The load distribution is also relevant for the share
of cold starts when operating the cloud function but not for
our concept because the execution time of a single function is
not influenced by the load distribution.
A. Experimental Setup
In order to ensure repeatable experiments, we first state the
tools and machines we used for our evaluation of the intro-
duced calibration and prediction. As our local experimenter
machines we use an Intel(R) Core(TM) i7-2600 CPU @
3.40GHz, model 42 with 4 cores (named H6 0 in the following)
and an Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz, model
158 with 4 cores (named H90 in the following). We installed
Ubuntu 20.04.2 LTS and Docker to execute the containers on
both machines. Furthermore, we configured the experiments
with our research prototype14 which stores the execution
metrics in a PostgreSQL database.
We limit the evaluation to AWS Lambda15 as target platform
for our simulations due to its dominance in the market.
All benchmarking requests on AWS were executed on an
Intel(R) Xeon(R) Processor @ 2.50GHz, model 63 and in
the availability region eu-central-1. Therefore, we have no
heterogeneity of CPU architectures as discussed in [45], [60].
B. Calibration Step
Our hypothesis H1 is that the computed GFLOPS grow
proportionally to the memory setting on provider side or the
local CPU share respectively. Table II shows that the linear
regression is statistically significant and confirms thereby H1.
Graphical representations of the local and AWS calibrations
are shown in Figure 3.
Local (H60) Local (H90) AWS
p-value <2.2e-16 <2.2e-16 <2.2e-16
0.9995 0.9978 0.9973
Intercept -3.081 -7.052 -1.995
Slope 23.400 54.284 0.020
We created a container with the LINPACK source code
and executed it by increasing the CPU share in 0.1 steps.
Since our machine has 4 cores, we made 40 measurements
per run. After 25 runs, we computed a linear regression where
the coefficient of determination (R2) was 0.9995 (H60) and
0.9978 (H90). The intercept of the regression line is not the
origin. The intercept is negative, which can be explained by
the inherent overhead of all computations. We also computed
the regression with an intercept at 0, but this worsens R2
and the fit of the regression line to the datapoints between
0.3 and 3.6 CPUs. The AWS calibration was executed 100
times for the memory settings 128, 256, 512, 768, 1024, 1280,
1536, 1769, 177016, 2048, 2560, 3072, 3584, 4096, 4608,
5120, 5632, 6144, 6656, 7168, 7680, 8192, 8704, 9216, 9728,
10240. As for the local calibration we decided to compute a
Pearson regression where R2was 0.9973. Still a good Pearson
regression, but it is obvious when looking at Figure 3 that the
values for specific memory settings show a wider distribution.
As mentioned in the conceptual part, only valid provider values
for the specific configuration variables are allowed. In case
of AWS the memory setting needs to be a natural number
between 128 and 10240MB as of the time of writing this paper.
Via Docker-Compose we configured an additional PostgreSQL database,
where we store the data. All relevant data is published together with the
version of the prototype.
16AWS Lambda assigns a second vCPU to the function at 1770MB so we
selected 1769MB and 1770MB as settings in order to determine whether this
has an impact on the GFLOPS (
We showed a close correlation between memory/CPU shares
and GFLOPS for the corresponding environments. Due to the
high correlation coefficients, we eliminated the dependent vari-
able GFLOPS and used Equation 3 to compute the CPU share
for specific memory settings. We are further able to predict the
cloud performance of a function when executing it on a local
machine during the development process. Comparing absolute
values between the cloud and local environment is limited due
to the different hardware used, but the trends of the execution
time are relevant to enable the proposed simulations locally.
C. Simulating Cloud Function Behavior
In this part, we implemented two functions with different
hypotheses. As a literature study showed [61], most publica-
tions in the FaaS area focus only on CPU-intensive functions.
We followed this approach.
For a first evaluation, we implemented a function to compute
the fibonacci sequence. Hypothesis H2 is that this function will
not profit from multi-core environments and show the same ex-
ecution behavior for all settings which are equivalent to more
than one CPU. The next hypothesis H3 is that parallelized
functions will profit proportionally from a resource increase
when assigning memory settings which exceed one CPU. We
tackle H3 by implementing a multi-threaded function to search
prime numbers in a given range. The fibonacci function was
executed for each memory setting 100 times, the prime number
function 5 times.
For fibonacci, we implemented our function in Node.js and
used only a single input value for all tests (n=40) since this
eliminates the input as another variable and strengthens the
results for execution times and the proposed methodology. The
fibonacci function is widely used as a CPU intensive function
(e.g. [57], [62]) for microbenchmark experiments.
The vertical lines in Figures 3, 4 and 5 indicate the values
where a single CPU is fully utilized and another portion of
the next CPU is added for further executions. In Figure 3 the
top and middle curves show the number of CPUs used by the
LINPACK calibration on our local machines, where we are
aware of the cpus setting. The bottom curve shows the AWS
LINPACK execution where we computed memory equivalents
for fully utilized CPUs. These values were derived from the
documentation for the first CPU equivalent and interpolated
for CPU equivalents 2-6. For Figures 4 and 5, we computed
the CPU-memory equivalents via Equation 3 to have the same
dimension on the x-axis to support the interpretation of our
results. On H60 for example, one CPU is fully utilized by
comparable memory values greater than 1135MB (2505MB
for H90).
Obviously, the fibonacci function we deployed does not
profit from the multi-core environment as can be seen in
Figure 4. Especially the execution on H60 (middle of the
figure) shows constant execution time after increasing memory
and exceeding the first CPU-memory equivalent (vertical line
at 1135 MB). In this case e.g. at 2048 MB, the function has
access to 1.77 CPUs, but is only capable of fully utilize a
single one since the function is implemented single-threaded.
0 500 1000 1500 2000
0 4000 8000 12000
Computed Provider Memory Setting based on CPU share
Running Fibonacci computation locally (H90)
0 500 1000 1500 2000
0 2000 6000 10000
Computed Provider Memory Setting based on CPU share
Running Fibonacci computation locally (H60)
0 500 1000 1500 2000
0 2000 6000 10000
Memory Setting
Running Fibonacci on AWS
Fig. 4. Running Fibonacci Cloud Functions Locally and on AWS.
In a production use case without overbooking and strict
resource allocation policies, this would result in wasting CPU
resources and adding additional costs for getting the same per-
formance (execution time) compared to other configurations.
This problem can also be found in other research (e.g., Fig.
2. in [40] or Fig. 1. in [44]). This observation confirms H2.
JONA S et al. argued that the FaaS programming model sim-
plifies the deployment and execution of ”distributed computing
for the 99%” [48, paper title] but the fact that the ability to
use multi-core environments of these functions determine the
runtime behavior is often neglected. In addition, awareness
of multi-threaded functions is missing. This is even more
important when looking at recent improvements in the resource
allocation for cloud functions17. Providers in general claim
that doubling the resource allocation halves the execution time
but only for multi-threaded functions without blocking calls
to third party services. This is also the scaling process AWS
Lambda advertises on its platform.
This promise was used for generating the blue respectively
orange curve in Figure 4 and 5 dynamically for a memory
setting of interest. Based on a grouping of execution times by
memory size, we can compute the average (arithmetic mean)
execution time (AVGa) for a memory setting a.
Equation 4 shows the formula for computing the curves.
f(x) = AV Gaa
This gives us the chance to select a memory setting during
development and look at performance data to assess the
17AWS Lambda increased their memory and CPU capabilities in
December 2020:
function performance graphically. For an optimization and
estimation the curves help in interpreting the results. RQ2
where we asked if a local execution can predict the cloud
function runtime behavior is answered by these curves.
As an example, we used 512MB as our afor the plots
in Figure 4 and 5. Therefore, the average execution time
(AVG512) is a point on the curve. All values above the
curve do not profit proportionally from scaling resources. This
would result in spending more resources on the task than
necessary. As mentioned before, there might be situations,
where doubling the memory setting and therefore the cost
is acceptable for a 1.5 decrease in execution time, but these
decisions are use case dependent. Vice versa, all executions
under the curve profit disproportionately from the resource
increase. This is only the case, when a situation as mentioned
happens (1.5 decrease when doubling resources) and we look
at the second memory setting. Then the first memory setting
would profit from a downscale. For CPU intensive functions
like the fibonacci use case this is rarely the case, since the
ideal case is halving the time by doubling the resources and
our calibration function LINPACK is optimized for such a
CPU performance use case.
0 2000 4000 6000 8000 10000
0 100000 250000
Computed Provider Memory Setting based on CPU share
Running Prime Number Computation locally (H90)
0 1000 2000 3000 4000 5000
0 100000 250000
Computed Provider Memory Setting based on CPU share
Running Prime Number Computation locally (H60)
0 2000 4000 6000 8000 10000
0 100000 250000
Memory Setting
Run Prime Number Computation on AWS
Fig. 5. Running Prime Number Cloud Functions Locally and on AWS.
The second function was implemented in Java. It counts the
number of prime numbers within a given range [2, 500’000].
The range, as the input for the fibonacci use case, is con-
stant for all simulation executions. Each memory setting was
executed 5 times. We used the common fork join pool18 of
the JVM to divide the task equally on the assigned cores.
Figure 5 shows the simulations on H60 and H90 as well as
executions in the cloud. We can see that the higher memory
settings (e.g. 2048 or 3008MB) are slightly above the orange
curve (a=512MB) which is due to scheduling efforts when
coordinating various threads. The CPU-memory equivalents
on AWS are interpolated based on the range and number of
cores derived from the documentation. As for the fibonacci
use case, the blue curves predict how the function will profit
from increasing memory on the provider platform. Also our
third hypothesis (H3) is confirmed since the AWS Lambda
executions as well as the local simulations are closely to
the blue respectively orange curves. What is interesting in
the simulation runs and also at AWS Lambda is the slight
performance degradation when reaching the next complete
CPU, e.g. in the case of H90 when running the prime function
with 2304, 5120 or 7936 MB19. In this case scheduling,
especially business of the CPU and other system processes
consuming CPU time, are responsible for this phenomenon.
D. Predicting Cloud Function Execution Time
We wrote a lot about CPU-memory setting equivalents. So
we used the calibration step to compute CPU shares based
on the memory setting by determining the GFLOPS locally
and on AWS. Obviously, these shares differ, so one CPU on
H60 is comparable to 1135MB whereas H90 has a one CPU
equivalent of 2505MB. On AWS, 1770MB are equivalent to
one CPU.
0 500 1000 1500 2000
0.0 0.5 1.0 1.5
Trends in Predicting Fibonacci Execution Time
Memory Setting
Fig. 6. Trends in Prediction Provider Execution Time by Local Execution
Time for Fibonacci Cloud Function.
In Figures 6 and 7, we computed ratios for AWS and the
local machines for the fibonacci and prime number use case.
Looking at the fibonacci lines first, the ratio is nearly constant
at 1.1 (AWS/H60) until 1135MB and 0.8 (AWS/H90) where
an increase after 1770MB is visible. These factors can be
used to predict the execution time on the platform by having
simulation data.
The reason for both increase and decrease is the multi-
threading aspect. H60 reaches the one CPU equivalent at
1135MB. After this, the execution time remains stable for
fibonacci on H60. Since the one CPU equivalent on AWS
is reached at 1770MB, the execution time on AWS decreases
until this value and therefore the ratio also decreases (enumer-
ator decreases by constant denominator). After 1770MB, both
19All data and R scripts are published with the version of our prototype:
environments (H60 and AWS) have the same issue and the
ratio is constant again. For the red curve in Figure 6 there is
a slight increase of the ratio visible after 1770MB.
Functions running on H90 profit from the increase in CPU
until 2505MB (the one CPU equivalent on H90), therefore
the enumerator is constant, but the denominator decreases
resulting in an increase in the ratio. So for the prediction
of execution times from the local to the platform time the
inclusion of both, the multi-threading behavior as well as the
CPU-memory equivalents, are important.
0 2000 4000 6000 8000 10000
0.0 0.5 1.0 1.5 2.0
Trends in Predicting Prime Execution Time
Memory Setting
Fig. 7. Trends in Prediction Provider Execution Time by Local Execution
Time for Prime Number Cloud Function.
Figure 7 shows the trends for the prime number cloud
function. The ratio on H90 is between 0.73 (7936MB) and
1.08 (2560MB). In the case of 2560MB, AWS already profits
from a portion of the second CPU, whereas H90 only exceeded
the first CPU-equivalent. In the case of 7936MB, three full
CPUs compute the results locally, whereas AWS works with a
portion of the fifth CPU based on our interpolation. For H60
and AWS the ratios are between 1.32 and 1.79. The simulation
was only possible until 4707 MB (four CPU equivalent on H60
and therefore system limit). As shown in Figure 5, at 2560MB
and 2816MB the executions on AWS Lambda were slower
than the optimal values when comparing to the curve. These
higher values result in a higher ratio. The reasons for this
performance decrease can be manifold and will be discussed
in further research.
A. Discussion of the Results
Most importantly, FaaS enables distributed computing for
the 99% [48]. However, it is important to consider multi-
threaded functions and their performance gain when using
more resources on the provider side. We hope to raise aware-
ness for this aspect with the methodology and evaluation we
provided. The calibration and mapping of local and platform
environments enables a simulation of the function behavior.
Furthermore, our trend curves show that it is possible to predict
a range of ratios for best, worst and average case of the
expected execution times on the provider when running the
function locally. As input we need a calibration as well as
some sample executions to have first hints on how different
environments can be compared and function characteristics
influence the overall simulation.
B. Threats to Validity
Dev-Prod Parity - The more similar the simulation environ-
ment is to the production environment on a provider platform,
the better comparable are the results. In our approach we use
a virtualized environment, where we execute all calibrations
and simulations in Docker containers. AWS Lambda uses an
additional VM layer to separate VMs from different tenants
on the same physical host [12]. This does not lead to uncom-
parable environments but it should be kept in mind that the
system stacks are different and adjusted in future work.
Minimal Scope - Our evaluation was executed with a
limited scope. We only use CPU intensive functions to evaluate
our approach which is quite common in system research in the
FaaS area [61]. The benefit of this strategy is the better control
over settings and functions which aids the interpretation of
the results. Therefore, we decided to stick to the CPU-bound
functions fibonacci and prime number search.
Another aspect of minimal scope is our focus solely on a
single cloud function in isolation. A typical usage of FaaS,
for example, is storing state in a database where requests and
communication over the wire influences the overall application
Single Provider - Our evaluation is limited to a single
provider in a single region (eu-central-1). There are other
publication (e.g. [45], [60]) which show that different regions
can make a difference - even for the same provider.
Sample Size - As for each empirical evaluation the sample
size is questionable. Overall since a single calibration run on
H60 took round about 6 hours, we only conducted 25 of them,
which is sufficient from our point of view since the deviation
of the results is minimal. For the evaluation section, especially
the prime number run, an experiment with more datapoints or
a further statistical evaluation might disclose further insights.
The discussion and presentation of a simulation based
approach where the execution on a developer’s machine re-
veals insights in the platform behavior of the function is an
important step in our overall simulation and benchmarking
Our ideas for future research are threefold. Firstly, we plan
to extend the scope of this work to additional providers (also
open source platforms) and different physical regions within a
single provider’s offerings. Especially the prediction of execu-
tion times in the cloud is in focus of the next step. Furthermore,
we plan to implement a visualizer for our research prototype
to show the evaluation presented here during the development
Secondly, since we focused solely on the execution time
locally and on the platform, we did not discuss the difference
of cold vs. warm executions. This aspect, as well as the load
pattern, is in focus when proceeding in the implementation of
our prototype.
Finally, the dev-prod parity and the interaction mechanism
with other services are especially interesting when using FaaS
and influences the performance and the execution behavior.
We plan to add a scenario of a database interaction in the
next sprint and also want to look at the provider portfolios to
identify other components which are vital to use with cloud
[1] P. Castro et al., “Serverless Programming (Function as a Service),” in
Proc. of ICDCS, 2017.
[2] M. Armbrust et al., “A View of Cloud Computing,Communications of
the ACM, vol. 53, no. 4, pp. 50–58, 2010.
[3] E. Jonas et al., “Cloud Programming Simplified: A Berkeley View on
Serverless Computing,” EECS Department, University of California,
Berkeley, Tech. Rep. UCB/EECS-2019-3, 2019.
[4] E. van Eyk et al., “The SPEC Cloud Group’s Research Vision on FaaS
and Serverless Architectures,” in Proc. of WoSC, 2017.
[5] A. Eivy, “Be Wary of the Economics of ”Serverless” Cloud Computing,”
IEEE Cloud Computing, vol. 4, no. 2, pp. 6–12, 2017.
[6] K. Kritikos and P. Skrzypek, “Simulation-as-a-Service with Serverless
Computing,” in Proc. of SERVICES, 2019.
[7] M. Shahrad et al., “Architectural implications of function-as-a-service
computing,” in Proc. of MICRO, 2019.
[8] R. Harms and M. Yamartino, “The economics of the cloud,” Microsoft
Corporation, Tech. Rep., 2010.
[9] S. Hendrickson et al., “Serverless Computation with openLambda,” in
Proc. of HotCloud, 2016.
[10] Z. Kozhirbayev and R. O. Sinnott, “A performance comparison of
container-based technologies for the cloud,” Future Generation Com-
puter Systems, vol. 68, pp. 175–182, 2017.
[11] M. Belair et al., “Leveraging Kernel Security Mechanisms to Improve
Container Security,” in Proc. of ARES, 2019.
[12] A. Agache et al., “Firecracker: Lightweight virtualization for serverless
applications,” in Proc. of NSDI, 2020.
[13] W. Lloyd et al., “Serverless Computing: An Investigation of Factors
Influencing Microservice Performance,” in Proc. of IC2E, 2018.
[14] M. M. Arif et al., “Empirical study on the discrepancy between
performance testing results from virtual and physical environments,
Empirical Software Engineering, vol. 23, no. 3, pp. 1490–1518, 2017.
[15] R. Weing¨
artner et al., “Cloud resource management: A survey on
forecasting and profiling models,” Journal of Network and Computer
Applications, vol. 47, pp. 99–106, 2015.
[16] T. Moseley et al., “Shadow profiling: Hiding instrumentation costs with
parallelism,” in Proc. of CGO, 2007.
[17] J. Mace et al., “Pivot tracing,” in Proc. of SOSP, 2015.
[18] J. Manner et al., “Troubleshooting serverless functions: a combined
monitoring and debugging approach,” Software-Intensive Cyber-Physical
Systems, vol. 34, no. 2-3, pp. 99–104, 2019.
[19] G. Ren et al., “Google-wide profiling: A continuous profiling infrastruc-
ture for data centers,” IEEE Micro, vol. 30, no. 4, pp. 65–79, 2010.
[20] A. Cuomo et al., “Simulation-based performance evaluation of cloud
applications,” in Proc. of IDC, 2013.
[21] S. Winzinger and G. Wirtz, “Applicability of coverage criteria for
serverless applications,” in Proc. of SOSE, 2020.
[22] A. Pi et al., “Profiling distributed systems in lightweight virtualized
environments with logs and resource metrics,” in Proc. of HPDC, 2018.
[23] E. Casalicchio and V. Perciballi, “Measuring docker performance,” in
Proc. of ICPE Companion, 2017.
[24] F. A. Oliveira et al., “A cloud-native monitoring and analytics frame-
work,” IBM Research Division Thomas J. Watson Research Center,
Tech. Rep. RC25669 (WAT1710-006), 2017.
[25] G. D’Angelo and R. D. Grande, “Guest editors’ introduction: Special
issue on simulation in (and of) the cloud,” Simulation Modelling Practice
and Theory, vol. 58, pp. 113–114, 2015.
[26] E. Barbierato et al., “Exploiting CloudSim in a multiformalism modeling
approach for cloud based systems,” Simulation Modelling Practice and
Theory, vol. 93, pp. 133–147, 2019.
[27] R. Buyya and M. Murshed, “GridSim: a toolkit for the modeling and
simulation of distributed resource management and scheduling for grid
computing,” Concurrency and Computation: Practice and Experience,
vol. 14, no. 13-15, pp. 1175–1220, 2002.
[28] D. Fern´
andez-Cerero et al., “SCORE: Simulator for cloud optimization
of resources and energy consumption,” Simulation Modelling Practice
and Theory, vol. 82, pp. 160–173, 2018.
[29] D. Kliazovich et al., “GreenCloud: a packet-level simulator of energy-
aware cloud computing data centers,The Journal of Supercomputing,
vol. 62, no. 3, pp. 1263–1283, 2010.
[30] W. Tian et al., “Open-source simulators for cloud computing: Compar-
ative study and challenging issues,Simulation Modelling Practice and
Theory, vol. 58, pp. 239–254, 2015.
[31] F. Fakhfakh et al., “Simulation tools for cloud computing: A survey and
comparative study,” in Proc. of ICIS, 2017.
[32] U. U. Rahman et al., “Nutshell—simulation toolkit for modeling data
center networks and cloud computing,” IEEE Access, vol. 7, pp. 19 922–
19 942, 2019.
[33] A. Ismail, “Energy-driven cloud simulation: existing surveys, simulation
supports, impacts and challenges,” Cluster Computing, vol. 23, no. 4,
pp. 3039–3055, 2020.
[34] R. N. Calheiros et al., “CloudSim: a toolkit for modeling and simulation
of cloud computing environments and evaluation of resource provision-
ing algorithms,” Software: Practice and Experience, vol. 41, no. 1, pp.
23–50, 2010.
[35] S. F. Piraghaj et al., “ContainerCloudSim: An environment for modeling
and simulation of containers in cloud data centers,” Software: Practice
and Experience, vol. 47, no. 4, pp. 505–521, 2016.
[36] H. Johng et al., “Estimating the Performance of Cloud-Based Systems
Using Benchmarking and Simulation in a Complementary Manner,” in
Proc. of ICSOC, 2018.
[37] N. Kratzke and R. Siegfried, “Towards cloud-native simulations –
lessons learned from the front-line of cloud computing,” The Journal
of Defense Modeling and Simulation: Applications, Methodology, Tech-
nology, pp. 39–58, 2020.
[38] N. Mahmoudi and H. Khazaei, “Simfaas: A performance simulator for
serverless computing platforms,” in Proc. of CLOSER 2021 (to appear),
[39] H. Jeon et al., “A CloudSim-extension for simulating distributed
functions-as-a-service,” in Proc. of PDCAT, 2019.
[40] C. Lin and H. Khazaei, “Modeling and optimization of performance
and cost of serverless applications,” IEEE Transactions on Parallel and
Distributed Systems, vol. 32, no. 3, pp. 615–632, 2021.
[41] S. Horovitz et al., “FaaStest - machine learning based cost and perfor-
mance FaaS optimization,” in Proc. of GECON, 2019.
[42] J. Manner and G. Wirtz, “Impact of Application Load in Function as a
Service,” in Proc. of SummerSoC, 2019.
[43] A. Reuter et al., “Cost efficiency under mixed serverless and serverful
deployments,” in Proc. of SEAA, 2020.
[44] S. Eismann et al., “Sizeless: Predicting the optimal size of serverless
functions,” arXiv e-Prints: 2010.15162, 2020.
[45] R. Cordingly et al., “Predicting performance and cost of
serverless computing functions with SAAF,” in Proc. of
DASC/PiCom/CBDCom/CyberSciTech, 2020.
[46] M. Zakarya and L. Gillam, “Modelling resource heterogeneities in
cloud simulations and quantifying their accuracy,Simulation Modelling
Practice and Theory, vol. 94, pp. 43–65, 2019.
[47] T. Back and V. Andrikopoulos, “Using a Microbenchmark to Compare
Function as a Service Solutions,” in Service-Oriented and Cloud Com-
puting. Springer International Publishing, 2018, pp. 146–160.
[48] E. Jonas et al., “Occupy the Cloud: Distributed Computing for the 99%,
in Prof. of SoCC, 2017.
[49] L. Wang et al., “Peeking Behind the Curtains of Serverless Platforms,
in Proc. of USENIX ATC, 2018.
[50] J. Manner, “Towards Performance and Cost Simulation in Function as
a Service,” in Proc. of ZEUS, 2019.
[51] A. Sodan et al., “Parallelism via multithreaded and multicore CPUs,”
Computer, vol. 43, no. 3, pp. 24–32, 2010.
[52] B. Sprunt, “The basics of performance-monitoring hardware,” IEEE
Micro, vol. 22, no. 4, pp. 64–71, 2002.
[53] J. J. Dongarra et al.,LINPACK Users’ Guide. Society for Industrial
and Applied Mathematics, 1979.
[54] ——, “The linpack benchmark: past, present and future,” Concurrency
and Computation: Practice and Experience, vol. 15, no. 9, pp. 803–820,
[55] J. Kim and K. Lee, “FunctionBench: A suite of workloads for serverless
cloud function service,” in Proc. of CLOUD, 2019.
[56] M. Malawski et al., “Benchmarking Heterogeneous Cloud Functions,”
in Proc. of Euro-Par, 2018.
[57] J. Manner et al., “Cold Start Influencing Factors in Function as a
Service,” in Proc. of WoSC, 2018.
[58] H. Lee et al., “Evaluation of Production Serverless Computing Environ-
ments,” in Proc. of WoSC, 2018.
[59] T. Kalibera and R. Jones, “Rigorous benchmarking in reasonable time,
in Proc. of ISMM, 2013.
[60] J. O’Loughlin and L. Gillam, “Performance evaluation for cost-efficient
public infrastructure cloud use,” in Proc. of GECON, 2014.
[61] J. Scheuner and P. Leitner, “Function-as-a-service performance evalua-
tion: A multivocal literature review,” Journal of Systems and Software,
vol. 170, p. 110708, 2020.
[62] P. Vahidinia et al., “Cold start in serverless computing: Current trends
and mitigation strategies,” in Proc. of COINS, 2020.
... It makes a difference if an on-premise hosted platform is deployed on a server with an Intel Xeon Gold processor or on a consumer machine with an Intel i7. Therefore, open source tools cannot guarantee execution time ranges for functions nor abstract performance measures like MIPS [9] or GFLOPS [10] for different function settings. Furthermore, private clusters are limited in their capability to run bursty workloads because of the trade-off between utilization and cost efficiency. ...
... For AWS Lambda the CPU is assigned proportionally as already mentioned but what does this mean for the assigned CPU resources, i.e. linearly, exponentially or an unknown scaling behavior? When looking at CPU intensive tasks, like LINPACK [10], [23]- [25] or recursive Fibonacci implementations [10], [26], [27], we can state that CPU resources are linearly assigned based on the configured memory. This is only the case for functions executed on homogeneous hardware as experiments on AWS Lambda showed where heterogeneous hardware is also present [26], [28]- [31]. ...
... For AWS Lambda the CPU is assigned proportionally as already mentioned but what does this mean for the assigned CPU resources, i.e. linearly, exponentially or an unknown scaling behavior? When looking at CPU intensive tasks, like LINPACK [10], [23]- [25] or recursive Fibonacci implementations [10], [26], [27], we can state that CPU resources are linearly assigned based on the configured memory. This is only the case for functions executed on homogeneous hardware as experiments on AWS Lambda showed where heterogeneous hardware is also present [26], [28]- [31]. ...
Conference Paper
Full-text available
Open-source offerings are often investigated when comparing their features to commercial cloud offerings. However, performance benchmarking is rarely executed for open-source tools hosted on-premise nor is it possible to conduct a fair cost comparison due to a lack of resource settings equivalent to cloud scaling strategies. Therefore, we firstly list implemented resource scaling strategies for public and open-source FaaS platforms. Based on this we propose a methodology to calculate an abstract performance measure to compare two platforms with each other. Since all open-source platforms suggest a Kubernetes deployment, we use this measure for a configuration of open-source FaaS platforms based on Kubernetes limits. We tested our approach with CPU intensive functions, considering the difference between single-threaded and multi-threaded functions to avoid wasting resources. With regard to this, we also address the noisy neighbor problem for open-source FaaS platforms by conducting an instance parallelization experiment. Our approach to limit resources leads to consistent results while avoiding an overbooking of resources.
Conference Paper
Full-text available
Serverless functions are an emerging cloud computing paradigm that is being rapidly adopted by both industry and academia. In this cloud computing model, the provider opaquely handles resource management tasks such as resource provisioning, deployment, and auto-scaling. The only resource management task that developers are still in charge of is selecting how much resources are allocated to each worker instance. However, selecting the optimal size of serverless functions is quite challenging, so developers often neglect it despite its significant cost and performance benefits. Existing approaches aiming to automate serverless functions resource sizing require dedicated performance tests, which are time-consuming to implement and maintain. In this paper, we introduce an approach to predict the optimal resource size of a serverless function using monitoring data from a single resource size. As our approach does not require dedicated performance tests, it enables cloud providers to implement resource sizing on a platform level and automate the last resource management task associated with serverless functions. We evaluate our approach on four different serverless applications on AWS, where it predicts the execution time of the other memory sizes based on monitoring data for a single memory size with an average prediction error of 15.3%. Based on these predictions, it selects the optimal memory size for 79.0% of the serverless functions and the second-best memory size for 12.3% of the serverless functions, which results in an average speedup of 39.7% while also decreasing average costs by 2.6%.
Conference Paper
Full-text available
Developing accurate and extendable performance models for serverless platforms, aka Function-as-a-Service (FaaS) platforms, is a very challenging task. Also, implementation and experimentation on real serverless platforms is both costly and time-consuming. However, at the moment, there is no comprehensive simulation tool or framework to be used instead of the real platform. As a result, in this paper, we fill this gap by proposing a simulation platform, called SimFaaS, which assists serverless application developers to develop optimized Function-as-a-Service applications in terms of cost and performance. On the other hand, SimFaaS can be leveraged by FaaS providers to tailor their platforms to be workload-aware so that they can increase profit and quality of service at the same time. Also, serverless platform providers can evaluate new designs, implementations, and deployments on SimFaaS in a timely and cost-efficient manner. SimFaaS is open-source, well-documented, and publicly avail able, making it easily usable and extendable to incorporate more use case scenarios in the future. Besides, it provides performance engineers with a set of tools that can calculate several characteristics of serverless platform internal states, which is otherwise hard (mostly impossible) to extract from real platforms. In previous studies, temporal and steady-state performance models for serverless computing platforms have been developed. However, those models are limited to Markovian processes. We designed SimFaaS as a tool that can help overcome such limitations for performance and cost prediction in serverless computing. We show how SimFaaS facilitates the prediction of essential performance metrics such as average response time, probability of cold start, and the average number of instances reflecting the infrastructure cost incurred by the serverless computing provider. We evaluate the accuracy and applicability of SimFaaS by comparing the prediction results with real-world traces from Amazon AWS Lambda.
Full-text available
Function-as-a-Service (FaaS) and serverless applications have proliferated significantly in recent years because of their high scalability, ease of resource management, and pay-as-you-go pricing model. However, cloud users are facing practical problems when they migrate their applications to the serverless pattern, which are the lack of analytical performance and billing model and the trade-off between limited budget and the desired quality of service of serverless applications. In this paper, we fill this gap by proposing and answering two research questions regarding the prediction and optimization of performance and cost of serverless applications. We propose a new construct to formally define a serverless application workflow, and then implement analytical models to predict the average end-to-end response time and the cost of the workflow. Consequently, we propose a heuristic algorithm named Probability Refined Critical Path Greedy algorithm (PRCP) with four greedy strategies to answer two fundamental optimization questions regarding the performance and the cost. We extensively evaluate the proposed models by conducting experimentation on AWS Lambda and Step Functions. Our analytical models can predict the performance and cost of serverless applications with more than 98% accuracy. The PRCP algorithms can achieve the optimal configurations of serverless applications with 97% accuracy on average.
Full-text available
Function-as-a-Service (FaaS) is one form of the serverless cloud computing paradigm and is defined through FaaS platforms (e.g., AWS Lambda) executing event-triggered code snippets (i.e., functions). Many studies that empirically evaluate the performance of such FaaS platforms have started to appear but we are currently lacking a comprehensive understanding of the overall domain. To address this gap, we conducted a multivocal literature review (MLR) covering 112 studies from academic (51) and grey (61) literature. We find that existing work mainly studies the AWS Lambda platform and focuses on micro-benchmarks using simple functions to measure CPU speed and FaaS platform overhead (i.e., container cold starts). Further, we discover a mismatch between academic and industrial sources on tested platform configurations, find that function triggers remain insufficiently studied, and identify HTTP API gateways and cloud storages as the most used external service integrations. Following existing guidelines on experimentation in cloud systems, we discover many flaws threatening the reproducibility of experiments presented in the surveyed studies. We conclude with a discussion of gaps in literature and highlight methodological suggestions that may serve to improve future FaaS performance evaluation studies.
Full-text available
A large scale cloud data center is needed to provision various applications in different domains. As a result, power consumption is expected to increase due to huge operations and expansion of cloud data centers. Furthermore, it also intensifies environment concern. Various approaches and solutions for energy-driven cloud data center have been proposed to overcome this challenge. Testing and evaluating these solutions in large scale is costly and time consuming. Hence, simulation techniques become the preferred approach to tackle this concern. There are a few cloud simulators have been developed with different features and capabilities which can be chosen for this reason. A survey work can serve as a guideline. A few cloud simulation surveys have been done but limited survey is found for energy-driven cloud simulation. This review complements the existing surveys by considering different aspects of energy-driven cloud simulators. Therefore, this paper presents a review of existing cloud simulation surveys with several classifications. Furthermore, it provides some insights of the selected cloud simulators by emphasizing on the energy-driven simulation supports and the impact of the cloud simulators in succeeding works. This paper also highlights open and future challenges.