Content uploaded by Johannes Manner
Author content
All content in this area was uploaded by Johannes Manner on Sep 09, 2019
Content may be subject to copyright.
--- PREPRINT ---
Impact of Application Load in
Function as a Service
Johannes Manner and Guido Wirtz
DSG, University Bamberg, An der Weberei 5, 96047 Bamberg, Germany
{johannes.manner,guido.wirtz}@uni-bamberg.de
Abstract. Function as a Service (FaaS) introduces a different notion of
scaling than related paradigms. The unlimited upscaling and the prop-
erty of downscaling to zero running containers leads to a situation where
the application load influences the number of running containers directly.
We propose a combined simulation and benchmarking process for cloud
functions to provide information on the performance and cost aspect for
developers in an early development stage. Our focus in this paper is on
simulating the concurrently running containers on a FaaS platform based
on different function configurations. The experiment performed serves as
a proof of concept work and emphasizes the importance for design deci-
sions and system requirements. Especially for self-hosted FaaS platforms
or resources bound to cloud functions like database connections, this
information is crucial for deployment and maintenance.
Keywords: Serverless Computing, Function as a Service, FaaS, Bench-
marking, Simulation, Load Profile
1 Introduction
As in every virtualization technology, a cloud function container faces some
performance challenges when it is created and executed for the first time. To
estimate the quality of serivce a system delivers, benchmarking applications is
crucial. Huppler [1] stated that a benchmark should be relevant,fair,ver-
ifiable,economical and repeatable. There does exist a bunch of experiments,
e.g. [2,3,4,5,6], executed in the FaaS domain to assess this new paradigm in the
cloud stack.
We investigate these benchmarks based on the requirements of Huppler. All
of these benchmarks make a performance evaluation of one or more FaaS plat-
forms compared to each other or related technologies like Virtual Machine (VM)
based solutions. The biggest issue in general is the repeatability of a bench-
mark since the targeted field is highly evolving. Another problem is the lack
of information about the settings and other influential factors of the mentioned
experiment. Results are discussed in detail for every of these publications, but
only a few of them describe all the necessary steps to repeat the experiment
--- PREPRINT ---
2 J. Manner, G. Wirtz
and verify the findings, as Kuhlenkamp and Werner [7] ascertained. All the
benchmarks in their literature study are FaaS related and conducted since 2015.
They gave each experiment a score between 0 and 4 to assess quality of the
presented work. Workload generator, function implementation, platform config-
uration and other used services are the categories of their systematic literature
study. The mean average was 2.6, which indicates that a lot of information is
missing in the conducted benchmarks. Only 3 out of 26 experiments supplied
all preconditions and parameters needed to make it possible to reproduce the
presented results. Therefore, results of different benchmarks are often not com-
parable to each other. The first category, generation of load patterns and their
topology, is the least discussed item. Authors of FaaS benchmarking papers only
write in every third publication about the load pattern aspect.
As the load pattern topology has a major influence on the scaling behavior of
a cloud function platform, we focus on this aspect here. This is also important
for software architects constructing hybrid architectures which need informa-
tion about the incoming request rate in the non-FaaS part of their systems.
Otherwise, the FaaS part of applications can cause Distributed Denial of Ser-
vice (DDos) attack on other parts. Our paper stresses this aspect in particular by
(i) discussing different ways to specify load patterns, (ii) proposing a workflow
for a combined FaaS simulation and benchmarking process and (iii) presenting a
methodology to compute the number of running instances out of the respective
load trace.
The outline of the paper is as follows. Section 2 discusses related work and
answers the first contribution, which load generation tools are suited to specify
application workloads. Section 3 proposes a generic workflow for a simulation and
benchmarking process of cloud functions and picks a single aspect, the number
of concurrently running functions as a proof of concept. The paper concludes
with a discussion in Section 4 and an outlook in Section 5.
2 Related Work
2.1 Benchmarking FaaS
The open challenges Iosup and others [8] mentioned in their publication for
Infrastructure as a Service (IaaS) benchmarking are partly-open challenges for
FaaS as well. There is currently a lack of methodological approaches to bench-
mark cloud functions consistently. Malawski and others [6,9] conducted scien-
tific workflow benchmarks and built their benchmarking pipeline based on the
serverless framework 1. They publish their benchmarking results continuously,
but do not include simulations, which would reduce cost and time. Similar to
this approach, Scheuner and Leitner [10] introduced a system, where micro
and application benchmarks are combined. Especially the micro benchmarking
aspect is interesting for a consistent FaaS methodology since typically a single
cloud function is the starting point. Three different load patterns are part of
1https://serverless.com/framework/
--- PREPRINT ---
Impact of Application Load in FaaS 3
their contribution but hidden in the implementation of their system and there-
fore not directly mentioned, as in many other FaaS publications. These initial
benchmarks focusing on a single aspect in isolation are important steps to un-
derstand the impact on system design and execution, but they are quite difficult
to setup and need a lot of time for execution, as Iosup already mentioned for
IaaS benchmarking. So, [8] proposes a combination of simulating small sized ar-
tificial workloads and conducting real world experiments as the most promising
approach to get stable results with least effort in time and money.
2.2 Load Patterns in Conducted Experiments
The ”job arrival pattern” [8] is critical for the performance of any System Un-
der Test (SUT). Especially in FaaS, where scaling is determined by the given
input workload. To perform repeatable benchmarks and enable a simulation of
cloud functions under different external circumstances, the documentation of
load patterns is critical. As mentioned in the introduction, this is the least dis-
cussed aspect, but some authors explained their workload in detail, as discussed
by [7]. These descriptions are not sufficient:
–McGrath and Brenner [11] performed a concurrency test and a backoff
test. The concurrency test featured 15 test executions. Each of them at 10
seconds intervals with an increasing number of concurrent request. For the
first test execution only 1 request was started and in the last execution
15 concurrent requests were submitted. This was repeated 10 times. The
backoff test performed a single invocation from 1 to 30 minutes pausing
time between the invocations to investigate the expiration time of a cloud
function container and the impact of cold start on execution performance.
–Lee and others [4] focused on concurrency tests. First they measured the
function throughput per second by invoking the cloud functions 500, 1,000,
2,000, 3,000 and 10,000 times. The time between invocations was not men-
tioned. Therefore, it is not clear if the second call used the already warm
containers from the first execution. Furthermore, they investigated different
aspects with 1 request at a time and 100 concurrent requests and a few other
settings, but also not informed the reader about wait time between calls or
the exact distribution.
–Figiela and others [12] conducted two CPU intensive benchmarks. The first
one was executed every 5 minutes and invoked the different functions once.
The second experiment used a fork-join model and executed the tasks in
parallel for 200, 400 and 800 concurrent tasks. The number of repetitions
and the corresponding wait time between them were not mentioned and
maybe not present.
–Back and Andrikopoulos [2] used fast fourier transformation,matrix ma-
nipulation and sleep use cases for their benchmark. They parameterized each
function and executed each combination once a day on three consecutive
days. It is unclear to the reader, if all of these measurements resulted in a
cold start on the respective providers. Also the results are prone to outliers
since the sample size with 3 executions per combination can distort findings.
--- PREPRINT ---
4 J. Manner, G. Wirtz
–Das and others [13] implemented a sequential benchmark of cloud and edge
resources, where the time of two consecutive invocations was between 10 and
15 seconds to avoid concurrent request executions. There is no information
how the authors dealt with the first invocation of a cloud function.
Manner and others [14] focused on the cold start overhead in FaaS. There-
fore, they defined a sequential load pattern to generate pairs of a single cold and
warm start to compare the performance on a container basis. Warm starts were
executed 1 minute after the cold execution returns. After the pair was executed,
the pattern paused for 29 minutes to achieve a shutdown of the container. W.r.t.
the load pattern aspect, the experiments in this publication are reproducible and
all necessary information is described to repeat them.
All the presented workloads are artificial load patterns, where some reduc-
tions are made for simplicity and to assess a single detail or use case in FaaS. It
is often unclear if the authors used an established load generation tool or imple-
mented a proprietary interface for submitting the workload. There is currently
a lack in experiments, which use real world load traces.
2.3 Load Generation Tools
Before discussing load generation tools, the kind of application load is important
for any benchmark or simulation. Schroeder and others [15] defined three of
them: Closed, open and partly-open systems. Closed systems can predict, based
on other parts of the system, how many incoming request will arise. In contrast,
the workload of an open system is not predictable since users access the service
randomly via an interface. Partly-open is a combination of both.
We only focus on open systems since a single cloud function is in focus of our
work and has therefore no other dependencies. There exists a recent study about
workload generators for web-based systems [16], where a comprehensive collection
is presented and a lot of generation tools are compared. For benchmarking FaaS,
an arrival rate of requests is the needed input. Therefore, we picked two tools to
generate workloads as a reference here. Tools like JMeter 2focus on controlled
workloads with constant, linear or stepwise increasing loads. This behavior is
especially important to generate clean and clear experimental setups to isolate
different aspects under investigation. Based on these ideas, we also implemented
some benchmarking modes in our prototype 3to control the execution of requests
based on our needs and added some instrumentation to compare the execution
time on the platform and on a local machine, submitting the requests. On the
other hand, there are tools to model real world load traces based on seasonal,
bursty, noisy and trend parts like LIMBO [17]. LIMBO enables the generation
of a load pattern based on an existing trace or via combination of mathematical
functions. In contrast to JMeter, where the load can be directly submitted,
LIMBO decouples the load generation and the submission via another tool, as
suggested by [16].
2http://jmeter.apache.org/
3https://github.com/johannes-manner/SeMoDe
--- PREPRINT ---
Impact of Application Load in FaaS 5
3 FaaS Benchmarking and Simulation
3.1 Combined Workflow
Describe
Benchmark
Settings
Describe
Workload
Specification
Perform
Simulation
Deploy Cloud
Function
Submit
Workload
Store Results
Prediction suited
for the use case?
Store Simulation
Result
Analzye
Execution Data
Compare
Platform and
Simulation Data
Yes
No
Page 1 of 1
14.05.201
9
file:///C:/Users/jmanner/Documents/Privat/03_Paper/2019_SummerSoC/Paper/pipeline.sv
g
Fig. 1: Generic Pipeline for FaaS Benchmarking
Figure 1 presents a generic pipeline for FaaS benchmarking inspired by Io-
sup [8]. The SUT is not explicitly mentioned since a single cloud function is
the SUT in our approach. Memory setting, the size of the deployment artifact
etc. [14] directly influence the execution time and are therefore relevant, in com-
bination with the load pattern, to assess the concurrently running containers.
After providing the cloud function, the load pattern and the mentioned meta-
data, our prototype starts the simulation. After simulation is done, the user has
to decide, if the simulated values are suited for the use case, e.g. if the number
of concurrently running instances not exceeding a limit, or, if he has to adjust
the values and starts another simulation, e.g. raising the memory setting and
reducing the overall execution time. In the latter case, the prototype stores this
interim result for a later comparison with the next simulation round, where a
developer can assess which setting results in better cost and performance.
If the simulation is satisfying for the user, our prototype deploys the function
using serverless framework and submits the workload based on the load pattern.
Our prototype uses synchronous Representational State Transfer (REST) calls
to generate events on the FaaS platform as introduced in [14]. This behavior
is similar to the direct executor model as proposed by [9]. Subsequently, the
user analyses platform execution data and compares the results with predicted
values of the simulation. Finally, the results are stored for further improving the
simulation framework, proposed in a prior paper [18].
3.2 Simulating Number of Cloud Function Containers
This section focuses on a single piece of the presented pipeline in Figure 1:
Perform Simulation. We investigate only one aspect of this piece: Number of
running containers. An important aspect is execution time w.r.t. to different
function configurations. Also the input of functions highly influences the runtime
performance, e.g. sorting algorithms. We tackle this problem of varying execution
--- PREPRINT ---
6 J. Manner, G. Wirtz
times in future work, when refining the simulation engine. Further aspects are the
associated cost impact, effects on used backend services like cloud databases etc.
If the simulation exposes a high number of concurrently running containers and
the concurrency level is problematic, the developer could throttle the incoming
requests by using a queue etc.
Algorithm 1 is implemented by our prototype. A comparison to used schedul-
ing algorithms in open source FaaS platforms is outstanding. Currently, the sim-
ulation uses the mean average execution time (exec) of the investigated cloud
function, mean cold start time (cold) and idle time for container shutdown (shut-
down). The timeStamps are a list of double values marking the start time of a
request and are created manually. Statistical deviations are not included in this
proposed simulation approach, but planned for future work. The gateway spawns
events and triggers the function under test. Multi-tenancy is not included in our
algorithm, but has an impact on performance and execution time as Heller-
stein and others [19] stated.
Algorithm 1 Basic Simulation - Number of Containers
1: procedure simulate(exec, cold, shutdown, timeStamps))
2: for time in timeStamps do
3: checkF inishedC ontainers(time)
4: shutdownIdleContainers(time, shutdown)
5: if idleContainerAvailable() then
6: pickIdleAndExecute(exec)
7: else
8: spinUpAndExecute(col d, exec)
9: end if
10: end for
11: shutdownAllC ontainers()
12: generateC ontainerDistribution()
13: end procedure
In line 3, the program checks, if some of the containers have finished their
execution at time and sets these containers in an idle state. The next function
shuts all idle containers down, which exceed the shutdown time. At this point,
the internal state of the simulation is clean and the next request can be executed
either from an already warm container (line 5, 6) or a new instance (line 8), which
is affected by a cold start. If all request are served, the prototype produces a
distribution, how many containers are running on the basis of seconds.
4 Discussion
Figure 2 depicts an initial load trace and two corresponding simulations. The
colored numbers are counts on a second basis and show the number of incoming
requests (orange) and the number of concurrently running containers (yellow
--- PREPRINT ---
Impact of Application Load in FaaS 7
and gray). The input trace is artificially created and the values 4for these two
simulations are chosen w.r.t. a prior investigation [14] 5.
Timestamp InitialDistribution SimulationInput[10.0,0.3,1800.0] SimulationInput[5.1,0.25,1800.0]
0333
1366
2 4 10 10
3 7 17 17
4 6 23 23
5 4 27 25
6 3 30 25
7 3 33 25
8 3 36 25
9 3 39 25
10 6 43 25
11 6 46 25
12 6 49 25
13 4 49 26
14 4 49 27
15 4 48 28
16 0 43 25
17 0 40 18
18 0 37 12
19 0 34 8
20 0 31 4
21 0 25 0
22 0 19 0
23 0 12 0
24080
25000
334
7643333
666444
0000000000
3
6
10
17
23
27
30
33
36
39
43
46
49 49 49 48
43
40
37
34
31
25
19
12
8
0
3
6
10
17
23 25 25 25 25 25 25 25 25 26 27 28
25
18
12
8
4
00000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
InitialDistribution SimulationInput[10.0,0.3,1800.0] SimulationInput[5.1,0.2 5,1800.0]
Fig. 2: Example Distribution for an Artificial Load Trace
The execution time of the gray run (5.1) is roughly twice the execution time
of the yellow run (10.0). As for many FaaS platforms, like AWS Lambda 6
and Google Cloud Functions 7, the CPU resources are directly coupled with
the memory setting. We suppose, that for example the yellow cloud function is
deployed with a memory limit of 256 MB RAM, whereas the gray cloud function
is restricted to 128 MB RAM. Assumed, that the two functions are implemented
in Java, the cold start time is not affected in the same way since the JVM startup
is resource intensive in both cases. The shutdown time (1800.0) has no effect in
this example since the considered interval is too short.
Gray and yellow graphs show a start-up, an execution and a tear-down phase.
Our artificial distribution simulates a moderate load with a few invocations per
second. The start-up phase is similar for the first 5 seconds since after 5 seconds
the first containers are reused in the yellow simulation. Our execution phase is
only five seconds for the gray (second 11 to 16) compared to ten seconds for
the yellow simulation (second 6 to 16), but shows the impact of the supposed
runtime configuration on the number of running containers. For self-hosted FaaS
platforms or resources bound to cloud functions like database connections, the
difference between 28 or 49 concurrently running containers influence system
requirements and design decisions. The tear-down in the yellow case happens
faster due to the shorter execution time. The output load trace is missing in this
simulation.
4Compare the input values to Algorithm 1 - SimulationInput[exec,cold,shutdown].
5Source code, parameters and input trace are available on GitHub: https://github.
com/johannes-manner/SeMoDe/releases/tag/summersoc13
6https://aws.amazon.com/lambda
7https://cloud.google.com/functions/
--- PREPRINT ---
8 J. Manner, G. Wirtz
5 Future Work
The aim is to implement the suggested simulation and benchmarking pipeline in
our prototype. Therefore, the next step is to include an automated data picking
facility as Malawski and others [9] already implemented.
Our simulation model is based on a few parameters without statistical devi-
ation to keep the system deterministic. We want to extend the simulation in this
directions and also include the output load pattern of our simulation since this
output is maybe the input for another component of the overall (hybrid) appli-
cation. Furthermore, we want to conduct a few benchmarks on constant, linear
and bursty workloads to refine our simulation model and perform a realistic
proof-of-concept of our work and include the multi tenancy aspect.
To conclude, the topology of load pattern has a major influence on the num-
ber of running containers on the FaaS platforms. Our paper stresses this aspect
in particular and puts emphasis on the lack of documentation in conducted ex-
periments from the literature. The presented simulation is a first step in our
overall simulation approach towards predictability of platform behavior.
References
1. K. Huppler. The art of building a good benchmark. In Raghunath Nambiar and
Meikel Poess, editors, Performance Evaluation and Benchmarking, pages 18–30.
Springer, 2009.
2. T. Back and V. Andrikopoulos. Using a Microbenchmark to Compare Function as
a Service Solutions. In Service-Oriented and Cloud Computing. Springer, 2018.
3. D. Jackson and G. Clynch. An investigation of the impact of language runtime on
the performance and cost of serverless functions. In Proc. WoSC, 2018.
4. H. Lee et al. Evaluation of Production Serverless Computing Environments. In
Proc. WoSC, 2018.
5. W. Lloyd et al. Improving Application Migration to Serverless Computing Plat-
forms: Latency Mitigation with Keep-Alive Workloads. In Proc. WoSC, 2018.
6. M. Malawski et al. Benchmarking Heterogeneous Cloud Functions. In Dora B.
Heras and Luc Boug´e, editors, Euro-Par 2017: Parallel Processing Workshops,
pages 415–426. Springer International Publishing, 2018.
7. J. Kuhlenkamp and S. Werner. Benchmarking FaaS Platforms: Call for Community
Participation. In Proc. WoSC, 2018.
8. A. Iosup et al. IaaS Cloud Benchmarking: Approaches, Challenges, and Experience.
In Proc. MTAGS, 2012.
9. M. Malawski. Towards Serverless Execution of Scientific Workflows HyperFlow
Case Study. In Proc. WORKS, 2016.
10. J. Scheuner and P. Leitner. A Cloud Benchmark Suite Combining Micro and
Applications Benchmarks. In Proc. ICPE, 2018.
11. G. McGrath and P. R. Brenner. Serverless computing: Design, implementation,
and performance. In Proc. ICDCSW, 2017.
12. K. Figiela et al. Performance evaluation of heterogeneous cloud functions. Con-
currency and Computation: Practice and Experience, 2018.
13. A. Das et al. EdgeBench: Benchmarking edge computing platforms. In Proc.
WoSC, 2018.
--- PREPRINT ---
Impact of Application Load in FaaS 9
14. J. Manner et al. Cold Start Influencing Factors in Function as a Service. In Proc.
WoSC, 2018.
15. B. Schroeder et al. Open Versus Closed: A Cautionary Tale. In Proc. NSDI, 2006.
16. M. Curiel and A. Pont. Workload generators for web-based systems: Characteris-
tics, current status, and challenges. IEEE Communications Surveys & Tutorials,
20(2):1526–1546, 2018.
17. J. von Kistowski et al. Modeling and extracting load intensity profiles. ACM
Transactions on Autonomous and Adaptive Systems, 11(4):1–28, 2017.
18. J. Manner. Towards Performance and Cost Simulation in Function as a Service.
In Proc. ZEUS (accepted), 2019.
19. J. M. Hellerstein et al. Serverless Computing: One Step Forward, Two Steps Back.
In Proc. CIDR, 2019.