Conference PaperPDF Available

Cold Start Influencing Factors in Function as a Service


Abstract and Figures

Function as a Service (FaaS) is a young and rapidly evolving cloud paradigm. Due to its hardware abstraction, inherent virtualization problems come into play and need an assessment from the FaaS point of view. Especially avoidance of idling and scaling on demand cause a lot of container starts and as a consequence a lot of cold starts for FaaS users. The aim of this paper is to address the cold start problem in a benchmark and investigate influential factors on the duration of the perceived cold start. We conducted a benchmark on AWS Lambda and Microsoft Azure Functions with 49500 cloud function executions. Formulated as hypotheses, the influence of the chosen programming language, platform, memory size for the cloud function, and size of the deployed artifact are the dimensions of our benchmark. Cold starts on the platform as well as the cold starts for users were measured and compared to each other. Our results show that there is an enormous difference for the overhead the user perceives compared to the billed duration. In our benchmark, the average cold start overheads on the user's side ranged from 300ms to 24s for the chosen configurations.
Content may be subject to copyright.
Cold Start Influencing Factors
in Function as a Service
Johannes Manner, Martin Endreß, Tobias Heckel and Guido Wirtz
Distributed Systems Group
University of Bamberg
Bamberg, Germany
{johannes.manner, guido.wirtz}
{martin.endress, tobias-christian-juergen-lukas.heckel}
Abstract—Function as a Service (FaaS) is a young and rapidly
evolving cloud paradigm. Due to its hardware abstraction,
inherent virtualization problems come into play and need an
assessment from the FaaS point of view. Especially avoidance of
idling and scaling on demand cause a lot of container starts and
as a consequence a lot of cold starts for FaaS users. The aim of
this paper is to address the cold start problem in a benchmark
and investigate influential factors on the duration of the perceived
cold start.
We conducted a benchmark on AWS Lambda and Microsoft
Azure Functions with 49500 cloud function executions. Formu-
lated as hypotheses, the influence of the chosen programming
language, platform, memory size for the cloud function, and size
of the deployed artifact are the dimensions of our benchmark.
Cold starts on the platform as well as the cold starts for users
were measured and compared to each other. Our results show
that there is an enormous difference for the overhead the user
perceives compared to the billed duration. In our benchmark,
the average cold start overheads on the user’s side ranged from
300ms to 24s for the chosen configurations.
Index Terms—Serverless Computing, Function as a Service,
FaaS, Cloud Functions, Cold Start, Benchmarking
Function as a Service (FaaS) and Serverless Computing
are often used in conjunction, but there is a difference [1].
Serverless in general is a more abstract phrase, whereas FaaS
focuses on event driven cloud functions. Therefore, we use the
term FaaS to stick to a precise phrasing of the investigated
cloud service model.
Cold starts are an inherent problem of virtualization tech-
niques. Cloud functions are executed in containers. The first
execution of a cloud function experiences a cold start since
the container must be started prior to the execution. Due to
performance reasons, FaaS providers do not shut down the
containers immediately. Subsequent executions use spawned
but existing containers to profit from the warm execution
environments. Avoidance of idling and scaling on demand are
game changers compared to other cloud service models, but
entail more cold starts.
So far, cold starts are perceived as a system-level chal-
lenge [2], [3] and were not directly investigated to our knowl-
edge. Efforts are made to circumvent the cold start problem
by pinging the cloud function on a regular basis [4], [5]. This
ping hack1sins against the scale to zero principle of FaaS.
Therefore, we are motivated to research factors that influ-
ence the cold start and pose the following research questions:
RQ1: Which factors written down as hypotheses influence
the cold start of cloud functions?
RQ2: How to benchmark cold starts of cloud functions
consistently to get repeatable experiments and results?
Based on these questions, we formulated influencing factors
as hypotheses and provided a pipeline to execute repeatable
benchmarks. Our benchmark is designed to be relevant, re-
peatable, fair, verifiable and economical [6].
The agenda is as follows: In Section II, we present other
benchmarks on FaaS with varying focuses. Section III an-
swers the first research question by forming hypotheses about
influential factors on the cold start, which are partly tested
in the experiment in Section IV. Our results are presented
in Section V and discussed in Section VI. Future Work in
Section VII concludes this paper.
We present economically interesting benchmarks in the first
paragraph and performance-oriented in the second one.
Based on the use cases and requirements, avoidance of
idle instances is a game changer. ADZ IC and CHATL EY [7]
compared the cost of hosting one hour of operation on PaaS
offerings like AWS EC2 and Google AppEngine with the same
functionality on FaaS. They calculated a cost reduction of up
to 99.95% assuming that the functionality is invoked every five
minutes for 200ms. Another benchmark with similar research
questions was performed by VI LL AM IZ AR E T AL . [8]. They
compared three different system architectures: A monolithic
approach, a micoservice-based system, and a cloud function
architecture. Three scenarios were transformed from monoliths
to loosely coupled cloud functions for a more robust data basis.
The transition of the monolithic application into microservices
caused a first cost reduction. This reduction progressed by run-
ning the microservice as cloud functions on a FaaS platform
and resulted in an overall cost reduction of 70% and more.
1J Daly. 15 Key Takeaways from the Serverless Talk at AWS Startup Day. at-
aws-startup-day/. 2018. Last accessed 9/13/2018.
--- PREPRINT ---
LEE E T AL . [4] executed a performance-oriented benchmark
for distributed data processing emphasizing the scaling prop-
erty and resulting throughput of FaaS platforms. To utilize
the cloud functions, they used a compute intensive workload.
They monitored the number of running instances when varying
the workload to assess the time overhead. However they did
not express the overhead due to cold starts explicitly. All
major, commercial FaaS platforms were part of the study,
namely AWS Lambda2, Microsoft Azure Functions3, Google
Cloud Functions4, and IBM OpenWhisk5. MA LAWSKI E T
AL . [9] also conducted a cross platform study with the same
providers. They used recursive fibonacci as a compute bound
benchmark algorithm. Interaction via API gateways was their
chosen scenario, one of the main use cases for cloud functions.
This allowed a time measurement on client and provider side.
Therefore, they were able to compute the perceived overhead
for the user, which included network latency, platform routing
and scheduling overhead, when following their interpretation
of the results. The cold start was missing in their enumeration
of influencing factors and it is therefore investigated in this
To facilitate an unbiased evaluation, hypotheses about the
implications of different parameters and decisions were for-
mulated prior to the experiment. FaaS users and especially
providers make decisions, which may influence the cold start
behavior of the executed functions.
H1: Programming Language - FaaS platforms offer a large
variety of programming languages [3]. JavaScript (JS)
for example is supported by all major platforms since
it is a perfect fit for small, stateless cloud functions.
Also compiled languages like Java and C# come in
focus due to the engineering benefits if functions get
more complex. Because of the environment overhead,
our hypothesis is that compiled programming languages
impose a significantly higher cold start overhead than
interpreted languages like JS. For instance, the execution
of a cloud function written in Java needs a running Java
Virtual Machine (JVM) which must be set up prior to
function execution.
H2: Deployment Package Size - Our hypothesis is that the
cold start overhead increases with the deployment pack-
age size. We want to measure the time, which is needed to
copy the function image to the container, load the image
into memory, unpack, and execute it.
H3: Memory/CPU Setting - Our hypothesis is that the cold
start overhead decreases with increasing resources, be-
cause the container can be loaded and set up faster. We
assume that this behavior is observable for low memory
settings where the CPU is busy, but is negligible for
high settings since the CPU is underutilized in these
cases. This limitation does not weaken the hypothesis,
because the low memory settings starting at 128MB are
of particular interest. Memory and CPU are used in
combination since most of the mature platforms offer a
linear scaling of CPU power based on the memory setting.
H4: Number of Dependencies - Loading dependencies
takes time when spinning up a cloud function. Our
hypothesis is that the amount and size of dependencies
increases the cold start overhead since they must be
loaded prior to the first execution and can be reused in
subsequent ones. If we can confirm this hypothesis, a
best practice would be the division of required libraries
in sublibraries, where the needed subset of functionality
is extracted in a new artifact.
H5: Concurrency Level - FaaS gets attention especially due
to the scaling property of cloud functions. We hypothesize
that the concurrency level, i. e., the number of concurrent
requests and therefore started containers, neither influ-
ences the cold start overhead nor the execution time.
Functions are started independently of each other in
a separate container for every concurrent execution. If
1000 requests arrive simultaneously, we expect that 1000
containers are started by the middleware of the FaaS
H6: Prior Executions - Avoidance of idling is a tremendous
improvement of FaaS compared to PaaS. Achieving this
goal comes with the drawback, that unused containers are
removed from running machines. Subsequent calls to the
cloud function require a new container. We assume that
the cold start overhead is independent of prior executions.
This hypothesis is of particular interest for the first
execution of the cloud function after deployment.
H7: Container Shutdown - Providers may optimize their
infrastructure by using learning algorithms for identifying
cloud functions, which are used frequently. Due to cost
effects and user satisfaction, we hypothesize that the
duration after which a container shuts down is dependent
on the number of previous executions. According to the
FaaS paradigm, executions are independent of each other
and should not influence the lifespan of a container.
A. Hypotheses Selection
The experiment in this paper evaluates three out of the seven
hypotheses of Section III. Programming Language,Deploy-
ment Package Size and Memory/CPU Setting are investigated.
Reasons to choose these hypotheses are the ease of testing
and getting stable and reproducible results. Since it is the first
benchmark focusing only on cold starts, our aim is to make
a clear experimental setup and reduce other parameters and
external influences to a minimum. We omitted the hypotheses
with a concurrent notion due to the side effects which are
introduced by concurrency in general. The data base produced
by our sequential benchmark has a minimum set of external
influences. Without having this data base, an evaluation of the
hypotheses with a concurrent notion in our future work would
not be possible due to missing reference data. Therefore, our
benchmark is of special interest for real world applications,
which are only requested once or twice an hour and thus
benefit from the scaling to zero property.
B. Provider and Language Selection
We selected Java and JS as programming languages. An
important reason for this decision is, that Java is a compiled
and JS an interpreted language. This selection emphasizes
the differences in programming languages for the evaluation
of the programming language hypothesis. Furthermore, Java
and JS are widely used in enterprises and the open source
community6. Based on this language selection, we selected
AWS Lambda and Microsoft Azure Functions as platforms
since they were the only platforms supporting Java. This
selection hinders us to compare the memory setting hypothesis
between providers, because Azure Cloud Functions does an
automatic scaling of memory and CPU. However, we can still
provide first insights about this hypothesis.
C. Algorithm Selection
We chose the recursive version of fibonacci like SP IL L-
NE R ET A L. [10], which calculates the nth value of the
fibonacci sequence to test our hypotheses. Recursive fibonacci
is compute bound. As a consequence, the compute time is
determined by the processing power of the machine. Since it
is a recursive algorithm with two recursive calls, the tree grows
exponentially. The calculation is not memory bound, since one
stack is evaluated completely until the next evaluation starts,
e.g. fibonacci(n1) is evaluated before fibonacci(n2)
is called. This results in a time complexity of O(2n)and a
memory complexity of O(n).
With these characteristics, the algorithm is well suited for
our benchmark. Assuming that identical hardware is used
within a data center [4], predictability and low variance in
function execution time guarantee stable results. The low
memory usage ensures that we can benchmark the function
with any of the memory settings provided by FaaS platforms.
D. Generic Experiment Pipeline
Using a deployment pipeline, which automates all necessary
steps, makes it possible to reproduce results more easily. The
pipeline is included in our prototype SeMoDe7as an utility
mechanism for the mentioned fibonacci functions.
The first step is to implement a function that should
be assessed. Reference implementations8for some provider-
language combinations of the recursive fibonacci algorithm are
available in a separate folder of SeMoDe. To test some of the
hypotheses explained in Section III, we need an interceptor
step, where the source code is altered. In the case of the
deployment package size hypothesis, this step can be per-
formed automatically. In other cases such as the dependency
hypothesis, a manual interception must be performed.
6Based on the opened pull requests on GitHub, https://octoverse.github.
com/, last accessed on 10/10/2018.
We use the Command Line Interface (CLI) of the Serverless
Framework9to specify configuration settings and for the
deployment of our cloud functions.
SeMoDe offers different benchmarking options, where spe-
cial emphasis is put on the isolation of cold starts when
executing cloud functions. We set up the API gateways of
the FaaS platforms to enable local REST calls on our host
machine. This procedure gives us the opportunity to control
the execution by specifying the input of the respective cloud
function. The specific input prevents the platforms to cache
results. Finally, SeMoDe provides fetchers per provider, which
retrieve the data from the logging services in a controlled
E. Experimental Setup
While creating the experimental setup, we considered sev-
eral aspects that were independent of the hypotheses. The first
one is the way of invoking functions on the platform. Also
logging information for later analysis is a big issue for data
consistency. Finally, thinking about cold start influencing steps
during a function invocation and execution was of concern.
Due to the specific focus on cold starts, the aim of the
experimental setting is to force a cold start closely followed
by a warm start on the same container instance. A warm start
is defined as the reuse of a container in our setup. Given that
there is only a single cold start per container, having a pair
with a single cold and a single warm execution guarantees
a sound comparison, because mean calculation of several
warm executions is avoided. Such mean calculations could
have distorted our results, because one platform optimizes
the performance of cloud functions after a certain amount of
invocations, as we observed during our initial experiments.
Tests have shown that containers on most platforms were shut
down after at most 20 minutes of idling.
A FaaS platform is a black box. The precise execution
duration, which is used for billing on the platform, includes
the function execution and parts of the start up process. Other
parts of the initialization and start up of the container plus
other needed infrastructural components are not included. To
measure these aspects, we performed a REST-based interaction
with the FaaS platform, where the start and end time is also
logged on the client side.
Logging the time stamps locally enables us to compare the
local execution with the platform duration. After storing the
starting time stamp, a REST call is executed, which sends
the request over the network to the API gateway endpoint.
This endpoint creates a new event which triggers a container
creation or reuse. Finally, the cloud function is executed and
the middleware on the platform logs the start and end time
of the function execution as well as the precise duration,
which is the difference of both time stamps. The result of
the computation is transferred to the API gateway endpoint,
wrapped in a response, and sent to the caller via the network.
The client REST call exits and the local end time stamp is
Local REST
FaaS Platform time
1 min 29 min 1 min 29 min
Fig. 1: Sequential Execution of our Experiment
logged on the host machine of the FaaS user. The two local
time stamps enable an assessment of the perceived execution
duration for the user and as a consequence the difference
between cold and warm starts.
To force pairs of cold and warm executions, we used the
SeMoDe benchmarking mode sequential with changing inter-
vals. This mode triggers the provided function with a delay
between execution start times. Delays vary and are defined
in a provided array of delays d in a round robin fashion.
The start time of each execution is generalized in Fig. 2. The
platform response includes a container and platform identifier.
These identifiers enable an unambiguous matching between
the local REST data and the platform log data.
start(i, d) = (0if i= 0
start(i1, d) + d[imod len(d)] if i1
Fig. 2: Start Time of the ith Execution of the Local Benchmark
We set our array dto {1 minute, 29 minutes}. The start time
is the time of the local utility, which calls the API gateway. A
representation of the resulting execution sequence can be seen
in Fig. 1. Once again, it should be noted, that the invocation
of the cloud functions is sequential.
F. Experiment’s Dimension Selection
Finally, we select the memory and package sizes to test our
hypotheses. An initial package size is the size of a package
after the build phase. Initial Java packages have approximately
1.5 MB, JS ones are smaller than 1 KB. The following
package sizes, which differ from the initial ones are artificially
increased by adding a file to the zip or increasing the JS
file with a comment. Deployment packages were sized initial,
3.125 MB, 6.25 MB, 12.5 MB, 25 MB and 50 MB. On Azure,
three additional packages were created for 100 MB, 200 MB
and 400 MB. For AWS, 50 MB is the upper package size limit
for functions at the time of the experiment (June 2018).10
The memory setting was only configured on AWS since
Azure does not support this feature. Memory settings on
AWS were 128 MB, 256 MB, 512 MB, 1024 MB, 2048 MB
and the maximum setting of 3008 MB. The memory setting
linearly determines the compute power of the container. Every
combination of deployment package size, memory, language,
and provider resulted in a cloud function. Therefore, we
deployed 72 cloud function on AWS and 18 on Azure. The
lower number of Azure functions is explicable by the dynamic
allocation of memory and CPU to the functions.11
Our experimental setup is designed to exclude side effects.
Calculating the execution overhead (cold – warm) as logged
by the client isolates the perceived cold start overhead. The
average execution time of the function (recursive fibonacci
calculation), network latency, and routing within the FaaS
platform is assumed to be equal for cold and warm executions
and therefore irrelevant for the cold start overhead value.
The reduction results in an isolation of the additional time
consuming steps, which occur during a cold start and answers
the research question, how to benchmark cold start on FaaS
platforms consistently.
G. Provider Limitations
The function execution time is limited to five minutes for
AWS. However, the AWS API Gateway closes the connection
after 29 seconds.12 In this event, the execution may still
succeed but a timeout occurs locally and matching of local
data and platform executions is no longer possible. Therefore,
we had to make sure that the cloud function executions are
never longer than 29 seconds for any of the configurations.
We tested several nvalues as input for the recursive fibonacci
function and found that calculating fibonacci(40) results in a
duration below 29 seconds even for the slowest configuration
(128 MB).
H. Experiment Execution and Data Dimensions
The experiment was executed between 6/25/2018 and
7/1/2018. Each cloud function was invoked 550 times to get
275 pairs of cold and warm executions. If a cloud function
returned 500 as HTTP status code, which indicates a server
error, or another error occurred like an API gateway timeout,
we excluded the cold as well as the warm execution. Only
pairwise valid data is processed and included in the results.
To summarize our setting, the result data matrix consists
of seven dimensions: Provider, Programming Language, De-
ployment Package Size, Memory Setting, Specific Invocation
Time, Local Duration, and Platform Duration.
11 us/azure/azure-functions/functions-
A. Hypotheses Independent Results
Before we confirm or reject the selected hypotheses, we
gather general insights from the data in this part. Figures 3
and 4 and Table I are based on the same dimension selection.
The deployment package size is initial, all valid pairs of each
cloud function are used to compute the figures and mean
values. For AWS, if not noted otherwise, the cloud function
with 256MB memory is selected. Due to the different scaling
of memory, the absolute values of cold and warm executions
of AWS and Azure are not comparable. Nevertheless, the data
provides insights, how Azure deals with cold starts in general
and how AWS does it with a relatively low memory setting.
The following sections will discuss these aspects in more detail
when reflecting our selected hypotheses.
Figure 3 shows boxplots of the duration in milliseconds,
measured by the start and end time of REST calls on the client.
The bottom of the box is the 25th percentile and the top is the
75th percentile. The center line of the box is the median equal
to the 50th percentile. Upper whisker is the 75th percentile
plus the box length multiplied by 1.5. Corresponding to this
computation, the lower whisker is the 25th percentile minus the
box length multiplied by 1.5. Values that are not between the
two whiskers are outliers and depicted as dots. This procedure
was also chosen for the generation of Fig. 4. The values of
these boxplots were fetched from the logging services of the
respective platforms.
Values for the cold executions compared to corresponding
warm ones are higher on the client, when assessing the box
and the whiskers. These values from the client can visually
be related to the platform ones, because the box plots in the
two figures are based on exactly the same raw data and also
the Y axis dimension is the same for both figures and the
box plots. Due to this visual coherence, some outliers are no
longer included in the figures. The raw data, the box plots
as printed here and the box plots including the outliers are
online accessible13. Sometimes cold executions are faster than
warm ones. Especially the AWS values in Fig. 4, where a huge
duration intersection of cold and warm executions is present,
strengthen this statement. The warm and cold values for AWS
JS seem to be even equal.
To get more insights about absolute values, Table I presents
mean values in milliseconds measured on client and platform
side. The dimensions of the presented data are provider,
language and initial deployment package size. AWS cloud
functions were executed with 256MB memory. The mean
values of these cloud functions can be compared to the
medians in Figs. 3 and 4.
For AWS, we measured an execution overhead for the cold
start of 1,750 ms (cold – warm) on the client instead of 247 ms
on the platform for Java and 644 ms instead of -43 ms for
JS. We observed that some cold executions on the platform
are faster than the warm executions on the same container.
Cold Warm Cold–Warm
Client 5961 4211 1750
Java Platform 4329 4082 247
JS Client 14320 13676 644
AWS Platform 13496 13539 -43
Azure Client 26681 1809 24872
Java Platform 15261 1545 13716
JS Client 14369 4547 9822
Platform 5492 4270 1222
TABLE I: Mean Values in Milliseconds for Cold and Warm
Executions on Client and Platform Side
This is the reason why the value for JS is negative in this
case. For JS, 63 % of the cold executions were faster than
the corresponding warm ones. 16% of AWS cloud functions
written in Java executed faster on the cold start compared
to the warm execution on the same container. These start
and end times were logged on the platform. On the client
side, a cold execution is never faster than its corresponding
warm execution, neither for JS nor for Java. Our conclusion
for AWS is, that typical tasks during container start up etc.
are not included in the logged value on the platform. Our
results strengthen this assumption. Java needs a more resource
intensive environment with an initialization of a JVM during
the cold start, whereas JS uses only an interpreter to execute
the code. We assume, that underusage when executing the cold
request, collocation of various cloud functions on the same
host, and other reasons influence the performance as well.
For Azure, a larger execution overhead can also be identified
using the logged execution times on the client. This indicates
that the platforms do not log the complete execution overhead
of the cold start in the execution time of the function.
In order to evaluate our hypotheses, we need to know the
total execution overhead of cold starts. This is the reason why
for all further analyses we only consider the execution times
logged on the client.
B. Hypotheses Dependent Results
1) Assessment Methodology: To assess the hypotheses de-
pendent results, we use mean values as in the previous section,
but more often a correlation metric to make a clear statement,
to which degree a hypothesis is significant based on our data.
We used Spearman’s correlation coefficient since the nor-
mal distribution test showed that the data is not distributed
normally, which renders Pearson’s correlation coefficient not
applicable. The range of the correlation coefficient ρis from
negatively correlated (-1) to positively correlated (+1). There
exist different interpretations considering the significance of
correlation. We stick to a widely used interpretation [11],
where 0 indicates no correlation, absolute value of 0.2 weak,
0.5 moderate, 0.8 strong and 1.0 perfect correlation.
Additionally, we constructed a linear regression model to
calculate the slope of the line plus the intersection point of the
Cold Warm
0 5000 15000 25000
aws−java 256 MB
Cold Warm
0 5000 15000 25000
aws−js 256 MB
Cold Warm
0 5000 15000 25000
Cold Warm
0 5000 15000 25000
Fig. 3: Execution Times of Cold and Warm Invocations on Client Side
Cold Warm
0 5000 15000 25000
aws−java 256 MB
Cold Warm
0 5000 15000 25000
aws−js 256 MB
Cold Warm
0 5000 15000 25000
Cold Warm
0 5000 15000 25000
Fig. 4: Execution Times of Cold and Warm Invocations on Provider Side
Y axis. This enables us to state an equation to compute other
configurations than the investigated ones. Especially for the
deployment hypothesis, this approach can forecast arbitrary
package sizes. Resulting linear models and the correlation
coefficient ρare presented in Tables III and V.
The slope of the line is no indicator for correlation, but
states, how strongly the values of Y are influenced by a
increasing or decreasing X value.
2) H1: Hypothesis Programming Language: Execution
times for AWS Java functions with different memory settings
are shown in Table II. Cold start times for Java cloud func-
tions are between two and three times higher than those of
respective JS functions. Azure’s data from the previous section
supports our hypothesis. Average cold execution overhead
was 24,872 ms for Java, while the JS function only caused
9,822 ms, which results in a ratio of 2.53. Based on these
ratios, we confirm the hypothesis, as we noticed that the cold
start time was significantly larger for each of the Java functions
compared to JS.
Memory in MB 128 256 512 1024 2048 3008
Java 1980 1750 1292 1113 1257 861
JS 587 644 614 368 589 371
Ratio 3.38 2.72 2.10 3.03 2.14 2.32
TABLE II: Differences of Cold and Warm Executions on
Client Side (AWS Lambda)
3) H2: Hypothesis Deployment Package Size: For AWS and
Azure JS, we can confirm our hypotheses since ρis positive.
For AWS, the correlation is weak, but present. Azure JS has
the highest value for the slope (43 ms
MB ) and also the most
significant correlation.
For Azure Java the correlation coefficient is negative and
therefore we consider this combination in more detail. Ta-
ble IV states the cold start overhead for different deployment
package sizes on the client side. The mean values in this table
ρ LinearM odel
Java 0.29 1510ms + 9 ms
AWS JS 0.37 613ms + 12 ms
Azure Java -0.15 25580ms 7ms
JS 0.46 8571ms + 43 ms
TABLE III: Spearman’s Correlation Coefficient ρand Linear
Regression Model for Hypothesis Deployment Package Size
DPS 0 3.125 6.25 12.5 25 50 100 200 400
CSO 25 27 22 27 24 36 26 26 23
TABLE IV: Deployment Package Size (DPS) in MB and
Cold Start Overhead (CSO) in Seconds for Azure Java Cloud
are mainly between 22 and 27 seconds with an outlier at 50MB
deployment package size. Comparing the initial size and the
highest configuration with 400MB there are only 2 seconds
difference, which is less than 10 % w.r.t. the absolute value.
This and the fact, that there is no clear tendency, indicates
that the hypothesis for this combination is not significant. This
assessment is also based on the low absolute value of ρand the
small value for the slope. Therefore, we reject the hypothesis
for Azure cloud functions written in Java.
4) H3: Hypothesis Memory Setting: The hypothesis Mem-
ory Setting states that the cold start overhead decreases with
the size of memory. Only AWS cloud functions are tested
since Azure has no memory setting. Exactly as testing the
prior hypothesis, we calculated the Spearman’s correlation
coefficient ρas well as the linear regression model.
ρ LinearM odel
Java -0.59 1634ms 0.25 ms
JS -0.20 606ms 0.07 ms
TABLE V: Spearman’s Correlation Coefficient ρand Linear
Regression Model for Hypothesis Memory Setting
Our hypothesis holds true since the values for the correlation
coefficient ρin Table V are both negative. For Java, we observe
a higher correlation and slope. We assume that this is caused
by a costlier middleware layer. As Java is a compiled language,
the JVM needs to be setup to execute the code. The available
CPU and memory might have a positive influence on how fast
this setup is done. JS is an interpreted language and therefore
the execution environment is not as complex as the one for
Java, but more acquired resources also have a positive effect
on the cold start time.
A. Discussion of Results
Our methodology to assess the cold start from a user point
of view is inevitable, because platforms report only a fraction
of cold start overhead in their function duration. Additionally,
they seem to report different fractions of the provisioning and
initialization. Especially for functions written in JS on AWS
our results were surprising. We measured that cold starts on
the platform were faster than the consecutive warm ones in
some cases. This leads to the conclusion that AWS only bills
the users for their function executions without the time to setup
servers, virtual machines and containers.
The gap between compiled and interpreted languages with
a ratio between 2 and 3 was higher than expected. Our expla-
nation is that complex execution environments, like the JVM
in case of the compiled language Java, overcharge the already
busy CPU. This effect is smaller for higher memory settings,
but present. Especially the performance gain for compiled
languages is worth mentioning. Cold start overhead of Java
functions correlates with ρ=0.59. Only the deployment
package size hypothesis shows a mixed picture, where the
correlation is lower and varies between positive and negative
values within the same platform.
Our motivation to take the cold starts of cloud functions
into consideration is the currently prevailing strategy of using
pings to pre-warm cloud function instances. The experimental
setup of our benchmark is a REST based interaction via an
API gateway. As noted in the introduction, this ping hack is
opposed to the FaaS principle of scaling to zero. The mean
cold start overhead we measured for different platforms, lan-
guages and without artificially increased deployment package
sizes ranges from 370 ms for JS on AWS with a memory
setting of 3008 MB to 24 s for Java on Azure. The 50MB
configuration for Azure Cloud Functions written in Java had
even more overhead as already shown in Section V-B3 due
to the experimental status of some languages. Therefore, the
ping hack may not always be necessary. Additionally, scaling
also leads to cold starts and the ping hack therefore does
not solve the problem at all. The ping hack only ensures
that a fixed amount of containers is available. Our results,
especially the comparison of cold and warm executions on
client side, demonstrate that in some use cases there is no need
for this kind of hack. Especially in situations, where response
times of a few hundreds of milliseconds (AWS-JS-3008MB:
370ms overhead) are reasonable. Because of this wide range
of cold start overheads, it is important to assess the impact on
specific applications. For applications requiring a fast response
or involving user interactions, even small cold start overheads
might impose a problem.
Further investigations are needed in this area, because cold
start is one of the main issues FaaS has to assess and solve.
B. Threats to Validity
Based on the characteristics HUPP LE R [6] mentioned in his
benchmarking publication, we tried to make our benchmark as
robust, self-explanatory, and repeatable as possible. But there
are some factors that could threaten the validity of our data:
Platform Limitations - There is only limited information
available on how containers are initialized and cloud
functions are executed. With the documentation informa-
tion only, the high variety of different execution times of
a cloud function is not fully explicable. Also, additional
services like the API gateway on AWS can influence the
results by returning errors, which are not cloud function
Available Metrics - The function execution time that is
logged and used for billing on the platforms provides
only limited information. In AWS, the start up duration
of a container is not included in the logged execution
time. This initialization of a container is crucial for the
perceived cold starts.
Sample Size - We only tested our hypotheses with 275 cold-
warm pairs per function.
Temporal Relevance - Due to the very young and evolving
FaaS paradigm, the updates and changes in the platforms
limit our results to a certain time frame.
We plan to do the same benchmark setting again for
the tested hypotheses and want to integrate additional FaaS
platforms, namely IBM OpenWhisk, Google Cloud Functions,
OpenFaaS14 and FnProject15. The next benchmark will be
executed for a longer time period to assess daily differences
in the execution time and cold start behavior. Testing further
hypotheses, especially the number of dependencies, which is
important during the implementation of cloud functions, is
scheduled for the next experiment.
Also the experimental design of some hypotheses, i.e. the
deployment package size hypothesis, needs a redesign espe-
cially from the programming language point of view to verify,
if this hypothesis is not significant for some combinations.
This conceptual redesign should avoid ambiguous results and
be part of the next experiment.
This follow up benchmark serves as a data basis for a
concurrency benchmark, which will be executed at the same
time to get comparable executions. The concurrency tests are
quite important since one of the main use cases is the usage
of cloud functions as a reactive component to decouple peak
loads in a web application scenario. Especially peak loads
trigger a huge amount of cold starts on the platform.
To get more insights about some hypotheses, e.g. the
programming language hypothesis, we want to conduct a
study, where our functions are executed locally. Development-
production parity is a key issue when comparing the local
values with the client perceived REST duration and also
with the platforms’ start and end times. A comparison to
benchmarks executed locally will facilitate us to foster our
hypotheses in future work.
To understand the different FaaS use cases, further cloud
function triggers need to be investigated in respect to their cold
start impact. Especially the integration triggers of databases
are widely used, where a cloud function is triggered for every
inserted entry in a database.
[1] E. van Eyk et al., “The SPEC Cloud Group’s Research Vision on FaaS
and Serverless Architectures,” in In Proc. WoSC, 2017.
[2] I. Baldini et al., “Serverless Computing: Current Trends and Open
Problems,” 2017.
[3] T. Lynn et al., “A Preliminary Review of Enterprise Serverless Cloud
Computing (Function-as-a-Service) Platforms,” in In Proc. CloudCom,
[4] H. Lee et al., “Evaluation of Production Serverless Computing Environ-
ments,” in In Proc. WoSC, 2018.
[5] E. van Eyk and A. Iosup, “Addressing Performance Challenges in
Serverless Computing,” in In Proc. ICT.OPEN, 2018.
[6] K. Huppler, “The Art of Building a Good Benchmark,” in Performance
Evaluation and Benchmarking, 2009.
[7] G. Adzic and R. Chatley, “Serverless Computing: Economic and Archi-
tectural Impact,” in In Proc. ESEC/FSE, 2017.
[8] M. Villamizar et al., “Infrastructure Cost Comparison of Running Web
Applications in the Cloud Using AWS Lambda and Monolithic and
Microservice Architectures,” in In Proc. CCGrid, 2016.
[9] M. Malawski et al., “Benchmarking Heterogeneous Cloud Functions,”
in In Proc. Euro-Par, 2018.
[10] J. Spillner et al., “FaaSter, Better, Cheaper: The Prospect of Serverless
Scientific Computing and HPC,” in Communications in Computer and
Information Science, 2017.
[11] K. H. Zou et al., “Correlation and Simple Linear Regression,” Radiology,
vol. 227, no. 3, pp. 617–628, 2003.
... A number of existing works have studied the auto-scaling techniques employed by both the commercial and open source serverless computing platforms, and how they affect application performance [2], [3]. [4] compares AWS Lambda, Google Cloud Functions and Microsoft Azure in terms of their function cold start delay. ...
... The performance of our trained multi-agent models is evaluated and discussed mainly in terms of our target optimization objectives of serverless application performance and resource cost efficiency. We extract 1800 function traces in total from Azure function traces using the procedure described in section V(B) (2) in creating the evaluation data set. Our model evaluation is conducted under three request traffic levels as 5-20, 20-40 and 40-60 requests per second, for both the 3 and 5 parallel agent scenarios. ...
Serverless computing has gained a strong traction in the cloud computing community in recent years. Among the many benefits of this novel computing model, the rapid auto-scaling capability of user applications takes prominence. However, the offer of adhoc scaling of user deployments at function level introduces many complications to serverless systems. The added delay and failures in function request executions caused by the time consumed for dynamically creating new resources to suit function workloads, known as the cold-start delay, is one such very prevalent shortcoming. Maintaining idle resource pools to alleviate this issue often results in wasted resources from the cloud provider perspective. Existing solutions to address this limitation mostly focus on predicting and understanding function load levels in order to proactively create required resources. Although these solutions improve function performance, the lack of understanding on the overall system characteristics in making these scaling decisions often leads to the sub-optimal usage of system resources. Further, the multi-tenant nature of serverless systems requires a scalable solution adaptable for multiple co-existing applications, a limitation seen in most current solutions. In this paper, we introduce a novel multi-agent Deep Reinforcement Learning based intelligent solution for both horizontal and vertical scaling of function resources, based on a comprehensive understanding on both function and system requirements. Our solution elevates function performance reducing cold starts, while also offering the flexibility for optimizing resource maintenance cost to the service providers. Experiments conducted considering varying workload scenarios show improvements of up to 23% and 34% in terms of application latency and request failures, while also saving up to 45% in infrastructure cost for the service providers.
... Hence, instantiating a function's container introduces a non-negligible time latency, known as cold start, and gives rise to a challenge for serverless platforms [5], [6], [7], [8]. Some application-specific factors such as programming language, runtime environment and code deployment size as well as function requirements like CPU and memory, affect the cold start of a function [8], [9], [10], [11]. ...
... Therefore a paused container pool manager is proposed to pre-create a network for function containers and attach the new function containers to configured IP and network when required. Some studies [8], [10], [19] have identified significant factors that affect the cold start of a function. These include runtime environment, CPU and memory requirements, code dependency setting, workload concurrency, and container networking requirements. ...
Full-text available
Function-as-a-Service is a cloud computing paradigm offering an event-driven execution model to applications. It features serverless attributes by eliminating resource management responsibilities from developers and offers transparent and on-demand scalability of applications. Typical serverless applications have stringent response time and scalability requirements and therefore rely on deployed services to provide quick and fault-tolerant feedback to clients. However, the FaaS paradigm suffers from cold starts as there is a non-negligible delay associated with on-demand function initialization. This work focuses on reducing the frequency of cold starts on the platform by using Reinforcement Learning. Our approach uses Q-learning and considers metrics such as function CPU utilization, existing function instances, and response failure rate to proactively initialize functions in advance based on the expected demand. The proposed solution was implemented on Kubeless and was evaluated using a normalised real-world function demand trace with matrix multiplication as the workload. The results demonstrate a favourable performance of the RL-based agent when compared to Kubeless' default policy and function keep-alive policy by improving throughput by up to 8.81% and reducing computation load and resource wastage by up to 55% and 37%, respectively, which is a direct outcome of reduced cold starts.
... Edge computing location[3] caching and reusing previously employed containers tie up memory and can lead to 137 information leakage. Hence, cold start issues and possible solutions are receiving an 138 increasing interest from the academic community[11,12].139 The aim of this tutorial is to provide an exhaustive overview of the open-source 140 serverless technologies state-of-art for edge computing. ...
Serverless computing is a recent deployment model for cloud, edge and fog computing platforms, which ultimate goal is to provide cost reduction and scalability enhancement with no additional deployment overhead. The main implementation of this model is Functions-as-a-Service (FaaS): Developers deploy modular functions, which are typically event-driven, on the platform without the need to manage the underlying infrastructure. Moreover, using the so called warm start mode, the FaaS containers hosting the application are kept up and running after initialization, granting the user the impression of high availability. Conversely, in a cold start mode scenario, those containers are deleted when no application requests are received within a certain time window, to save resources. This focus on resources efficiency and flexibility could make the serverless approach significantly convenient for edge computing based applications, in which the hosting nodes consist of devices and machines with limited resources, geographically distributed in proximity to the users. In this paper, we explore the available solutions to deploy a serverless application in an edge computing scenario, with a focus on open-source tools and IoT data.KeywordsServerless computingFaaSEdgeIoTOpen-source
... For example, it may refer to techniques that reduce resource usage in machine learning and optimization [2]. Contrastingly, in classical serverless computing, warm-starting typically relates to methods for preparing execution environments, such as reusing running containers [38]. This work considers warm-starting a general strategy for partially computing or preparing a quantum algorithm's output on auxiliary devices. ...
Full-text available
Quantum processing units (QPUs) are currently exclusively available from cloud vendors. However, with recent advancements, hosting QPUs is soon possible everywhere. Existing work has yet to draw from research in edge computing to explore systems exploiting mobile QPUs, or how hybrid applications can benefit from distributed heterogeneous resources. Hence, this work presents an architecture for Quantum Computing in the edge-cloud continuum. We discuss the necessity, challenges, and solution approaches for extending existing work on classical edge computing to integrate QPUs. We describe how warm-starting allows defining workflows that exploit the hierarchical resources spread across the continuum. Then, we introduce a distributed inference engine with hybrid classical-quantum neural networks (QNNs) to aid system designers in accommodating applications with complex requirements that incur the highest degree of heterogeneity. We propose solutions focusing on classical layer partitioning and quantum circuit cutting to demonstrate the potential of utilizing classical and quantum computation across the continuum. To evaluate the importance and feasibility of our vision, we provide a proof of concept that exemplifies how extending a classical partition method to integrate quantum circuits can improve the solution quality. Specifically, we implement a split neural network with optional hybrid QNN predictors. Our results show that extending classical methods with QNNs is viable and promising for future work.
Full-text available
Cloud Functions, often called Function-as-a-Service (FaaS), pioneered by AWS Lambda, are an increasingly popular method of running distributed applications. As in other cloud offerings, cloud functions are heterogeneous, due to different underlying hardware, runtime systems, as well as resource management and billing models. In this paper, we focus on performance evaluation of cloud functions, taking into account heterogeneity aspects. We developed a cloud function benchmarking framework, consisting of one suite based on Serverless Framework, and one based on HyperFlow. We deployed the CPU-intensive benchmarks: Mersenne Twister and Linpack, and evaluated all the major cloud function providers: AWS Lambda, Azure Functions, Google Cloud Functions and IBM OpenWhisk. We make our results available online and continuously updated. We report on the initial results of the performance evaluation and we discuss the discovered insights on the resource allocation policies.
Conference Paper
Full-text available
In line with cloud computing emergence as the dominant enterprise computing paradigm, our conceptualization of the cloud computing reference architecture and service construction has also evolved. For example, to address the need for cost reduction and rapid provisioning, virtualization has moved beyond hardware to containers. More recently, serverless computing or Function-as-a-Service has been presented as a means to introduce further cost-efficiencies, reduce configuration and management overheads, and rapidly increase an application's ability to speed up, scale up and scale down in the cloud. The potential of this new computation model is reflected in the introduction of serverless computing platforms by the main hyperscale cloud service providers. This paper provides an overview and multi-level feature analysis of seven enterprise serverless computing platforms. It reviews extant research on these platforms and identifies the emergence of AWS Lambda as a de facto base platform for research on enterprise serverless cloud computing. The paper concludes with a summary of avenues for further research.
Conference Paper
Full-text available
Cloud computing enables an entire ecosystem of developing, composing, and providing IT services. An emerging class of cloud-based software architectures, serverless, focuses on providing software architects the ability to execute arbitrary functions with small overhead in server management, as Function-as-a-service (FaaS). However useful, serverless and FaaS suffer from a community problem that faces every emerging technology, which has indeed also hampered cloud computing a decade ago: lack of clear terminology, and scattered vision about the field. In this work, we address this community problem. We clarify the term serverless, by reducing it to cloud functions as programming units, and a model of executing simple and complex (e.g., workflows of) functions with operations managed primarily by the cloud provider. We propose a research vision, where 4 key directions (perspectives) present 17 technical opportunities and challenges.
Conference Paper
Full-text available
Cloud Functions, often called Function-as-a-Service (FaaS), pioneered by AWS Lambda, are an increasingly popular method of running distributed applications. As in other cloud offerings, cloud functions are heterogeneous, due to different underlying hardware, runtime systems, as well as resource management and billing models. In this paper, we focus on performance evaluation of cloud functions, taking into account heterogeneity aspects. We developed a cloud function benchmarking framework, consisting of one suite based on Serverless Framework, and one based on HyperFlow. We deployed the CPU-intensive benchmarks: Mersenne Twister and Linpack, and evaluated all the major cloud function providers: AWS Lambda, Azure Functions, Google Cloud Functions and IBM OpenWhisk. We make our results available online and continuously updated. We report on the initial results of the performance evaluation and we discuss the discovered insights on the resource allocation policies.
Full-text available
Serverless computing has emerged as a new compelling paradigm for the deployment of applications and services. It represents an evolution of cloud programming models, abstractions, and platforms, and is a testament to the maturity and wide adoption of cloud technologies. In this chapter, we survey existing serverless platforms from industry, academia, and open source projects, identify key characteristics and use cases, and describe technical challenges and open problems.
Conference Paper
Full-text available
Large Internet companies like Amazon, Netflix, and LinkedIn are using the microservice architecture pattern to deploy large applications in the cloud as a set of small services that can be developed, tested, deployed, scaled, operated and upgraded independently. However, aside from gaining agility, independent development, and scalability, infrastructure costs are a major concern for companies adopting this pattern. This paper presents a cost comparison of a web application developed and deployed using the same scalable scenarios with three different approaches: 1) a monolithic architecture, 2) a microservice architecture operated by the cloud customer, and 3) a microservice architecture operated by the cloud provider. Test results show that microservices can help reduce infrastructure costs in comparison to standard monolithic architectures. Moreover, the use of services specifically designed to deploy and scale microservices reduces infrastructure costs by 70% or more. Lastly, we also describe the challenges we faced while implementing and deploying microservice applications.
Conference Paper
Amazon Web Services unveiled their ‘Lambda’ platform in late 2014. Since then, each of the major cloud computing infrastructure providers has released services supporting a similar style of deployment and operation, where rather than deploying and running monolithic services, or dedicated virtual machines, users are able to deploy individual functions, and pay only for the time that their code is actually executing. These technologies are gathered together under the marketing term ‘serverless’ and the providers suggest that they have the potential to significantly change how client/server applications are designed, developed and operated. This paper presents two case industrial studies of early adopters, showing how migrating an application to the Lambda deployment architecture reduced hosting costs – by between 66% and 95% – and discusses how further adoption of this trend might influence common software architecture design practices.
Conference Paper
What makes a good benchmark? This is a question that has been asked often, answered often, altered often. In the past 25 years, the information processing industry has seen the creation of dozens of “industry standard” performance benchmarks – some highly successful, some less so. This paper will explore the overall requirements of a good benchmark, using existing industry standards as examples along the way.