Conference PaperPDF Available

Cold Start Influencing Factors in Function as a Service

Authors:

Abstract and Figures

Function as a Service (FaaS) is a young and rapidly evolving cloud paradigm. Due to its hardware abstraction, inherent virtualization problems come into play and need an assessment from the FaaS point of view. Especially avoidance of idling and scaling on demand cause a lot of container starts and as a consequence a lot of cold starts for FaaS users. The aim of this paper is to address the cold start problem in a benchmark and investigate influential factors on the duration of the perceived cold start. We conducted a benchmark on AWS Lambda and Microsoft Azure Functions with 49500 cloud function executions. Formulated as hypotheses, the influence of the chosen programming language, platform, memory size for the cloud function, and size of the deployed artifact are the dimensions of our benchmark. Cold starts on the platform as well as the cold starts for users were measured and compared to each other. Our results show that there is an enormous difference for the overhead the user perceives compared to the billed duration. In our benchmark, the average cold start overheads on the user's side ranged from 300ms to 24s for the chosen configurations.
Content may be subject to copyright.
Cold Start Influencing Factors
in Function as a Service
Johannes Manner, Martin Endreß, Tobias Heckel and Guido Wirtz
Distributed Systems Group
University of Bamberg
Bamberg, Germany
{johannes.manner, guido.wirtz}@uni-bamberg.de
{martin.endress, tobias-christian-juergen-lukas.heckel}@stud.uni-bamberg.de
Abstract—Function as a Service (FaaS) is a young and rapidly
evolving cloud paradigm. Due to its hardware abstraction,
inherent virtualization problems come into play and need an
assessment from the FaaS point of view. Especially avoidance of
idling and scaling on demand cause a lot of container starts and
as a consequence a lot of cold starts for FaaS users. The aim of
this paper is to address the cold start problem in a benchmark
and investigate influential factors on the duration of the perceived
cold start.
We conducted a benchmark on AWS Lambda and Microsoft
Azure Functions with 49500 cloud function executions. Formu-
lated as hypotheses, the influence of the chosen programming
language, platform, memory size for the cloud function, and size
of the deployed artifact are the dimensions of our benchmark.
Cold starts on the platform as well as the cold starts for users
were measured and compared to each other. Our results show
that there is an enormous difference for the overhead the user
perceives compared to the billed duration. In our benchmark,
the average cold start overheads on the user’s side ranged from
300ms to 24s for the chosen configurations.
Index Terms—Serverless Computing, Function as a Service,
FaaS, Cloud Functions, Cold Start, Benchmarking
I. INT ROD UC TI ON
Function as a Service (FaaS) and Serverless Computing
are often used in conjunction, but there is a difference [1].
Serverless in general is a more abstract phrase, whereas FaaS
focuses on event driven cloud functions. Therefore, we use the
term FaaS to stick to a precise phrasing of the investigated
cloud service model.
Cold starts are an inherent problem of virtualization tech-
niques. Cloud functions are executed in containers. The first
execution of a cloud function experiences a cold start since
the container must be started prior to the execution. Due to
performance reasons, FaaS providers do not shut down the
containers immediately. Subsequent executions use spawned
but existing containers to profit from the warm execution
environments. Avoidance of idling and scaling on demand are
game changers compared to other cloud service models, but
entail more cold starts.
So far, cold starts are perceived as a system-level chal-
lenge [2], [3] and were not directly investigated to our knowl-
edge. Efforts are made to circumvent the cold start problem
by pinging the cloud function on a regular basis [4], [5]. This
ping hack1sins against the scale to zero principle of FaaS.
Therefore, we are motivated to research factors that influ-
ence the cold start and pose the following research questions:
RQ1: Which factors written down as hypotheses influence
the cold start of cloud functions?
RQ2: How to benchmark cold starts of cloud functions
consistently to get repeatable experiments and results?
Based on these questions, we formulated influencing factors
as hypotheses and provided a pipeline to execute repeatable
benchmarks. Our benchmark is designed to be relevant, re-
peatable, fair, verifiable and economical [6].
The agenda is as follows: In Section II, we present other
benchmarks on FaaS with varying focuses. Section III an-
swers the first research question by forming hypotheses about
influential factors on the cold start, which are partly tested
in the experiment in Section IV. Our results are presented
in Section V and discussed in Section VI. Future Work in
Section VII concludes this paper.
II. RE LATE D WOR K
We present economically interesting benchmarks in the first
paragraph and performance-oriented in the second one.
Based on the use cases and requirements, avoidance of
idle instances is a game changer. ADZ IC and CHATL EY [7]
compared the cost of hosting one hour of operation on PaaS
offerings like AWS EC2 and Google AppEngine with the same
functionality on FaaS. They calculated a cost reduction of up
to 99.95% assuming that the functionality is invoked every five
minutes for 200ms. Another benchmark with similar research
questions was performed by VI LL AM IZ AR E T AL . [8]. They
compared three different system architectures: A monolithic
approach, a micoservice-based system, and a cloud function
architecture. Three scenarios were transformed from monoliths
to loosely coupled cloud functions for a more robust data basis.
The transition of the monolithic application into microservices
caused a first cost reduction. This reduction progressed by run-
ning the microservice as cloud functions on a FaaS platform
and resulted in an overall cost reduction of 70% and more.
1J Daly. 15 Key Takeaways from the Serverless Talk at AWS Startup Day.
https://www.jeremydaly.com/15-key-takeaways-from-the-serverless-talk- at-
aws-startup-day/. 2018. Last accessed 9/13/2018.
--- PREPRINT ---
LEE E T AL . [4] executed a performance-oriented benchmark
for distributed data processing emphasizing the scaling prop-
erty and resulting throughput of FaaS platforms. To utilize
the cloud functions, they used a compute intensive workload.
They monitored the number of running instances when varying
the workload to assess the time overhead. However they did
not express the overhead due to cold starts explicitly. All
major, commercial FaaS platforms were part of the study,
namely AWS Lambda2, Microsoft Azure Functions3, Google
Cloud Functions4, and IBM OpenWhisk5. MA LAWSKI E T
AL . [9] also conducted a cross platform study with the same
providers. They used recursive fibonacci as a compute bound
benchmark algorithm. Interaction via API gateways was their
chosen scenario, one of the main use cases for cloud functions.
This allowed a time measurement on client and provider side.
Therefore, they were able to compute the perceived overhead
for the user, which included network latency, platform routing
and scheduling overhead, when following their interpretation
of the results. The cold start was missing in their enumeration
of influencing factors and it is therefore investigated in this
work.
III. HYP OTH ES ES
To facilitate an unbiased evaluation, hypotheses about the
implications of different parameters and decisions were for-
mulated prior to the experiment. FaaS users and especially
providers make decisions, which may influence the cold start
behavior of the executed functions.
H1: Programming Language - FaaS platforms offer a large
variety of programming languages [3]. JavaScript (JS)
for example is supported by all major platforms since
it is a perfect fit for small, stateless cloud functions.
Also compiled languages like Java and C# come in
focus due to the engineering benefits if functions get
more complex. Because of the environment overhead,
our hypothesis is that compiled programming languages
impose a significantly higher cold start overhead than
interpreted languages like JS. For instance, the execution
of a cloud function written in Java needs a running Java
Virtual Machine (JVM) which must be set up prior to
function execution.
H2: Deployment Package Size - Our hypothesis is that the
cold start overhead increases with the deployment pack-
age size. We want to measure the time, which is needed to
copy the function image to the container, load the image
into memory, unpack, and execute it.
H3: Memory/CPU Setting - Our hypothesis is that the cold
start overhead decreases with increasing resources, be-
cause the container can be loaded and set up faster. We
assume that this behavior is observable for low memory
settings where the CPU is busy, but is negligible for
high settings since the CPU is underutilized in these
2https://aws.amazon.com/lambda
3https://azure.microsoft.com/en-us/services/functions/
4https://cloud.google.com/functions/
5https://www.ibm.com/cloud/functions
cases. This limitation does not weaken the hypothesis,
because the low memory settings starting at 128MB are
of particular interest. Memory and CPU are used in
combination since most of the mature platforms offer a
linear scaling of CPU power based on the memory setting.
H4: Number of Dependencies - Loading dependencies
takes time when spinning up a cloud function. Our
hypothesis is that the amount and size of dependencies
increases the cold start overhead since they must be
loaded prior to the first execution and can be reused in
subsequent ones. If we can confirm this hypothesis, a
best practice would be the division of required libraries
in sublibraries, where the needed subset of functionality
is extracted in a new artifact.
H5: Concurrency Level - FaaS gets attention especially due
to the scaling property of cloud functions. We hypothesize
that the concurrency level, i. e., the number of concurrent
requests and therefore started containers, neither influ-
ences the cold start overhead nor the execution time.
Functions are started independently of each other in
a separate container for every concurrent execution. If
1000 requests arrive simultaneously, we expect that 1000
containers are started by the middleware of the FaaS
platform.
H6: Prior Executions - Avoidance of idling is a tremendous
improvement of FaaS compared to PaaS. Achieving this
goal comes with the drawback, that unused containers are
removed from running machines. Subsequent calls to the
cloud function require a new container. We assume that
the cold start overhead is independent of prior executions.
This hypothesis is of particular interest for the first
execution of the cloud function after deployment.
H7: Container Shutdown - Providers may optimize their
infrastructure by using learning algorithms for identifying
cloud functions, which are used frequently. Due to cost
effects and user satisfaction, we hypothesize that the
duration after which a container shuts down is dependent
on the number of previous executions. According to the
FaaS paradigm, executions are independent of each other
and should not influence the lifespan of a container.
IV. EXP ER IM EN T
A. Hypotheses Selection
The experiment in this paper evaluates three out of the seven
hypotheses of Section III. Programming Language,Deploy-
ment Package Size and Memory/CPU Setting are investigated.
Reasons to choose these hypotheses are the ease of testing
and getting stable and reproducible results. Since it is the first
benchmark focusing only on cold starts, our aim is to make
a clear experimental setup and reduce other parameters and
external influences to a minimum. We omitted the hypotheses
with a concurrent notion due to the side effects which are
introduced by concurrency in general. The data base produced
by our sequential benchmark has a minimum set of external
influences. Without having this data base, an evaluation of the
hypotheses with a concurrent notion in our future work would
not be possible due to missing reference data. Therefore, our
benchmark is of special interest for real world applications,
which are only requested once or twice an hour and thus
benefit from the scaling to zero property.
B. Provider and Language Selection
We selected Java and JS as programming languages. An
important reason for this decision is, that Java is a compiled
and JS an interpreted language. This selection emphasizes
the differences in programming languages for the evaluation
of the programming language hypothesis. Furthermore, Java
and JS are widely used in enterprises and the open source
community6. Based on this language selection, we selected
AWS Lambda and Microsoft Azure Functions as platforms
since they were the only platforms supporting Java. This
selection hinders us to compare the memory setting hypothesis
between providers, because Azure Cloud Functions does an
automatic scaling of memory and CPU. However, we can still
provide first insights about this hypothesis.
C. Algorithm Selection
We chose the recursive version of fibonacci like SP IL L-
NE R ET A L. [10], which calculates the nth value of the
fibonacci sequence to test our hypotheses. Recursive fibonacci
is compute bound. As a consequence, the compute time is
determined by the processing power of the machine. Since it
is a recursive algorithm with two recursive calls, the tree grows
exponentially. The calculation is not memory bound, since one
stack is evaluated completely until the next evaluation starts,
e.g. fibonacci(n1) is evaluated before fibonacci(n2)
is called. This results in a time complexity of O(2n)and a
memory complexity of O(n).
With these characteristics, the algorithm is well suited for
our benchmark. Assuming that identical hardware is used
within a data center [4], predictability and low variance in
function execution time guarantee stable results. The low
memory usage ensures that we can benchmark the function
with any of the memory settings provided by FaaS platforms.
D. Generic Experiment Pipeline
Using a deployment pipeline, which automates all necessary
steps, makes it possible to reproduce results more easily. The
pipeline is included in our prototype SeMoDe7as an utility
mechanism for the mentioned fibonacci functions.
The first step is to implement a function that should
be assessed. Reference implementations8for some provider-
language combinations of the recursive fibonacci algorithm are
available in a separate folder of SeMoDe. To test some of the
hypotheses explained in Section III, we need an interceptor
step, where the source code is altered. In the case of the
deployment package size hypothesis, this step can be per-
formed automatically. In other cases such as the dependency
hypothesis, a manual interception must be performed.
6Based on the opened pull requests on GitHub, https://octoverse.github.
com/, last accessed on 10/10/2018.
7https://github.com/johannes-manner/SeMoDe
8https://github.com/johannes-manner/SeMoDe/tree/master/fibonacci
We use the Command Line Interface (CLI) of the Serverless
Framework9to specify configuration settings and for the
deployment of our cloud functions.
SeMoDe offers different benchmarking options, where spe-
cial emphasis is put on the isolation of cold starts when
executing cloud functions. We set up the API gateways of
the FaaS platforms to enable local REST calls on our host
machine. This procedure gives us the opportunity to control
the execution by specifying the input of the respective cloud
function. The specific input prevents the platforms to cache
results. Finally, SeMoDe provides fetchers per provider, which
retrieve the data from the logging services in a controlled
manner.
E. Experimental Setup
While creating the experimental setup, we considered sev-
eral aspects that were independent of the hypotheses. The first
one is the way of invoking functions on the platform. Also
logging information for later analysis is a big issue for data
consistency. Finally, thinking about cold start influencing steps
during a function invocation and execution was of concern.
Due to the specific focus on cold starts, the aim of the
experimental setting is to force a cold start closely followed
by a warm start on the same container instance. A warm start
is defined as the reuse of a container in our setup. Given that
there is only a single cold start per container, having a pair
with a single cold and a single warm execution guarantees
a sound comparison, because mean calculation of several
warm executions is avoided. Such mean calculations could
have distorted our results, because one platform optimizes
the performance of cloud functions after a certain amount of
invocations, as we observed during our initial experiments.
Tests have shown that containers on most platforms were shut
down after at most 20 minutes of idling.
A FaaS platform is a black box. The precise execution
duration, which is used for billing on the platform, includes
the function execution and parts of the start up process. Other
parts of the initialization and start up of the container plus
other needed infrastructural components are not included. To
measure these aspects, we performed a REST-based interaction
with the FaaS platform, where the start and end time is also
logged on the client side.
Logging the time stamps locally enables us to compare the
local execution with the platform duration. After storing the
starting time stamp, a REST call is executed, which sends
the request over the network to the API gateway endpoint.
This endpoint creates a new event which triggers a container
creation or reuse. Finally, the cloud function is executed and
the middleware on the platform logs the start and end time
of the function execution as well as the precise duration,
which is the difference of both time stamps. The result of
the computation is transferred to the API gateway endpoint,
wrapped in a response, and sent to the caller via the network.
The client REST call exits and the local end time stamp is
9https://serverless.com/framework
Local REST
FaaS Platform time
1 min 29 min 1 min 29 min
Fig. 1: Sequential Execution of our Experiment
logged on the host machine of the FaaS user. The two local
time stamps enable an assessment of the perceived execution
duration for the user and as a consequence the difference
between cold and warm starts.
To force pairs of cold and warm executions, we used the
SeMoDe benchmarking mode sequential with changing inter-
vals. This mode triggers the provided function with a delay
between execution start times. Delays vary and are defined
in a provided array of delays d in a round robin fashion.
The start time of each execution is generalized in Fig. 2. The
platform response includes a container and platform identifier.
These identifiers enable an unambiguous matching between
the local REST data and the platform log data.
start(i, d) = (0if i= 0
start(i1, d) + d[imod len(d)] if i1
Fig. 2: Start Time of the ith Execution of the Local Benchmark
Invocation
We set our array dto {1 minute, 29 minutes}. The start time
is the time of the local utility, which calls the API gateway. A
representation of the resulting execution sequence can be seen
in Fig. 1. Once again, it should be noted, that the invocation
of the cloud functions is sequential.
F. Experiment’s Dimension Selection
Finally, we select the memory and package sizes to test our
hypotheses. An initial package size is the size of a package
after the build phase. Initial Java packages have approximately
1.5 MB, JS ones are smaller than 1 KB. The following
package sizes, which differ from the initial ones are artificially
increased by adding a file to the zip or increasing the JS
file with a comment. Deployment packages were sized initial,
3.125 MB, 6.25 MB, 12.5 MB, 25 MB and 50 MB. On Azure,
three additional packages were created for 100 MB, 200 MB
and 400 MB. For AWS, 50 MB is the upper package size limit
for functions at the time of the experiment (June 2018).10
The memory setting was only configured on AWS since
Azure does not support this feature. Memory settings on
AWS were 128 MB, 256 MB, 512 MB, 1024 MB, 2048 MB
and the maximum setting of 3008 MB. The memory setting
linearly determines the compute power of the container. Every
combination of deployment package size, memory, language,
and provider resulted in a cloud function. Therefore, we
deployed 72 cloud function on AWS and 18 on Azure. The
10https://docs.aws.amazon.com/lambda/latest/dg/limits.html
lower number of Azure functions is explicable by the dynamic
allocation of memory and CPU to the functions.11
Our experimental setup is designed to exclude side effects.
Calculating the execution overhead (cold – warm) as logged
by the client isolates the perceived cold start overhead. The
average execution time of the function (recursive fibonacci
calculation), network latency, and routing within the FaaS
platform is assumed to be equal for cold and warm executions
and therefore irrelevant for the cold start overhead value.
The reduction results in an isolation of the additional time
consuming steps, which occur during a cold start and answers
the research question, how to benchmark cold start on FaaS
platforms consistently.
G. Provider Limitations
The function execution time is limited to five minutes for
AWS. However, the AWS API Gateway closes the connection
after 29 seconds.12 In this event, the execution may still
succeed but a timeout occurs locally and matching of local
data and platform executions is no longer possible. Therefore,
we had to make sure that the cloud function executions are
never longer than 29 seconds for any of the configurations.
We tested several nvalues as input for the recursive fibonacci
function and found that calculating fibonacci(40) results in a
duration below 29 seconds even for the slowest configuration
(128 MB).
H. Experiment Execution and Data Dimensions
The experiment was executed between 6/25/2018 and
7/1/2018. Each cloud function was invoked 550 times to get
275 pairs of cold and warm executions. If a cloud function
returned 500 as HTTP status code, which indicates a server
error, or another error occurred like an API gateway timeout,
we excluded the cold as well as the warm execution. Only
pairwise valid data is processed and included in the results.
To summarize our setting, the result data matrix consists
of seven dimensions: Provider, Programming Language, De-
ployment Package Size, Memory Setting, Specific Invocation
Time, Local Duration, and Platform Duration.
11https://docs.microsoft.com/en- us/azure/azure-functions/functions-
scale\#how-the-consumption-plan-works
12https://docs.aws.amazon.com/apigateway/latest/developerguide/limits.
html
V. RES ULTS
A. Hypotheses Independent Results
Before we confirm or reject the selected hypotheses, we
gather general insights from the data in this part. Figures 3
and 4 and Table I are based on the same dimension selection.
The deployment package size is initial, all valid pairs of each
cloud function are used to compute the figures and mean
values. For AWS, if not noted otherwise, the cloud function
with 256MB memory is selected. Due to the different scaling
of memory, the absolute values of cold and warm executions
of AWS and Azure are not comparable. Nevertheless, the data
provides insights, how Azure deals with cold starts in general
and how AWS does it with a relatively low memory setting.
The following sections will discuss these aspects in more detail
when reflecting our selected hypotheses.
Figure 3 shows boxplots of the duration in milliseconds,
measured by the start and end time of REST calls on the client.
The bottom of the box is the 25th percentile and the top is the
75th percentile. The center line of the box is the median equal
to the 50th percentile. Upper whisker is the 75th percentile
plus the box length multiplied by 1.5. Corresponding to this
computation, the lower whisker is the 25th percentile minus the
box length multiplied by 1.5. Values that are not between the
two whiskers are outliers and depicted as dots. This procedure
was also chosen for the generation of Fig. 4. The values of
these boxplots were fetched from the logging services of the
respective platforms.
Values for the cold executions compared to corresponding
warm ones are higher on the client, when assessing the box
and the whiskers. These values from the client can visually
be related to the platform ones, because the box plots in the
two figures are based on exactly the same raw data and also
the Y axis dimension is the same for both figures and the
box plots. Due to this visual coherence, some outliers are no
longer included in the figures. The raw data, the box plots
as printed here and the box plots including the outliers are
online accessible13. Sometimes cold executions are faster than
warm ones. Especially the AWS values in Fig. 4, where a huge
duration intersection of cold and warm executions is present,
strengthen this statement. The warm and cold values for AWS
JS seem to be even equal.
To get more insights about absolute values, Table I presents
mean values in milliseconds measured on client and platform
side. The dimensions of the presented data are provider,
language and initial deployment package size. AWS cloud
functions were executed with 256MB memory. The mean
values of these cloud functions can be compared to the
medians in Figs. 3 and 4.
For AWS, we measured an execution overhead for the cold
start of 1,750 ms (cold – warm) on the client instead of 247 ms
on the platform for Java and 644 ms instead of -43 ms for
JS. We observed that some cold executions on the platform
are faster than the warm executions on the same container.
13https://github.com/johannes-manner/SeMoDe/releases/tag/wosc4
Cold Warm Cold–Warm
Client 5961 4211 1750
Java Platform 4329 4082 247
JS Client 14320 13676 644
AWS Platform 13496 13539 -43
Azure Client 26681 1809 24872
Java Platform 15261 1545 13716
JS Client 14369 4547 9822
Platform 5492 4270 1222
TABLE I: Mean Values in Milliseconds for Cold and Warm
Executions on Client and Platform Side
This is the reason why the value for JS is negative in this
case. For JS, 63 % of the cold executions were faster than
the corresponding warm ones. 16% of AWS cloud functions
written in Java executed faster on the cold start compared
to the warm execution on the same container. These start
and end times were logged on the platform. On the client
side, a cold execution is never faster than its corresponding
warm execution, neither for JS nor for Java. Our conclusion
for AWS is, that typical tasks during container start up etc.
are not included in the logged value on the platform. Our
results strengthen this assumption. Java needs a more resource
intensive environment with an initialization of a JVM during
the cold start, whereas JS uses only an interpreter to execute
the code. We assume, that underusage when executing the cold
request, collocation of various cloud functions on the same
host, and other reasons influence the performance as well.
For Azure, a larger execution overhead can also be identified
using the logged execution times on the client. This indicates
that the platforms do not log the complete execution overhead
of the cold start in the execution time of the function.
In order to evaluate our hypotheses, we need to know the
total execution overhead of cold starts. This is the reason why
for all further analyses we only consider the execution times
logged on the client.
B. Hypotheses Dependent Results
1) Assessment Methodology: To assess the hypotheses de-
pendent results, we use mean values as in the previous section,
but more often a correlation metric to make a clear statement,
to which degree a hypothesis is significant based on our data.
We used Spearman’s correlation coefficient since the nor-
mal distribution test showed that the data is not distributed
normally, which renders Pearson’s correlation coefficient not
applicable. The range of the correlation coefficient ρis from
negatively correlated (-1) to positively correlated (+1). There
exist different interpretations considering the significance of
correlation. We stick to a widely used interpretation [11],
where 0 indicates no correlation, absolute value of 0.2 weak,
0.5 moderate, 0.8 strong and 1.0 perfect correlation.
Additionally, we constructed a linear regression model to
calculate the slope of the line plus the intersection point of the
Cold Warm
0 5000 15000 25000
aws−java 256 MB
Cold Warm
0 5000 15000 25000
aws−js 256 MB
Cold Warm
0 5000 15000 25000
azure−java
Cold Warm
0 5000 15000 25000
azure−js
Fig. 3: Execution Times of Cold and Warm Invocations on Client Side
Cold Warm
0 5000 15000 25000
aws−java 256 MB
Cold Warm
0 5000 15000 25000
aws−js 256 MB
Cold Warm
0 5000 15000 25000
azure−java
Cold Warm
0 5000 15000 25000
azure−js
Fig. 4: Execution Times of Cold and Warm Invocations on Provider Side
Y axis. This enables us to state an equation to compute other
configurations than the investigated ones. Especially for the
deployment hypothesis, this approach can forecast arbitrary
package sizes. Resulting linear models and the correlation
coefficient ρare presented in Tables III and V.
The slope of the line is no indicator for correlation, but
states, how strongly the values of Y are influenced by a
increasing or decreasing X value.
2) H1: Hypothesis Programming Language: Execution
times for AWS Java functions with different memory settings
are shown in Table II. Cold start times for Java cloud func-
tions are between two and three times higher than those of
respective JS functions. Azure’s data from the previous section
supports our hypothesis. Average cold execution overhead
was 24,872 ms for Java, while the JS function only caused
9,822 ms, which results in a ratio of 2.53. Based on these
ratios, we confirm the hypothesis, as we noticed that the cold
start time was significantly larger for each of the Java functions
compared to JS.
Memory in MB 128 256 512 1024 2048 3008
Java 1980 1750 1292 1113 1257 861
JS 587 644 614 368 589 371
Ratio 3.38 2.72 2.10 3.03 2.14 2.32
TABLE II: Differences of Cold and Warm Executions on
Client Side (AWS Lambda)
3) H2: Hypothesis Deployment Package Size: For AWS and
Azure JS, we can confirm our hypotheses since ρis positive.
For AWS, the correlation is weak, but present. Azure JS has
the highest value for the slope (43 ms
MB ) and also the most
significant correlation.
For Azure Java the correlation coefficient is negative and
therefore we consider this combination in more detail. Ta-
ble IV states the cold start overhead for different deployment
package sizes on the client side. The mean values in this table
ρ LinearM odel
Java 0.29 1510ms + 9 ms
MB
AWS JS 0.37 613ms + 12 ms
MB
Azure Java -0.15 25580ms 7ms
MB
JS 0.46 8571ms + 43 ms
MB
TABLE III: Spearman’s Correlation Coefficient ρand Linear
Regression Model for Hypothesis Deployment Package Size
DPS 0 3.125 6.25 12.5 25 50 100 200 400
CSO 25 27 22 27 24 36 26 26 23
TABLE IV: Deployment Package Size (DPS) in MB and
Cold Start Overhead (CSO) in Seconds for Azure Java Cloud
Function
are mainly between 22 and 27 seconds with an outlier at 50MB
deployment package size. Comparing the initial size and the
highest configuration with 400MB there are only 2 seconds
difference, which is less than 10 % w.r.t. the absolute value.
This and the fact, that there is no clear tendency, indicates
that the hypothesis for this combination is not significant. This
assessment is also based on the low absolute value of ρand the
small value for the slope. Therefore, we reject the hypothesis
for Azure cloud functions written in Java.
4) H3: Hypothesis Memory Setting: The hypothesis Mem-
ory Setting states that the cold start overhead decreases with
the size of memory. Only AWS cloud functions are tested
since Azure has no memory setting. Exactly as testing the
prior hypothesis, we calculated the Spearman’s correlation
coefficient ρas well as the linear regression model.
ρ LinearM odel
Java -0.59 1634ms 0.25 ms
MB
JS -0.20 606ms 0.07 ms
MB
TABLE V: Spearman’s Correlation Coefficient ρand Linear
Regression Model for Hypothesis Memory Setting
Our hypothesis holds true since the values for the correlation
coefficient ρin Table V are both negative. For Java, we observe
a higher correlation and slope. We assume that this is caused
by a costlier middleware layer. As Java is a compiled language,
the JVM needs to be setup to execute the code. The available
CPU and memory might have a positive influence on how fast
this setup is done. JS is an interpreted language and therefore
the execution environment is not as complex as the one for
Java, but more acquired resources also have a positive effect
on the cold start time.
VI. DISCUSSION
A. Discussion of Results
Our methodology to assess the cold start from a user point
of view is inevitable, because platforms report only a fraction
of cold start overhead in their function duration. Additionally,
they seem to report different fractions of the provisioning and
initialization. Especially for functions written in JS on AWS
our results were surprising. We measured that cold starts on
the platform were faster than the consecutive warm ones in
some cases. This leads to the conclusion that AWS only bills
the users for their function executions without the time to setup
servers, virtual machines and containers.
The gap between compiled and interpreted languages with
a ratio between 2 and 3 was higher than expected. Our expla-
nation is that complex execution environments, like the JVM
in case of the compiled language Java, overcharge the already
busy CPU. This effect is smaller for higher memory settings,
but present. Especially the performance gain for compiled
languages is worth mentioning. Cold start overhead of Java
functions correlates with ρ=0.59. Only the deployment
package size hypothesis shows a mixed picture, where the
correlation is lower and varies between positive and negative
values within the same platform.
Our motivation to take the cold starts of cloud functions
into consideration is the currently prevailing strategy of using
pings to pre-warm cloud function instances. The experimental
setup of our benchmark is a REST based interaction via an
API gateway. As noted in the introduction, this ping hack is
opposed to the FaaS principle of scaling to zero. The mean
cold start overhead we measured for different platforms, lan-
guages and without artificially increased deployment package
sizes ranges from 370 ms for JS on AWS with a memory
setting of 3008 MB to 24 s for Java on Azure. The 50MB
configuration for Azure Cloud Functions written in Java had
even more overhead as already shown in Section V-B3 due
to the experimental status of some languages. Therefore, the
ping hack may not always be necessary. Additionally, scaling
also leads to cold starts and the ping hack therefore does
not solve the problem at all. The ping hack only ensures
that a fixed amount of containers is available. Our results,
especially the comparison of cold and warm executions on
client side, demonstrate that in some use cases there is no need
for this kind of hack. Especially in situations, where response
times of a few hundreds of milliseconds (AWS-JS-3008MB:
370ms overhead) are reasonable. Because of this wide range
of cold start overheads, it is important to assess the impact on
specific applications. For applications requiring a fast response
or involving user interactions, even small cold start overheads
might impose a problem.
Further investigations are needed in this area, because cold
start is one of the main issues FaaS has to assess and solve.
B. Threats to Validity
Based on the characteristics HUPP LE R [6] mentioned in his
benchmarking publication, we tried to make our benchmark as
robust, self-explanatory, and repeatable as possible. But there
are some factors that could threaten the validity of our data:
Platform Limitations - There is only limited information
available on how containers are initialized and cloud
functions are executed. With the documentation informa-
tion only, the high variety of different execution times of
a cloud function is not fully explicable. Also, additional
services like the API gateway on AWS can influence the
results by returning errors, which are not cloud function
related.
Available Metrics - The function execution time that is
logged and used for billing on the platforms provides
only limited information. In AWS, the start up duration
of a container is not included in the logged execution
time. This initialization of a container is crucial for the
perceived cold starts.
Sample Size - We only tested our hypotheses with 275 cold-
warm pairs per function.
Temporal Relevance - Due to the very young and evolving
FaaS paradigm, the updates and changes in the platforms
limit our results to a certain time frame.
VII. FUTURE WOR K
We plan to do the same benchmark setting again for
the tested hypotheses and want to integrate additional FaaS
platforms, namely IBM OpenWhisk, Google Cloud Functions,
OpenFaaS14 and FnProject15. The next benchmark will be
executed for a longer time period to assess daily differences
in the execution time and cold start behavior. Testing further
hypotheses, especially the number of dependencies, which is
important during the implementation of cloud functions, is
scheduled for the next experiment.
Also the experimental design of some hypotheses, i.e. the
deployment package size hypothesis, needs a redesign espe-
cially from the programming language point of view to verify,
if this hypothesis is not significant for some combinations.
This conceptual redesign should avoid ambiguous results and
be part of the next experiment.
This follow up benchmark serves as a data basis for a
concurrency benchmark, which will be executed at the same
time to get comparable executions. The concurrency tests are
quite important since one of the main use cases is the usage
of cloud functions as a reactive component to decouple peak
loads in a web application scenario. Especially peak loads
trigger a huge amount of cold starts on the platform.
To get more insights about some hypotheses, e.g. the
programming language hypothesis, we want to conduct a
study, where our functions are executed locally. Development-
production parity is a key issue when comparing the local
values with the client perceived REST duration and also
with the platforms’ start and end times. A comparison to
benchmarks executed locally will facilitate us to foster our
hypotheses in future work.
To understand the different FaaS use cases, further cloud
function triggers need to be investigated in respect to their cold
start impact. Especially the integration triggers of databases
are widely used, where a cloud function is triggered for every
inserted entry in a database.
14https://www.openfaas.com/
15https://fnproject.io/
REFERENCES
[1] E. van Eyk et al., “The SPEC Cloud Group’s Research Vision on FaaS
and Serverless Architectures,” in In Proc. WoSC, 2017.
[2] I. Baldini et al., “Serverless Computing: Current Trends and Open
Problems,” 2017.
[3] T. Lynn et al., “A Preliminary Review of Enterprise Serverless Cloud
Computing (Function-as-a-Service) Platforms,” in In Proc. CloudCom,
2017.
[4] H. Lee et al., “Evaluation of Production Serverless Computing Environ-
ments,” in In Proc. WoSC, 2018.
[5] E. van Eyk and A. Iosup, “Addressing Performance Challenges in
Serverless Computing,” in In Proc. ICT.OPEN, 2018.
[6] K. Huppler, “The Art of Building a Good Benchmark,” in Performance
Evaluation and Benchmarking, 2009.
[7] G. Adzic and R. Chatley, “Serverless Computing: Economic and Archi-
tectural Impact,” in In Proc. ESEC/FSE, 2017.
[8] M. Villamizar et al., “Infrastructure Cost Comparison of Running Web
Applications in the Cloud Using AWS Lambda and Monolithic and
Microservice Architectures,” in In Proc. CCGrid, 2016.
[9] M. Malawski et al., “Benchmarking Heterogeneous Cloud Functions,”
in In Proc. Euro-Par, 2018.
[10] J. Spillner et al., “FaaSter, Better, Cheaper: The Prospect of Serverless
Scientific Computing and HPC,” in Communications in Computer and
Information Science, 2017.
[11] K. H. Zou et al., “Correlation and Simple Linear Regression,” Radiology,
vol. 227, no. 3, pp. 617–628, 2003.
... Triggers are events that initiate function execution, such as HTTP requests or database changes, providing the integration points between serverless functions and external systems [10]. Cold starts involve the initialization of execution environments when no warm container is available, introducing variable latency that affects system performance [11]. Warm execution refers to the reuse of already initialized containers for subsequent invocations, which significantly improves performance but introduces state management challenges [12]. ...
... Academic research in this area has explored deterministic replay for serverless functions [21], specialized concurrency control mechanisms for serverless databases [22], and function placement strategies to minimize resource contention [11]. These approaches address specific aspects of the serverless concurrency challenge but lack the comprehensive framework needed for complex distributed applications. ...
... Container initialization during cold starts introduces variable latency that can affect concurrency control mechanisms [11]. Time-sensitive operations such as distributed transactions or coordination protocols must account for this variability or risk timeout failures and inconsistent states. ...
Article
Full-text available
Serverless computing has emerged as a compelling paradigm for deploying applications without explicit infrastructure management. However, the inherent distributed nature of serverless platforms introduces significant challenges in concurrency management. This paper examines the state-of-the-art approaches to managing concurrency in distributed serverless environments, identifying key limitations in current implementations and proposing novel solutions. We present a comprehensive analysis of concurrency control mechanisms, race condition mitigation strategies, and resource contention resolution techniques specific to serverless architectures. Our experimental evaluation demonstrates that the proposed adaptive concurrency management framework reduces function execution latency by 27% while improving resource utilization by 35% compared to traditional approaches. These findings suggest that intelligent concurrency management is crucial for optimizing performance in serverless computing environments.
... In addition, cold start time also depends on multiple factors like choice of programming language, size of the container image deployed, and memory settings. For example, the cold start time for a recursive Fibonacci sequence generation function in JavaScript deployed on Azure cloud functions is around 9.822 s whereas the same function deployed in Java results in a cold start time of around 24.872 s (Manner et al., 2018). Finally, the instance lifetime (the longest time a function instance stays alive) where a tenant's application can maintain state and would suffer from a cold start varies from cloud provider to provider. ...
Article
Full-text available
Scalability and flexibility of modern cloud application can be mainly attributed to virtual machines (VMs) and containers, where virtual machines are isolated operating systems that run on a hypervisor while containers are lightweight isolated processes that share the Host OS kernel. To achieve the scalability and flexibility required for modern cloud applications, each bare-metal server in the data center often houses multiple virtual machines, each of which runs multiple containers and multiple containerized applications that often share the same set of libraries and code, often referred to as images. However, while container frameworks are optimized for sharing images within a single VM, sharing images across multiple VMs, even if the VMs are within the same bare-metal server, is nearly non-existent due to the nature of VM isolation, leading to repetitive downloads, causing redundant added network traffic and latency. This work aims to resolve this problem by utilizing SmartNICs, which are specialized network hardware that provide hardware acceleration and offload capabilities for networking tasks, to optimize image retrieval and sharing between containers across multiple VMs on the same server. The method proposed in this work shows promise in cutting down container cold start time by up to 92%, reducing network traffic by 99.9%. Furthermore, the result is even more promising as the performance benefit is directly proportional to the number of VMs in a server that concurrently seek the same image, which guarantees increased efficiency as bare metal machine specifications improve.
... Several studies have focused on developing specialized frameworks and platforms for Serverless AI. Bhattacharjee et al. [17] introduced "Barista," a framework for deploying and serving machine learning models in serverless environments, addressing challenges related to model versioning, A/B testing, and multi-tenant isolation. Pfützner et al. [18] presented "Harnessing Serverless Computing for Machine Learning," a comprehensive study that proposed a serverless AI platform architecture and evaluated its performance across various use cases. ...
Article
Full-text available
This research paper explores the emerging field of Serverless AI, focusing on the deployment of machine learning models in cloud functions. As organizations increasingly adopt cloud-native architectures, the integration of artificial intelligence and machine learning capabilities within serverless computing environments has become a critical area of study. This paper investigates the benefits, challenges, and best practices associated with deploying ML models in cloud functions, examining various serverless platforms and their suitability for AI workloads. Through a comprehensive literature review, case studies, and experimental analysis, we provide insights into the performance, scalability, and cost-effectiveness of Serverless AI solutions. Our findings suggest that while Serverless AI offers significant advantages in terms of reduced operational complexity and improved resource utilization, careful consideration must be given to model optimization, cold start latencies, and execution time constraints. This research contributes to the growing body of knowledge on cloud-native AI architectures and provides practical guidance for organizations seeking to leverage Serverless AI in their applications
... Several studies have focused on developing specialized frameworks and platforms for Serverless AI. Bhattacharjee et al. [17] introduced "Barista," a framework for deploying and serving machine learning models in serverless environments, addressing challenges related to model versioning, A/B testing, and multi-tenant isolation. Pfützner et al. [18] presented "Harnessing Serverless Computing for Machine Learning," a comprehensive study that proposed a serverless AI platform architecture and evaluated its performance across various use cases. ...
Article
Full-text available
1. INTRODUCTION : The convergence of serverless computing and artificial intelligence has given rise to a new paradigm in cloud-native application development: Serverless AI. This approach combines the event-driven, auto-scaling nature of serverless architectures with the predictive capabilities of machine learning models, offering organizations a powerful toolset for building intelligent, responsive applications without the burden of managing underlying infrastructure. Serverless computing, often referred to as Function-as-a-Service (FaaS), has gained significant traction in recent years due to its promise of reduced operational complexity, improved resource utilization, and pay-per-use pricing models [1]. Concurrently, the field of artificial intelligence, particularly machine learning, has experienced rapid advancement, with models becoming increasingly sophisticated and capable of addressing complex real-world problems [2]. The integration of these two technologies presents both opportunities and challenges. On one hand, serverless platforms offer an ideal environment for deploying and scaling AI models, allowing developers to focus on model development rather than infrastructure management. On the other hand, the stateless nature of serverless functions, coupled with potential cold start latencies and execution time limits, poses unique challenges for AI workloads that may require significant computational resources or persistent state [3]. This research paper aims to provide a comprehensive examination of Serverless AI, with a particular focus on the deployment of machine learning models in cloud functions. We explore the current state of the art, analyze the advantages and limitations of this approach, and investigate best practices for optimizing model performance in serverless environments. The remainder of this paper is structured as follows: • Section 2 provides a background on serverless computing and machine learning, establishing the context for their integration. • Section 3 presents a literature review, summarizing existing research on Serverless AI and identifying key themes and gaps in current knowledge.
... Esse ciclo continua alternando entre partida fria e partida quente conforme a sua utilização. Apesar dos provedores não disponibilizarem o tempo exato em que uma função precisa estar ociosa para passar do estado de partida quente para a partida fria, o trabalho [5] chegou empiricamente no valor aproximado de 20 minutos. Adicionando uma tolerância de 10 minutos, essa trabalho considera que uma função que está pelo menos 30 minutos ociosa, deixa de estar ativa e tem seus recursos desalocados. ...
Conference Paper
Com o avanço da computação em nuvem e serviços serverless, mais foco essa área vem ganhando nos últimos anos. Provedores de nuvem oferecem serviços relacionados a serverless, e em particular, a Amazon disponibiliza o AWS Lambda para a criação de funções serverless. Existem ao menos duas formas de implantá-los: através da compressão da pasta do projeto, que contém o código fonte e arquivos executáveis, em formato ZIP; e a segunda maneira na qual a aplicação e suas dependências estão em uma imagem de contêiner. Dependendo da abordagem escolhida, o desempenho, o custo e o tempo de inicialização podem variar. Levando em consideração essas métricas, este trabalho visa compará-las entre as duas abordagens de implantação mencionadas e tem como objetivo descobrir se uma das abordagens apresenta ser mais adequada do que outra. Experimentos conduzidos visando tal comparação demonstram que a criação de funções utilizando pastas compactadas apresentam vantagens, principalmente no tempo de inicialização da função quando está em modo de partida fria, e no custo.
Article
Full-text available
This study investigates the efficacy of serverless computing for deploying and scaling artificial intelligence (AI) and machine learning (ML) workloads in cloud environments. We employ a comprehensive methodology to assess performance and cost-efficiency, conducting experiments using popular AI/ML frameworks on leading serverless platforms. Key performance indicators such as latency, throughput, and scalability are measured, alongside an in-depth cost analysis considering resource utilization, operational costs, and total cost of ownership. Our findings reveal that serverless computing offers significant advantages in scalability and cost-efficiency for certain AI/ML workloads, particularly those with intermittent computational needs. However, limitations such as cold start latencies and resource constraints are identified. This research contributes valuable insights for practitioners and researchers, informing decision-making processes for organizations considering serverless computing for AI/ML initiatives.
Article
Full-text available
Emerging as a transforming paradigm in cloud computing, serverless computing presents a fresh way to create and distribute applications free from the requirement for underlying infrastructure management. This work offers a thorough study of serverless computing with an emphasis on its ability to maximise resource use and improve cost-effectiveness in contemporary cloud systems. We investigate the main features, advantages, and difficulties of serverless platforms as well as how they affect methods of application design and development. By means of a methodical analysis of the body of current research and case studies, we investigate many optimisation strategies and best practices for using serverless computing to enhance resource allocation and lower running expenses. We also go over possible future routes for study and development in this quickly developing subject as well as the present constraints of serverless architectures Although serverless computing has many benefits in terms of scalability, cost-efficiency, and operational overhead, our results show that to fully realise these advantages careful attention must be given to application architecture, workload characteristics, and pricing models.
Article
Full-text available
Serverless computing has emerged as a transformative paradigm in cloud infrastructure, offering organizations the ability to scale their applications dynamically without the burden of managing underlying servers. By abstracting away the provisioning and scaling of infrastructure, serverless computing enables developers to focus on building and deploying their applications, while the cloud provider handles the auto-scaling, load balancing, and fault tolerance. This paper examines the key benefits and challenges of serverless computing, with a particular emphasis on optimizing resource utilization and cost efficiency. The findings suggest that serverless computing can lead to significant improvements in resource utilization and cost savings, but organizations must also address challenges related to cold starts, vendor lock-in, and monitoring complexity to fully realize the potential of this cloud computing paradigm.
Chapter
Full-text available
Cloud Functions, often called Function-as-a-Service (FaaS), pioneered by AWS Lambda, are an increasingly popular method of running distributed applications. As in other cloud offerings, cloud functions are heterogeneous, due to different underlying hardware, runtime systems, as well as resource management and billing models. In this paper, we focus on performance evaluation of cloud functions, taking into account heterogeneity aspects. We developed a cloud function benchmarking framework, consisting of one suite based on Serverless Framework, and one based on HyperFlow. We deployed the CPU-intensive benchmarks: Mersenne Twister and Linpack, and evaluated all the major cloud function providers: AWS Lambda, Azure Functions, Google Cloud Functions and IBM OpenWhisk. We make our results available online and continuously updated. We report on the initial results of the performance evaluation and we discuss the discovered insights on the resource allocation policies.
Conference Paper
Full-text available
In line with cloud computing emergence as the dominant enterprise computing paradigm, our conceptualization of the cloud computing reference architecture and service construction has also evolved. For example, to address the need for cost reduction and rapid provisioning, virtualization has moved beyond hardware to containers. More recently, serverless computing or Function-as-a-Service has been presented as a means to introduce further cost-efficiencies, reduce configuration and management overheads, and rapidly increase an application's ability to speed up, scale up and scale down in the cloud. The potential of this new computation model is reflected in the introduction of serverless computing platforms by the main hyperscale cloud service providers. This paper provides an overview and multi-level feature analysis of seven enterprise serverless computing platforms. It reviews extant research on these platforms and identifies the emergence of AWS Lambda as a de facto base platform for research on enterprise serverless cloud computing. The paper concludes with a summary of avenues for further research.
Conference Paper
Full-text available
Cloud computing enables an entire ecosystem of developing, composing, and providing IT services. An emerging class of cloud-based software architectures, serverless, focuses on providing software architects the ability to execute arbitrary functions with small overhead in server management, as Function-as-a-service (FaaS). However useful, serverless and FaaS suffer from a community problem that faces every emerging technology, which has indeed also hampered cloud computing a decade ago: lack of clear terminology, and scattered vision about the field. In this work, we address this community problem. We clarify the term serverless, by reducing it to cloud functions as programming units, and a model of executing simple and complex (e.g., workflows of) functions with operations managed primarily by the cloud provider. We propose a research vision, where 4 key directions (perspectives) present 17 technical opportunities and challenges.
Conference Paper
Full-text available
Cloud Functions, often called Function-as-a-Service (FaaS), pioneered by AWS Lambda, are an increasingly popular method of running distributed applications. As in other cloud offerings, cloud functions are heterogeneous, due to different underlying hardware, runtime systems, as well as resource management and billing models. In this paper, we focus on performance evaluation of cloud functions, taking into account heterogeneity aspects. We developed a cloud function benchmarking framework, consisting of one suite based on Serverless Framework, and one based on HyperFlow. We deployed the CPU-intensive benchmarks: Mersenne Twister and Linpack, and evaluated all the major cloud function providers: AWS Lambda, Azure Functions, Google Cloud Functions and IBM OpenWhisk. We make our results available online and continuously updated. We report on the initial results of the performance evaluation and we discuss the discovered insights on the resource allocation policies.
Chapter
Full-text available
Serverless computing has emerged as a new compelling paradigm for the deployment of applications and services. It represents an evolution of cloud programming models, abstractions, and platforms, and is a testament to the maturity and wide adoption of cloud technologies. In this chapter, we survey existing serverless platforms from industry, academia, and open source projects, identify key characteristics and use cases, and describe technical challenges and open problems.
Conference Paper
Full-text available
Large Internet companies like Amazon, Netflix, and LinkedIn are using the microservice architecture pattern to deploy large applications in the cloud as a set of small services that can be developed, tested, deployed, scaled, operated and upgraded independently. However, aside from gaining agility, independent development, and scalability, infrastructure costs are a major concern for companies adopting this pattern. This paper presents a cost comparison of a web application developed and deployed using the same scalable scenarios with three different approaches: 1) a monolithic architecture, 2) a microservice architecture operated by the cloud customer, and 3) a microservice architecture operated by the cloud provider. Test results show that microservices can help reduce infrastructure costs in comparison to standard monolithic architectures. Moreover, the use of services specifically designed to deploy and scale microservices reduces infrastructure costs by 70% or more. Lastly, we also describe the challenges we faced while implementing and deploying microservice applications.
Conference Paper
Amazon Web Services unveiled their ‘Lambda’ platform in late 2014. Since then, each of the major cloud computing infrastructure providers has released services supporting a similar style of deployment and operation, where rather than deploying and running monolithic services, or dedicated virtual machines, users are able to deploy individual functions, and pay only for the time that their code is actually executing. These technologies are gathered together under the marketing term ‘serverless’ and the providers suggest that they have the potential to significantly change how client/server applications are designed, developed and operated. This paper presents two case industrial studies of early adopters, showing how migrating an application to the Lambda deployment architecture reduced hosting costs – by between 66% and 95% – and discusses how further adoption of this trend might influence common software architecture design practices.
Conference Paper
What makes a good benchmark? This is a question that has been asked often, answered often, altered often. In the past 25 years, the information processing industry has seen the creation of dozens of “industry standard” performance benchmarks – some highly successful, some less so. This paper will explore the overall requirements of a good benchmark, using existing industry standards as examples along the way.