Preprint

Prediction-driven resource provisioning for serverless container runtimes

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

In recent years Serverless Computing has emerged as a compelling cloud based model for the development of a wide range of data-intensive applications. However, rapid container provisioning introduces non-trivial challenges for FaaS cloud providers, as (i) real-world FaaS workloads may exhibit highly dynamic request patterns, (ii) applications have service-level objectives (SLOs) that must be met, and (iii) container provisioning can be a costly process. In this paper, we present SLOPE, a prediction framework for serverless FaaS platforms to address the aforementioned challenges. Specifically, it trains a neural network model that utilizes knowledge from past runs in order to estimate the number of instances required to satisfy the invocation rate requirements of the serverless applications. In cases that a priori knowledge is not available, SLOPE makes predictions using a graph edit distance approach to capture the similarities among serverless applications. Our experimental results illustrate the efficiency and benefits of our approach, which can reduce the operating costs by 66.25% on average.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Function-as-service (FaaS) platforms promise a simpler programming model for cloud computing, in which the developers concentrate on writing its applications. In contrast, platform providers take care of resource management and administration. As FaaS users are billed based on the execution of the functions, platform providers have a natural incentive not to keep idle resources running at the platform's expense. However, this strategy may lead to the cold start issue, in which the execution of a function is delayed because there is no ready resource to host the execution. Cold starts can take hundreds of milliseconds to seconds and have been a prohibitive and painful disadvantage for some applications. This work describes and evaluates a technique to start functions , which restores snapshots from previously executed function processes. We developed a prototype of this technique based on the CRIU process checkpoint/restore Linux tool. We evaluate this prototype by running experiments that compare its start-up time against the standard Unix process creation/start-up procedure. We analyze the following three functions: i) a "do-nothing" function, ii) an Image Resizer function, and iii) a function that renders Markdown files. The results attained indicate that the technique can improve the start-up time of function replicas by 40% (in the worst case of a "do-nothing" function) and up to 71% for the Image Resizer one. Further analysis indicates that the runtime initialization is a key factor, and we confirmed it by performing a sensitivity analysis based on synthetically generated functions of different code sizes. These experiments demonstrate that it is critical to decide when to create a snapshot of a function. When one creates the snapshots of warm functions, the speed-up achieved by the prebaking technique is even higher: the speed-up increases from 127.45% to 403.96%, for a small, synthetic function; and for a bigger, synthetic function, this ratio increases from 121.07% to 1932.49%.
Article
Full-text available
New architectural patterns (e.g. microservices), the massive adoption of Linux containers (e.g. Docker containers), and improvements in key features of Cloud computing such as auto-scaling, have helped developers to decouple complex and monolithic systems into smaller stateless services. In turn, Cloud providers have introduced serverless computing, where applications can be defined as a workflow of event-triggered functions. However, serverless services, such as AWS Lambda, impose serious restrictions for these applications (e.g. using a predefined set of programming languages or difficulting the installation and deployment of external libraries). This paper addresses such issues by introducing a framework and a methodology to create Serverless Container-aware ARchitectures (SCAR). The SCAR framework can be used to create highly-parallel event-driven serverless applications that run on customized runtime environments defined as Docker images on top of AWS Lambda. This paper describes the architecture of SCAR together with the cache-based optimizations applied to minimize cost, exemplified on a massive image processing use case. The results show that, by means of SCAR, AWS Lambda becomes a convenient platform for High Throughput Computing, specially for highly-parallel bursty workloads of short stateless jobs.
Conference Paper
Full-text available
Cloud Functions, often called Function-as-a-Service (FaaS), pioneered by AWS Lambda, are an increasingly popular method of running distributed applications. As in other cloud offerings, cloud functions are heterogeneous, due to different underlying hardware, runtime systems, as well as resource management and billing models. In this paper, we focus on performance evaluation of cloud functions, taking into account heterogeneity aspects. We developed a cloud function benchmarking framework, consisting of one suite based on Serverless Framework, and one based on HyperFlow. We deployed the CPU-intensive benchmarks: Mersenne Twister and Linpack, and evaluated all the major cloud function providers: AWS Lambda, Azure Functions, Google Cloud Functions and IBM OpenWhisk. We make our results available online and continuously updated. We report on the initial results of the performance evaluation and we discuss the discovered insights on the resource allocation policies.
Article
Full-text available
Graph data have become ubiquitous and manipulating them based on similarity is essential for many applications. Graph edit distance is one of the most widely accepted measures to determine similarities between graphs and has extensive applications in the fields of pattern recognition, computer vi- sion etc. Unfortunately, the problem of graph edit distance computation is NP-Hard in general. Accordingly, in this pa- per we introduce three novel methods to compute the upper and lower bounds for the edit distance between two graphs in polynomial time. Applying these methods, two algorithms AppFull and AppSub are introduced to perform different kinds of graph search on graph databases. Comprehensive experimental studies are conducted on both real and syn- thetic datasets to examine various aspects of the methods for bounding graph edit distance. Result shows that these methods achieve good scalability in terms of both the number of graphs and the size of graphs. The effectiveness of these algorithms also confirms the usefulness of using our bounds in filtering and searching of graphs.
Article
Serverless computing is a popular cloud computing paradigm that frees developers from server management. Function-as-a-Service (FaaS) is the most popular implementation of serverless computing, representing applications as event-driven and stateless functions. However, existing studies report that functions of FaaS applications severely suffer from cold-start latency. In this paper, we propose an approach namely FaaSLight to accelerating the cold start for FaaS applications through application-level optimization. We first conduct a measurement study to investigate the possible root cause of the cold start problem of FaaS. The result shows that application code loading latency is a significant overhead. Therefore, loading only indispensable code from FaaS applications can be an adequate solution. Based on this insight, we identify code related to application functionalities by constructing the function-level call graph, and separate other code (i.e., optional code) from FaaS applications. The separated optional code can be loaded on demand to avoid the inaccurate identification of indispensable code causing application failure. In particular, a key principle guiding the design of FaaSLight is inherently general, i.e., platform - and language-agnostic . In practice, FaaSLight can be effectively applied to FaaS applications developed in different programming languages (Python and JavaScript), and can be seamlessly deployed on popular serverless platforms such as AWS Lambda and Google Cloud Functions, without having to modify the underlying OSes or hypervisors, nor introducing any additional manual engineering efforts to developers. The evaluation results on real-world FaaS applications show that FaaSLight can significantly reduce the code loading latency (up to 78.95%, 28.78% on average), thereby reducing the cold-start latency. As a result, the total response latency of functions can be decreased by up to 42.05% (19.21% on average). Compared with the state-of-the-art, FaaSLight achieves a 21.25 × improvement in reducing the average total response latency.
Article
This study builds a fully deconvolutional neural network (FDNN) and addresses the problem of single image super-resolution (SISR) by using the FDNN. Although SISR using deep neural networks has been a major research focus, the problem of reconstructing a high resolution (HR) image with an FDNN has received little attention. A few recent approaches toward SISR are to embed deconvolution operations into multilayer feedforward neural networks. This paper constructs a deep FDNN for SISR that possesses two remarkable advantages compared to existing SISR approaches. The first improves the network performance without increasing the depth of the network or embedding complex structures. The second replaces all convolution operations with deconvolution operations to implement an effective reconstruction. That is, the proposed FDNN only contains deconvolution layers and learns an end-to-end mapping from low resolution (LR) to HR images. Furthermore, to avoid the oversmoothness of the mean squared error loss, the trained image is treated as a probability distribution, and the Kullback–Leibler divergence is introduced into the final loss function to achieve enhanced recovery. Although the proposed FDNN only has 10 layers, it is successfully evaluated through extensive experiments. Compared with other state-of-the-art methods and deep convolution neural networks with 20 or 30 layers, the proposed FDNN achieves better performance for SISR.
Conference Paper
The microservice architecture has dramatically reduced user effort in adopting and maintaining servers by providing a catalog of functions as services that can be used as building blocks to construct applications. This has enabled datacenter operators to look at managing datacenter hosting microservices quite differently from traditional infrastructures. Such a paradigm shift calls for a need to rethink resource management strategies employed in such execution environments. We observe that the visibility enabled by a microservices execution framework can be exploited to achieve high throughput and resource utilization while still meeting Service Level Agreements, especially in multi-tenant execution scenarios.
Conference Paper
Function as a Service (FaaS) has been gaining popularity as a way to deploy computations to serverless backends in the cloud. This paradigm shifts the complexity of allocating and provisioning resources to the cloud provider, which has to provide the illusion of always-available resources (i.e., fast function invocations without cold starts) at the lowest possible resource cost. Doing so requires the provider to deeply understand the characteristics of the FaaS workload. Unfortunately, there has been little to no public information on these characteristics. Thus, in this paper, we first characterize the entire production FaaS workload of Azure Functions. We show for example that most functions are invoked very infrequently, but there is an 8-order-of-magnitude range of invocation frequencies. Using observations from our characterization, we then propose a practical resource management policy that significantly reduces the number of function cold starts, while spending fewer resources than state-of-the-practice policies.
Article
Serverless computing has emerged as a new cloud computing execution model that liberates users and application developers from explicitly managing ‘physical’ resources, leaving such a resource management burden to service providers. In this article, we study the problem of resource allocation for multi-tenant serverless computing platforms explicitly taking into account workload fluctuations including sudden surges. In particular, we investigate different root causes of performance degradation in these platforms where tenants (their applications) have different workload characteristics. To this end, we develop a fine-grained CPU cap control solution as a resource manager that dynamically adjusts CPU usage limit (or CPU cap) concerning applications with same/similar performance requirements, i.e., application groups . The adjustment of CPU caps applies primarily to co-located worker processes of serverless computing platforms to minimize resource contention, which is the major source of performance degradation. The actual adjustment decisions are made based on performance metrics (e.g., throttled time and queue length) using a group-aware scheduling algorithm. The extensive experimental results performed in our local cluster confirm that the proposed resource manager can effectively eliminate the burden of explicit reservation of computing capacity, even when fluctuations and sudden surges in the incoming workload exist. We measure the robustness of the proposed resource manager by comparing it with several heuristics which extensively used in practice, including the enhanced version of round robin and the least length queue scheduling policies, under various workload intensities driven by real-world scenarios. Notably, our resource manager outperforms other heuristics by decreasing skewness and average response time up to 44 and 94 percent, respectively, while it does not over-use the CPU resources.
Conference Paper
Distributed topic-based publish/subscribe systems like Apache Kafka provide a scalable and decentralized approach to achieve data dissemination. However, despite their wide adoption they can suffer from performance degradation due to the uneven load distribution between the nodes that receive and forward the messages (i.e., brokers). This problem occurs due to the lack of effective load balancing mechanisms that consider the impact of (i) the amount of topics that are handled by a specific broker and (ii) changes in the input rate during the course of the system execution. Furthermore, while there have been some previous works that examine the problem, most of them focus on content-based pub/sub systems or require a centralized coordinator for determining the appropriate assignments. In this work we propose a novel decentralized load balancing technique for topic-based publish/subscribe systems. More specifically, we exploit the fact that brokers in systems like Kafka can communicate using inner topics to exchange their load-related information and propose a novel decentralized algorithm that executes on each individual broker to determine the topics' partitions that should be migrated in order to avoid overloaded conditions. Our detailed experimental evaluation on our local cluster, using different applications that process various data forms from different topics, illustrate the benefits of our approach and show that we can efficiently balance the load between the brokers without the need of a centralized coordination mechanism.
Conference Paper
In this demonstration we present \emph{Dione} a novel framework for automatic profiling and tuning big data applications. Our system allows a non-expert user to submit Spark or Flink applications to his/her cluster and Dione automatically determines the impact of different configuration parameters on the application's execution time and monetary cost. Dione is the first framework that exploits similarities in the execution plans of different applications to narrow down the amount of profiling runs that are required for building prediction models that capture the impact of the configuration parameters on the metrics of interest. Dione exploits these prediction models to tune the configuration parameters in a way that minimizes the application's execution time or the user's budget. Finally, Dione's Web-UI visualizes the impact of the configuration parameters on the execution time and the monetary cost, and enables the user to submit the application with the recommended parameters' values.
Conference Paper
Recurrent Neural Networks are powerful tools for modeling sequences. They are flexibly extensible and can incorporate various kinds of information including temporal order. These properties make them well suited for generating sequential recommendations. In this paper, we extend Recurrent Neural Networks by considering unique characteristics of the Recommender Systems domain. One of these characteristics is the explicit notion of the user recommendations are specifically generated for. We show how individual users can be represented in addition to sequences of consumed items in a new type of Gated Recurrent Unit to effectively produce personalized next item recommendations. Offline experiments on two real-world datasets indicate that our extensions clearly improve objective performance when compared to state-of-the-art recommender algorithms and to a conventional Recurrent Neural Network.
Article
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Article
This survey is divided into three major sections. The first concerns mathematical results about the choice axiom and the choice models that devoIve from it. For example, its relationship to Thurstonian theory is satisfyingly understood; much is known about how choice and ranking probabilities may relate, although little of this knowledge seems empirically useful; and there are certain interesting statistical facts. The second section describes attempts that have been made to test and apply these models. The testing has been done mostly, though not exclusively, by psychologists; the applications have been mostly in economics and sociology. Although it is clear from many experiments that the conditions under which the choice axiom holds are surely delicate, the need for simple, rational underpinnings in complex theories, as in economics and sociology, leads one to accept assumptions that are at best approximate. And the third section concerns alternative, more general theories which, in spirit, are much like the choice axiom. Perhaps I had best admit at the outset that, as a commentator on this scene, I am quali- fied no better than many others and rather less well than some who have been working in this area recently, which I have not been. My pursuits have led me along other, somewhat related routes. On the one hand, I have contributed to some of the recent, purely algebraic aspects of fundamental measurement (for a survey of some of this material, see Krantz, Lute, Suppes, & Tversky, 1971). And on the other hand, I have worked in the highly probabilistic area of psychophysical theory; but the empirical materials have led me away from axiomatic structures, such as the choice axiom, to more structural, neural models which are not readily axiomatized at the present time. After some attempts to apply choice models to psychophysical phenomena (discussed below in its proper place), I was led to conclude that it is not a very promising approach to, these data, and so I have not been actively studying any aspect of the choice axiom in over 12 years. With that understood, let us begin.
Article
Error-tolerant graph matching is a powerful concept that has various applications in pattern recognition and machine vision. In the present paper, a new distance measure on graphs is proposed. It is based on the maximal common subgraph of two graphs. The new measure is superior to edit distance based measures in that no particular edit operations together with their costs need to be defined. It is formally shown that the new distance measure is a metric. Potential algorithms for the efficient computation of the new measure are discussed.
Energy efficient scheduling for serverlerss systems
  • M Tsenos
  • A Peri
  • V Kalogeraki
M. Tsenos, A. Peri, and V. Kalogeraki, "Energy efficient scheduling for serverlerss systems," in ACSOS, Toronto, Canada, 2023.
Orchestrating the execution of serverless functions in hybrid cloud
  • A Peri
  • M Tsenos
  • V Kalogeraki
A. Peri, M. Tsenos, and V. Kalogeraki, "Orchestrating the execution of serverless functions in hybrid cloud," in ACSOS, Toronto, Canada, 2023.
Orion: Online resource negotiator for multiple big data analytics frameworks
  • N Zacheilas
  • N Chalvantzis
  • I Konstantinou
  • V Kalogeraki
  • N Koziris
N. Zacheilas, N. Chalvantzis, I. Konstantinou, V. Kalogeraki, and N. Koziris, "Orion: Online resource negotiator for multiple big data analytics frameworks," in ICAC. IEEE, 2018, pp. 11-20.
Faasnet: Scalable and fast provisioning of custom serverless container runtimes at alibaba cloud function compute
  • A Wang
A. Wang and et. al, "Faasnet: Scalable and fast provisioning of custom serverless container runtimes at alibaba cloud function compute," in USENIX ATC, 2021.
{SOCK}: Rapid task provisioning with serverlessoptimized containers
  • E Oakes
E. Oakes and et. al, "{SOCK}: Rapid task provisioning with serverlessoptimized containers," in USENIX ATC, 2018, pp. 57-70.
Faasm: Lightweight isolation for efficient stateful serverless computing
  • S Shillaker
  • P Pietzuch
S. Shillaker and P. Pietzuch, "Faasm: Lightweight isolation for efficient stateful serverless computing," in USENIX, 2020, pp. 419-433.
Edgewise: a better stream processing engine for the edge
  • X Fu
X. Fu and et al., "Edgewise: a better stream processing engine for the edge," in USENIX ATC, 2019.
Generalized cross entropy loss for training deep neural networks with noisy labels
  • Z Zhang
  • M Sabuncu
Z. Zhang and M. Sabuncu, "Generalized cross entropy loss for training deep neural networks with noisy labels," NeurIPS, vol. 31, 2018.
Uses and abuses of the cross-entropy loss: Case studies in modern deep learning
  • E Gordon-Rodriguez
E. Gordon-Rodriguez and et al., "Uses and abuses of the cross-entropy loss: Case studies in modern deep learning," 2020.
Neural networks trained to solve differential equations learn general representations
M. Magill and et al., "Neural networks trained to solve differential equations learn general representations," NeurIPS, vol. 31, 2018.
Faasrank: Learning to schedule functions in serverless platforms
  • H Yu
H. Yu and et. al, "Faasrank: Learning to schedule functions in serverless platforms," in ACSOS, Washington, DC, USA. IEEE, 2021, pp. 31-40.
Similarity of neural network representations revisited
  • S Kornblith
S. Kornblith and et. al, "Similarity of neural network representations revisited," in ICML. PMLR, 2019, pp. 3519-3529.
Static call graph construction in aws lambda serverless applications
  • M Obetz
  • S Patterson
  • A L Milanova
M. Obetz, S. Patterson, and A. L. Milanova, "Static call graph construction in aws lambda serverless applications." in HotCloud, 2019.