Conference Paper

Prediction-driven resource provisioning for serverless container runtimes

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Tomaras, Tsenos, and Kalogeraki (2023) emphasize that automated delivery pipelines reduce the manual handoffs and error-prone processes that historically editor@iaeme.com complicated healthcare system deployments [11]. For healthcare applications where reliability directly impacts patient care, CI/CD pipelines implement comprehensive validation gatesincluding automated testing, security scanning, and compliance verification-ensuring that changes meet quality thresholds before proceeding to production environments. ...
... API Gateway complements these compute capabilities by providing managed API endpoints that handle authentication, throttling, and request validation-creating secure interfaces for both internal and external consumers. Tomaras, Tsenos, and Kalogeraki (2023) highlight that serverless architectures deliver significant operational benefits through their consumption-based pricing models, which align costs directly with system utilization rather than provisioned capacity [11]. ...
... Core AWS Services for Healthcare Enterprise Systems[7][8][9][10][11][12] ...
Article
Full-text available
This article examines the architectural frameworks and implementation strategies for modern healthcare enterprise systems built on Amazon Web Services (AWS) cloud infrastructure. The article explores how agile methodologies and containerization enhance development workflows while ensuring consistency across diverse healthcare environments. Through analysis of core AWS services including S3, Elastic Container Service, Cognito, and CloudFormation, we demonstrate optimal patterns for secure data management, infrastructure automation, and continuous delivery pipelines specific to healthcare contexts. The article further investigates how emerging technologies such as serverless computing and AI-assisted development tools augment traditional approaches to healthcare information systems. The findings suggest that properly orchestrated cloud-native architectures significantly improve both operational efficiency and data integrity across organizational boundaries, while maintaining the stringent security and compliance requirements essential to healthcare applications. This article contributes to the growing body of knowledge on enterprise system implementation in highly regulated industries, offering practical insights for healthcare IT architects and administrators.
... These implementations leverage sophisticated prediction models to anticipate resource requirements and optimize container deployment patterns. Research indicates significant potential for AI-driven optimization in improving container platform efficiency [11]. ...
... These platforms implement sophisticated prediction mechanisms that optimize resource allocation based on anticipated workload patterns. Research demonstrates the effectiveness of prediction-driven approaches in improving resource utilization and reducing operational costs in serverless environments [11]. ...
... Modern container deployments must balance the benefits of prediction-driven resource management with the complexity of implementing such systems. Research demonstrates the importance of addressing operational complexity and security considerations in infrastructure design [11]. ...
Article
Full-text available
Modern application containerization has emerged as a transformative paradigm in software development and deployment, revolutionizing how organizations build, ship, and run applications across diverse computing environments. This article comprehensively analyzes containerization technologies, examining their architectural components, implementation patterns, and impact on enterprise computing landscapes. Through a systematic review of current industry practices, security frameworks, and cloud integration patterns, the article investigates the fundamental aspects of container-based virtualization and its relationship with microservices architecture. The article explores critical considerations in container security, orchestration strategies, and enterprise adoption patterns while analyzing the convergence of containerization with cloud-native computing. The findings indicate that containerization significantly enhances application portability, resource efficiency, and deployment consistency while introducing new security and operational complexity challenges. This article Container-Based Virtualization in Enterprise Computing: A Comprehensive Analysis of Architecture, Security, and Cloud Integration Patterns https://iaeme.com/Home/journal/IJCET 1487 editor@iaeme.com contributes to the growing knowledge in modern software architecture by providing a structured framework for understanding and implementing containerization technologies in enterprise environments. The paper concludes by identifying emerging trends and future research directions in container-based computing, offering insights for practitioners and researchers.
Conference Paper
Full-text available
Function as a Service (FaaS)-the reason why so many practitioners and researchers talk about Serverless Computing-claims to hide all operational concerns. The promise when using FaaS is that users only have to focus on the core business functionality in form of cloud functions. However, a few configuration options remain within the developer's responsibility. Most of the currently available cloud function offerings force the user to choose a memory or other resource setting and a timeout value. CPU is scaled based on the chosen options. At a first glance, this seems like an easy task, but the tradeoff between performance and cost has implications on the quality of service of a cloud function. Therefore, in this paper we present a local simulation approach for cloud functions and support developers in choosing a suitable configuration. The methodology we propose simulates the execution behavior of cloud functions locally, makes the cloud and local environment comparable and maps the local profiling data to a cloud platform. This reduces time during the development and enables developers to work with their familiar tools. This is especially helpful when implementing multi-threaded cloud functions.
Conference Paper
Full-text available
Function-as-service (FaaS) platforms promise a simpler programming model for cloud computing, in which the developers concentrate on writing its applications. In contrast, platform providers take care of resource management and administration. As FaaS users are billed based on the execution of the functions, platform providers have a natural incentive not to keep idle resources running at the platform's expense. However, this strategy may lead to the cold start issue, in which the execution of a function is delayed because there is no ready resource to host the execution. Cold starts can take hundreds of milliseconds to seconds and have been a prohibitive and painful disadvantage for some applications. This work describes and evaluates a technique to start functions , which restores snapshots from previously executed function processes. We developed a prototype of this technique based on the CRIU process checkpoint/restore Linux tool. We evaluate this prototype by running experiments that compare its start-up time against the standard Unix process creation/start-up procedure. We analyze the following three functions: i) a "do-nothing" function, ii) an Image Resizer function, and iii) a function that renders Markdown files. The results attained indicate that the technique can improve the start-up time of function replicas by 40% (in the worst case of a "do-nothing" function) and up to 71% for the Image Resizer one. Further analysis indicates that the runtime initialization is a key factor, and we confirmed it by performing a sensitivity analysis based on synthetically generated functions of different code sizes. These experiments demonstrate that it is critical to decide when to create a snapshot of a function. When one creates the snapshots of warm functions, the speed-up achieved by the prebaking technique is even higher: the speed-up increases from 127.45% to 403.96%, for a small, synthetic function; and for a bigger, synthetic function, this ratio increases from 121.07% to 1932.49%.
Conference Paper
Full-text available
In recent years we observe the rapid growth of large-scale analytics applications in a wide range of domains – from healthcare infrastructures to traffic management. The high volume of data that need to be processed has stimulated the development of special purpose frameworks which handle the data deluge by parallelizing data processing and concurrently using multiple computing nodes. These frameworks differentiate significantly in terms of the policies they follow to decompose their workloads into multiple tasks and also on the way they exploit the available computing resources. As a result, based on the framework that applications have been implemented in, we observe significant variations in their resource utilization and execution times. Therefore, determining the appropriate framework for executing a big data application is not trivial. In this work we propose Orion, a novel resource negotiator for cloud infrastructures that support multiple big data frame- works such as Apache Spark, Apache Flink and TensorFlow. More specifically, given an application, Orion determines the most appropriate framework to assign it to. Additionally, Orion reserves the required resources so that the application is able to meet its performance requirements. Our negotiator exploits state- of-the-art prediction techniques for estimating the application’s execution time when it is assigned to a specific framework with varying configuration parameters and processing resources. Finally, our detailed experimental evaluation, using practical big data workloads on our local cluster, illustrates that our approach outperforms its competitors.
Chapter
Full-text available
Cloud Functions, often called Function-as-a-Service (FaaS), pioneered by AWS Lambda, are an increasingly popular method of running distributed applications. As in other cloud offerings, cloud functions are heterogeneous, due to different underlying hardware, runtime systems, as well as resource management and billing models. In this paper, we focus on performance evaluation of cloud functions, taking into account heterogeneity aspects. We developed a cloud function benchmarking framework, consisting of one suite based on Serverless Framework, and one based on HyperFlow. We deployed the CPU-intensive benchmarks: Mersenne Twister and Linpack, and evaluated all the major cloud function providers: AWS Lambda, Azure Functions, Google Cloud Functions and IBM OpenWhisk. We make our results available online and continuously updated. We report on the initial results of the performance evaluation and we discuss the discovered insights on the resource allocation policies.
Article
Full-text available
New architectural patterns (e.g. microservices), the massive adoption of Linux containers (e.g. Docker containers), and improvements in key features of Cloud computing such as auto-scaling, have helped developers to decouple complex and monolithic systems into smaller stateless services. In turn, Cloud providers have introduced serverless computing, where applications can be defined as a workflow of event-triggered functions. However, serverless services, such as AWS Lambda, impose serious restrictions for these applications (e.g. using a predefined set of programming languages or difficulting the installation and deployment of external libraries). This paper addresses such issues by introducing a framework and a methodology to create Serverless Container-aware ARchitectures (SCAR). The SCAR framework can be used to create highly-parallel event-driven serverless applications that run on customized runtime environments defined as Docker images on top of AWS Lambda. This paper describes the architecture of SCAR together with the cache-based optimizations applied to minimize cost, exemplified on a massive image processing use case. The results show that, by means of SCAR, AWS Lambda becomes a convenient platform for High Throughput Computing, specially for highly-parallel bursty workloads of short stateless jobs.
Conference Paper
Full-text available
Cloud Functions, often called Function-as-a-Service (FaaS), pioneered by AWS Lambda, are an increasingly popular method of running distributed applications. As in other cloud offerings, cloud functions are heterogeneous, due to different underlying hardware, runtime systems, as well as resource management and billing models. In this paper, we focus on performance evaluation of cloud functions, taking into account heterogeneity aspects. We developed a cloud function benchmarking framework, consisting of one suite based on Serverless Framework, and one based on HyperFlow. We deployed the CPU-intensive benchmarks: Mersenne Twister and Linpack, and evaluated all the major cloud function providers: AWS Lambda, Azure Functions, Google Cloud Functions and IBM OpenWhisk. We make our results available online and continuously updated. We report on the initial results of the performance evaluation and we discuss the discovered insights on the resource allocation policies.
Article
Serverless computing is a popular cloud computing paradigm that frees developers from server management. Function-as-a-Service (FaaS) is the most popular implementation of serverless computing, representing applications as event-driven and stateless functions. However, existing studies report that functions of FaaS applications severely suffer from cold-start latency. In this paper, we propose an approach namely FaaSLight to accelerating the cold start for FaaS applications through application-level optimization. We first conduct a measurement study to investigate the possible root cause of the cold start problem of FaaS. The result shows that application code loading latency is a significant overhead. Therefore, loading only indispensable code from FaaS applications can be an adequate solution. Based on this insight, we identify code related to application functionalities by constructing the function-level call graph, and separate other code (i.e., optional code) from FaaS applications. The separated optional code can be loaded on demand to avoid the inaccurate identification of indispensable code causing application failure. In particular, a key principle guiding the design of FaaSLight is inherently general, i.e., platform - and language-agnostic . In practice, FaaSLight can be effectively applied to FaaS applications developed in different programming languages (Python and JavaScript), and can be seamlessly deployed on popular serverless platforms such as AWS Lambda and Google Cloud Functions, without having to modify the underlying OSes or hypervisors, nor introducing any additional manual engineering efforts to developers. The evaluation results on real-world FaaS applications show that FaaSLight can significantly reduce the code loading latency (up to 78.95%, 28.78% on average), thereby reducing the cold-start latency. As a result, the total response latency of functions can be decreased by up to 42.05% (19.21% on average). Compared with the state-of-the-art, FaaSLight achieves a 21.25 × improvement in reducing the average total response latency.
Article
This study builds a fully deconvolutional neural network (FDNN) and addresses the problem of single image super-resolution (SISR) by using the FDNN. Although SISR using deep neural networks has been a major research focus, the problem of reconstructing a high resolution (HR) image with an FDNN has received little attention. A few recent approaches toward SISR are to embed deconvolution operations into multilayer feedforward neural networks. This paper constructs a deep FDNN for SISR that possesses two remarkable advantages compared to existing SISR approaches. The first improves the network performance without increasing the depth of the network or embedding complex structures. The second replaces all convolution operations with deconvolution operations to implement an effective reconstruction. That is, the proposed FDNN only contains deconvolution layers and learns an end-to-end mapping from low resolution (LR) to HR images. Furthermore, to avoid the oversmoothness of the mean squared error loss, the trained image is treated as a probability distribution, and the Kullback–Leibler divergence is introduced into the final loss function to achieve enhanced recovery. Although the proposed FDNN only has 10 layers, it is successfully evaluated through extensive experiments. Compared with other state-of-the-art methods and deep convolution neural networks with 20 or 30 layers, the proposed FDNN achieves better performance for SISR.
Conference Paper
The microservice architecture has dramatically reduced user effort in adopting and maintaining servers by providing a catalog of functions as services that can be used as building blocks to construct applications. This has enabled datacenter operators to look at managing datacenter hosting microservices quite differently from traditional infrastructures. Such a paradigm shift calls for a need to rethink resource management strategies employed in such execution environments. We observe that the visibility enabled by a microservices execution framework can be exploited to achieve high throughput and resource utilization while still meeting Service Level Agreements, especially in multi-tenant execution scenarios.
Conference Paper
Function as a Service (FaaS) has been gaining popularity as a way to deploy computations to serverless backends in the cloud. This paradigm shifts the complexity of allocating and provisioning resources to the cloud provider, which has to provide the illusion of always-available resources (i.e., fast function invocations without cold starts) at the lowest possible resource cost. Doing so requires the provider to deeply understand the characteristics of the FaaS workload. Unfortunately, there has been little to no public information on these characteristics. Thus, in this paper, we first characterize the entire production FaaS workload of Azure Functions. We show for example that most functions are invoked very infrequently, but there is an 8-order-of-magnitude range of invocation frequencies. Using observations from our characterization, we then propose a practical resource management policy that significantly reduces the number of function cold starts, while spending fewer resources than state-of-the-practice policies.
Article
Serverless computing has emerged as a new cloud computing execution model that liberates users and application developers from explicitly managing ‘physical’ resources, leaving such a resource management burden to service providers. In this article, we study the problem of resource allocation for multi-tenant serverless computing platforms explicitly taking into account workload fluctuations including sudden surges. In particular, we investigate different root causes of performance degradation in these platforms where tenants (their applications) have different workload characteristics. To this end, we develop a fine-grained CPU cap control solution as a resource manager that dynamically adjusts CPU usage limit (or CPU cap) concerning applications with same/similar performance requirements, i.e., application groups . The adjustment of CPU caps applies primarily to co-located worker processes of serverless computing platforms to minimize resource contention, which is the major source of performance degradation. The actual adjustment decisions are made based on performance metrics (e.g., throttled time and queue length) using a group-aware scheduling algorithm. The extensive experimental results performed in our local cluster confirm that the proposed resource manager can effectively eliminate the burden of explicit reservation of computing capacity, even when fluctuations and sudden surges in the incoming workload exist. We measure the robustness of the proposed resource manager by comparing it with several heuristics which extensively used in practice, including the enhanced version of round robin and the least length queue scheduling policies, under various workload intensities driven by real-world scenarios. Notably, our resource manager outperforms other heuristics by decreasing skewness and average response time up to 44 and 94 percent, respectively, while it does not over-use the CPU resources.
Conference Paper
Distributed topic-based publish/subscribe systems like Apache Kafka provide a scalable and decentralized approach to achieve data dissemination. However, despite their wide adoption they can suffer from performance degradation due to the uneven load distribution between the nodes that receive and forward the messages (i.e., brokers). This problem occurs due to the lack of effective load balancing mechanisms that consider the impact of (i) the amount of topics that are handled by a specific broker and (ii) changes in the input rate during the course of the system execution. Furthermore, while there have been some previous works that examine the problem, most of them focus on content-based pub/sub systems or require a centralized coordinator for determining the appropriate assignments. In this work we propose a novel decentralized load balancing technique for topic-based publish/subscribe systems. More specifically, we exploit the fact that brokers in systems like Kafka can communicate using inner topics to exchange their load-related information and propose a novel decentralized algorithm that executes on each individual broker to determine the topics' partitions that should be migrated in order to avoid overloaded conditions. Our detailed experimental evaluation on our local cluster, using different applications that process various data forms from different topics, illustrate the benefits of our approach and show that we can efficiently balance the load between the brokers without the need of a centralized coordination mechanism.
Conference Paper
In this demonstration we present \emph{Dione} a novel framework for automatic profiling and tuning big data applications. Our system allows a non-expert user to submit Spark or Flink applications to his/her cluster and Dione automatically determines the impact of different configuration parameters on the application's execution time and monetary cost. Dione is the first framework that exploits similarities in the execution plans of different applications to narrow down the amount of profiling runs that are required for building prediction models that capture the impact of the configuration parameters on the metrics of interest. Dione exploits these prediction models to tune the configuration parameters in a way that minimizes the application's execution time or the user's budget. Finally, Dione's Web-UI visualizes the impact of the configuration parameters on the execution time and the monetary cost, and enables the user to submit the application with the recommended parameters' values.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Conference Paper
Recurrent Neural Networks are powerful tools for modeling sequences. They are flexibly extensible and can incorporate various kinds of information including temporal order. These properties make them well suited for generating sequential recommendations. In this paper, we extend Recurrent Neural Networks by considering unique characteristics of the Recommender Systems domain. One of these characteristics is the explicit notion of the user recommendations are specifically generated for. We show how individual users can be represented in addition to sequences of consumed items in a new type of Gated Recurrent Unit to effectively produce personalized next item recommendations. Offline experiments on two real-world datasets indicate that our extensions clearly improve objective performance when compared to state-of-the-art recommender algorithms and to a conventional Recurrent Neural Network.
Article
Graph data have become ubiquitous and manipulating them based on similarity is essential for many applications. Graph edit distance is one of the most widely accepted measures to determine similarities between graphs and has extensive applications in the fields of pattern recognition, computer vision etc. Unfortunately, the problem of graph edit distance computation is NP-Hard in general. Accordingly, in this paper we introduce three novel methods to compute the upper and lower bounds for the edit distance between two graphs in polynomial time. Applying these methods, two algorithms AppFull and AppSub are introduced to perform different kinds of graph search on graph databases. Comprehensive experimental studies are conducted on both real and synthetic datasets to examine various aspects of the methods for bounding graph edit distance. Result shows that these methods achieve good scalability in terms of both the number of graphs and the size of graphs. The effectiveness of these algorithms also confirms the usefulness of using our bounds in filtering and searching of graphs.
Article
This survey is divided into three major sections. The first concerns mathematical results about the choice axiom and the choice models that devoIve from it. For example, its relationship to Thurstonian theory is satisfyingly understood; much is known about how choice and ranking probabilities may relate, although little of this knowledge seems empirically useful; and there are certain interesting statistical facts. The second section describes attempts that have been made to test and apply these models. The testing has been done mostly, though not exclusively, by psychologists; the applications have been mostly in economics and sociology. Although it is clear from many experiments that the conditions under which the choice axiom holds are surely delicate, the need for simple, rational underpinnings in complex theories, as in economics and sociology, leads one to accept assumptions that are at best approximate. And the third section concerns alternative, more general theories which, in spirit, are much like the choice axiom. Perhaps I had best admit at the outset that, as a commentator on this scene, I am quali- fied no better than many others and rather less well than some who have been working in this area recently, which I have not been. My pursuits have led me along other, somewhat related routes. On the one hand, I have contributed to some of the recent, purely algebraic aspects of fundamental measurement (for a survey of some of this material, see Krantz, Lute, Suppes, & Tversky, 1971). And on the other hand, I have worked in the highly probabilistic area of psychophysical theory; but the empirical materials have led me away from axiomatic structures, such as the choice axiom, to more structural, neural models which are not readily axiomatized at the present time. After some attempts to apply choice models to psychophysical phenomena (discussed below in its proper place), I was led to conclude that it is not a very promising approach to, these data, and so I have not been actively studying any aspect of the choice axiom in over 12 years. With that understood, let us begin.
Article
Error-tolerant graph matching is a powerful concept that has various applications in pattern recognition and machine vision. In the present paper, a new distance measure on graphs is proposed. It is based on the maximal common subgraph of two graphs. The new measure is superior to edit distance based measures in that no particular edit operations together with their costs need to be defined. It is formally shown that the new distance measure is a metric. Potential algorithms for the efficient computation of the new measure are discussed.
Faasm: Lightweight isolation for efficient stateful serverless computing
  • S Shillaker
  • P Pietzuch
{SOCK}: Rapid task provisioning with serverless-optimized containers
  • E Oakes
Edgewise: a better stream processing engine for the edge
  • X Fu
Generalized cross entropy loss for training deep neural networks with noisy labels
  • Z Zhang
  • M Sabuncu
Faasnet: Scalable and fast provisioning of custom serverless container runtimes at alibaba cloud function compute
  • A Wang
Uses and abuses of the cross-entropy loss: Case studies in modern deep learning
  • E Gordon-Rodriguez
Static call graph construction in aws lambda serverless applications
  • M Obetz
  • S Patterson
  • A L Milanova
Similarity of neural network representations revisited
  • Kornblith
Static call graph construction in aws lambda serverless applications
  • Obetz
Neural networks trained to solve differential equations learn general representations
  • Magill