A preview of this full-text is provided by Springer Nature.
Content available from Computing
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
Computing (2025) 107:47
https://doi.org/10.1007/s00607-024-01377-9
REGULAR PAPER
Machine learning inference serving models inserverless
computing: asurvey
AkramAslani1· MostafaGhobaei‑Arani1
Received: 12 May 2024 / Accepted: 22 November 2024 / Published online: 7 January 2025
© The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2025
Abstract
Serverless computing has attracted many researchers with features such as scalability
and optimization of operating costs, no need to manage infrastructures, and build
programs at a higher speed. Serverless computing can be used for real-time machine
learning (ML) prediction using serverless inference functions. Deploying an ML
serverless inference function involves building a compute resource, deploying an
ML model, network infrastructure, and permissions to call the inference function.
However, the subject of machine learning inference (MLI) has challenges such as
resource management, delay and response time, large and complex models, and
security and privacy, not many studies have been conducted in this field. This
comprehensive literature review article examines the recent developments in MLI in
serverless computing environments. The mechanisms presented in the taxonomy can
be summarized in four categories: service level objective SLO-aware, acceleration-
aware, framework-aware, and latency-aware. In each category, different methods
and algorithms used to optimize inference in serverless environments have been
examined along with their advantages and disadvantages. We show that acceleration-
aware methods focus on the optimal use of computing resources, and framework-
aware methods play an important role in improving system efficiency and scalability
by examining different frameworks for inference in serverless environments. Also,
SLO-aware and Latency-aware methods, considering time limits and service level
agreement, help provide quality and reliable inference in serverless environments.
Finally, this article presents a vision of future challenges and opportunities in this
field and provides solutions for future research in the field of MLI in serverless.
Keywords Serverless computing· Function-as-a-service· Machine learning
inference· Deep learning· Inference serving models
* Mostafa Ghobaei-Arani
mo.ghobaei@iau.ac.ir
Akram Aslani
ak.aslani@iau.ac.ir
1 Department ofComputer Engineering, Qom Branch, Islamic Azad University, Qom, Iran
Content courtesy of Springer Nature, terms of use apply. Rights reserved.