ArticlePublisher preview available
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

The Actor-based programming model is largely used in the context of distributed systems for its message-passing semantics and neat separation between the concurrency model and the underlying hardware platform. However, in the context of a single multi-core node where the performance metric is the primary optimization objective, the “pure” Actor Model is generally not used because Actors cannot exploit the physical shared-memory, thus reducing the optimization options. In this work, we propose to enrich the Actor Model with some well-known Parallel Patterns to face the performance issues of using the “pure” Actor Model on a single multi-core platform. In the experimental study, conducted on two different multi-core systems by using the C++ Actor Framework, we considered a subset of the Parsec benchmarks and two Savina benchmarks. The analysis of results demonstrates that the Actor Model enriched with suitable Parallel Patterns implementations provides a robust abstraction layer capable of delivering performance results comparable with those of thread-based libraries (i.e. Pthreads and FastFlow) while offering a safer and versatile programming environment.
This content is subject to copyright. Terms and conditions apply.
Vol:.(1234567890)
International Journal of Parallel Programming (2020) 48:692–712
https://doi.org/10.1007/s10766-020-00663-1
1 3
Improving thePerformance ofActors onMulti‑cores
withParallel Patterns
LucaRinaldi1· MassimoTorquati1 · DanieleDeSensi1· GabrieleMencagli1·
MarcoDanelutto1
Received: 16 October 2019 / Accepted: 27 May 2020 / Published online: 4 June 2020
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract
The Actor-based programming model is largely used in the context of distributed
systems for its message-passing semantics and neat separation between the concur-
rency model and the underlying hardware platform. However, in the context of a
single multi-core node where the performance metric is the primary optimization
objective, the “pure” Actor Model is generally not used because Actors cannot
exploit the physical shared-memory, thus reducing the optimization options. In this
work, we propose to enrich the Actor Model with some well-known Parallel Patterns
to face the performance issues of using the “pure” Actor Model on a single multi-
core platform. In the experimental study, conducted on two different multi-core sys-
tems by using the C++ Actor Framework, we considered a subset of the
Parsec
benchmarks and two
savina
benchmarks. The analysis of results demonstrates that
the Actor Model enriched with suitable Parallel Patterns implementations provides a
robust abstraction layer capable of delivering performance results comparable with
those of thread-based libraries (i.e.
Pthreads
and
FastFlow
) while offering a safer
and versatile programming environment.
Keywords Actors· Parallel patterns· Programming model· Multi-cores
1 Introduction
The Actor Model (AM) proposed by Hewitt etal. [24] is attracting a revived atten-
tion among software developers and academics. In the AM, the concurrent unit
is the Actor. Actors are isolated entities with an internal state that can receive
This work has been partially supported by University of Pisa PRA 2018 66 DECLware: Declarative
methodologies for designing and deploying applications.
* Massimo Torquati
torquati@di.unipi.it
1 Computer Science Department, University ofPisa, Pisa, Italy
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... The Actors parallel pattern is a programming model and parallel pattern that emphasizes the use of independent, concurrent entities called Actors for the design and implementation of parallel systems [31]. Each actor is a self-contained computation unit with its own state and behavior. ...
... An integration between processing units and the application business is needed to take profit from dynamism at either application or resource level can first access an existing resource and migrate to the new resource when available. It has also been observed that in many actor-based applications, a small number of actors (called hubs) exchange many more messages than the average actor and communicate with a large number of different actors [31]. As a result, these hub actors require more resources than their regular counterparts. ...
Article
Full-text available
Today, all computers have some degree of usable parallelism. Modern computers are explicitly equipped with hardware support for parallelism, such as multiple nodes, multicores, multiple CPUs, and accelerators. At the same time, the Cloud Continuum has become a viable platform for running parallel applications. Building software for these parallel and distributed platforms can be challenging due to the numerous considerations programmers must make during the development process. With this in mind, the high-performance computing literature proposed the concept of parallel patterns to hide some complexities. However, there are no patterns that address the design and creation of adaptive applications. Taking the compute continuum era in mind, we present how adaptability features can be explored within each parallel programming pattern, providing technical details on managing dynamic resources and handling changes in application behavior. In addition to this contribution, we also address practical implications by presenting some frameworks that can be used to implement adaptive applications and examples of using them with the proposed patterns.
... In our previous works [42,43], to face the performance optimization issues of the AM on a scale-up server, we proposed to enhance the model with a set of well-known Parallel Patterns (PPs). PPs are integrated into the AM as "macro Actors". ...
... Recently we proposed to use Parallel Pattern abstractions to enhance the performance of the Actor Model in sharedmemory systems [42,43]. The motivation is twofold: a) to introduce a communication structure in Actor-based applications that usually are characterized by unstructured communication topologies [31], and b) to safely enable some low-level shared-memory based optimizations that are generally not allowed by the "pure" Actor Model 2 . ...
Conference Paper
Full-text available
The steady growth of data volume produced as continuous streams makes paramount the development of software capable of providing timely results to the users. The Actor Model (AM) offers a high-level of abstraction suited for developing scalable message-passing applications. It allows the application developer to focus on the application logic moving the burden of implementing fast and reliable inter-Actors message-exchange to the implementation framework. In this paper, by using the CAF framework as reference AM implementation, we focus on evaluating the model in high data rate streaming applications targeting scale-up servers. Our approach leverages Parallel Pattern (PP) abstractions to model streaming computations and introduces optimizations that otherwise could be challenging to implement without violating the Actor Model's semantics. The experimental analysis demonstrates that the new implementation skeletons we propose for our PPs can bring significant performance boosts (more than 2X) in high data rate streaming applications.
... The research presented in [32] investigates the performance when converting two parallel patterns (based in C++ FastFlow) to Rust. The results revealed that the Rust version of parallel patterns achieved similar performance with respect to the C++ version. ...
Article
This work aims at contributing with a structured parallel programming abstraction for Rust in order to provide ready-to-use parallel patterns that abstract low-level and architecture-dependent details from application programmers. We focus on stream processing applications running on shared-memory multi-core architectures (i.e, video processing, compression, and others). Therefore, we provide a new high-level and efficient parallel programming abstraction for expressing stream parallelism, named Rust-SSP. We also created a new stream benchmark suite for Rust that represents real-world scenarios and has different application characteristics and workloads. Our benchmark suite is an initiative to assess existing parallelism abstraction for this domain, as parallel implementations using these abstractions were provided. The results revealed that Rust-SSP achieved up to 41.1% better performance than other solutions. In terms of programmability, the results revealed that Rust-SSP requires the smallest number of extra lines of code to enable stream parallelism.
... By extending their support to recursion, FaaS platforms could avoid launching new function instances (e.g. via tail recursion optimisation) and unnecessary billing and resource-wasting behaviour, Finally, as an alternative to orchestration languages and workflows employed by most of the surveyed approaches, Calvin [42] exploits an actor-based model to compose services and serverless functions into applications. Such models might be interesting to further investigate in FaaS settings as they have recently been employed to describe AI [43] applications-including annotations on their hardware requirements-and parallel computing patterns [44]. ...
Article
Full-text available
Function-as-a-Service (FaaS) allows developers to define, orchestrate and run modular event-based pieces of code on virtualised resources, without the burden of managing the underlying infrastructure nor the life-cycle of such pieces of code. Indeed, FaaS providers offer resource auto-provisioning, auto-scaling and pay-per-use billing at no costs for idle time. This makes it easy to scale running code and it represents an effective and increasingly adopted way to deliver software. This article aims at offering an overview of the existing literature in the field of next-gen FaaS from three different perspectives: (i) the definition of FaaS orchestrations, (ii) the execution of FaaS orchestrations in Fog computing environments, and (iii) the security of FaaS orchestrations. Our analysis identify trends and gaps in the literature, paving the way to further research on securing FaaS orchestrations in Fog computing landscapes.
Conference Paper
Full-text available
Parallel programmers mandate high-level parallel programming tools allowing to reduce the effort of the efficient parallelization of their applications. Parallel programming leveraging parallel patterns has recently received renovated attention thanks to their clear functional and parallel semantics. In this work, we propose a synergy between the well-known Actors-based programming model and the pattern-based parallelization methodology. We present our preliminary results in that direction, discussing and assessing the implementation of the Map parallel pattern by using an Actor-based software accelerator abstraction that seamlessly integrates within the C++ Actor Framework (CAF). The results obtained on the Intel Xeon Phi KNL platform demonstrate good performance figures achieved with negligible programming efforts.
Conference Paper
Full-text available
Abstract—In this work, we consider the C++ Actor Framework (CAF), a recent proposal that revamped the interest in building concurrent and distributed applications using the actor programming model in C++. CAF has been optimized for high-throughput computing, whereas message latency between actors is greatly influenced by the message data rate: at low and moderate rates the latency is higher than at high data rates. To this end, we propose a modification of the polling strategies in the work-stealing CAF scheduler, which can reduce message latency at low and moderate data rates up to two orders of magnitude without compromising the overall throughput and message latency at maximum pressure. The technique proposed uses a lightweight event notification protocol that is general enough to be used used to optimize the runtime of other frameworks experiencing similar issues.
Article
Full-text available
The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.
Article
Full-text available
High-level parallel programming is an active research topic aimed at promoting parallel programming methodologies that provide the programmer with high-level abstractions to develop complex parallel software with reduced time-to-solution. Pattern-based parallel programming is based on a set of composable and customizable parallel patterns used as basic building blocks in parallel applications. In recent years, a considerable effort has been made in empowering this programming model with features able to overcome shortcomings of early approaches concerning flexibility and performance. In this paper we demonstrate that the approach is flexible and efficient enough by applying it on 12 out of 13 PARSEC applications. Our analysis, conducted on three different multi-core architectures, demonstrates that pattern-based parallel programming has reached a good level of maturity, providing comparable results in terms of performance with respect to both other parallel programming methodologies based on pragma-based annotations (i.e. OpenMP and OmpSs) and native implementations (i.e. Pthreads). Regarding the programming effort, we also demonstrate a considerable reduction in Lines-Of-Code (LOC) and Code Churn compared with Pthreads and comparable results with respect to other existing implementations.
Article
Full-text available
In this article we present SkePU 2, the next generation of the SkePU C++ skeleton programming framework for heterogeneous parallel systems. We critically examine the design and limitations of the SkePU 1 programming interface. We present a new, flexible and type-safe, interface for skeleton programming in SkePU 2, and a source-to-source transformation tool which knows about SkePU 2 constructs such as skeletons and user functions. We demonstrate how the source-to-source compiler transforms programs to enable efficient execution on parallel heterogeneous systems. We show how SkePU 2 enables new use-cases and applications by increasing the flexibility from SkePU 1, and how programming errors can be caught earlier and easier thanks to improved type safety. We propose a new skeleton, Call, unique in the sense that it does not impose any predefined skeleton structure and can encapsulate arbitrary user-defined multi-backend computations. We also discuss how the source-to-source compiler can enable a new optimization opportunity by selecting among multiple user function specializations when building a parallel program. Finally, we show that the performance of our prototype SkePU 2 implementation closely matches that of SkePU 1.
Conference Paper
Full-text available
The Actor Model is a message passing concurrency model that was originally proposed by Hewitt et al. in 1973. It is now 43 years later and since then researchers have explored a plethora of variations on this model. This paper presents a history of the Actor Model throughout those years. The goal of this paper is not to provide an exhaustive overview of every actor system in existence but rather to give an overview of some of the exemplar languages and libraries that influenced the design and rationale of other actor systems throughout those years. This paper therefore shows that most actor systems can be roughly classified into four families, namely: Classic Actors, Active Objects, Processes and Communicating Event-Loops. This paper also defines the Isolated Turn Principle as a unifying principle across those four families. Additionally this paper lists some of the key properties along which actor systems can be evaluated and formulates some general insights about the design and rationale of the different actor families across those dimensions.
Chapter
The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.
Chapter
Among the programming models for parallel and distributed computing, one can identify two important families. The programming models adapted to data-parallelism, where a set of coordinated processes perform a computation by splitting the input data; and coordination languages able to express complex coordination patterns and rich interactions between processing entities. This article takes two successful programming models belonging to the two categories and puts them together into an effective programming model. More precisely, we investigate the use of active objects to coordinate BSP processes. We choose two paradigms that both enforce the absence of data-races, one of the major sources of error in parallel programming. This article explains why we believe such a model is interesting and provides a formal semantics integrating the notions of the two programming paradigms in a coherent and effective manner.
Article
Distributed actor languages are an effective means of constructing scalable reliable systems, and the Erlang programming language has a well-established and influential model. While the Erlang model conceptually provides reliable scalability, it has some inherent scalability limits and these force developers to depart from the model at scale. This article establishes the scalability limits of Erlang systems and reports the work of the EU RELEASE project to improve the scalability and understandability of the Erlang reliable distributed actor model. We systematically study the scalability limits of Erlang and then address the issues at the virtual machine, language, and tool levels. More specifically: (1) We have evolved the Erlang virtual machine so that it can work effectively in large-scale single-host multicore and NUMA architectures. We have made important changes and architectural improvements to the widely used Erlang/OTP release. (2) We have designed and implemented Scalable Distributed (SD) Erlang libraries to address language-level scalability issues and provided and validated a set of semantics for the new language constructs. (3) To make large Erlang systems easier to deploy, monitor, and debug, we have developed and made open source releases of five complementary tools, some specific to SD Erlang. Throughout the article we use two case studies to investigate the capabilities of our new technologies and tools: a distributed hash table based Orbit calculation and Ant Colony Optimisation (ACO). Chaos Monkey experiments show that two versions of ACO survive random process failure and hence that SD Erlang preserves the Erlang reliability model. While we report measurements on a range of NUMA and cluster architectures, the key scalability experiments are conducted on the Athos cluster with 256 hosts (6,144 cores). Even for programs with no global recovery data to maintain, SD Erlang partitions the network to reduce network traffic and hence improves performance of the Orbit and ACO benchmarks above 80 hosts. ACO measurements show that maintaining global recovery data dramatically limits scalability; however, scalability is recovered by partitioning the recovery data. We exceed the established scalability limits of distributed Erlang, and do not reach the limits of SD Erlang for these benchmarks at this scale (256 hosts, 6,144 cores).