Gabriele Mencagli

Gabriele Mencagli
  • Ph.D in Computer Science, University of Pisa, Italy
  • Professor (Assistant) at University of Pisa

About

108
Publications
23,489
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,081
Citations
Introduction
I am an Associate Professor at the Computer Science Department, University of Pisa, Italy. My research topics are Parallel Programming, Parallel Architectures, Autonomic Computing and Data Stream Processing. I am author or co-author of 36 journal papers, 44 conference and workshop papers, 3 chapters appearing in handbooks, and 1 book.
Current institution
University of Pisa
Current position
  • Professor (Assistant)
Additional affiliations
October 2014 - present
University of Pisa
Position
  • Professor (Assistant)
Description
  • Ricercatore a Tempo Determinato (A)

Publications

Publications (108)
Article
Full-text available
Stream processing is a computing paradigm enabling the continuous processing of unbounded data streams. Some classes of stream processing applications can greatly benefit from the parallel processing power and affordability offered by GPUs. However, efficient GPU utilization with stream processing applications often requires micro-batching techniqu...
Article
Full-text available
In stream processing, a vast volume of data is continuously processed by standing queries that extract insights from raw inputs. These queries often maintain an internal state, representing useful information from the stream’s history, to produce results. Notable examples of state paradigms include sliding windows, where computation is periodically...
Preprint
Full-text available
In the stream processing paradigm, a huge volume of data is continu- ously processed by standing queries that extract insights from raw inputs. Such queries often keep an internal state (representing useful information of the stream history) to produce results. Examples of state paradigms are notably sliding win- dows, where computation is periodic...
Article
Full-text available
An increasing number of application domains require high-throughput processing to extract insights from massive data streams. The Data Stream Processing (DSP) paradigm provides formal approaches to analyze structured data streams considered as special, unbounded relations. The most used class of stateful operators in DSP are the ones running slidin...
Chapter
Stream processing plays a vital role in applications that require continuous, low-latency data processing. Thanks to their extensive parallel processing capabilities and relatively low cost, GPUs are well-suited to scenarios where such applications require substantial computational resources. However, micro-batching becomes essential for efficient...
Article
Full-text available
Reconfigurable devices such as field-programmable gate arrays (FPGAs) offer flexible solutions to workload acceleration with high energy efficiency. Despite such a potential advantage, they often reveal hard to program by application programmers. High-level synthesis languages have been developed to provide higher-level abstractions, allowing the d...
Chapter
High-Performance Computing (HPC) have evolved to be used to perform simulations of systems where physical experimentation is prohibitively impractical, expensive, or dangerous. This paper provides a general overview and showcases the analysis of non-functional properties in RISC-V-based platforms for HPCs. In particular, our analyses target the eva...
Conference Paper
Full-text available
High-PerformanceComputing(HPC)haveevolvedtobeused to perform simulations of systems where physical experimentation is pro- hibitively impractical, expensive, or dangerous. This paper provides a general overview and showcases the analysis of non-functional properties in RISC-V-based platforms for HPCs. In particular, our analyses target the evaluati...
Article
Full-text available
We present the new distributed-memory run-time system (RTS) of the C++-based open-source structured parallel programming library FastFlow . The new RTS enables the execution of FastFlow shared-memory applications written using its Building Blocks () on distributed systems with minimal changes to the original program. The changes required are all hi...
Conference Paper
Full-text available
We present the new distributed-memory run-time system (RTS) of the C++-based open-source structured parallel programming library FastFlow. The new RTS enables the execution of FastFlow shared-memory applications written using its Building Blocks (BBs) on distributed systems with minimal changes to the original program. The changes required are all...
Article
Full-text available
Several real-world parallel applications are becoming more dynamic and long-running, demanding online (at run-time) adaptations. Stream processing is a representative scenario that computes data items arriving in real-time and where parallel executions are necessary. However, it is challenging for humans to monitor and manually self-optimize comple...
Preprint
Full-text available
This paper discusses the perspective of the H2020 TEACHING project on the next generation of autonomous applications running in a distributed and highly heterogeneous environment comprising both virtual and physical resources spanning the edge-cloud continuum. TEACHING puts forward a human-centred vision leveraging the physiological, emotional, and...
Article
The NAS Parallel Benchmarks (NPB), originally implemented mostly in Fortran, is a consolidated suite containing several benchmarks extracted from Computational Fluid Dynamics (CFD) models. The benchmark suite has important characteristics such as intensive memory communications, complex data dependencies, different memory access patterns, and hardw...
Conference Paper
Stream processing applications compute streams of data and provide insightful results in a timely manner, where parallel computing is necessary for accelerating the application executions. Considering that these applications are becoming increasingly dynamic and long-running, a potential solution is to apply dynamic runtime changes. However, it is...
Article
Nowadays, we are witnessing the diffusion of Stream Processing Systems (SPSs) able to analyze data streams in near realtime. Traditional SPSs like Storm and Flink target distributed clusters and adopt the continuous streaming model , where inputs are processed as soon as they are available while outputs are continuously emitted. Recently, the...
Article
Full-text available
This paper discusses the impact of structured parallel programming methodologies in state-of-the-art industrial and research parallel programming frameworks. We first recap the main ideas underpinning structured parallel programming models and then present the concepts of algorithmic skeletons and parallel design patterns. We then discuss how such...
Article
Full-text available
Systems enabling the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. Data Stream Processing Systems (DSPSs) are complex and powerful frameworks able to ease the development of streaming applications in distributed computing environments like clusters and...
Conference Paper
Full-text available
Structured parallel programming models based on parallel design patterns are gaining more and more importance. Several state-of-the-art industrial frameworks build on the parallel design pattern concept, including Intel TBB and Microsoft PPL. In these frameworks, the explicit exposition of parallel structure of the application favours the identific...
Conference Paper
Full-text available
The steady growth of data volume produced as continuous streams makes paramount the development of software capable of providing timely results to the users. The Actor Model (AM) offers a high-level of abstraction suited for developing scalable message-passing applications. It allows the application developer to focus on the application logic movi...
Article
Full-text available
The Actor-based programming model is largely used in the context of distributed systems for its message-passing semantics and neat separation between the concurrency model and the underlying hardware platform. However, in the context of a single multi-core node where the performance metric is the primary optimization objective, the “pure” Actor Mod...
Article
Full-text available
In the last years, pattern-based programming has been recognized as a good practice for efficiently exploiting parallel hardware resources. Following this approach, multiple libraries have been designed for providing such high-level abstractions to ease the parallel programming. However, those libraries do not share a common interface. To pave the...
Article
Full-text available
Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in microbatches, whose computation is offloaded on...
Chapter
Full-text available
The amount of data generated is increasing exponentially. However, processing data and producing fast results is a technological challenge. Parallel stream processing can be implemented for handling high frequency and big data flows. The MPI parallel programming model offers low-level and flexible mechanisms for dealing with distributed architectur...
Article
Full-text available
In the stream processing domain, applications are represented by graphs of operators arbitrarily connected and filled with their business logic code. The APIs of existing Stream Processing Systems (SPSs) ease the development of transformations that recur in the streaming practice (e.g., filtering, aggregation and joins). In contrast, their parallel...
Article
Full-text available
This work studies the issues related to dynamic memory management in Data Stream Processing, an emerging paradigm enabling the real-time processing of live data streams. In this paper we consider two streaming parallel patterns and we discuss different implementation variants related on how dynamic memory is managed. The results show that the stand...
Conference Paper
Full-text available
The Actor-based parallel programming model is having its momentum in the context of IoT and Cloud computing thanks to a clear separation between the concurrency model and the underlying HW platform. The message-driven style of non-blocking and asynchronous interactions via immutable messages among Actors fits well with the complexity of modern hete...
Conference Paper
Full-text available
The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of the potential parallelism offered by current heterogeneous multi-cores equipped with one or more GPUs is still a challenge in the conte...
Article
The “New Landscapes of the Data Stream Processing in the era of Fog Computing” special issue aims to present new research works on topics related to recent advances in Data Streaming Processing (DSP) computing paradigm in the emerging environments of Fog Computing and Internet of Things (IoT). The papers included in this special issue are relevant...
Article
Full-text available
We discuss the extended parallel pattern set identified within the EU-funded project RePhrase as a candidate pattern set to support data intensive applications targeting heterogeneous architectures. The set has been designed to include three classes of pattern, namely i) core patterns, modelling common, not necessarily data intensive parallelism ex...
Article
Full-text available
Today’s stream processing systems handle high-volume data streams in an efficient manner. To achieve this goal, they are designed to scale out on large clusters of commodity machines. However, despite the efficient use of distributed architectures, they lack support to co-processors like graphical processing units (GPUs) ready to accelerate data-...
Conference Paper
Full-text available
Parallel programmers mandate high-level parallel programming tools allowing to reduce the effort of the efficient parallelization of their applications. Parallel programming leveraging parallel patterns has recently received renovated attention thanks to their clear functional and parallel semantics. In this work, we propose a synergy between the...
Conference Paper
Full-text available
The ubiquity of data streams in different fields of computing has led to the emergence of Stream Processing Systems (SPSs) used to program applications that extract insights from unbounded sequences of data items. Streaming applications demand various kinds of optimizations. Most of them are aimed at increasing throughput and reducing processing la...
Article
Full-text available
Time-to-solution is an important metric when parallelizing existing code. The REPARA approach provides a systematic way to instantiate stream and data parallel patterns by annotating the sequential source code with \({\mathtt {C}}\)++\({\mathtt {11}}\) attributes. Annotations are automatically transformed in a target parallel code that uses existin...
Article
Full-text available
Continuous streaming computations are usually composed of different modules, exchanging data through shared message queues. The selection of the algorithm used to access such queues (i.e. the concurrency control) is a critical aspect both for performance and power consumption. In this paper we describe the design of automatic concurrency control al...
Conference Paper
Full-text available
We present a container-based architecture for supporting autonomic data stream processing application on fog computing infrastructures. Our architecture runs applications as Docker containers, and it exploits the native features of Docker to dynamically scale up/down the resources of a fog node assigned to the applications running on it. Preliminar...
Conference Paper
Full-text available
Benchmarking is a way to study the performance of new architectures and parallel programming frameworks. Well-established benchmark suites such as the NAS Parallel Benchmarks (NPB) comprise legacy codes that still lack portability to C++ language. As consequence, a set of high-level and easy-to-use C++ parallel programming frameworks cannot be test...
Conference Paper
Full-text available
Abstract—In this work, we consider the C++ Actor Framework (CAF), a recent proposal that revamped the interest in building concurrent and distributed applications using the actor programming model in C++. CAF has been optimized for high-throughput computing, whereas message latency between actors is greatly influenced by the message data rate: at l...
Article
Full-text available
According to the recent trend in data acquisition and processing technology, big data are increasingly available in the form of unbounded streams of elementary data items to be processed in real-time. In this paper we study in detail the paradigm of sliding windows, a well-known technique for approximated queries that update their results continuou...
Article
Full-text available
High-level parallel programming is an active research topic aimed at promoting parallel programming methodologies that provide the programmer with high-level abstractions to develop complex parallel software with reduced time-to-solution. Pattern-based parallel programming is based on a set of composable and customizable parallel patterns used as b...
Article
Full-text available
Paradigms like Internet of Things and the most recent Internet of Everything are shifting the attention towards systems able to process unbounded sequences of items in the form of data streams. In the real world, data streams may be highly variable, exhibiting burstiness in the arrival rate and non-stationarities such as trends and cyclic behaviors...
Conference Paper
Full-text available
Power consumption management has become a major concern in software development. Continuous streaming computations are usually composed by different modules, exchanging data through shared message queues. The selection of the algorithm used to access such queues (i.e., the concurrency control) is a critical aspect for both performance and power con...
Conference Paper
Full-text available
High-level parallel programming is a de-facto standard approach to develop parallel software with reduced time to development. High-level abstractions are provided by existing frameworks as pragma-based annotations in the source code, or through pre-built parallel patterns that recur frequently in parallel algorithms, and that can be easily instant...
Article
Full-text available
The topic of Data Stream Processing is a recent and highly active research area dealing with the in-memory, tuple-by-tuple analysis of streaming data. Continuous queries typically consume huge volumes of data received at a great velocity. Solutions that persistently store all the input tuples and then perform off-line computation are impractical. R...
Article
Full-text available
We introduce a set of state access patterns suitable for managing accesses to state in parallel computations operating on streams. The state access patterns are useful for modelling typical stream parallel applications. We present a classification of the patterns according to the extent and way in which the state can be structured and accessed. We...
Article
Full-text available
Techniques to handle traffic bursts and out-of-order arrivals are of paramount importance to provide real-time sensor data analytics in domains like traffic surveillance, transportation management, healthcare and security applications. In these systems the amount of raw data coming from sensors must be analyzed by continuous queries that extract va...
Article
Full-text available
On the road to computer systems able to support the requirements of exascale applications, Chip Multi-Processors (CMPs) are equipped with an ever increasing number of cores interconnected through fast on-chip networks. To exploit such new architectures, the parallel software must be able to scale almost linearly with the number of cores available....
Conference Paper
Full-text available
High-volume data streams are straining the limits of stream processing frameworks which need advanced parallel processing capabilities to withstand the actual incoming band-width. Parallel processing must be synergically integrated with elastic features in order dynamically scale the amount of utilized resources by accomplishing the Quality of Serv...
Conference Paper
We present a container-based architecture for supporting au-tonomic data stream processing application on Fog computing infras-tructures. Our architecture runs applications as Docker containers, and it exploits Docker's native features to dynamically scale up/down the resources of a Fog node assigned to the applications running on it. Preliminary r...
Conference Paper
Full-text available
Divide-and-Conquer (DaC) is a sequential programming paradigm which models a large class of algorithms used in real-life applications. Although suitable to extract parallelism in a straightforward way, the parallel implementation of DaC algorithms still requires some expertise in parallel programming tools by the programmer. In this paper we aim at...
Article
Full-text available
The emergence of real-time decision-making applications in domains like high-frequency trading, emergency management, and service level analysis in communication networks has led to the definition of new classes of queries. Skyline queries are a notable example. Their results consist of all the tuples whose attribute vector is not dominated (in the...
Article
Full-text available
Data stream processing applications have a long running nature (24hr/7d) with workload conditions that may exhibit wide variations at run-time. Elasticity is the term coined to describe the capability of applications to change dynamically their resource usage in response to workload fluctuations. This paper focuses on strategies for elastic data st...
Article
Full-text available
Distributed data stream processing applications are structured as graphs of interconnected modules able to ingest high-speed data and to transform them in order to generate results of interest. Elasticity is one of the most appealing features of stream processing applications. It makes it possible to scale up/down the allocated computing resources...
Conference Paper
Full-text available
This paper addresses the problem of designing scaling strategies for elastic data stream processing. Elasticity allows applications to rapidly change their configuration on-the-fly (e.g., the amount of used resources) in response to dynamic workload fluctuations. In this work we face this problem by adopting the Model Predictive Control technique,...
Conference Paper
Full-text available
With the wide diffusion of parallel architectures parallelism has become an indispensable factor in the application design. However, the cost of the parallelization process of existing applications is still too high in terms of time-to-development, and often requires a large effort and expertise by the programmer. The REPARA methodology consists in...
Conference Paper
Full-text available
Skyline queries are preference queries frequently used in multi-criteria decision making to retrieve interesting points from large datasets. They return the points whose attribute vector is not dominated by any other point. Over the last years, sequential and parallel implementations over static datasets have been proposed for multiprocessors and c...
Article
Full-text available
Autonomic computing is a paradigm for building systems capable of adapting their operation when external changes occur, such as workload variations, load surges and changes in the resource availability. The optimal configuration in terms of the number of computing resources assigned to each component must be automatically adjusted to the new enviro...
Article
Full-text available
Adaptiveness is an essential feature for distributed parallel applications executed on dynamic environments like Grids and Clouds. Being adaptive means that parallel components can change their configuration at run-time (by modifying their parallelism degree or switching to a different parallel variant) to face irregular workload or to react to unc...
Conference Paper
Full-text available
Data Stream Processing (DaSP) is a paradigm characterized by on-line (often real-time) applications working on unlimited data streams whose elements must be processed efficiently " on the fly ". DaSP computations are characterized by data-flow graphs of operators connected via streams and working on the received elements according to high throughpu...
Conference Paper
Full-text available
The efficient parallelization of very fine-grained computations is an old problem still challenging also on modern shared memory architectures. Scalable parallelizations are possi­ ble if the base mechanisms provided by the run-time support (for inter-thread/inter-process synchronization/communication) are carefully designed and developed on top of...
Article
Full-text available
The development of radar systems on general-purpose off-the-shelf parallel hardware represents an effective means of providing efficient implementations with reasonable realisation costs. However, the fulfilment of the required real-time constraints poses serious problems of performance and efficiency: parallel architectures need to be exploited at...
Conference Paper
Full-text available
The work proposes ffMDF, a lightweight dynamic run-time support able to achieve high performance in the execution of dense linear algebra kernels on shared-cache multi-core. ffMDF implements a dynamic macro-dataflow interpreter processing DAG graphs generated on-the-fly out of standard numeric kernel code. The experimental results demonstrate that...
Article
Full-text available
Adaptiveness in distributed parallel applications is a key feature to provide satisfactory performance results in the face of unexpected events such as workload variations and time-varying user requirements. The adaptation process is based on the ability to change specific characteristics of parallel components (e.g., their parallelism degree) and...
Conference Paper
Full-text available
Shared-memory and message-passing are two opposite models to develop parallel computations. The shared-memory model, adopted by existing frameworks such as OpenMP, represents a de-facto standard on multi-/many-core architectures. However, message-passing deserves to be studied for its inherent properties in terms of portability and flexibility as w...
Conference Paper
Full-text available
Distributed parallel applications executed on heterogeneous and dynamic environments need to adapt their configuration (in terms of parallelism degree and parallelism form for each component) in response to unpredictable factors related to the physical platform and the application semantics. On emerging Cloud computing scenarios, reconfigurations i...
Conference Paper
Full-text available
Cloud Computing is a paradigm that enables the access to a set of shared networking and computing resources and high-level platforms and services through the exploitation of virtualization technologies. On Clouds, it is of relevant importance to make applications adaptive and reconfigurable, in the sense that the optimal configuration (satisfying d...
Conference Paper
Full-text available
In adaptive distributed parallel applications the adaptation process is based on the ability to change some characteristics of parallel components, such as the parallelism form and the parallelism degree, in response to unexpected execution conditions. Although existing research work has studied this problem, it is of increasing importance to inves...
Conference Paper
Full-text available
The advent of multi-/many-core architectures demands ef- ficient run-time supports to sustain parallel applications scalability. Synchronization mechanisms should be op- timized in order to account for different scenarios, such as the interaction between threads executed on different cores as well as intra-core synchronization, i.e. involving threa...
Conference Paper
Full-text available
A central issue for parallel applications executed on heterogeneous distributed platforms (e.g. Grids and Clouds) is assuring that performance and cost parameters are optimized throughout the execution. A solution is based on providing application components with adaptation strategies able to select at run-time the best component configuration. In...
Conference Paper
Full-text available
Nowadays, a central issue for applications executed on heterogeneous distributed platforms is represented by assuring that certain performance and reliability parameters are respected throughout the system execution. A typical solution is based on supporting application components with adaptation strategies, able to select at run-time the better co...
Conference Paper
Full-text available
Programming models for Pervasive Computing applications typically include the possibility of specifying software components according to multiple alternative versions, each optimized for a certain class of computing and communication technologies. A main mechanism provided by these programming models permits to dynamically select one of the alterna...
Chapter
Full-text available
Several complex and time-critical applications require the existence of novel distributed, heterogeneous and dynamic platforms composed of a variety of fixed and mobile processing nodes and networks. Such platforms, that can be called Pervasive Mobile Grids, aim to merge the features of Pervasive Computing and High-performance Grid Computing onto a...

Network

Cited By