
Massimo TorquatiUniversità di Pisa | UNIPI · Department of Computer Science
Massimo Torquati
Ph.D in Computer Science, University of Pisa.
About
149
Publications
33,458
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,618
Citations
Citations since 2017
Introduction
Additional affiliations
July 2014 - January 2023
Publications
Publications (149)
Decentralised Machine Learning (DML) enables collaborative machine learning without centralised input data. Federated Learning (FL) and Edge Inference are examples of DML. While tools for DML (especially FL) are starting to flourish, many are not flexible and portable enough to experiment with novel systems (e.g., RISC-V), non-fully connected topol...
We present the new distributed-memory run-time system (RTS) of the C++-based open-source structured parallel programming library FastFlow . The new RTS enables the execution of FastFlow shared-memory applications written using its Building Blocks () on distributed systems with minimal changes to the original program. The changes required are all hi...
In the near future, Exascale systems will need to bridge three technology gaps to achieve high performance while remaining under tight power constraints: energy efficiency and thermal control; extreme computation efficiency via HW acceleration and new arithmetic; methods and tools for seamless integration of reconfigurable accelerators in heterogen...
We present the new distributed-memory run-time system (RTS) of the C++-based open-source structured parallel programming library FastFlow.
The new RTS enables the execution of FastFlow shared-memory applications written using its Building Blocks (BBs) on distributed systems with minimal changes to the original program. The changes required are all...
The papers in this special section focus on collecting high-quality scientific contributions from the research community working in the fields of parallel and distributed computing, data analytics algorithms and big data frameworks, application specific processing. Specifically, the main focus is on emerging new computing trends that affect the con...
To achieve high performance and high energy efficiency
on near-future exascale computing systems, three key
technology gaps needs to be bridged. These gaps include: energy
efficiency and thermal control; extreme computation efficiency
via HW acceleration and new arithmetics; methods and
tools for seamless integration of reconfigurable accelerators...
The NAS Parallel Benchmarks (NPB), originally implemented mostly in Fortran, is a consolidated suite containing several benchmarks extracted from Computational Fluid Dynamics (CFD) models. The benchmark suite has important characteristics such as intensive memory communications, complex data dependencies, different memory access patterns, and hardw...
Nowadays, we are witnessing the diffusion of Stream Processing Systems (SPSs) able to analyze data streams in near realtime. Traditional SPSs like
Storm
and
Flink
target distributed clusters and adopt the
continuous streaming model
, where inputs are processed as soon as they are available while outputs are continuously emitted. Recently, the...
High-Performance Computing (HPC) is one of the strategic priorities for research and innovation worldwide due to its relevance for industrial and scientific applications. We envision HPC as composed of three pillars: infrastructures, applications, and key technologies and tools. While infrastructures are by construction centralized in large-scale H...
This paper discusses the impact of structured parallel programming methodologies in state-of-the-art industrial and research parallel programming frameworks. We first recap the main ideas underpinning structured parallel programming models and then present the concepts of algorithmic skeletons and parallel design patterns. We then discuss how such...
Structured parallel programming models based on parallel design patterns are gaining more and more importance. Several state-of-the-art industrial frameworks build on the parallel design pattern concept, including Intel TBB and Microsoft PPL. In these frameworks, the explicit exposition of parallel structure of the application favours the identific...
The steady growth of data volume produced as continuous streams makes paramount the development of software capable of providing timely results to the users.
The Actor Model (AM) offers a high-level of abstraction suited for developing scalable message-passing applications. It allows the application developer to focus on the application logic movi...
The Actor-based programming model is largely used in the context of distributed systems for its message-passing semantics and neat separation between the concurrency model and the underlying hardware platform. However, in the context of a single multi-core node where the performance metric is the primary optimization objective, the “pure” Actor Mod...
In the last years, pattern-based programming has been recognized as a good practice for efficiently exploiting parallel hardware resources. Following this approach, multiple libraries have been designed for providing such high-level abstractions to ease the parallel programming. However, those libraries do not share a common interface. To pave the...
Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in microbatches, whose computation is offloaded on...
In this work, we investigate the performance impact of using the Rust programming language instead of the C++ one to implement two basic parallel patterns as provided by the FASTFLOW parallel library. The rationale of using Rust is that it is a modern system-level language capable to statically guarantee that if a data reference is sent over a comm...
In the stream processing domain, applications are represented by graphs of operators arbitrarily connected and filled with their business logic code. The APIs of existing Stream Processing Systems (SPSs) ease the development of transformations that recur in the streaming practice (e.g., filtering, aggregation and joins). In contrast, their parallel...
This work studies the issues related to dynamic memory management in Data Stream Processing, an emerging paradigm enabling the real-time processing of live data streams. In this paper we consider two streaming parallel patterns and we discuss different implementation variants related on how dynamic memory is managed. The results show that the stand...
The Actor-based parallel programming model is having its momentum in the context of IoT and Cloud computing thanks to a clear separation between the concurrency model and the underlying HW platform. The message-driven style of non-blocking and asynchronous interactions via immutable messages among Actors fits well with the complexity of modern hete...
The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of the potential parallelism offered by current heterogeneous multi-cores equipped with one or more GPUs is still a challenge in the conte...
The “New Landscapes of the Data Stream Processing in the era of Fog Computing” special issue aims to present new research works on topics related to recent advances in Data Streaming Processing (DSP) computing paradigm in the emerging environments of Fog Computing and Internet of Things (IoT). The papers included in this special issue
are relevant...
We discuss the extended parallel pattern set identified within the EU-funded project RePhrase as a candidate pattern set to support data intensive applications targeting heterogeneous architectures. The set has been designed to include three classes of pattern, namely i) core patterns, modelling common, not necessarily data intensive parallelism ex...
Today’s stream processing systems handle high-volume data streams in an efficient manner. To achieve this goal, they are designed to
scale out
on large clusters of commodity machines. However, despite the efficient use of distributed architectures, they lack support to co-processors like graphical processing units (GPUs) ready to accelerate data-...
Parallel programmers mandate high-level parallel programming tools allowing to reduce the effort of the efficient parallelization of their applications. Parallel programming leveraging parallel patterns has recently received renovated attention thanks to their clear functional and parallel semantics.
In this work, we propose a synergy between the...
Time-to-solution is an important metric when parallelizing existing code. The REPARA approach provides a systematic way to instantiate stream and data parallel patterns by annotating the sequential source code with \({\mathtt {C}}\)++\({\mathtt {11}}\) attributes. Annotations are automatically transformed in a target parallel code that uses existin...
We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or...
Continuous streaming computations are usually composed of different modules, exchanging data through shared message queues. The selection of the algorithm used to access such queues (i.e. the concurrency control) is a critical aspect both for performance and power consumption. In this paper we describe the design of automatic concurrency control al...
The ability to teach parallel programming principles and techniques is becoming fundamental to prepare a new generation of programmers able to master the pervasive parallelism made available by hardware vendors. Classical parallel programming courses leverage either low-level programming frameworks (e.g. those based on Pthreads) or higher level fra...
We present a container-based architecture for supporting autonomic data stream processing application on fog computing infrastructures. Our architecture runs applications as Docker containers, and it exploits the native features of Docker to dynamically scale up/down the resources of a fog node assigned to the applications running on it. Preliminar...
Parallel design patterns can be fruitfully combined to develop parallel software applications. Different combinations of patterns can feature different QoS while being functionally equivalent. To support application developers in selecting the best combinations of patterns to develop their applications, we hereby propose a probabilistic approach th...
Abstract—In this work, we consider the C++ Actor Framework (CAF), a recent proposal that revamped the interest in building concurrent and distributed applications using the actor programming model in C++. CAF has been optimized for high-throughput computing, whereas message latency between actors is greatly influenced by the message data rate: at l...
This book constitutes the proceedings of the 24th International Conference on Parallel and Distributed Computing, Euro-Par 2018, held in Turin, Italy, in August 2018.
The 57 full papers presented in this volume were carefully reviewed and selected from 194 submissions. They were organized in topical sections named: support tools and environments;...
Managing low-level architectural features for controlling performance and power consumption is a growing demand in the parallel computing community. Such features include, but are not limited to: energy profiling, platform topology analysis, CPU cores disabling and frequency scaling. However, these low-level mechanisms are usually managed by specif...
According to the recent trend in data acquisition and processing technology, big data are increasingly available in the form of unbounded streams of elementary data items to be processed in real-time. In this paper we study in detail the paradigm of sliding windows, a well-known technique for approximated queries that update their results continuou...
High-level parallel programming is an active research topic aimed at promoting parallel programming
methodologies that provide the programmer with high-level abstractions to develop complex parallel software
with reduced time-to-solution. Pattern-based parallel programming is based on a set of composable and
customizable parallel patterns used as b...
Paradigms like Internet of Things and the most recent Internet of Everything are shifting the attention towards systems able to process unbounded sequences of items in the form of data streams. In the real world, data streams may be highly variable, exhibiting burstiness in the arrival rate and non-stationarities such as trends and cyclic behaviors...
Power consumption management has become a major concern in software development. Continuous streaming computations are usually composed by different modules, exchanging data through shared message queues. The selection of the algorithm used to access such queues (i.e., the concurrency control) is a critical aspect for both performance and power con...
The rapid progress of multi/many-core architectures has caused data-intensive parallel applications not yet fully optimized to deliver the best performance. In the advent of concurrent programming, frameworks offering structured patterns have alleviated developers' burden adapting such applications to multithreaded architectures. While some of thes...
High-level parallel programming is a de-facto standard approach to develop parallel software with reduced time to development. High-level abstractions are provided by existing frameworks as pragma-based annotations in the source code, or through pre-built parallel patterns that recur frequently in parallel algorithms, and that can be easily instant...
The Armadillo C++ library provides programmers with a high-level Matlab-like syntax for linear algebra. Its design aims at providing a good balance between speed and ease of use. It can be linked with different back-ends, i.e. different LAPACK-compliant libraries. In this work we present a novel run-time support of Armadillo, which gracefully exten...
We introduce a set of state access patterns suitable for managing accesses to state in parallel computations operating on streams. The state access patterns are useful for modelling typical stream parallel applications. We present a classification of the patterns according to the extent and way in which the state can be structured and accessed. We...
Since the ‘free lunch’ of processor performance is over, parallelism has become the new trend in hardware and architecture design. However, parallel resources deployed in data centers are underused in many cases, given that sequential programming is still deeply rooted in current software development. To address this problem, new methodologies and...
Techniques to handle traffic bursts and out-of-order arrivals are of paramount importance to provide real-time sensor data analytics in domains like traffic surveillance, transportation management, healthcare and security applications. In these systems the amount of raw data coming from sensors must be analyzed by continuous queries that extract va...
Power-aware computing is gaining an increasing attention both in academic and industrial settings. The problem of guaranteeing a given QoS requirement (either in terms of performance or power consumption) can be faced by selecting and dynamically adapting the amount of physical and logical resources used by the application. In this study, we consid...
The dataflow programming model has been extensively used as an effective solution to implement efficient parallel programming frameworks. However, the amount of resources allocated to the runtime support is usually fixed once by the programmer or the runtime, and kept static during the entire execution. While there are cases where such a static cho...
This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code...
We present a container-based architecture for supporting au-tonomic data stream processing application on Fog computing infras-tructures. Our architecture runs applications as Docker containers, and it exploits Docker's native features to dynamically scale up/down the resources of a Fog node assigned to the applications running on it. Preliminary r...
We introduce a DSL based toolchain supporting the design of parallel applications where parallelism is structured after parallel design pattern compositions. A DSL provides the possibility to write high level parallel design pattern expressions representing the structure of parallel applications, to refactor the pattern expressions, to evaluate the...
In current computing systems, many applications require guarantees on their maximum power consumption to not exceed the available power budget. On the other hand, for some applications, it could be possible to decrease their performance, yet maintain an acceptable level, in order to reduce their power consumption. To provide such guarantees, a poss...
Divide-and-Conquer (DaC) is a sequential programming paradigm which models a large class of algorithms used in real-life applications. Although suitable to extract parallelism in a straightforward way, the parallel implementation of DaC algorithms still requires some expertise in parallel programming tools by the programmer.
In this paper we aim at...
Recent advances in molecular biology and bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus of a cell. High-throughput molecular biology techniques provide a genome-wide capture of the spatial organization of chromosomes at unprecedented scales, which permit to identify phys...
Background
Haplotype phasing is an important problem in the analysis of genomics information. Given a set of DNA fragments of an individual, it consists of determining which one of the possible alleles (alternative forms of a gene) each fragment comes from. Haplotype information is relevant to gene regulation, epigenetics, genome-wide association s...
We introduce a set of state access patterns suitable for managing state in embarrassingly parallel computations on streams. The state access patterns are useful to model typical stream parallel applications. We present a classification of the patterns according to the extent and way in which the state is modified. We define precisely the state acce...