Enrico CaloreINFN - Istituto Nazionale di Fisica Nucleare | INFN · Ferrara
Enrico Calore
PhD
About
100
Publications
12,648
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,758
Citations
Introduction
I received my Bsc and Msc in Computer Engineering from the University of Padova in 2006 and 2010 respectively.
In 2014 I received the PhD in Computer Science from the University of Milan.
I have been a PostDoc at INFN and University of Ferrara until 2019 and now I am a Research Engineer at INFN Ferrara.
My research interests are mainly in: HPC; parallel computing; performance evaluation; scientific computing; code portability, and code optimization towards performance and energy-efficiency.
Additional affiliations
January 2020 - present
January 2019 - December 2019
January 2015 - December 2018
Education
January 2014
November 2010
July 2006
Publications
Publications (100)
Improper camera orientation produces convergent vertical lines (keystone distortion) and skewed horizon lines (horizon distortion) in digital pictures; an a-posteriori processing is then necessary to obtain appealing pictures. We show here that, after accurate calibration, the camera on-board accelerometer can be used to automatically generate an a...
This paper describes a massively parallel code for a state-of-the art thermal lattice–Boltzmann method. Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs. Versions of this code have been already used for large-scale studies of convective turbulence. GPUs are beco...
Energy efficiency is becoming increasingly important for computing systems, in particular for large scale HPC facilities. In this work we evaluate, from an user perspective, the use of Dynamic Voltage and Frequency Scaling (DVFS) techniques, assisted by the power and energy monitoring capabilities of modern processors in order to tune applications...
Nowadays, the use of hardware accelerators to boost the performance of HPC applications is a consolidated practice, and among others, GPUs are by far the most widespread. More recently, some data centers have successfully deployed also FPGA accelerated systems, especially to boost machine learning inference algorithms. Given the growing use of mach...
Calcification of the aortic valve (CAVDS) is a major cause of aortic stenosis (AS) leading to loss of valve function which requires the substitution by surgical aortic valve replacement (SAVR) or transcatheter aortic valve intervention (TAVI). These procedures are associated with high post-intervention mortality, then the corresponding risk assessm...
Quantum Sensing is a rapidly expanding research field that finds one of its applications in Fundamental Physics, as the search for Dark Matter. Devices based on superconducting qubits have already been successfully applied in detecting few-GHz single photons via Quantum Non-Demolition measurement (QND). This technique allows us to perform repeatabl...
We unveil the multifractal behavior of Ising spin glasses in their low-temperature phase. Using the Janus II custom-built supercomputer, the spin-glass correlation function is studied locally. Dramatic fluctuations are found when pairs of sites at the same distance are compared. The scaling of these fluctuations, as the spin-glass coherence length...
The Computational Storage paradigm is attracting increasing interest in many applications because of the performance and the energy-efficiency improvement, given by the tight coupling of processing elements with Solid State Drives through proper interconnection fabrics. In this work, we study a computational storage architecture aimed to boost the...
Agriculture acts as a catalyst for comprehensive economic growth, boosting income levels, mitigating poverty, and contrasting hunger. For these reasons, it is important to monitor agricultural practices and the use of parcels carefully and automatically to support the development of sustainable use of natural resources. The deployment of high-resol...
Rejuvenation and memory, long considered the distinguishing features of spin glasses, have recently been proven to result from the growth of multiple length scales. This insight, enabled by simulations on the Janus~II supercomputer, has opened the door to a quantitative analysis. We combine numerical simulations with comparable experiments to intro...
Recent trends in deep learning (DL) imposed hardware accelerators as the most viable solution for several classes of high-performance computing (HPC) applications such as image classification, computer vision, and speech recognition. This survey summarizes and classifies the most recent advances in designing DL accelerators suitable to reach the pe...
The extended principle of superposition has been a touchstone of spin-glass dynamics for almost 30 years. The Uppsala group has demonstrated its validity for the metallic spin glass, CuMn, for magnetic fields H up to 10 Oe at the reduced temperature Tr=T/Tg=0.95, where Tg is the spin-glass condensation temperature. For H>10 Oe, they observe a depar...
We unveil the multifractal behavior of Ising spin glasses in their low-temperature phase. Using the Janus II custom-built supercomputer, the spin-glass correlation function is studied locally. Dramatic fluctuations are found when pairs of sites at the same distance are compared. The scaling of these fluctuations, as the spin-glass coherence length...
Memory and rejuvenation effects in the magnetic response of off-equilibrium spin glasses have been widely regarded as the doorway into the experimental exploration of ultrametricity and temperature chaos. Unfortunately, despite more than twenty years of theoretical efforts following the experimental discovery of memory and rejuvenation, these effec...
Precise assessment of calcification lesions in the Aortic Root (AR) is relevant for the success of the Transcatheter Aortic Valve Implantation (TAVI) procedure. To this end, the radiologists analyze the Cardiac Computed Tomography (CCT) scans of patients, and detect the position and extent of the calcium deposits. In this contribution, we develop a...
One of the objectives fostered in medical science is the so-called precision medicine, which requires the analysis of a large amount of survival data from patients to deeply understand treatment options. Tools like machine learning (ML) and deep neural networks are becoming a de-facto standard. Nowadays, computing facilities based on the Von Neuman...
Time-reversal symmetry is spontaneously broken in spin glasses below their glass temperature. Under such conditions, the standard assumption about the equivalence of the most standard protocols (i.e.,\it{no big difference between switching the field on or off}, as it is sometimes said) is not really justified. In fact, we show here that the spin-gl...
Memory and rejuvenation effects in the magnetic response of off-equilibrium spin glasses have been widely regarded as the doorway into the experimental exploration of ultrametricity and temperature chaos (maybe the most exotic features in glassy free-energy landscapes). Unfortunately, despite more than twenty years of theoretical efforts following...
Experiments featuring non-equilibrium glassy dynamics under temperature changes still await interpretation. There is a widespread feeling that temperature chaos (an extreme sensitivity of the glass to temperature changes) should play a major role but, up to now, this phenomenon has been investigated solely under equilibrium conditions. In fact, the...
The synergy between experiment, theory, and simulations enables a microscopic analysis of spin-glass dynamics in a magnetic field in the vicinity of and below the spin-glass transition temperature T g . The spin-glass correlation length, ξ ( t , t w ; T ), is analysed both in experiments and in simulations in terms of the waiting time t w after the...
The synergy between experiment, theory, and simulations enables a microscopic analysis of spin-glass dynamics in a magnetic field in the vicinity of and below the spin-glass transition temperature $T_\mathrm{g}$. The spin-glass correlation length, $\xi(t,t_\mathrm{w};T)$, is analysed both in experiments and in simulations in terms of the waiting ti...
The correlation length ξ, a key quantity in glassy dynamics, can now be precisely measured for spin glasses both in experiments and in simulations. However, known analysis methods lead to discrepancies either for large external fields or close to the glass temperature. We solve this problem by introducing a scaling law that takes into account both...
We find a dynamic effect in the non-equilibrium dynamics of a spin glass that closely parallels equilibrium temperature chaos. This effect, that we name dynamic temperature chaos, is spatially heterogeneous to a large degree. The key controlling quantity is the time-growing spin-glass coherence length. Our detailed characterization of dynamic tempe...
The correlation length $\xi$, a key quantity in glassy dynamics, can now be precisely measured for spin glasses both in experiments and in simulations. However, known analysis methods lead to discrepancies either for large external fields or close to the glass temperature. We solve this problem by introducing a scaling law that takes into account b...
This paper presents the performance analysis for both the computing performance and the energy efficiency of a Lattice Boltzmann Method (LBM) based application, used to simulate three-dimensional multicomponent turbulent systems on massively parallel architectures for high-performance computing. Extending results reported in previous works, the ana...
We illustrate the application of quantum computing techniques to the investigation of the thermodynamical properties of a simple system, made up of three quantum spins with frustrated pair interactions and affected by a hard sign problem when treated within classical computational schemes. We show how quantum algorithms completely solve the problem...
In the last years, the energy efficiency of HPC systems is increasingly becoming of paramount importance for environmental, technical, and economical reasons. Several projects have investigated the use of different processors and accelerators in the quest of building systems able to achieve high energy efficiency levels for data centers and HPC ins...
In this work we describe a method to measure the computing performance and energy-efficiency to be expected of an FPGA device. The motivation of this work is given by their possible usage as accelerators in the context of floating-point intensive HPC workloads. In fact, FPGA devices in the past were not considered an efficient option to address flo...
Reconfigurable computing, exploiting Field Programmable Gate Arrays (FPGA), has become of great interest for both academia and industry research thanks to the possibility to greatly accelerate a variety of applications. The interest has been further boosted by recent developments of FPGA programming frameworks which allows to design applications at...
In this paper we report results of the analysis of computational performances and energy efficiency of a Lattice Boltzmann method (LBM) based application on the Intel KNL family of processors. In particular we analyse the impact of the main memory (DRAM) while using optimised memory access patterns to accessing data on the on-chip memory (MCDRAM) c...
Energy-efficiency is already of paramount importance for High Performance Computing (HPC) systems operation, and tools to monitor power usage and tune relevant hardware parameters are already available and in use at major supercomputing centres. On the other hand, HPC application developers and users still usually focus just on performance, even if...
This paper presents an early performance assessment of the ThunderX2, the most recent Arm-based multi-core processor designed for HPC applications. We use as benchmarks well known stencil-based LBM and LQCD algorithms, widely used to study respectively fluid flows, and interaction properties of elementary particles. We run benchmark kernels derived...
We illustrate the application of Quantum Computing techniques to the investigation of the thermodynamical properties of a simple system, made up of three quantum spins with frustrated pair interactions and affected by a hard sign problem when treated within classical computational schemes. We show how quantum algorithms completely solve the problem...
We investigate the fate of the Roberge-Weiss endpoint transition and its connection with the restoration of chiral symmetry as the chiral limit of Nf=2+1 QCD is approached. We adopt a stout staggered discretization on lattices with Nt=4 sites in the temporal direction; the chiral limit is approached maintaining a constant physical value of the stra...
We investigate the fate of the Roberge-Weiss endpoint transition and its connection with the restoration of chiral symmetry as the chiral limit of $N_f = 2+1$ QCD is approached. We adopt a stout staggered discretization on lattices with $N_t = 4$ sites in the temporal direction; the chiral limit is approached maintaining a constant physical value o...
Experiments on spin glasses can now make precise measurements of the exponent z(T) governing the growth of glassy domains, while our computational capabilities allow us to make quantitative predictions for experimental scales. However, experimental and numerical values for z(T) have differed. We use new simulations on the Janus II computer to resol...
In this contribution we measure the computing and energy performance of the recently developed DAVIDE HPC-cluster, a massively parallel machine based on IBM POWER CPUs and NVIDIA Pascal GPUs. We use as an application benchmark the OpenStaPLE Lattice QCD code, written using the OpenACC programming framework. Our code exploits the computing performan...
Energy consumption of processors and memories is quickly becoming a limiting factor in the deployment of large computing systems. For this reason, it is important to understand the energy performance of these processors and to study strategies allowing their use in the most efficient way. In this work, we focus on the computing and energy performan...
Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes, allowing for application optimizations. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, and are quickly becoming very popular processors for HPC applications. Still, writing efficient and scalable programs for GPUs is not an easy task as codes must adapt to increasingly parallel architecture features. In this chapter, the authors descri...
Significance
The Mpemba effect, wherein an initially hotter system relaxes faster when quenched to lower temperatures than an initially cooler system, has attracted much attention. Paradoxically, its very existence is a hot topic. Using massive numerical simulations, we show unambiguously that the Mpemba effect is present in the archetypical model...
Energy consumption is increasingly becoming a limiting factor to the design of faster large-scale parallel systems, and development of energy-efficient and energy-aware applications is today a relevant issue for HPC code-developer communities. In this work we focus on energy performance of the Knights Landing (KNL) Xeon Phi, the latest many-core ar...
Energy consumption of processors and memories is quickly becoming a limiting factor in the deployment of large computing systems. For this reason it is important to understand the energy performance of these processors and to study strategies allowing to use them in the most efficient way. In this work we focus on computing and energy performance o...
The Knights Landing (KNL) is the codename for the latest generation of Intel processors based on Intel Many Integrated Core (MIC) architecture. It relies on massive thread and data parallelism, and fast on-chip memory. This processor operates in standalone mode, booting an off-the-shelf Linux operating system. The KNL peak performance is very high...
Experiments on spin glasses can now make precise measurements of the exponent $z(T)$ governing the growth of glassy domains, while our computational capabilities allow us to make quantitative predictions for experimental scales. However, experimental and numerical values for $z(T)$ have differed. We use new simulations on the Janus II computer to r...
This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions, purposely designed to be portable across different computer architectures, including GPUs and commodity CPUs. Portability is achieved using the OpenACC parallel programming model, used to develop a code that can be compiled for several processor ar...
Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able to correlate performance and power figu...
Varying from multi-core CPU processors to many-core GPUs, the present scenario of HPC architectures is extremely heterogeneous. In this context, code portability is increasingly important for easy maintainability of applications; this is relevant in scientific computing where code changes are numerous and frequent. In this talk we present the desig...
Energy consumption is today one of the most relevant issues in operating HPC systems for scientific applications. The use of unconventional computing systems is therefore of great interest for several scientific communities looking for a better tradeoff between time-to-solution and energy-to-solution. In this context, the performance assessment of...
We first reproduce on the Janus and Janus II computers a milestone experiment that measures the spin-glass coherence length through the lowering of free-energy barriers induced by the Zeeman effect. Secondly we determine the scaling behavior that allows a quantitative analysis of a new experiment reported in the companion Letter [S. Guchhait and R....
High-performance computing systems are more and more often based on accelerators. Computing applications targeting those systems often follow a host-driven approach in which hosts offload almost all compute-intensive sections of the code onto accelerators; this approach only marginally exploits the computational resources available on the host CPUs...
Significance
The unifying feature of glass formers (such as polymers, supercooled liquids, colloids, granulars, spin glasses, superconductors, etc.) is a sluggish dynamics at low temperatures. Indeed, their dynamics are so slow that thermal equilibrium is never reached in macroscopic samples: in analogy with living beings, glasses are said to age....
The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core GPUs, exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this sc...
An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators. Designing efficient applications for these systems have been troublesome in the past as accelerators could usually be programmed using specific programming languages threatening maintainability, po...
Current development trends of fast processors calls for an increasing number of cores, each core featuring wide vector processing units. Applications must then exploit both directions of parallelism to run efficiently. In this work we focus on the efficient use of vector instructions. These process several data-elements in parallel, and memory data...
Abstract Whereas overt visuospatial attention is customarily measured with eye tracking, covert attention is assessed by various methods. Here we exploited SSVEPs – the oscillatory responses of the visual cortex to incoming flickering stimuli – to record the movements of covert visuospatial attention in a way operatively similar to eye tracking (at...
An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators. Designing efficient applications for these systems has been troublesome in the past as accelerators could usually be programmed only using specific programming languages – such as CUDA – threatenin...
Energy efficiency is becoming more and more important in the HPC field; high-end processors are quickly evolving towards more advanced power-saving and power-monitoring technologies. On the other hand, low-power processors, designed for the mobile market, attract interest in the HPC area for their increasing computing capabilities, competitive pric...
An increasingly large number of scientific applications run on large clusters based on GPU systems. In most cases the large scale parallelism of the applications uses MPI, widely recognized as the de-facto standard for building parallel applications, while several programming languages are used to express the parallelism available in the applicatio...
Accelerators are quickly emerging as the leading technology to further boost computing performances; their main feature is a massively parallel on-chip architecture. NVIDIA and AMD GPUs and the Intel Xeon-Phi are examples of accelerators available today. Accelerators are power-efficient and deliver up to one order of magnitude more peak performance...
Many scientific software applications, that solve complex compute-or data-intensive problems, such as large parallel simulations of physics phenomena, increasingly use HPC systems in order to achieve scientifically relevant results. An increasing number of HPC systems adopt heterogeneous node architectures, combining traditional multi-core CPUs wit...
2(+) and 1(-) states in Zr-90 were populated via the (O-17, O-17 'gamma) reaction at 340 MeV. The gamma decay was measured with high resolution using the AGATA (advanced gamma tracking array demonstrator array). Differential cross sections were obtained at few different angles for the scattered particle. The results of the elastic scattering and in...
The architecture of high performance computing systems is becoming more and more heterogeneous, as accelerators play an increasingly important role alongside traditional CPUs. Programming heterogeneous systems efficiently is a complex task, that often requires the use of specific programming environments. Programming frameworks supporting codes por...
Brain-Computer Interfaces (BCIs) implement a direct communication pathway between the brain of an user and an external device, as a computer or a machine in general. One of the most used brain responses to implement non-invasive BCIs is the so called steady-state visually evoked potential (SSVEP). This periodic response is generated when an user ga...
http://hdl.handle.net/10077/10529
We sought to provide direct evidence of the attention movements during dynamic mental imagery. Observers extrapolated in imagery the horizontal motion of a target with the gaze in central fixation. We recorded the steady-statevisual- evoked potentials (SSVEP) generated by flickering the left and right sides of the...
An increasing number of massively parallel machines adopt heterogeneous node architectures combining traditional multicore CPUs with energy-efficient and fast accelerators. Programming heterogeneous systems can be cumbersome and designing efficient codes of- ten becomes a hard task. The lack of standard programming frameworks for accelerator based...
High performance computing increasingly relies on heterogeneous systems, based on multi-core CPUs, tightly coupled to accelerators: GPUs or many core systems. Programming heterogeneous systems raises new issues: reaching high sustained performances means that one must exploit parallelism at several levels; at the same time the lack of a standard pr...