R. Tripiccione

Complutense University of Madrid, Madrid, Madrid, Spain

Are you R. Tripiccione?

Claim your profile

Publications (174)225.33 Total impact

  • Enrico Calore, Sebastiano Fabio Schifano, Raffaele Tripiccione
    [Show abstract] [Hide abstract]
    ABSTRACT: High performance computing increasingly relies on heterogeneous systems, based on multi-core CPUs, tightly coupled to accelerators: GPUs or many core systems. Programming heterogeneous systems raises new issues: reaching high sustained performances means that one must exploit parallelism at several levels; at the same time the lack of a standard programming environment has an impact on code portability. This paper presents a performance assessment of a massively parallel and portable Lattice Boltzmann code, based on the Open Computing Language (OpenCL) and the Message Passing Interface (MPI). Exactly the same code runs on standard clusters of multi-core CPUs, as well as on hybrid clusters including accelerators. We consider a state-of-the-art Lattice Boltzmann model that accurately reproduces the thermo-hydrodynamics of a fluid in 2 dimensions. This algorithm has a regular structure suitable for accelerator architectures with a large degree of parallelism, but it is not straightforward to obtain a large fraction of the theoretically available performance. In this work we focus on portability of code across several heterogeneous architectures preserving performances and also on techniques to move data between accelerators minimizing overheads of communication latencies. We describe the organization of the code and present and analyze performance and scalability results on a cluster of nodes based on NVIDIA K20 GPUs and Intel Xeon-Phi accelerators.
    Euro-Par 2014: Parallel Processing Workshops, Porto; 08/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: A recent trend in scientific computing is the increasingly important role of co-processors, originally built to accelerate graphics rendering, and now used for general high-performance computing. The INFN Computing On Knights and Kepler Architectures (COKA) project focuses on assessing the suitability of co-processor boards for scientific computing in a wide range of physics applications, and on studying the best programming methodologies for these systems. Here we present in a comparative way our results in porting a Lattice Boltzmann code on two state-of-the-art accelerators: the NVIDIA K20X, and the Intel Xeon-Phi. We describe our implementations, analyze results and compare with a baseline architecture adopting Intel Sandy Bridge CPUs.
    Journal of Physics Conference Series 06/2014; 513(5):052032.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the turbulent evolution originated from a system subjected to a Rayleigh-Taylor instability with a double density at high resolution in a 2 dimensional geometry using a highly optimized thermal Lattice Boltzmann code for GPUs. The novelty of our investigation stems from the initial condition, given by the superposition of three layers with three different densities, leading to the development of two Rayleigh-Taylor fronts that expand upward and downward and collide in the middle of the cell. By using high resolution numerical data we highlight the effects induced by the collision of the two turbulent fronts in the long time asymptotic regime. We also provide details on the optimized Lattice-Boltzmann code that we have run on a cluster of GPUs
    05/2014;
  • Source
    P Ripesi, L Biferale, S F Schifano, R Tripiccione
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the turbulent evolution originated from a system subjected to a Rayleigh-Taylor instability with a double density at high resolution in a two-dimensional geometry using a highly optimized thermal lattice-Boltzmann code for GPUs. Our investigation's initial condition, given by the superposition of three layers with three different densities, leads to the development of two Rayleigh-Taylor fronts that expand upward and downward and collide in the middle of the cell. By using high-resolution numerical data we highlight the effects induced by the collision of the two turbulent fronts in the long-time asymptotic regime. We also provide details on the optimized lattice-Boltzmann code that we have run on a cluster of GPUs.
    Physical Review E 04/2014; 89(4-1):043022. · 2.31 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We perform equilibrium parallel-tempering simulations of the 3D Ising Edwards-Anderson spin glass in a field. A traditional analysis shows no signs of a phase transition. Yet, we encounter dramatic fluctuations in the behaviour of the model: Averages over all the data only describe the behaviour of a small fraction of it. Therefore we develop a new approach to study the equilibrium behaviour of the system, by classifying the measurements as a function of a conditioning variate. We propose a finite-size scaling analysis based on the probability distribution function of the conditioning variate, which may accelerate the convergence to the thermodynamic limit. In this way, we find a non-trivial spectrum of behaviours, where a part of the measurements behaves as the average, while the majority of them shows signs of scale invariance. As a result, we can estimate the temperature interval where the phase transition in a field ought to lie, if it exists. Although this would-be critical regime is unreachable with present resources, the numerical challenge is finally well posed.
    Journal of Statistical Mechanics Theory and Experiment 03/2014; 2014(5). · 1.87 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the off-equilibrium dynamics of the three-dimensional Ising spin glass in the presence of an external magnetic field. We have performed simulations both at fixed temperature and with an annealing protocol. Thanks to the Janus special-purpose computer, based on field-programmable gate array (FPGAs), we have been able to reach times equivalent to 0.01 s in experiments. We have studied the system relaxation both for high and for low temperatures, clearly identifying a dynamical transition point. This dynamical temperature is strictly positive and depends on the external applied magnetic field. We discuss different possibilities for the underlying physics, which include a thermodynamical spin-glass transition, a mode-coupling crossover, or an interpretation reminiscent of the random first-order picture of structural glasses.
    Physical Review E 03/2014; 89(3-1):032140. · 2.31 Impact Factor
  • Enrico Calore, Sebastiano Fabio Schifano, Raffaele Tripiccione
    [Show abstract] [Hide abstract]
    ABSTRACT: The architecture of high performance computing systems is becoming more and more heterogeneous, as accelerators play an increasingly important role alongside traditional CPUs. Programming heterogeneous systems efficiently is a complex task, that often requires the use of specific programming environments. Programming frameworks supporting codes portable across different high performance architectures have recently appeared, but one must carefully assess the relative costs of portability versus computing efficiency, and find a reasonable tradeoff point. In this paper we address precisely this issue, using as test-bench a Lattice Boltzmann code implemented in OpenCL. We analyze its performance on several different state-of-the-art processors: NVIDIA GPUs and Intel Xeon-Phi many-core accelerators, as well as more traditional Ivy Bridge and Opteron multi-core commodity CPUs. We also compare with results obtained with codes specifically optimized for each of these systems. Our work shows that a properly structured OpenCL code runs on many different systems reaching performance levels close to those obtained by architecture-tuned CUDA or C codes.
    Procedia Computer Science 01/2014; 29:40 - 49.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Spin glasses - theoretical models used to capture several physical properties of real glasses - are mostly studied by Monte Carlo simulations. The associated algorithms have a very large and easily identifiable degree of available parallelism, that can also be easily cast in SIMD form. State-of-the-art multi- and many-core processors and accelerators are therefore a promising computational platform to support these Grand Challenge applications. In this paper we port and optimize for many-core processors a Monte Carlo code for the simulation of the 3D Edwards Anderson spin glass, focusing on a dual eight-core Sandy Bridge processor, and on a Xeon-Phi co-processor based on the new Many Integrated Core architecture. We present performance results, discuss bottlenecks preventing further performance gains and compare with the corresponding figures for GPU-based implementations and for application-specific dedicated machines.
    2013 20th International Conference on High Performance Computing (HiPC); 12/2013
  • Source
  • Source
  • Source
    David Mesterházy, Luca Biferale, Karl Jansen, Raffaele Tripiccione
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a new numerical Monte Carlo approach to determine the scaling behavior of lattice field theories far from equilibrium. The presented methods are generally applicable to systems where classical-statistical fluctuations dominate the dynamics. As an example, these methods are applied to the random-force-driven one-dimensional Burgers' equation - a model for hydrodynamic turbulence. For a self-similar forcing acting on all scales the system is driven to a nonequilibrium steady state characterized by a Kolmogorov energy spectrum. We extract correlation functions of single- and multi-point quantities and determine their scaling spectrum displaying anomalous scaling for high-order moments. Varying the external forcing we are able to tune the system continuously from equilibrium, where the fluctuations are short-range correlated, to the case where the system is strongly driven in the infrared. In the latter case the nonequilibrium scaling of small-scale fluctuations are shown to be universal.
    PoS(LATTICE 2013)05. 11/2013;
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We report a high-precision finite-size scaling study of the critical behavior of the three-dimensional Ising Edwards-Anderson model (the Ising spin glass). We have thermalized lattices up to L=40 using the Janus dedicated computer. Our analysis takes into account leading-order corrections to scaling. We obtain Tc = 1.1019(29) for the critical temperature, \nu = 2.562(42) for the thermal exponent, \eta = -0.3900(36) for the anomalous dimension and \omega = 1.12(10) for the exponent of the leading corrections to scaling. Standard (hyper)scaling relations yield \alpha = -5.69(13), \beta = 0.782(10) and \gamma = 6.13(11). We also compute several universal quantities at Tc.
    10/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes the architecture, the development and the implementation of Janus II, a new generation application-driven number cruncher optimized for Monte Carlo simulations of spin systems (mainly spin glasses). This domain of computational physics is a recognized grand challenge of high-performance computing: the resources necessary to study in detail theoretical models that can make contact with experimental data are by far beyond those available using commodity computer systems. On the other hand, several specific features of the associated algorithms suggest that unconventional computer architectures, which can be implemented with available electronics technologies, may lead to order of magnitude increases in performance, reducing to acceptable values on human scales the time needed to carry out simulation campaigns that would take centuries on commercially available machines. Janus II is one such machine, recently developed and commissioned, that builds upon and improves on the successful JANUS machine, which has been used for physics since 2008 and is still in operation today. This paper describes in detail the motivations behind the project, the computational requirements, the architecture and the implementation of this new machine and compares its expected performances with those of currently available commercial systems.
    Computer Physics Communications 10/2013; 185(2). · 2.41 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A track reconstruction system for the trigger of the ATLAS detector at the Large Hadron Col-lider is described. The Fast Tracker is a highly parallel hardware system designed to operate at the Level-1 trigger output rate. It will provide high-quality tracks reconstructed over the entire inner detector by the start of processing in the Level-2 trigger. The system is based on associa-tive memories for pattern recognition and fast FPGA's for track reconstruction. Its design and expected performance under instantaneous luminosities up to 3 × 10 34 /cm 2 /s are discussed.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we address the problem of identifying and exploiting techniques that optimize the performance of large scale scientific codes on many-core processors. We consider as a test-bed a state-of-the-art Lattice Boltzmann (LB) model, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equations of state of a perfect gas. The regular structure of Lattice Boltzmann algorithms makes it relatively easy to identify a large degree of available parallelism; the challenge is that of mapping this parallelism onto processors whose architecture is becoming more and more complex, both in terms of an increasing number of independent cores and - within each core - of vector instructions on longer and longer data words. We take as an example the Intel Sandy Bridge micro-architecture, that supports AVX instructions operating on 256-bit vectors; we address the problem of efficiently implementing the key computational kernels of LB codes - streaming and collision - on this family of processors; we introduce several successive optimization steps and quantitatively assess the impact of each of them on performance. Our final result is a production-ready code already in use for large scale simulations of the Rayleigh-Taylor instability. We analyze both raw performance and scaling figures, and compare with GPU-based implementations of similar codes.
    Journal of Physics Conference Series 08/2013; 454(1):2015-.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: a b s t r a c t The Fast Tracker (FTK) processor is an approved ATLAS upgrade that will reconstruct tracks using the full silicon tracker at Level-1 rate (up to 100 KHz). FTK uses a completely parallel approach to read the silicon tracker information, execute the pattern matching and reconstruct the tracks. This approach, according to detailed simulation results, allows full tracking with nearly offline resolution within an execution time of 100 ms. A central component of the system is the associative memories (AM); these special devices reduce the pattern matching combinatoric problem, providing identification of coarse resolution track candidates. The system consists of a pipeline of several components with the goal to organize and filter the data for the AM, then to reconstruct and filter the final tracks. This document presents an overview of the system and reports the status of the different elements of the system. & 2013 CERN. Published by Elsevier B.V. All rights reserved.
    Nuclear Instruments and Methods in Physics Research Section A Accelerators Spectrometers Detectors and Associated Equipment 01/2013; · 1.14 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Accelerators are an increasingly common option to boost performance of codes that require extensive number crunching. In this paper we report on our experience with NVIDIA accelerators to study fluid systems using the Lattice Boltzmann (LB) method. The regular structure of LB algorithms makes them suitable for processor architectures with a large degree of parallelism, such as recent multi- and many-core processors and GPUs; however, the challenge of exploiting a large fraction of the theoretically available performance of this new class of processors is not easily met. We consider a state-of-theart two-dimensional LB model based on 37 populations (a D2Q37 model), that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equation-of-state of a perfect gas. The computational features of this model make it a significant benchmark to analyze the performance of new computational platforms, since critical kernels in this code require both high memory-bandwidth on sparse memory addressing patterns and floating-point throughput. In this paper we consider two recent classes of GPU boards based on the Fermi and Kepler architectures; we describe in details all steps done to implement and optimize our LB code and analyze its performance first on single- GPU systems, and then on parallel multi-GPU systems based on one node as well as on a cluster of many nodes; in the latter case we use CUDA-aware MPI as an abstraction layer to assess the advantages of advanced GPU-to-GPU communication technologies like GPUDirect. On our implementation, aggregate sustained performance of the most compute intensive part of the code breaks the 1 double-precision Tflops barrier on a single-host system with two GPUs
    Computer Architecture and High Performance Computing (SBAC-PAD), 2013 25th International Symposium on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: We describe the implementation of a thermal compressible Lattice Boltzmann algorithm on an NVIDIA Tesla C2050 system based on the Fermi GP-GPU. We consider two different versions, including and not including reactive effects. We describe the overall organization of the algorithm and give details on its implementations. Efficiency ranges from 25% to 31% of the double precision peak performance of the GP-GPU. We compare our results with a different implementation of the same algorithm, developed and optimized for many-core Intel Westmere CPUs.
    Computers & Fluids 01/2013; 80:55–62. · 1.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: a b s t r a c t The Fast Tracker (FTK) processor is an approved ATLAS upgrade that will reconstruct tracks using the full silicon tracker at Level-1 rate (up to 100 KHz). FTK uses a completely parallel approach to read the silicon tracker information, execute the pattern matching and reconstruct the tracks. This approach, according to detailed simulation results, allows full tracking with nearly offline resolution within an execution time of 100 ms. A central component of the system is the associative memories (AM); these special devices reduce the pattern matching combinatoric problem, providing identification of coarse resolution track candidates. The system consists of a pipeline of several components with the goal to organize and filter the data for the AM, then to reconstruct and filter the final tracks. This document presents an overview of the system and reports the status of the different elements of the system. & 2013 CERN. Published by Elsevier B.V. All rights reserved.

Publication Stats

2k Citations
225.33 Total Impact Points

Institutions

  • 2014
    • Complutense University of Madrid
      • Department of Theoretical physics I
      Madrid, Madrid, Spain
  • 2001–2013
    • Universita degli studi di Ferrara
      • Department of Physics and Earth Sciences
      Ferrare, Emilia-Romagna, Italy
  • 2010–2012
    • Institute for Biocomputation and Physics of Complex Systems
      Caesaraugusta, Aragon, Spain
    • University of Zaragoza
      • Mechanic Engineering
      Caesaraugusta, Aragon, Spain
  • 1997–2012
    • INFN - Istituto Nazionale di Fisica Nucleare
      Frascati, Latium, Italy
  • 1995–2011
    • University of Rome Tor Vergata
      • Dipartimento di Fisica
      Roma, Latium, Italy
  • 1984–2005
    • Università di Pisa
      • Department of Physics "E.Fermi"
      Pisa, Tuscany, Italy
  • 2002
    • CERN
      Genève, Geneva, Switzerland
  • 1983–2002
    • Scuola Normale Superiore di Pisa
      Pisa, Tuscany, Italy
  • 1987
    • The Rockefeller University
      New York City, New York, United States