Philippe Canal's research while affiliated with Fermi National Accelerator Laboratory (Fermilab) and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (50)
ROOT is high energy physics' software for storing and mining data in a statistically sound way, to publish results with scientific graphics. It is evolving since 25 years, now providing the storage format for more than one exabyte of data; virtually all high energy physics experiments use ROOT. With another significant increase in the amount of dat...
This document discusses the state, roadmap, and risks of the foundational components of ROOT with respect to the experiments at the HL-LHC (Run 4 and beyond). As foundational components, the document considers in particular the ROOT input/output (I/O) subsystem. The current HEP I/O is based on the TFile container file format and the TTree binary ev...
Within the next decade, experimental High Energy Physics (HEP) will enter a new era of scientific discovery through a set of targeted programs recommended by the Particle Physics Project Prioritization Panel (P5), including the upcoming High Luminosity Large Hadron Collider (LHC) HL-LHC upgrade and the Deep Underground Neutrino Experiment (DUNE). T...
Full detector simulation was among the largest CPU consumers in all CERN experiment software stacks for the first two runs of the Large Hadron Collider. In the early 2010s, it was projected that simulation demands would scale linearly with increasing luminosity, with only partial compensation from increasing computing resources. The extension of fa...
Celeritas is a new computational transport code designed for high-performance simulation of high-energy physics detectors. This work describes some of its current capabilities and the design choices that enable the rapid development of efficient on-device physics. The abstractions that underpin the code design facilitate low-level performance tweak...
The upcoming generation of exascale HPC machines will all have most of their computing power provided by GPGPU accelerators. In order to be able to take advantage of this class of machines for HEP Monte Carlo simulations, we started to develop a Geant pilot application as a collaboration between HEP and the Exascale Computing Project. We will use t...
Full detector simulation was among the largest CPU consumer in all CERN experiment software stacks for the first two runs of the Large Hadron Collider (LHC). In the early 2010's, the projections were that simulation demands would scale linearly with luminosity increase, compensated only partially by an increase of computing resources. The extension...
The high energy physics community is discussing where investment is needed to prepare software for the HL-LHC and its unprecedented challenges. The ROOT project is one of the central software players in high energy physics since decades. From its experience and expectations, the ROOT team has distilled a comprehensive set of areas that should see r...
We overview recent changes in the ROOT I/O system, increasing performance and enhancing it and improving its interaction with other data analysis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly to improve experiment's software perfo...
Efficient random number generation with high quality statistical properties and exact reproducibility of Monte Carlo simulations are important requirements in many areas of computational science. VecRNG is a package providing pseudo-random number generation (pRNG) in the context of a new library VecMath. This library bundles up several general-purp...
The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts ("branches") that are really used in a given analysis need to be read from storage. Its unique feature is the seamless C++ integration, which allows users to directly store their event cl...
For the last 5 years Accelogic pioneered and perfected a radically new theory of numerical computing codenamed “Compressive Computing”, which has an extremely profound impact on real-world computer science [1]. At the core of this new theory is the discovery of one of its fundamental theorems which states that, under very general conditions, the va...
The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts (“branches”) that are really used in a given analysis need to be read from storage. Its unique feature is the seamless C++ integration, which allows users to directly store their event cl...
We overview recent changes in the ROOT I/O system, enhancing it by improving its performance and interaction with other data analysis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly improve experiment’s software performance.
The nee...
Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the she...
Experiments at the Large Hadron Collider (LHC) produce tens of petabytes of new data in ROOT format per year that need to be processed and analysed. In the next decade, following the planned upgrades of the LHC and its detectors, this data production rate is expected to increase at least ten-fold. Therefore, optimizing the ROOT I/O subsystem is of...
In the coming years, HEP data processing will need to exploit parallelism on present and future hardware resources to sustain the bandwidth requirements. As one of the cornerstones of the HEP software ecosystem, ROOT embraced an ambitious parallelisation plan which delivered compelling results. In this contribution the strategy is characterised as...
SIMD acceleration can potentially boost by factors the application throughput. Achieving efficient SIMD vectorization for scalar code with complex data flow and branching logic, goes however way beyond breaking some loop dependencies and relying on the compiler. Since the refactoring effort scales with the number of lines of code, it is important t...
The development of the GeantV Electromagnetic (EM) physics package has evolved following two necessary paths towards code modernization. A first phase required the revision of the main electromagnetic physics models and their implementation. The main objectives were to improve their accuracy, extend them to the new high-energy frontier posed by the...
The Physics programmes of LHC Run III and HL-LHC challenge the HEP community. The volume of data to be handled is unprecedented at every step of the data processing chain: analysis is no exception. Physicists must be provided with first-class analysis tools which are easy to use, exploit bleeding edge hardware technologies and allow to seamlessly e...
Portable and efficient vectorization is a significant challenge in large software projects such as GeantV, ROOT, and experiments' frameworks. Nevertheless, fully exploiting SIMD parallelism will be a required step in order to bridge the widening gap between the needs and availability of computing resouces for data analysis and processing in particl...
The bright future of particle physics at the Energy and Intensity frontiers poses exciting challenges to the scientific software community. The traditional strategies for processing and analysing data are evolving in order to (i) offer higher-level programming model approaches and (ii) exploit parallelism to cope with the ever increasing complexity...
In the fall 2016, GeantV went through a thorough community evaluation of the project status and of its strategy for sharing the R&D results with the LHC experiments and with the HEP simulation community in general. Following this discussion, GeantV has engaged onto an ambitious 2-year road-path aiming to deliver a beta version that has most of the...
When processing large amounts of data, the rate at which reading and writing can take place is a critical factor. High energy physics data processing relying on ROOT is no exception. The recent parallelisation of LHC experiments' software frameworks and the analysis of the ever increasing amount of collision data collected by experiments further em...
GeantV is a complex system based on the interaction of different modules needed for detector simulation, which include transport of particles in fields, physics models simulating their interactions with matter and a geometrical modeler library for describing the detector and locating the particles and computing the path length to the current volume...
The recent progress in parallel hardware architectures with deeper vector pipelines or many-cores technologies brings opportunities for HEP experiments to take advantage of SIMD and SIMT computing models. Launched in 2013, the GeantV project studies performance gains in propagating multiple particles in parallel, improving instruction throughput an...
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments.
The incarnations of the aforementioned parallelism are multi-threading, multi...
An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing te...
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-t...
The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype...
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and m...
The GeantV project is focused on the R&D of new particle transport techniques to maximize parallelism on multiple levels, profiting from the use of both SIMD instructions and co-processors for the CPU-intensive calculations specific to this type of applications. In our approach, vectors of tracks belonging to multiple events and matching different...
Thread-parallelisation and single-instruction multiple data (SIMD) "vectorisation" of software components in HEP computing has become a necessity to fully benefit from current and future computing hardware. In this context, the Geant-Vector/GPU simulation project aims to re-engineer current software for the simulation of the passage of particles th...
We present massively parallel high energy electromagnetic particle transportation through a finely segmented detector on a Graphics Processing Unit (GPU). Simulating events of energetic particle decay in a general-purpose high energy physics (HEP) detector requires intensive computing resources, due to the complexity of the geometry as well as phys...
This paper represents the vision of the members of the Fermilab Scientific
Computing Division's Computational Physics Department (SCD-CPD) on the status
and the evolution of various HEP software tools such as the Geant4 detector
simulation toolkit, the Pythia and GENIE physics generators, and the ROOT data
analysis framework. The goal of this paper...
The increases of data size and in the geographical distribution of analysis intensify the demand on the I/O subsystem. Over the last year, we greatly improved the I/O throughput, in some case by several factors, when reading ROOT files. ROOT'S improved techniques include improving the pre-existing prefetching, the automatic flushing of data buffers...
Gratia originated as an accounting system for batch systems and Linux process accounting. In production since 2006 at FNAL, it was adopted by the Open Science Grid as a distributed, grid-wide accounting system in 2007. Since adoption Gratia's next challenge has been to adapt to an explosive increase in data volume and to handle several new categori...
For the last several months the main focus of development in the ROOT I/O package has been code consolidation and performance improvements. We introduced a new pre-fetch mechanism to minimize the number of transactions between client and server, hence reducing the effect of latency on the time it takes to read a file both locally and over wide are...
High performance computing with a large code base and C++ has proved to be a good combination. But when it comes to storing data, C++ is a problematic choice: it offers no support for serialization, type definitions are amazingly complex to parse, and the dependency analysis (what does object A need to be stored?) is incredibly difficult. Neverthel...
The Parallel ROOT Facility, PROOF, enables the analysis of much larger data sets on a shorter time scale. It exploits the inherent parallelism in data of uncorrelated events via a multi-tier architecture that optimizes I/O and CPU utilization in heterogeneous clusters with distributed storage. The system provides transparent and interactive access...
Most physics analysis jobs involve multiple selection steps on the input data. These selection steps are called cuts or queries. A common strategy to implement these queries is to read all input data from files and then process the queries in memory. In many applications the number of variables used to define these queries is a relative small porti...
The parallel ROOT facility, PROOF, enables the interactive analysis of distributed data sets in a transparent way. It exploits the inherent parallelism in data sets of uncorrelated events via a multi-tier architecture that optimizes I/O and CPU utilization in heterogeneous clusters with distributed storage. On a grid, PROOF can use the available se...
In this talk we will review the major additions and improvements made to the ROOT system in the last 18 months and present our plans for future developments. The additons and improvements range from modifications to the I/O sub-system to allow users to save and restore objects of classes that have not been instrumented by special ROOT macros, to th...
Several years ago, the two major collider experiments at Fermilab (D and CDF);
decided that new software development for Run II will be largely done in C++.;
The run is slated to start in 1.5 years, an aggressive time frame for a major change;
in development language and style. If despite the transition each experiment (and;
sometimes multiple grou...
Citations
... Because detector simulation is naturally an HTC problem, nontrivial effort is required to adapt the necessary computations to use computer accelerators effectively. Two ongoing projects, AdePT [50] and Celeritas [51,52], the latter led by US national laboratory personnel, have produced viable prototypes of high-performing GPU-native simulation engines. However, there is a significant burden in developing and maintaining coprocessor-friendly code, including portability to different GPUs platforms and, potentially, future non-GPU architectures. ...
... A recently concluded R&D project called GeantV introduced sub-event parallelism using vectorized instructions, primarily targeting many-core CPU architectures, in a new prototype transport engine. The primary findings of this project were that the achievable speedup compared to Geant4 was limited to a factor 2 ± 0.5 and that a small percentage of the speedup actually arose from vectorization [33]. Nevertheless, the project produced several useful developments, such as VecGeom, and lessons, such as the importance of instruction cache locality. ...
... We expect the implementation of the support of RNTuple to be easier than TTree that it is expected to replace, thanks to its design. Data are stored in column of fundamental types (float, int,...) [54], similar to Apache Arrow [55], which should ease support from programming languages other than C ++ like Julia. ...
... Refs. [6][7][8]. Once the storage limit is reached, one is forced to discard parts of the dataset, or only save certain features of the data. Generally, this can be done without impacting the overall scientific program of the experiments, for example by using a data selection system called trigger that only stores data satisfying certain pre-determined characteristics that ensure the dataset will be aligned with the experiment's main scientific goals. ...
... In this respect, generators should be somewhat easier to reengineer efficiently for GPUs than detector simulation software (notably Geant4 [156]), where the abundance of conditional branching of a stochastic nature may lead to "thread divergence" and poor software performance (see, for examples, Refs. [157][158][159][160][161][162]). ...
... The resulting technology has the capability to enable substantial economic and operational gains (including speedup) for High Energy and Nuclear Physics data storage/analysis. In our initial studies, a factor of nearly x4 (3.9) compression was achieved with RHIC/STAR data where ROOT compression managed only x1.4 [6]. ...
... In [55], authors present optimization of SHR3 generator to solve differential equations, which outperformed the Curand XORWOW generator by 79% and 38% for uniform and normal distributions, respectively. Reference [56] compared the performance of PRNGs from the VecRNG package (VecRNG is a part of VecMath, a collection of vectorized algorithms for high-energy physics applications based on the VecCore library [56,57].) with the Curand library PRNGs to generate 10 7 double precision PRNs. They find that the Curand library implementation of the PHILOX4_32_10 generator is five times faster than the VecRNG implementation, while the MRG32k3a generators in both libraries show a similar performance. ...
... Awkward RDataFrame [13] uses the C++ header-only libraries to simplify the process of justin-time (JIT) compilation in ROOT [14]. The ak.from_rdataframe [15] function converts the selected ROOT RDataFrame [16] columns as native Awkward Arrays. The templated headeronly implementation constructs the Form from the primitive data types [17]. ...
Reference: The Awkward World of Python and C++
... When processing large volumes, sometimes the time spent on reading or writing data comes to the fore. New possibilities for parallelizing these processes are actively used in the optimization of HEP software [5][6][7]. ...
... Besides the scheduler efficiency to dispatch baskets, the global performance is highly impacted by the intrinsic efficiency of the basketizing procedure, involving gather and scatter actions as well as concurrent access. As presented in detail in Ref. [44], the main conclusion is that the basketizing dynamics strongly depends on the complexity of the workflow and on state parameters, such as number of tracks in flight, particle production budget, or percent to completion for a given event. ...
Reference: GeantV