Chapter

# Distributed Training of Generative Adversarial Networks for Fast Detector Simulation: ISC High Performance 2018 International Workshops, Frankfurt/Main, Germany, June 28, 2018, Revised Selected Papers

Authors:
• Transmutex (https://www.transmutex.com)
To read the full-text of this research, you can request a copy directly from the authors.

## Abstract

The simulation of the interaction of particles in High Energy Physics detectors is a computing intensive task. Since some level of approximation is acceptable, it is possible to implement fast simulation simplified models that have the advantage of being less computationally intensive. Here we present a fast simulation based on Generative Adversarial Networks (GANs). The model is constructed from a generative network describing the detector response and a discriminative network, trained in adversarial manner. The adversarial training process is compute-intensive and the application of a distributed approach becomes particularly important. We present scaling results of a data-parallel approach to distribute GANs training across multiple nodes on TACC’s Stampede2. The efficiency achieved was above 94% when going from 1 to 128 Xeon Scalable Processor nodes. We report on the accuracy of the generated samples and on the scaling of time-to-solution. We demonstrate how HPC installations could be utilized to globally optimize this kind of models leading to quicker research cycles and experimentation, thanks to their large computation power and excellent connectivity.

## No full-text available

... In this perspective, an efficient training process becomes essential and, therefore, we focus on optimizing the computing resources needed to train 3DGAN, studying parallelization on HPC centers and commercial cloud resources [12,13,14]. This work presents two main contributions: the results of the optimisation of 3DGAN inference and training on Intel Xeon TM Scalable Processors (Cascade Lake) and the deployment of a distributed training approach over 256 nodes. ...
... Using dedicated hardware, such as GPGPUs, the generation time could further reduce but we choose not to quote comparison results since the Geant4 application cannot run on GPGPUs. As explained in [12], training 3DGAN for on a single Intel Xeon TM 8160 node (2 sockets, 24 cores each) required slightly less than 5 hours per epoch using an Intel optimised version of Tensorflow 1. ...
... Best result are obtained with 2 MPI processes per node (1 per socket), with tuned Tensorflow interop and intraop parallelism threads and processes pinned to separate NUMA domains in order to minimize NUMA effects. Figure 1 compares the results we obtained on the Endeavour cluster to the 2018 benchmarks described in [12]. Tensorflow optimisation improves the single node results by a factor 2.8. ...
... The architecture is implemented using Keras and Tensorflow as a backend. The RMSProp [29] optimizer is used. ...
... Integration with the Horovod framework and optimisation of Tensorflow and Horovod parameters in order to improve the training process performance on Intel Xeon processors is detailed in [29]: first results on scaling out the training process achieve close to 94% scaling efficiency up to 128 nodes. ...
Preprint
The increasing interest in the usage of Artificial Intelligence techniques (AI) from the research community and industry to tackle "real world" problems, requires High Performance Computing (HPC) resources to efficiently compute and scale complex algorithms across thousands of nodes. Unfortunately, typical data scientists are not familiar with the unique requirements and characteristics of HPC environments. They usually develop their applications with high-level scripting languages or frameworks such as TensorFlow and the installation process often requires connection to external systems to download open source software during the build. HPC environments, on the other hand, are often based on closed source applications that incorporate parallel and distributed computing API's such as MPI and OpenMP, while users have restricted administrator privileges, and face security restrictions such as not allowing access to external systems. In this paper we discuss the issues associated with the deployment of AI frameworks in a secure HPC environment and how we successfully deploy AI frameworks on SuperMUC-NG with Charliecloud.
Article
Full-text available
The future need of simulated events by the LHC experiments and their High Luminosity upgrades, is expected to increase by one or two orders of magnitude. As a consequence, research on new fast simulation solutions, including deep Generative Models, is very active and initial results look promising. We have previously reported on a prototype that we have developed, based on 3 dimensional convolutional Generative Adversarial Network, to simulate particle showers in high-granularity calorimeters. In this contribution we present improved results on a more realistic simulation. Detailed validation studies show very good agreement with Monte Carlo simulation. In particular, we show how increasing the network representational power, introducing physics-based constraints and using a transfer-learning approach for training improve the level of agreement over a large energy range.
Chapter
The increased availability of High-Performance Computing resources can enable data scientists to deploy and evaluate data-driven approaches, notably in the field of deep learning, at a rapid pace. As deep neural networks become more complex and are ingesting increasingly larger datasets, it becomes unpractical to perform the training phase on single machine instances due to memory constraints, and extremely long training time. Rather than scaling up, scaling out the computing resources is a productive approach to improve performance. The paradigm of data parallelism allows us to split the training dataset into manageable chunks that can be processed in parallel. In this work, we evaluate the scaling performance of training a 3D generative adversarial network (GAN) on an IBM POWER8 cluster, equipped with 12 NVIDIA P100 GPUs. The full training duration of the GAN, including evaluation, is reduced from 20 h and 16 min on a single GPU, to 2 h and 14 min on all 12 GPUs. We achieve a scaling efficiency of 98.9% when scaling from 1 to 12 GPUs, taking only the training process into consideration.
Article
Full-text available
The Petaflops supercomputer “Zhores” recently launched in the “Center for Computational and Data-Intensive Science and Engineering” (CDISE) of Skolkovo Institute of Science and Technology (Skoltech) opens up new exciting opportunities for scientific discoveries in the institute especially in the areas of data-driven modeling, machine learning and artificial intelligence. This supercomputer utilizes the latest generation of Intel and NVidia processors to provide resources for the most compute intensive tasks of the Skoltech scientists working in digital pharma, predictive analytics, photonics, material science, image processing, plasma physics and many more. Currently it places 7 th in the Russian and CIS TOP-50 (2019) supercomputer list. In this article we summarize the cluster properties and discuss the measured performance and usage modes of this new scientific instrument in Skoltech.
Article
Full-text available
We present the first application of three-dimensional convolutional Generative Adversarial Network to High Energy Physics simulation. We generate three-dimensional images of particles depositing energy in high granularity calorimeters. This is the first time such an approach is taken in HEP where most of data is three-dimensional in nature but it is customary to convert it into two-dimensional slices. The present work proves the success of using three dimensional convolutional GAN. Energy showers are well reproduced in all dimensions and show a good agreement with standard techniques (Geant4 detailed simulation). We also demonstrate the ability to condition training on several parameters such as particle type and energy. This work aims at proving that deep learning techniques represent a valid fast alternative to standard Monte Carlo approaches. It is part of the GeantV project.
Article
Full-text available
Machine Learning techniques have been used in different applications by the HEP community: in this talk, we discuss the case of detector simulation. The need for simulated events, expected in the future for LHC experiments and their High Luminosity upgrades, is increasing dramatically and requires new fast simulation solutions. We will present results of several studies on the application of computer vision techniques to the simulation of detectors, such as calorimeters. We will also describe a new R&D activity, within the GeantV project, aimed at providing a configurable tool capable of training a neural network to reproduce the detector response and replace standard Monte Carlo simulation. This represents a generic approach in the sense that such a network could be designed and trained to simulate any kind of detector and, eventually, the whole data processing chain in order to get, directly in one step, the final reconstructed quantities, in just a small fraction of time. We will present the first three-dimensional images of energy showers in a high granularity calorimeter, obtained using Generative Adversarial Networks.
Article
Full-text available
The precise modeling of subatomic particle interactions and propagation through matter is paramount for the advancement of nuclear and particle physics searches and precision measurements. The most computationally expensive step in the simulation pipeline of a typical experiment at the Large Hadron Collider (LHC) is the detailed modeling of the full complexity of physics processes that govern the motion and evolution of particle showers inside calorimeters. We introduce \textsc{CaloGAN}, a new fast simulation technique based on generative adversarial networks (GANs). We apply these neural networks to the modeling of electromagnetic showers in a longitudinally segmented calorimeter, and achieve speedup factors comparable to or better than existing full simulation techniques on CPU ($100\times$-$1000\times$) and even faster on GPU (up to $\sim10^5\times$). There are still challenges for achieving precision across the entire phase space, but our solution can reproduce a variety of geometric shower shape properties of photons, positrons and charged pions. This represents a significant stepping stone toward a full neural network-based detector simulation that could save significant computing time and enable many analyses now and in the future.
Article
Full-text available
We present a lightweight Python framework for distributed training of neural networks on multiple GPUs or CPUs. The framework is built on the popular Keras machine learning library. The Message Passing Interface (MPI) protocol is used to coordinate the training process, and the system is well suited for job submission at supercomputing sites. We detail the software's features, describe its use, and demonstrate its performance on systems of varying sizes on a benchmark problem drawn from high-energy physics research.
Article
Full-text available
We provide a bridge between generative modeling in the Machine Learning community and simulated physical processes in High Energy Particle Physics by applying a novel Generative Adversarial Network (GAN) architecture to the production of jet images -- 2D representations of energy depositions from particles interacting with a calorimeter. We propose a simple architecture, the Location-Aware Generative Adversarial Network, that learns to produce realistic radiation patterns from simulated high energy particle collisions. The pixel intensities of GAN-generated images faithfully span over many orders of magnitude and exhibit the desired low-dimensional physical properties (i.e., jet mass, n-subjettiness, etc.). We shed light on limitations, and provide a novel empirical validation of image quality and validity of GAN-produced simulations of the natural world. This work provides a base for further explorations of GANs for use in faster simulation in High Energy Particle Physics.
Article
Full-text available
The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs.
Article
Full-text available
The stochastic gradient descent method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training data, usually $32$--$512$ data points, is sampled to compute an approximation to the gradient. It has been observed in practice that when using a larger batch there is a significant degradation in the quality of the model, as measured by its ability to generalize. There have been some attempts to investigate the cause for this generalization drop in the large-batch regime, however the precise answer for this phenomenon is, hitherto unknown. In this paper, we present ample numerical evidence that supports the view that large-batch methods tend to converge to sharp minimizers of the training and testing functions -- and that sharp minima lead to poorer generalization. In contrast, small-batch methods consistently converge to flat minimizers, and our experiments support a commonly held view that this is due to the inherent noise in the gradient estimation. We also discuss several empirical strategies that help large-batch methods eliminate the generalization gap and conclude with a set of future research ideas and open questions.
Article
Full-text available
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with particularly strong support for training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model in contrast to existing systems, and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.
Article
Full-text available
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
Article
Full-text available
Two recently introduced criteria for estimation of generative models are both based on a reduction to binary classification. Noise-contrastive estimation (NCE) is an estimation procedure in which a generative model is trained to be able to distinguish data samples from noise samples. Generative adversarial networks (GANs) are pairs of generator and discriminator networks, with the generator network learning to generate samples by attempting to fool the discriminator network into believing its samples are real data. Both estimation procedures use the same function to drive learning, which naturally raises questions about how they are related to each other, as well as whether this function is related to maximum likelihood estimation (MLE). NCE corresponds to training an internal data model belonging to the {\em discriminator} network but using a fixed generator network. We show that a variant of NCE, with a dynamic generator network, is equivalent to maximum likelihood estimation. Since pairing a learned discriminator with an appropriate dynamically selected generator recovers MLE, one might expect the reverse to hold for pairing a learned generator with a certain discriminator. However, we show that recovering MLE for a learned generator requires departing from the distinguishability game.
Article
Full-text available
We present results from a case study comparing different multivariate classification methods. The input is a set of Monte Carlo data, generated and approximately triggered and pre-processed for an imaging gamma-ray Cherenkov telescope. Such data belong to two classes, originating either from incident gamma rays or caused by hadronic showers. There is only a weak discrimination between signal (gamma) and background (hadrons), making the data an excellent proving ground for classification techniques.The data and methods are described, and a comparison of the results is made. Several methods give results comparable in quality within small fluctuations, suggesting that they perform at or close to the Bayesian limit of achievable separation. Other methods give clearly inferior or inconclusive results. Some problems that this study can not address are also discussed.
Article
Full-text available
The field of high energy physics aims to discover the underlying structure of matter by searching for and studying exotic particles, such as the top quark and Higgs boson, produced in collisions at modern accelerators. Since such accelerators are extraordinarily expensive, extracting maximal information from the resulting data is essential. However, most accelerator events do not produce particles of interest, so making effective measurements requires event selection, in which events producing particles of interest (signal) are separated from events producing other particles (background). This article studies the use of machine learning to aid event selection. First, we apply supervised learning methods, which have succeeded previously in similar tasks. However, they are suboptimal in this case because they assume that the selector with the highest classification accuracy will yield the best final analysis; this is not true in practice, as such analyses are more sensitive to some backgrounds than others. Second, we present a new approach that uses stochastic optimization techniques to directly search for selectors that maximize either the precision of top quark mass measurements or the sensitivity to the presence of the Higgs boson. Empirical results confirm that stochastically optimized selectors result in substantially better analyses. We also describe a case study in which the best selector is applied to real data from the Fermilab Tevatron accelerator, resulting in the most precise top quark mass measurement of this type to date. Hence, this new approach to event selection has already contributed to our knowledge of the top quark's mass and our understanding of the larger questions upon which it sheds light.
Technical Report
TensorFlow [1] is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
Article
Simulation is a key component of physics analysis in particle physics and nuclear physics. The most computationally expensive simulation step is the detailed modeling of particle showers inside calorimeters. Full detector simulations are too slow to meet the growing demands resulting from large quantities of data; current fast simulations are not precise enough to serve the entire physics program. Therefore, we introduce CaloGAN, a new fast simulation based on generative adversarial neural networks (GANs). We apply the CaloGAN to model electromagnetic showers in a longitudinally segmented calorimeter. This represents a significant stepping stone toward a full neural network-based detector simulation that could save significant computing time and enable many analyses now and in the future. In particular, the CaloGAN achieves speedup factors comparable to or better than existing fast simulation techniques on CPU ($100\times$-$1000\times$) and even faster on GPU (up to $\sim10^5\times$)) and has the capability of faithfully reproducing many aspects of key shower shape variables for a variety of particle types.
Article
Despite their overwhelming capacity to overfit, deep learning architectures tend to generalize relatively well to unseen data, allowing them to be deployed in practice. However, explaining why this is the case is still an open area of research. One standing hypothesis that is gaining popularity, e.g. Hochreiter & Schmidhuber (1997); Keskar et al. (2017), is that the flatness of minima of the loss function found by stochastic gradient based methods results in good generalization. This paper argues that most notions of flatness are problematic for deep models and can not be directly applied to explain generalization. Specifically, when focusing on deep networks with rectifier units, we can exploit the particular geometry of parameter space induced by the inherent symmetries that these architectures exhibit to build equivalent models corresponding to arbitrarily sharper minima. Furthermore, if we allow to reparametrize a function, the geometry of its parameters can change drastically without affecting its generalization properties.
Article
This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape at solutions found by gradient descent. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local entropy based objective that favors well-generalizable solutions lying in the flat regions of the energy landscape, while avoiding poorly-generalizable solutions located in the sharp valleys. Our algorithm resembles two nested loops of SGD, where we use Langevin dynamics to compute the gradient of local entropy at each update of the weights. We prove that incorporating local entropy into the objective function results in a smoother energy landscape and use uniform stability to show improved generalization bounds over SGD. Our experiments on competitive baselines demonstrate that Entropy-SGD leads to improved generalization and has the potential to accelerate training.
Article
In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.
Conference Paper
Simulations of molecular dynamics play an important role in computational chemistry and physics. Such simulations require accurate information about the state and properties of interacting systems. The computation of water cluster potential energy surface is a complex and computationally expensive operation. Therefore, machine learning methods such as Artificial Neural Networks have been recently employed to machine-learn and further approximate clusters potential energy surfaces. This works presents the application of another highly successful machine learning method, the Support Vector Regression, for the modeling and approximation of the potential energy of water clusters as representatives of more general molecular clusters.
Article
High-level triggering is a vital component in many modern particle physics experiments. This paper describes a modification to the standard boosted decision tree (BDT) classifier, the so-called "bonsai" BDT, that has the following important properties: it is more efficient than traditional cut-based approaches; it is robust against detector instabilities, and it is very fast. Thus, it is fit-for-purpose for the online running conditions faced by any large-scale data acquisition system.
Article
We have studied the application of different classification algorithms in the analysis of simulated high energy physics data. Whereas Neural Network algorithms have become a standard tool for data analysis, the performance of other classifiers such as Support Vector Machines has not yet been tested in this environment. We chose two different problems to compare the performance of a Support Vector Machine and a Neural Net trained with back-propagation: tagging events of the type e+e- -> ccbar and the identification of muons produced in multihadronic e+e- annihilation events.
Workshop introduction, context of the workshop: Half-way through run2
• I Bird
GeantV apha-release preview
• A Gheata
Calorimetry with deep learning: particle classification, energy regression, and simulation for high-energy physics
• F Carminati
Lecture 6a overview of minibatch gradi-ent descent
• G Hinton
• N Srivastava
• K Swersky
How to Train a GAN? Tips and tricks to make GANs work
• S Chintala
GRPC: A high performance, open-source universal RPC framework
Towards the modeling of atomic and molecular clusters energy by support vector regression
• A Vitek
• M Stachon
• P Krmer
• V Snel