Table 1 - uploaded by Vassil G Vassilev

Content may be subject to copyright.

Source publication

Differentiation is ubiquitous in high energy physics, for instance in minimization algorithms and statistical analysis, in detector alignment and calibration, and in theory. Automatic differentiation (AD) avoids well-known limitations in round-offs and speed, which symbolic and numerical differentiation suffer from, by transforming the source code...

## Citations

... Recent advancements of production quality compilers like Clang allow tools to reuse the language parsing infrastructure, making it easier to implement source transformation AD. ADIC [26], Enzyme [27] and Clad [28] are compiler-based AD tools using source transformation. The increasing importance of AD is evident in newer programming languages such as Swift and Julia where it is integrated deep into the language [29,30]. ...

Automatic Differentiation (AD) is instrumental for science and industry. It is a tool to evaluate the derivative of a function specified through a computer program. The range of AD application domain spans from Machine Learning to Robotics to High Energy Physics. Computing gradients with the help of AD is guaranteed to be more precise than the numerical alternative and have a low, constant factor more arithmetical operations compared to the original function. Moreover, AD applications to domain problems typically are computationally bound. They are often limited by the computational requirements of high-dimensional parameters and thus can benefit from parallel implementations on graphics processing units (GPUs).
Clad aims to enable differential analysis for C/C++ and CUDA and is a compiler-assisted AD tool available both as a compiler extension and in ROOT. Moreover, Clad works as a plugin extending the Clang compiler; as a plugin extending the interactive interpreter Cling; and as a Jupyter kernel extension based on xeus-cling.
We demonstrate the advantages of parallel gradient computations on GPUs with Clad. We explain how to bring forth a new layer of optimization and a proportional speed up by extending Clad to support CUDA. The gradients of well-behaved C++ functions can be automatically executed on a GPU. The library can be easily integrated into existing frameworks or used interactively. Furthermore, we demonstrate the achieved application performance improvements, including (≈10x) in ROOT histogram fitting and corresponding performance gains from offloading to GPUs.

... This could be done by following the work presented in [137] for instance, which defines a differentiable array based programming language. Another solution would be to rely on an external system that could perform automatic differentiation at a lower level, such as Clad [148], which performs automatic differentiation of llvm programs or Tapenade [72], which works on Fortran or C programs. ...

This thesis is concerned with modelling languages aimed at assisting with modelling and simulation of systems described in terms of differential equations. These languages can be split into two classes: causal languages, where models are expressed using directed equations; and non-causal languages, where models are expressed using undirected equations. This thesis focuses on two related paradigms: FRP and FHM. FRP is an approach to programming causal time-aware applications that has successfully been used in causal modelling applications; while FHM is an approach to programming non-causal modelling applications. However, both are built on similar principles, namely, the treatment of models as first-class entities, allowing for models to be parametrised by other models or computed at runtime; and support for structurally dynamic models, whose behaviour can change during the simulation. This makes FRP and FHM particularly flexible and expressive approaches to modelling, especially compared to other mainstream languages. Because of their highly expressive and flexible nature, providing efficient implementations of these languages is a challenge. This thesis explores novel implementation techniques aimed at improving the performance of existing implementations of FRP and FHM, and other expressive modelling languages built on similar ideas. In the setting of FRP, this thesis proposes a novel embedded FRP library that uses the implementation approach of synchronous dataflow languages. This allows for significant performance improvement by better handling of the reactive network's topology, which represents a large portion of the runtime in current implementations, especially for applications that make heavy use of continuously varying values, such as modelling applications. In the setting of FHM, this thesis presents the modular compilation of a language based on FHM. Due to inherent difficulties with the simulation of systems of undirected equations, previous implementations of FHM and similarly expressive languages were either interpreted or generated code on the fly using just-in-time compilation, two techniques which have runtime overhead over ahead-of-time compilation. This thesis presents a new method for generating code for equation systems which allows for the separate compilation of FHM models. Compared with current approaches to FRP and FHM implementation, there is greater commonality between the implementation approaches described here, suggesting a possible way forward towards a future non-causal modelling language supporting FRP-like features, resulting in an even more expressive modelling language.

... Several computer codes exists in which all or some of these techniques are implemented. They include, for example, TAPENADE, 38 Stan Math Library, 39 CppAD, 40 CasADi, 41 ADOL-C, 42 Clad, 43 Adept, 44 autodiff. 45 We focus below on the description of automatic differentiation techniques in which operator overloading is used and the derivatives are computed in a forward-mode approach instead of reverse-mode. ...

This work uses advanced numerical techniques (complex differentiation and automatic differentiation) to efficiently and accurately compute all the required thermodynamic properties of an equation of state without any analytical derivatives─particularly without any handwritten derivatives. It avoids the tedious and error-prone process of symbolic differentiation, thus allowing for more rapid development of new thermodynamic models. The technique presented here was tested with several equations of state (van der Waals, Peng-Robinson, Soave-Redlich-Kwong, PC-SAFT, and cubic-plus-association) and high-accuracy multifluid models. A minimal set of algorithms (critical locus tracing and vapor-liquid equilibrium tracing) were implemented in an extensible and concise open-source C++ library: teqp (for Templated EQuation of state Package). This work demonstrates that highly complicated equations of state can be implemented faster yet with minimal computational overhead and negligible loss in numerical precision compared with the traditional approach that relies on analytical derivatives. We believe that the approach outlined in this work has the potential to establish a new computational standard when implementing computer codes for thermodynamic models.

... Alternatively, AD arithmetics can be added by a modified or special compiler [79,80] or through source-to-source transformation tools [81][82][83][84]. An extensive overview of AD tools can be consulted online [85]. ...

The full optimization of the design and operation of instruments whose functioning relies on the interaction of radiation with matter is a super-human task, given the large dimensionality of the space of possible choices for geometry, detection technology, materials, data-acquisition, and information-extraction techniques, and the interdependence of the related parameters. On the other hand, massive potential gains in performance over standard, "experience-driven" layouts are in principle within our reach if an objective function fully aligned with the final goals of the instrument is maximized by means of a systematic search of the configuration space. The stochastic nature of the involved quantum processes make the modeling of these systems an intractable problem from a classical statistics point of view, yet the construction of a fully differentiable pipeline and the use of deep learning techniques may allow the simultaneous optimization of all design parameters. In this document we lay down our plans for the design of a modular and versatile modeling tool for the end-to-end optimization of complex instruments for particle physics experiments as well as industrial and medical applications that share the detection of radiation as their basic ingredient. We consider a selected set of use cases to highlight the specific needs of different applications.

... Dedicated automatic differentiation tools capable of augmenting existing software, rather than requiring complete software rewrites, are needed. New compiler-based source-translation based AD tools, such as enzyme [108] and CLAD [109], are promising for such tasks. ...

The computational cost for high energy physics detector simulation in future experimental facilities is going to exceed the current available resources. To overcome this challenge, new ideas on surrogate models using machine learning methods are being explored to replace computationally expensive components. Additionally, differentiable programming has been proposed as a complementary approach, providing controllable and scalable simulation routines. In this document, new and ongoing efforts for surrogate models and differential programming applied to detector simulation are discussed in the context of the 2021 Particle Physics Community Planning Exercise (`Snowmass').

... Moreover, it imposes constraints on the compiler and complicates porting to GPU. Nevertheless, the application of the source-to-source tools using a special compiler is promising from a practical point of view [99] and makes it possible to hope for including the HMC into future computation systems. ...

... • A study demonstrating that running AD after optimization results in significant performance gains on a standard machine learning benchmark suite [57] and achieves state-of-the-art performance. Related work Clad is a plugin to the Clang compiler that implements forward mode automatic differentiation on a subset of C/C++ with reverse mode in development [59]. Chen et al. [11] present an end-to-end differentiable model for protein structure prediction. ...

Applying differentiable programming techniques and machine learning algorithms to foreign programs requires developers to either rewrite their code in a machine learning framework, or otherwise provide derivatives of the foreign code. This paper presents Enzyme, a high-performance automatic differentiation (AD) compiler plugin for the LLVM compiler framework capable of synthesizing gradients of statically analyzable programs expressed in the LLVM intermediate representation (IR). Enzyme synthesizes gradients for programs written in any language whose compiler targets LLVM IR including C, C++, Fortran, Julia, Rust, Swift, MLIR, etc., thereby providing native AD capabilities in these languages. Unlike traditional source-to-source and operator-overloading tools, Enzyme performs AD on optimized IR. On a machine-learning focused benchmark suite including Microsoft's ADBench, AD on optimized IR achieves a geometric mean speedup of 4.2 times over AD on IR before optimization allowing Enzyme to achieve state-of-the-art performance. Packaging Enzyme for PyTorch and TensorFlow provides convenient access to gradients of foreign code with state-of-the-art performance, enabling foreign code to be directly incorporated into existing machine learning workflows.

... In this paper, we describe the implementation of automatic differentiation techniques in ROOT, which is the data analysis framework broadly used High-Energy Physics [4]. This implementation is based on Clad [5,6], which is an automatic differentiation plugin for computation expressed in C/C++. ...

... Automatic differentiation in ROOT is based on Clad [5,6]. Clad is a source transformation AD tool for C++. ...

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to evaluate the derivative of a function specified by a computer program. AD exploits the fact that every computer program, no matter how complicated, executes a sequence of elementary arithmetic operations (addition, subtraction, multiplication, division, etc.), elementary functions (exp, log, sin, cos, etc.) and control flow statements. AD takes source code of a function as input and produces source code of the derived function. By applying the chain rule repeatedly to these operations, derivatives of arbitrary order can be computed automatically, accurately to working precision, and using at most a small constant factor more arithmetic operations than the original program. This paper presents AD techniques available in ROOT, supported by Cling, to produce derivatives of arbitrary C/C++ functions through implementing source code transformation and employing the chain rule of differential calculus in both forward mode and reverse mode. We explain its current integration for gradient computation in TFormula. We demonstrate the correctness and performance improvements in ROOT’s fitting algorithms.

... • A study demonstrating that running AD after optimization results in significant performance gains on a standard machine learning benchmark suite [57] and achieves state-of-the-art performance. Related work Clad is a plugin to the Clang compiler that implements forward mode automatic differentiation on a subset of C/C++ with reverse mode in development [59]. Chen et al. [11] present an end-to-end differentiable model for protein structure prediction. ...

Applying differentiable programming techniques and machine learning algorithms to foreign programs requires developers to either rewrite their code in a machine learning framework, or otherwise provide derivatives of the foreign code. This paper presents Enzyme, a high-performance automatic differentiation (AD) compiler plugin for the LLVM compiler framework capable of synthesizing gradients of statically analyzable programs expressed in the LLVM intermediate representation (IR). Enzyme synthesizes gradients for programs written in any language whose compiler targets LLVM IR including C, C++, Fortran, Julia, Rust, Swift, MLIR, etc., thereby providing native AD capabilities in these languages. Unlike traditional source-to-source and operator-overloading tools, Enzyme performs AD on optimized IR. On a machine-learning focused benchmark suite including Microsoft's ADBench, AD on optimized IR achieves a geometric mean speedup of 4.5x over AD on IR before optimization allowing Enzyme to achieve state-of-the-art performance. Packaging Enzyme for PyTorch and TensorFlow provides convenient access to gradients of foreign code with state-of-the art performance, enabling foreign code to be directly incorporated into existing machine learning workflows.

... In this paper, we describe the implementation of automatic differentiation techniques in ROOT, which is the data analysis framework broadly used High-Energy Physics [4]. This implementation is based on Clad [5,6], which is an automatic differentiation plugin for computation expressed in C/C++. ...

... Automatic differentiation in ROOT is based on Clad [5,6]. Clad is a source transformation AD tool for C++. ...

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to evaluate the derivative of a function specified by a computer program. AD exploits the fact that every computer program, no matter how complicated, executes a sequence of elementary arithmetic operations (addition, subtraction, multiplication, division, etc.), elementary functions (exp, log, sin, cos, etc.) and control flow statements. AD takes source code of a function as input and produces source code of the derived function. By applying the chain rule repeatedly to these operations, derivatives of arbitrary order can be computed automatically, accurately to working precision, and using at most a small constant factor more arithmetic operations than the original program. This paper presents AD techniques available in ROOT, supported by Cling, to produce derivatives of arbitrary C/C++ functions through implementing source code transformation and employing the chain rule of differential calculus in both forward mode and reverse mode. We explain its current integration for gradient computation in TFormula. We demonstrate the correctness and performance improvements in ROOT's fitting algorithms.