- A preview of this full-text is provided by Wiley.
- Learn more
Preview content only
Content available from Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
This content is subject to copyright. Terms and conditions apply.
OVERVIEW
An introduction to algorithmic differentiation
Assefaw H. Gebremedhin
1
| Andrea Walther
2
1
School of Electrical Engineering and
Computer Science, Washington State
University, Pullman, Washington
2
Institute for Mathematics, Paderborn
University, Paderborn, Germany
Correspondence
Assefaw H. Gebremedhin, School of
Electrical Engineering and Computer
Science, Washington State University,
Pullman, WA, 99164, USA.
Email: assefaw.gebremedhin@wsu.edu
Funding information
National Science Foundation, Grant/Award
Number: IIS-1553528
Abstract
Algorithmic differentiation (AD), also known as automatic differentiation, is a
technology for accurate and efficient evaluation of derivatives of a function given
as a computer model. The evaluations of such models are essential building blocks
in numerous scientific computing and data analysis applications, including optimi-
zation, parameter identification, sensitivity analysis, uncertainty quantification,
nonlinear equation solving, and integration of differential equations. We provide
an introduction to AD and present its basic ideas and techniques, some of its most
important results, the implementation paradigms it relies on, the connection it has
to other domains including machine learning and parallel computing, and a few of
the major open problems in the area. Topics we discuss include: forward mode and
reverse mode of AD, higher-order derivatives, operator overloading and source
transformation, sparsity exploitation, checkpointing, cross-country mode, and dif-
ferentiating iterative processes.
This article is categorized under:
Algorithmic Development > Scalable Statistical Methods
Technologies > Data Preprocessing
KEYWORDS
adjoints, algorithmic differentiation, automatic differentiation, backpropagation, checkpointing,
sensitivities
1|INTRODUCTION
Efficient calculation of derivative information is an indispensable building block in numerous applications ranging from
methods for solving nonlinear equations to sophisticated simulations in unconstrained and constrained optimization. Suppose
a sufficiently smooth function F:Rn!Rmis given. There are a few different techniques one can use to compute, either
exactly or approximately, the derivatives of F. In comparing and contrasting the different conceivable approaches, it is neces-
sary to take into account the desired accuracy and the computational effort involved.
The finite difference method is a very simple way to obtain approximations of derivatives. Following a truncated Taylor
expansion analysis, all that is needed for estimating derivatives using finite differences is function evaluations at different
arguments, taking the difference and dividing by a step-length (see Quarteroni, Sacco, & Saleri, 2000, Chapter 10.10 for
details on this class of methods). The runtime complexity of approximating derivatives using finite differences grows linearly
with n, that is, the number of unknowns, which is quite often prohibitively expensive. So the finite difference method is sim-
ple, but inexact and slow. On the other hand, we should also note that some optimization algorithms are not that much affected
by the inexactness of the derivative information provided by a finite difference method.
Received: 13 February 2019 Revised: 26 June 2019 Accepted: 2 August 2019
DOI: 10.1002/widm.1334
WIREs Data Mining Knowl Discov. 2020;10:e1334. wires.wiley.com/dmkd © 2019 Wiley Periodicals, Inc. 1of21
https://doi.org/10.1002/widm.1334