- A preview of this full-text is provided by Wiley.
- Learn more

Preview content only

Content available from Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

This content is subject to copyright. Terms and conditions apply.

OVERVIEW

An introduction to algorithmic differentiation

Assefaw H. Gebremedhin

1

| Andrea Walther

2

1

School of Electrical Engineering and

Computer Science, Washington State

University, Pullman, Washington

2

Institute for Mathematics, Paderborn

University, Paderborn, Germany

Correspondence

Assefaw H. Gebremedhin, School of

Electrical Engineering and Computer

Science, Washington State University,

Pullman, WA, 99164, USA.

Email: assefaw.gebremedhin@wsu.edu

Funding information

National Science Foundation, Grant/Award

Number: IIS-1553528

Abstract

Algorithmic differentiation (AD), also known as automatic differentiation, is a

technology for accurate and efficient evaluation of derivatives of a function given

as a computer model. The evaluations of such models are essential building blocks

in numerous scientific computing and data analysis applications, including optimi-

zation, parameter identification, sensitivity analysis, uncertainty quantification,

nonlinear equation solving, and integration of differential equations. We provide

an introduction to AD and present its basic ideas and techniques, some of its most

important results, the implementation paradigms it relies on, the connection it has

to other domains including machine learning and parallel computing, and a few of

the major open problems in the area. Topics we discuss include: forward mode and

reverse mode of AD, higher-order derivatives, operator overloading and source

transformation, sparsity exploitation, checkpointing, cross-country mode, and dif-

ferentiating iterative processes.

This article is categorized under:

Algorithmic Development > Scalable Statistical Methods

Technologies > Data Preprocessing

KEYWORDS

adjoints, algorithmic differentiation, automatic differentiation, backpropagation, checkpointing,

sensitivities

1|INTRODUCTION

Efficient calculation of derivative information is an indispensable building block in numerous applications ranging from

methods for solving nonlinear equations to sophisticated simulations in unconstrained and constrained optimization. Suppose

a sufficiently smooth function F:Rn!Rmis given. There are a few different techniques one can use to compute, either

exactly or approximately, the derivatives of F. In comparing and contrasting the different conceivable approaches, it is neces-

sary to take into account the desired accuracy and the computational effort involved.

The finite difference method is a very simple way to obtain approximations of derivatives. Following a truncated Taylor

expansion analysis, all that is needed for estimating derivatives using finite differences is function evaluations at different

arguments, taking the difference and dividing by a step-length (see Quarteroni, Sacco, & Saleri, 2000, Chapter 10.10 for

details on this class of methods). The runtime complexity of approximating derivatives using finite differences grows linearly

with n, that is, the number of unknowns, which is quite often prohibitively expensive. So the finite difference method is sim-

ple, but inexact and slow. On the other hand, we should also note that some optimization algorithms are not that much affected

by the inexactness of the derivative information provided by a finite difference method.

Received: 13 February 2019 Revised: 26 June 2019 Accepted: 2 August 2019

DOI: 10.1002/widm.1334

WIREs Data Mining Knowl Discov. 2020;10:e1334. wires.wiley.com/dmkd © 2019 Wiley Periodicals, Inc. 1of21

https://doi.org/10.1002/widm.1334