ChapterPDF Available

Calculus on Computational Graphs - an Introduction

Authors:

Abstract and Figures

Fast, accurate and reliable computational schemes are essential when implementing complex systems required in deep learning applications. One of the techniques for achieving this is the so-called computational graph. Computational graphs divide down a complex computation into small and executable steps which could be performed quickly with pencil and paper and better still with computers. In most cases, loops that require repeating of the same algorithm but wastes time computationally due to processing of loop times become a lot easier to handle. Computational graphs ease the training of neural networks with gradient descent algorithm making them many times faster than traditional implementation of neural networks. Computational graphs have also found applications in weather forecasting by reducing the associated computation time. Its strength is fast computation of derivatives. It is known also by a different name of "reverse-mode differentiation." Beyond its use in deep learning, backpropagation is a powerful computational tool in many other areas like weather forecasting and analysis of numerical stability. In many ways, computational graph theory is similar with logic gate operations in digital circuits where dedicated logic operations are undertaken with logic gates such as AND, OR, NOR and NAND operations in the implementation of many binary operations. While the use of logic gates lead to complex systems such as multiplexers, adders, multipliers and more complex digital circuits, computational graphs have found their way into deep learning operations involving derivatives of real numbers, additions, scaling and multiplications of real numbers by simplifying the operations.
Content may be subject to copyright.
6
Calculus on Computational Graphs
6.1 Introduction
Fast, accurate and reliable computational schemes are essential when
implementing complex systems required in deep learning applications. One
of the techniques for achieving this is the so-called computational graph.
Computational graphs divide down a complex computation into small and
executable steps which could be performed quickly with pencil and paper
and better still with computers. In most cases, loops that require repeating of
the same algorithm but wastes time computationally due to processing of loop
times become a lot easier to handle. Computational graphs ease the training
of neural networks with gradient descent algorithm making them many times
faster than traditional implementation of neural networks.
Computational graphs have also found applications in weather forecasting
by reducing the associated computation time. Its strength is fast computation
of derivatives. It is known also by a different name of “reverse-mode
differentiation.
Beyond its use in deep learning, backpropagation is a powerful
computational tool in many other areas like weather forecasting and analysis
of numerical stability. In many ways, computational graph theory is similar
with logic gate operations in digital circuits where dedicated logic operations
are undertaken with logic gates such as AND, OR, NOR and NAND oper-
ations in the implementation of many binary operations. While the use of
logic gates lead to complex systems such as multiplexers, adders, multipliers
and more complex digital circuits, computational graphs have found their way
into deep learning operations involving derivatives of real numbers, additions,
scaling and multiplications of real numbers by simplifying the operations.
93
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
94 Calculus on Computational Graphs
6.1.1 Elements of Computational Graphs
Computational graphs are useful means of breaking down complex
mathematical computations and operations into micro computations and
thereby make it a lot easier to solve them in a sequential manner. They also
make it easier to track computations and to understand where solutions break
down. A computational graph is a connection of links and nodes at which
operations take place. Nodes represent variables and links are functions and
operations.
The range of operations include addition, multiplication, subtraction,
exponentiation and a lot more operations herein not mentioned. Consider
Figure 6.1, the three nodes represent three variables a, b and c. The variable c
is the result of the operation of the function fon aand b. This means we can
write the result as
c=f(a, b)(6.1)
Computational graphs allow nesting of operations which allow solving more
complex problems. Consider the following nesting of operations with this
computational graph in Equation (6.2).
Clearly, from these three operations, we see that it is a lot easier to
undertake the more complex operation when yis computed as in equation
y=h(g(f(x))) (6.2)
From Figure 6.2, it is apparent that the first operation is performed first
(f(x)). This is followed by the second operation, which is g(.), and lastly
h(.)operation.
However, from the point of view of Equation (6.2), the innermost
operation in Equation (6.2) involving f(x)is performed first. This is followed
by the second operation, which is g(.), and lastly h(.)as the last operation.
c
f
b
a
Figure 6.1
u=f(x); v=g(u); y=h(v)
x u v y
Figure 6.2
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
6.2 Compound Expressions 95
Consider the case where fis the operation of addition (+). The result cis
then given by the expression
c= (a+b)
when the operation is multiplication (x), the result is also given by the
expression.
c= (a×b)
Since fis given as a general operator, we may use any operator in the diagram
and write an equivalent expression for c.
6.2 Compound Expressions
For computer to use compound expressions with computational graph effi-
ciently, it is essential to factor expressions into unit cells. For example, the
expression r=p×q= (x+y)(y+ 1) can be reduced first to two unit
cells or two terms followed by computation of the product of the terms. The
product term is r=p×q.
r=p×q= (x+y)(y+ 1) = xy +y2+x+y
p=x+y
q=y+ 1
Each of the computational operation or component is created and then a graph
is built out of them by appropriately connecting them with arrows. The arrows
originate from the terms used to build the unit term where the arrow ends as
in Figure 6.3.
This form of abstraction is extremely useful in building neural networks
and deep learning frameworks. In fact, they are also useful in programming
of expressions for example in operations involving parallel support vector
machines (PSVM). In Figure 6.4, once the values at the root of the compu-
tational graphs are known, the solution of the expression becomes a lot easy
and trivial. Take the case where x= 3 and y= 4, the resulting solutions are
shown in Figure 6.5.
The evaluation of the compound expression is 35.
c
f
b
a
Figure 6.3
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
96 Calculus on Computational Graphs
Figure 6.4 Computational graphs of compound expressions.
Figure 6.5 Evaluating compound expressions.
6.3 Computing Partial Derivatives
One of the areas where computational graphs are used widely in neural
network applications is for computing derivatives of variables and functions
in simple forms. For derivatives, it simplifies the use of the chain rule.
Consider computing the partial derivative of ywith respect to bwhere
y= (a+b)×(b3) = c×d
c= (a+b); d= (b3) (6.3)
The partial derivative of ywith respect to bis
∂y
∂b =y
∂c ×c
∂b +y
∂d ×d
∂b (6.4)
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
6.3 Computing Partial Derivatives 97
c
ab
-3
d
b
y
Figure 6.6 Computational graph for Equation (6.3).
c
ab
-3
d
b
y
Figure 6.7 Computation of partial derivatives.
The partial computational graph covering Equation (6.3) is given in
Figure 6.6. The partial derivative is given in Equation (6.4). From
Equation (6.3), the partial derivatives are given as
∂y
∂c =d= (b3);
∂y
∂d =c= (a+b);
∂c
∂a = 1; ∂c
∂b = 1; ∂d
∂b = 1;
∂y
∂b = (b3) ×1+(a+b)×1
(6.5)
These derivatives are superimposed on the computational graph in Figure 6.7.
6.3.1 Partial Derivatives: Two Cases of the Chain Rule
The chain rule may be applied in various circumstances. Two cases are of
interest: the linear case (Figure 6.8) and the loop case (Figure 6.9). These two
cases are illustrated in this section.
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
98 Calculus on Computational Graphs
dx
dz
∂y
∂z
∂x
∂z
xyz
gh
Δx → Δy → Δz
Figure 6.8 Linear chain rule.
sΔzΔ
xΔ
y
Δ
x
y
z
s
Figure 6.9 Loop chain rule.
6.3.1.1 Linear chain rule
The objective in the linear case is to find the derivative of the output zwith
respect to the input xor dz
dx . The output depends recursively on both yand
xand hence it is expected that the partial derivative of zwill also depend
on these two variables. The partial derivative of ywith respect to xtherefore
needs to address this. From the diagram, we can write generally
z=f(x); z=h(y); y=g(x)(6.6)
Therefore, the derivative of zwith respect to xneeds to be done with respect
to yalso and is a product of two terms:
dz
dx =∂z
∂y ×y
∂x (6.7)
Thus, the derivative of zwith respect to xis computed as a product of two
partial derivatives of variables leading to it. This is shown in Figure 6.8.
6.3.1.2 Loop chain rule
The loop chain rule in Figure 6.9 is an application of the linear chain rule.
Each loop is treated with a linear chain rule. Consider the following loop
diagrams (Figure 6.9). The objective is to find the derivative of the output z
using the linear chain rule along the two arms of the loop and sum them.
zis a function of sas z=f(s)through two branches involving xand y,
respectively. In the upper branch, x=g(s)and z=h(x). In the lower
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
6.4 Computing of Integrals 99
sΔzΔ
x1
N
X
z
s
X2
Figure 6.10 Multiple loop chain rule.
branch, y=g(s), z =h(y). Two branches contribute to the value of zso that
z=p(x, y). Therefore, there will also be a sum of partial derivatives coming
from the two branches. The derivative of xwith respect to sis obtained as
dz
ds =∂z
∂x ×dx
ds +∂z
∂y ×dy
ds (6.8)
6.3.1.3 Multiple loop chain rule
Generally, if zis computed from Nloops so that it is a function of N variables
like z=k(x1, x2, . . . , xN), then N branches contribute to the output z.
Therefore, the total derivative of zto the input is a chain of N partial
derivatives.
The general partial derivative expression shown as Figure 6.10 is
dz
ds =∂z
∂x1
×dx1
ds +∂z
∂x2
×dx2
ds +· · · +z
∂xN
×dxN
ds
=
N
X
n=1
∂z
∂xn
×dxn
ds (6.9)
The general case is more suited to deep learning situations where there are
many stages in the neural network and many branches are also involved.
6.4 Computing of Integrals
In the next section, we introduce the use of computational graphs for
computing integrals using some of the well-known traditional approaches
including the trapezoidal and Simpson rules.
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
100 Calculus on Computational Graphs
6.4.1 Trapezoidal Rule
Integration traditionally is finding the area under a curve. This area has two
dimensions: the discrete step in sampling of the function multiplied by the
amplitude of the function at that discrete step. Thus, the function f(x)may
be computed using trapezoidal rule with the expression
Zb
a
f(x)dx x
2[f(x0)+2f(x1)+2f(x2)· · · + 2f(xN1) + f(xN)]
x
2[f(x0) + f(xN)] + ∆x
N1
X
i=1
f(xi)(6.10)
so that x=ba
N;xi=a+ix. N is the number of discrete sampling steps.
Therefore, the computational graph for this integration method can be easily
drawn by first computing the values of the function at N +1 locations starting
from zero to N in xdiscrete steps. Just how big N is will be determined by
some error bound, which has been derived to be traditionally
|E| ≤ K2(ba)3
12N2.
K2is the value of the second derivative of the function f(x). This sets the
bound on the error in the integration value obtained by using the trapezoidal
rule for the function. Thus, once the choice of N is made, an error bound
has been set for the result of the integration. This error may be reduced by
changing the value of N, the number of terms in the summation. Figure 6.11
shows how to compute an integral using the Trapezoidal rule.
6.4.2 Simpson Rule
The Simpson rule for computing the integral of a function follows the same
type of method as used for the trapezoidal rule with two exceptions. The
summation expression is different. The number of terms N is even. The rule is
Zb
a
f(x)dx x
3[f(x0)+4f(x1)+2f(x2)+2f(x3) + · · · + 2f(xN2)
+ 4f(xN1) + f(xN)]
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
6.4 Computing of Integrals 101
+
f
+
f
+
f
1
x
1
x
+
Figure 6.11 Integration using trapezoidal rule.
x
3[f(x0) + f(xN)] + 4∆x
3
(N/2)1
X
i=1
f(x2i+1)
+2∆x
3
(N/2)1
X
i=1
f(x2i)(6.11)
so that x=ba
N;xi=a+ix. Therefore, the computational graph for
this integration method can be easily drawn by first computing the values of
the function at N+ 1 locations starting from zero to N in xdiscrete steps.
Just how big N is will be determined by some error bound which has been
derived to be traditionally
|E| ≤ K4(ba)5
180N4
K4is the value of the fourth derivative of the function f(x). This sets the
bound on the error in the integration value obtained by using the Simpson
rule for the function. Once the choice of N is made, an error bound has been
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
102 Calculus on Computational Graphs
set for the result of the integration. This error may be reduced by changing
the value of N, the number of terms in the summation.
Exercise: Draw the computational graph for the Simpson Rule for integrating
a function.
6.5 Multipath Compound Derivatives
In multipath differentiation as used in neural network applications, there is
tandem differentiation. Results from previous steps affect the derivative of
the current node. Take for example in Figure 6.12, the derivative of Y is
influenced by the derivative of X in the forward path.
In the reverse path, the derivative of Z affects the derivative of Y. Let us
look at these two cases involving multipath differentiation. In Figure 6.12,
the weights or factors by which each path derivative affects the next node are
shown with arrows and variables.
Multipath Forward Differentiation
In the discussion, we limit the number of paths to three, but with the
understanding that the number of paths is limitless and depends on the
application. Observe the dependence of the partial integrals on the weights
from the integrals from the previous node.
In Figure 6.13, we have the derivative of Z with respect to X depends
on the derivative of Y with respect to X. Notice the starting point has the
derivative of a variable X to X.
XZ
Y
Figure 6.12 Multipath differentiation.
Figure 6.13 Multipath forward differentiation.
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
6.5 Multipath Compound Derivatives 103
Figure 6.14 Multipath reverse differentiation.
Multipath Reverse Differentiation
In the reverse (backward) path dependence, Figure 6.14 shows that the
derivative of Z with respected to X starts with the derivative of Z with respect
to Z, which of course is one.
In Figure 6.14, there is a tandem of three paths from stage 1 and another
three paths to the end node, making a total of 9 paths. These paths for
the forward and reverse partial differentiations are given by the product
(α+β+γ) (δ+ε+ξ).
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.