Content uploaded by Jesús Peinado
Author content
All content in this area was uploaded by Jesús Peinado on Nov 11, 2021
Content may be subject to copyright.
On Bernoulli matrix polynomials and matrix
exponential approximation
E. Defeza, J. Ib´a˜nezb, P. Alonso-Jord´ac,∗, Jos´e M. Alonsob, J. Peinadoc
Universitat Polit`ecnica de Val`encia, Camino de Vera s/n, 46022, Valencia. Spain
aInstituto de Matem´atica Multidisciplinar
bInstituto de Instrumentaci´on para Imagen Molecular
cDepartment of Information Systems and Computation
Abstract
We present in this paper a new method based on Bernoulli matrix polynomials
to approximate the exponential of a matrix. The developed method has given
rise to two new algorithms whose efficiency and precision are compared to the
most efficient implementations that currently exist. For that, a state-of-the-art
test matrix battery, that allows deeply exploring the highlights and downsides of
each method, has been used. Since the new algorithms proposed here do make an
intensive use of matrix products, we also provide a GPUs-based implementation
that allows to achieve a high performance thanks to the optimal implementation
of matrix multiplication available on these devices.
Keywords:
Bernoulli matrix approximation, Matrix exponential function, GPU computing
1. Introduction
The computation of matrix functions has received attention in the last years
because of its several applications in different areas of science and technology. Of
all the matrix functions, the exponential matrix eA,A∈Cr×r,stands out, due
both to its applications in the resolution of systems of differential equations, or
in the graph theory, and to the difficulties involved in its computation, see [1–5].
Among the proposed methods for the approximate computation of the expo-
nential matrix, two fundamental ones stand out: those based on rational Pad´e
approximations [6–10], and those based on polynomial approximations, which
are either Taylor series developments [11–13] or serial developments of Hermite
matrix polynomials [14, 15]. In general, polynomial approximations showed to
∗Corresponding author
Email addresses: edefez@imm.upv.es (E. Defez), jjibanez@dsic.upv.es (J. Ib´a˜nez),
palonso@upv.es (P. Alonso-Jord´a), jmalonso@dsic.upv.es (Jos´e M. Alonso),
jpeinado@dsic.upv.es (J. Peinado)
Preprint submitted to Journal of Computational and Applied MathematicsNovember 11, 2021
be more efficient than the Pad´e algorithm in tests because they are more ac-
curate, despite a slightly higher cost in some cases. All these methods use the
basic property of Scaling-Squaring, based on the relationship
eA=eA/2s2s
.
Thus, if Pm(A) is a matrix polynomial approximation of eA, then given a
matrix Aand a scaling factor s,Pm(A/2s) is an approximation to eA/2sand
eA≈(Pm(A/2s))2s
.(1)
Bernoulli polynomials and Bernoulli numbers have been extensively used in
several areas of mathematics, as number theory, and they appear in many math-
ematical formulas, such as the residual term of the Euler-Maclaurian quadra-
ture rule [16, p. 63], the Taylor series expansion of the trigonometric functions
tan (x),csc (x) and cot (x) [16, p. 116-117] and the Taylor series expansion of
the hyperbolic function tanh (x), [16, p. 125]. They are also employed in the
well known exact expression of the even values of the Riemann zeta function:
ξ(2k) = X
n≥1
1
n2k=(−1)k−1(2π)2kB2k
2(2k)! , k ≥1.
Moreover, they are even used for solving initial value problems [17], boundary
value problems [18, 19], high-order linear and nonlinear Fredholm and Volterra
integro-differential equations [20, 21], complex differential equations [22] and
partial differential equations [23–25]. An excellent survey about Bernoulli poly-
nomials and its applications can be found in [26]. The development of series
functions of Bernoulli polynomials has been studied in [27, 28].
In this paper, we present a new series development of the exponential matrix
in terms of Bernoulli matrix polynomials which demonstrates that the polyno-
mial approximations of the exponential matrix are more accurate and less com-
putationally expensive in most cases than those based on Pad´e approximants.
We also verify that this new method based on Bernoulli matrix polynomials is
a competitive method for the approximation of the exponential matrix, with a
similar computational cost than the Taylor one but, generally, more accurate.
The organization of the paper is as follows. Section 2 is devoted to Bernoulli
polynomials. We show how to obtain a series of a matrix exponential in terms
of Bernoulli matrix polynomials and how to approach that exponential of a
matrix. The following section describes the algorithms proposed. Tests and
comparisons are presented in Section 4. We close the document with some
conclusion remarks.
Notation
Throughout this paper, we denote by Cr×rthe set of all the complex square
matrices of size rand by Ithe identity matrix. A polynomial of degree m
means an expression of the form Pm(t) = amtm+am−1tm−1+··· +a1t+a0,
2
where tis a real variable and aj, for 0 ≤j≤m, are complex numbers. In this
way, we can define the matrix polynomial Pm(B) for B∈Cr×ras Pm(B) =
amBm+am−1Bm−1+·· · +a1B+a0I. Throughout this paper, we denote In
(or I) and 0n×nthe matrix identity and the null matrix of order n, respectively.
With dxe, we denote the result of rounding xto the nearest integer greater than
or equal to xand bxcdenotes the result of rounding xto the nearest integer
less than or equal to x. As usual, the matrix norm ||·|| denotes any subordinate
matrix norm; in particular || · ||1is the usual 1−norm. Finally, if A(k, m) are
matrices in Cn×nfor m≥0, k ≥0, from [29] it follows that
X
m≥0X
k≥0A(k, m) = X
m≥0
m
X
k=0 A(k, m −k).(2)
2. On Bernoulli matrix polynomials
The Bernoulli polynomials Bn(x) are defined in [16, p.588] as the coefficients
of the generating function
g(x, t) = tetx
et−1=X
n≥0
Bn(x)
n!tn,|t|<2π, (3)
where g(x, t) is an holomorphic function in Cfor the variable t(it has an avoid-
able singularity in t= 0). Bernoulli polynomials Bn(x) has the explicit expres-
sion
Bn(x) =
n
X
k=0 n
kBkxn−k,(4)
where the Bernoulli numbers are defined by Bn=Bn(0). Therefore, it follows
that the Bernoulli numbers satisfy
B0= 1,Bk=−
k−1
X
i=0 k
iBi
k+ 1 −i, k ≥1.(5)
Note that B3=B5=··· =B2k+1 = 0, for k≥1. For a matrix A∈Cr×rwe
define the m−th Bernoulli matrix polynomial by the expression
Bm(A) =
m
X
k=0 m
kBkAm−k.(6)
Thus, we can now calculate the exact value of eAt t
et−1, where A∈
Cr×r. By using (3) and (6) one gets
eAt t
et−1=
X
n≥0
An
n!tn
X
k≥0
Bk
k!tk
=X
n≥0X
k≥0
AnBk
n!k!tntk.
3
Taking into account that A(k, n) = AnBktntk
n!k!from (2), we have
eAt t
et−1=X
n≥0
n
X
k=0
Bk
k!tkAn−k
(n−k)!tn−k
=X
n≥0 n
X
k=0 n
kBkAn−k!tn
n!=X
n≥0
Bn(A)tn
n!,
where Bn(A) is the n-th Bernoulli matrix polynomial defined in (6). In this
way, we can use the series expansion
eAt =et−1
tX
n≥0
Bn(A)tn
n!,|t|<2π, (7)
to obtain approximations of the matrix exponential. To do this, let’s take sthe
scaling (to be determined) of the matrix Aand take t= 1 in (7). We use the
matrix exponential approximation
Pm(A/2s)=(e−1)
m
X
n=0
Bn(A/2s)
n!.(8)
Approximation (8) has the problem that it is not expressed in explicit terms
of powers of the matrix A/2s. This explicit relationship is provided below.
Lemma 1. Given expression (8), we get
1
(e−1)Pm(A/2s) =
m
X
n=0
Bn(A/2s)
n!=
m
X
i=0
α(m)
i(A/2s)i,(9)
where α(m)
i=
m
X
k=ik
k−iBk−i
k!.
Proof: For m= 0 and m= 1 formula (9) trivially holds. From (8) we have that,
for m= 2 and using (6), one gets
1
(e−1)P2(A/2s) =
2
X
n=0
Bn(A/2s)
n!
=1
0!B0(A/2s) + 1
1!B1(A/2s) + 1
2!B2(A/2s)
=1
0! 0
X
k=0 0
kBk(A/2s)0−k!+1
1! 1
X
k=0 1
kBk(A/2s)1−k!
+1
2! 2
X
k=0 2
kBk(A/2s)2−k!
4
= 2
X
k=0
1
k!k
kBk!(A/2s)0+ 2
X
k=1
1
k!k
k−1Bk−1!(A/2s)1
+ 2
X
k=2
1
k!k
k−2Bk−2!(A/2s)2
=α(2)
0(A/2s)0+α(2)
1(A/2s)1+α(2)
2(A/2s)2
=
2
X
i=0
α(2)
i(A/2s)i,
thus formula (9) is true for m= 2. Similarly, for m= 3 one gets:
1
(e−1)P3(A/2s) =
3
X
n=0
Bn(A/2s)
n!
=1
0!B0(A/2s) + 1
1!B1(A/2s) + 1
2!B2(A/2s) + 1
3!B3(A/2s)
=1
0! 0
X
k=0 0
kBk(A/2s)0−k!+1
1! 1
X
k=0 1
kBk(A/2s)1−k!
+1
2! 2
X
k=0 2
kBk(A/2s)2−k!+1
3! 3
X
k=0 3
kBk(A/2s)3−k!
= 3
X
k=0
1
k!k
kBk!(A/2s)0+ 3
X
k=1
1
k!k
k−1Bk−1!(A/2s)1
+ 3
X
k=2
1
k!k
k−2Bk−2!(A/2s)2+ 3
X
k=3
1
k!k
k−3Bk−3!(A/2s)3
=α(3)
0(A/2s)0+α(3)
1(A/2s)1+α(3)
2(A/2s)2+α(3)
3(A/2s)3
=
3
X
i=0
α(3)
i(A/2s)i,
and formula (9) is also true for m= 3. We will use now an induction argument.
Suppose that formula (9) it is valid for mand let’s see for m+ 1. Using (6)
together with the induction hypothesis, one gets
1
(e−1)Pm+1(A/2s) =
m+1
X
n=0
Bn(A/2s)
n!
=
m
X
n=0
Bn(A/2s)
n!+1
(m+ 1)!Bm+1 (A/2s)
=
m
X
i=0
α(m)
i(A/2s)i+1
(m+ 1)! m+1
X
k=0 m+ 1
kBk(A/2s)m+1−k!
5
=α(m)
0+m+ 1
m+ 1Bm+1
(m+ 1)!(A/2s)0+· · · +α(m)
m+m+ 1
1B1
(m+ 1)!(A/2s)m
+m+ 1
0B0
(m+ 1)!(A/2s)m+1
= m+1
X
k=0 k
kBk
k!!(A/2s)0+ m+1
X
k=1 k
k−1Bk−1
k!!(A/2s)1+·· ·
·· · + m+1
X
k=m+1 k
k−(m+1)Bk−(m+1)
k!!(A/2s)m+1
=
m+1
X
i=0
α(m+1)
i(A/2s)i.
3. The proposed algorithms
The matrix polynomial Pm(A/2s) from (8) can be computed efficiently in
terms of matrix products using values for min the set
mk∈ {2,4,6,9,12,16,20,25,30, . . .},
with k= 1,2, . . ., by using the Paterson–Stockmeyer’s method [30]. If we con-
sider Pmk(A), then:
Pmk(A) = (10)
(((pmkAq+pmk−1Aq−1+pmk−2Aq−2+· ·· +pmk−q+1A+pmk−qI)Aq
+pmk−q−1Aq−1+pmk−q−2Aq−2+· ·· +pmk−2q+1A+pmk−2qI)Aq
+pmk−2q−1Aq−1+pmk−2q−2Aq−2+· ·· +pmk−3q+1A+pmk−3qI)Aq
. . .
+pq−1Aq−1+pq−2Aq−2+· ·· +p1A+p0I,
where q=√mkor q=√mk. Taking into account Table 4.1 from [4, pp.
74], the computational cost in terms of matrix products of (10) is k. To obtain
the exponential of matrix Awith enough precision and efficiency, it is necessary
to determine the values of mand sin expression (8). Once these values have
been determined, the approximation (1) is used to compute eAby means of
Bernoulli matrix polynomials.
For obtaining the optimal order of the series expansion mkand the scaling
parameter s, we have used a similar error analysis as in [12]:
Let be
A∈Ωm=X∈n×n:ρe−XTm(X)−I<1,
where ρ(·) is the spectral radius of a matrix and Tm(X) the Taylor approxima-
tion of matrix exponential of order m.
6
The backward error for computing eAcan be defined as the matrix ∆Asuch
that eA+∆A=Tm(A). It can be verified that
∆AwX
k≥m+1
c(m)
kAk,
where P
k≥m+1
c(m)
kAkis the backward error of matrix exponential for the Taylor
approximation. The absolute backward error can be bound as follows:
Eab(A) = k∆Ak '
X
k≥m+1
c(m)
kAk
≤X
k≥m+1 |c(m)
k|||Ak||1/kk
≤X
k≥m+1 |c(m)
k|βk
m,(11)
where βm= max ||Ak||1/k , k ≥m+ 1. Theorem 2 from [12] shows that
βm=||Ak0||1/k0, where m+ 1 ≤k0≤2m+ 1, and that value can be approxi-
mated by means of
βm'max na1/(m+1)
m+1 , a1/(m+2)
m+2 o,
where am+1 and am+2 are the 1–norm estimation of ||Am+1|| and ||Am+2 ||, re-
spectively, using the block 1–norm estimation algorithm of Higham and Tisseur
[31]. Let be
Θ(ab)
m= max
θ≥0 : X
k≥m+1 c(m)
kθk≤u
,(12)
where u= 2−53 is the unit roundoff in IEEE double precision floating-point
arithmetic. If an integer s≥0 verifies 2−sβm<Θ(ab)
m, then
Eab(A/2s)≤X
k≥m+1 c(m)
k(Θ(ab)
m)k< u
and mand swill be taken, respectively, as the adequate order of the polynomial
and the scaling parameter.
Using the above reasoning, the relative backward error can be bound in the
following way:
Erb(A) = k∆Ak
kAk| '
P
k≥m+1
c(m)
kAk
kAk≤P
k≥m+1 |c(m)
k|
Ak
kAk
=X
k≥m|c(m)
k+1|||Ak||1/k k≤X
k≥m|c(m)
k+1|βk
m.(13)
7
Let be
Θ(rb)
m= max
θ≥0 : X
k≥mc(m)
k+1θk≤u
.(14)
Therefore, if an integer s≥0 satisfies 2−sβm<Θ(rb)
m, then
Erb(A/2s)≤X
k≥mc(m)
k+1(Θ(rb)
m)k< u
and the polynomial order mand the scaling parameter swill have been obtained.
The Θ(ab)
mand Θ(rb)
mparameters were worked out, with the required pre-
cision by using symbolic computations, from m= 2 to m= 30. Then, the
maximum value Θmbetween Θ(ab)
mand Θ(rb)
mwas computed for each m, i.e.
Θm=max(Θ(ab)
m,Θ(rb)
m), giving place to the Θmparameter included in the sec-
ond column of Table 2 from [12]. As a result, Θmequals Θ(ab)
mwhen m≤16 and
Θmmatches Θ(rb)
motherwise. On other words, the absolute backward error was
considered when the polynomial order is less or equal than 16 and the relative
backward error was taken into account for greater values.
Algorithm 1 computes the matrix exponential function based on Bernoulli
series and the Paterson–Stockmeyer’s method. Step 1 of this algorithm uses
the procedure previously described to obtain the values of mand s(as known,
a more detailed description can be found in [12]). In steps 2, the Bernoulli
approximation is employed, depending of value mcalculated in step 1. Finally,
in steps 3-5, the matrix exponential is recovered.
Algorithm 1 Scaling and squaring Bernoulli algorithm for computing B=eA,
where A∈Cr×rand mMis the maximum approximation order allowed.
1: Choose adequate order mk6mMand scaling parameter s∈N∪ {0}
2: B=Pmk(A/2s) by using (10) (Pmk(·) Bernoulli matrix polynomial)
3: for i=1:sdo Recovering the matrix exponential
4: B=B2
5: end for
Bearing in mind (9), if Pm(A/2s) represents the matrix polynomial of order
mcorresponding to the Bernoulli approximation of exponential of matrix A/2s
and Tm(A/2s) denotes the same matrix polynomial but according to the Taylor
approach, then:
Pm(A/2s)=(e−1)
m
X
i=0
α(m)
i(A/2s)i=
m
X
i=0
b(m)
i(A/2s)i,(15)
Tm(A/2s) =
m
X
i=0
1
i!(A/2s)i=
m
X
i=0
ti(A/2s)i.
8
Table 1: Approximation polynomial coefficient vector differences between Bernoulli (b) and
Taylor (t) methods.
m ||b-t|| ||b-t||/||t||
2 5.023311e-01 2.009324e-01
4 5.695696e-02 2.103026e-02
6 2.741618e-03 1.008669e-03
9 1.293850e-05 4.759808e-06
12 4.657888e-08 1.713541e-08
16 2.819122e-11 1.037097e-11
20 1.445479e-14 5.317621e-15
25 2.735502e-19 1.006335e-19
30 4.901565e-22 1.803185e-22
The coefficients b(m)
iof the Bernoulli approximation polynomial differ sig-
nificantly from those of the Taylor one, ti, with i= 0, . . . , m, when m∈
{2,4,6,9,12,16,20}. However, they are practically identical for m∈ {25,30, . . .}.
This is so because, as the degree mof the Bernoulli polynomial increases, all its
coefficients b(m)
ivary, approaching the corresponding Taylor ones. Table 1 col-
lects the 1-norm of the absolute and relative differences between the coefficient
vectors of Bernoulli and Taylor polynomial approximations for different values
of m. As it can be seen, the 1-norm of these differences is less than the unit
roundoff (u= 2−53 w1.11 ×10−16) when m= 25 or m= 30. As an example,
the 1-norm of the relative difference between these coefficients when m= 25
is equal to 1.006335 ×10−19. As expected, the differences are smaller when m
increases.
Therefore, these experimental results show that backward errors expressed
in (11) and (13) are only fulfilled for Bernoulli approximation for values of m
greater than or equal to 25, but not for lower values. As a consequence of
this analysis carried out, the Algorithm 2 has been developed. It computes the
matrix exponential by means of Taylor series, when mis less than or equal to
20, or by means of Bernoulli series, when mis equal to 25 or 30.
Algorithm 2 Scaling and squaring Bernoulli algorithm for computing B=eA,
where A∈Cr×rand mMis the maximum approximation order allowed.
1: Choose adequate order mk6mMand scaling parameter s∈N∪ {0}
2: if mk≤20 then
3: B=Pmk(A/2s) using (10) (Pmk(·) Taylor matrix polynomial)
4: else
5: B=Pmk(A/2s) using (10) (Pmk(·) Bernoulli matrix polynomial)
6: end if
7: for i=1:sdo Recovering the matrix exponential
8: B=B2
9: end for
9
4. Numerical experiments
In this section, we will firstly compare expmber, the MATLAB implementa-
tion corresponding to Algorithm 1, based on Bernoulli approximation, with the
functions exptaynsv3 [12], that computes the matrix exponential using Taylor
matrix polynomials, and expm new [8], which implements a scaling and squaring
Pad´e-based algorithm to work out the mentioned matrix function. Next, we
will compare expmbertay, the function that combines Taylor and Bernoulli ap-
proximations in accordance with Algorithm 2, with expmber,exptaynsv3 and
expm new.
Algorithm 3 computes the “exact” matrix exponential function thanks to
MATLAB’s Symbolic Math Toolbox with 256 digits of precision. This algorithm
provides the exact solution when it finds that the relative error between T(j)
mk(n)
and T(i)
mk−1(n) is less than u= 2−53 (see (18)), where T(s)
m(n) is the Taylor matrix
approximation of order mof the scaled matrix A/2s, computed with ndigits of
precision by using the vpa (variable-precision arithmetic) MATLAB function in
the Algorithm 4. Previously, T(j)
mk(n) and T(i)
mk−1have been calculated so that
(16) and (17) are fulfilled.
Algorithm 3 Computes the “exact” matrix exponential T=eA, where A∈
Cr×r, by means of Taylor expansion.
1: if there exist two consecutive orders mk−1, mk∈ {30,36,42,49,56,64}and
integers 1 ≤i, j ≤15 for ssuch that
T(i)
mk−1(n)−T(i−1)
mk−1(n)
1
T(i)
mk−1(n)
1
< u, (16)
and
T(j)
mk(n)−T(j−1)
mk(n)
1
T(j)
mk(n)
1
< u, (17)
and
T(j)
mk(n)−T(i)
mk−1(n)
1
T(j)
mk(n)
1
< u (18)
by using Algorithm 4, then
return T=T(j)
mk(n)
2: else
return error
3: end if
10
Algorithm 4 Computes T(s)
m(n) = eA, where A∈Cr×r, by Taylor expansion
of order mand scaling parameter susing vpa MATLAB function with ndigits
of precision.
1: Compute T(s)
m(n) = Pm(A/2s) using Taylor expansion of order mwith n
digits of precision
2: for i=1:sdo
3: T(s)
m(n) = [T(s)
m(n)]2
4: end for
4.1. Experiments description
The following test battery, composed of three types of different and repre-
sentative matrices, has been chosen to compare the numerical performance of
the above described codes:
a) One hundred diagonalizable 128×128 real matrices with 1-norms varying
from 2.18 to 207.52. These matrices have the form A=V DV T, where D
is a diagonal matrix with real and complex eigenvalues and Vis an orthog-
onal matrix obtained as V=H/√128, being Hthe Hadamard matrix.
The “exact” matrix exponential was computed as exp(A) = Vexp(D)VT
(see [4, pp. 10]).
b) One hundred non-diagonalizable 128×128 complex matrices with 1-norms
ranging from 84 to 98. These matrices have the form A=V JV T, where
Jis a Jordan matrix with complex eigenvalues of modulus less than 10
and random algebraic multiplicity varying from 1 to 5. Vis an orthogonal
matrix obtained as V=H/√128, where His the Hadamard matrix. The
“exact” matrix exponential was worked out as exp(A) = Vexp(J)VT.
c) State-of-the-art matrices:
–Forty 128 ×128 matrices from the Matrix Computation Toolbox
(MCT) [32].
–Sixteen matrices from the Eigtool MATLAB package (EMP) [33] with
sizes 127 ×127 and 128 ×128.
The “exact” matrix exponential for these matrices was computed by
using Taylor approximations of orders 30, 36, 42, 49, 56 and 64,
changing their scaling parameter (see Algorithm 3).
Although the MCT and the EMP are initially composed of fifty-two
and twenty matrices, respectively, twelve from the MCT and four
from the EMP matrices were discarded for different reasons. For
example, matrices numbers 5, 10, 16, 17, 21, 25, 26, 42, 43, 44 and
49 belonging to the MCT and matrices numbers 5 and 6 appertaining
to the EMP were not taken into account since the exact exponential
solution could not be computed. Besides, matrix number 2 from
the MCT and matrices numbers 3 and 10 from the EMP were not
11
Table 2: Matrix products (P) for Tests 1, 2 and 3 using expmber,exptaynsv3 and expm new
MATLAB codes.
P(expmber) P(exptaynsv3) P(expm new)
Test 1 1131 1131 1178.33
Test 2 1100 1100 1227.33
Test 3 617 617 654.67
considered because the excessively high relative error provided by all
the methods to be compared.
An experiment, called Test, is performed for each of the three sets of matrices
described, respectively, that evaluates the computational cost and the numerical
accuracy of the methods under comparison. The three tests have been executed
using MATLAB (R2018b) running on an HP Pavilion dv8 Notebook PC with
an Intel Core i7 CPU Q720 @1.60Ghz processor and 6 GB of RAM.
4.2. Experimental results
Table 2 shows the computational costs of each method represented in terms
of the number of matrix products (P), taking into account that the cost of
the rest of the operations is negligible compared to it for big enough matrices.
As it can be seen, expmber and exptaynsv3 achieved an identical number of
matrix multiplications, since the same algorithm was used by both of them to
calculate the degree of the polynomial (m) and the value of the scaling (s).
This number of products was lower than that required by expm new, which gave
rise to the highest computational cost. In addition to the matrix products,
expm new solves a system of linear equations with nright-hand side vectors
where nrepresents the size of the square coefficient matrix, whose computational
cost was approximated as 4/3 matrix products.
Table 3, on the other hand, shows the percentage of cases in which the
relative errors of expmber are lower, greater or equal than those of exptaynsv3
and expm new. More in detail, the relative error was computed as
E = kexp(A)−˜exp(A)k1
kexp(A)k1
where ˜exp(A) is the approximate solution and exp(A) is the exact one.
With the exception of Test 3, the Bernoulli approach resulted in relative
errors lower than those of Taylor one. With regard to Pad´e, the Bernoulli
algorithm always offered considerably more accurate results, which reached up
to 100% of the matrices for Test 2.
For the three tests, respectively, the normwise relative errors (a), the per-
formance profiles (b), the ratio of the relative errors (c) and the ratio of the
matrix products (d) among the distinct compared methods have been plotted
in Figures 1, 2, and 3.
12
Table 3: Relative error comparison among expmber vs exptaynsv3 and expmber vs expm new
for the three tests.
Test 1 Test 2 Test 3
E(expmber)<E(exptaynsv3) 56% 91% 30.36%
E(expmber)>E(exptaynsv3) 43% 9% 62.5%
E(expmber)=E(exptaynsv3) 1% 0% 7.14%
E(expmber)<E(expm new) 97% 100% 69.64%
E(expmber)>E(expm new) 3% 0% 30.36%
E(expmber)=E(expm new) 0% 0% 0%
Regarding the normwise relative errors presented in Figures 1a, 2a and 3a,
their solid line represents the function kexpu, where kexp (or cond) is the con-
dition number of the matrix exponential function [4, Chapter 3] and uis the
unit roundoff. In general, expmber exhibited a very good numerical stability.
This can be appreciated seeing the distance from each matrix normwise relative
error to the cond ∗uline. In Figures 1a and 2a, the numerical stability is even
much better because these errors are below this line. Because kexp was infinite
or enormously high for the matrices 6, 7, 12, 15, 23, 36, 39, 50 and 51 from
the MCT and for the matrices 1, 4, 8 and 15 from the EMP, all of them were
rejected in the Figure 3a visualisation but considered in the other ones.
In the performance profile Figures (1b, 2b and 3b), the αcoordinate, on the
x-axis, varies from 1 to 5 in steps equal to 0.1. For a concrete αvalue, the p
coordinate, on the y-axis, means the probability that the considered algorithm
has a relative error lower than or equal to α-times the smallest relative error
over all the methods on the given test. For the first two tests (Figures 1b,
2b), the performance profile shows that Bernoulli and Taylor methods accuracy
was similar. Both of them had much better correctness than the Pad´e method.
Notwithstanding, Figure 3b reveals that exptaynsv3 code improved the result
accuracy against the expmber function, for the Test 3.
In Figures 1c, 2c and 3c, the ratios of relative errors have been presented
in decreasing order with respect to E(expmber)/E(exptaynsv3). They confirm
the data exposed in Table 3, where it was shown that expmber provides more
accurate results than exptaynsv3 for Tests 1 and 2, but not for Test 3. It is
obvious to note that Pad´e offered the worst performance in most cases.
In our opinion, this is clearly due to the distinctive numerical characteristics
of the 3 sets of matrices analysed and the degree of the polynomial (m) re-
quired to be used. According to our experience, expmber provides results with
a very appropriate accuracy for values of mequal to 25 or 30. However, for
significantly lower values, expmber will be less competitive than other codes,
such as exptaynsv3. Minimum, maximum and average values of mrequired for
Tests 1, 2 and 3 are collected in Table 4. In more detail, Figure 4 shows the
approximation polynomial order employed in the calculation of the exponential
function by means of expmber (or exptaynsv3) for each of the matrices that
are part of the test battery.
As it was presented in Table 2, expmber and exptaynsv3 functions per-
13
0 20 40 60 80 100
Matrix
10-16
10-15
10-14
10-13
10-12
Er
cond*u
expmber
exptaynsv3
expm_newm
(a) Normwise relative errors.
12345
0
0.2
0.4
0.6
0.8
1
p
expmber
exptaynsv3
expm_newm
(b) Performance profile.
0 20 40 60 80 100
Matrix
10-2
10-1
100
101
102
103
Relative error ratio
E(expmber)/E(exptaynsv3)
E(expmber)/E(expm_newm)
(c) Ratio of relative errors.
0 20 40 60 80 100
Matrix
1
1.05
1.1
1.15
1.2
Matrix product ratio
P(exptaynsv3)/P(expmber)
P(expm_newm)/P(expmber)
(d) Ratio of matrix products.
Figure 1: Experimental results for Test 1.
Table 4: Minimum, maximum and average polynomial degree (m) required for Tests 1, 2 and
3 using expmber or exptaynsv3 functions.
Minimum Maximum Average
Test 1 16 30 27.51
Test 2 30 30 30
Test 3 12 30 25.70
formed a lower number of matrix operations than expm new one. This statement
can be also corroborated from the results displayed in Figures 1d, 2d and 3d,
where the ratio between the number of expm new and expmber matrix products
ranged from 1.03 to 1.22 for Test 1, from 1.03 to 1.12 for Test 2 and from 0.67
to 2.87 for Test 3.
Next, we will analyse the idea of using Bernoulli and Taylor methods to-
gether, giving place to a novel approach to compute the matrix exponential
function. For that, we will start getting the benefits of the exptaynsv3 func-
tion against expm new one. As Table 5 shows, the percentage of cases in which
14
0 20 40 60 80 100
Matrix
10-15
10-14
Er
cond*u
expmber
exptaynsv3
expm_newm
(a) Normwise relative errors.
12345
0
0.2
0.4
0.6
0.8
1
p
expmber
exptaynsv3
expm_newm
(b) Performance profile.
0 20 40 60 80 100
Matrix
0.2
0.3
0.4
0.5
0.6
Relative error ratio
E(expmber)/E(exptaynsv3)
E(expmber)/E(expm_newm)
(c) Ratio of relative errors.
0 20 40 60 80 100
Matrix
1
1.02
1.04
1.06
1.08
1.1
1.12
Matrix product ratio
P(exptaynsv3)/P(expmber)
P(expm_newm)/P(expmber)
(d) Ratio of matrix products.
Figure 2: Experimental results for Test 2.
Table 5: Relative error comparison between exptaynsv3 and expm new for the three tests.
Test 1 Test 2 Test 3
E(exptaynsv3)<E(expm new) 100% 100% 89.29%
E(exptaynsv3)>E(expm new) 0% 0% 10.71%
E(exptaynsv3)=E(expm new) 0% 0% 0%
Taylor relative error is lower than Pad´e reaches 100% for Test 1 and 2, and
89.29% for Test 3. Evidently, these error percentages improve those offered by
Bernoulli approximation with respect to expm new, as described in Table 3.
From these excellent results, we therefore considered the possibility of com-
bining Bernoulli and Taylor methods, giving rise to the expmbertay code.
In this new function, and according to the comparison between the coeffi-
cients of their polynomials carried out previously, we will use the Taylor ap-
proach (exptaynsv3) for values of m below 25 and the Bernoulli approximation
(expmber) when m equals 25 or 30. In this way, the number of matrix prod-
ucts needed by expmbertay will obviously be identical to that of expmber or
15
0 10 20 30 40 50
Matrix
10-20
10-15
10-10
Er
cond*u
expmber
exptaynsv3
expm_newm
(a) Normwise relative errors.
12345
0
0.2
0.4
0.6
0.8
p
expmber
exptaynsv3
expm_newm
(b) Performance profile.
0 10 20 30 40 50 60
Matrix
10-10
10-5
100
105
1010
Relative error ratio
E(expmber)/E(exptaynsv3)
E(expmber)/E(expm_newm)
(c) Ratio of relative errors.
0 10 20 30 40 50 60
Matrix
1
1.5
2
2.5
Matrix product ratio
P(exptaynsv3)/P(expmber)
P(expm_newm)/P(expmber)
(d) Ratio of matrix products.
Figure 3: Experimental results for Test 3.
exptaynsv3.
Table 6 collects thus the percentage of matrices in which the relative errors
of expmbertay are lower, greater or equal than those of exptaynsv3,expmber
and expm new. For the vast majority of matrices, expmbertay provided an accu-
racy in the results practically identical to that of expmber, but improving even
the latter by 23.21% for the matrices of Test 3. With respect to exptaynsv3,
expmbertay also enhanced the results achieved by expmber, so that this com-
bined method is now better or equal than exptaynsv3 in 55.36% of cases for Test
3. Moreover, expmbertay became better than expm new in 100% of the matrices
for Test 1 and 2, and 91.07% for Test 3, which is higher than the percentages
individually offered by expmber (69.64%) and exptaynsv3 (89.29%).
Numerical features of expmbertay are finally exposed in Figures 5, 6 and 7
for the three tests by means of the normwise relative errors (a), the performance
profiles (b) and the ratio of the relative errors (c). As it can be seen, the method
presents an excellent precision in the results, with very low relative errors and
a very high probability in the performance profile pictures.
16
0 20 40 60 80 100
Matrix
16
18
20
22
24
26
28
30
Approximation polynomial order
M(expmber)
M(exptaynsv3)
(a) Test 1.
0 20 40 60 80 100
Matrix
29
29.5
30
30.5
31
Approximation polynomial order
M(expmber)
M(exptaynsv3)
(b) Test 2.
0 10 20 30 40 50 60
Matrix
10
15
20
25
30
Approximation polynomial order
M(expmber)
M(exptaynsv3)
(c) Test 3.
Figure 4: Polynomial order (m) for Test 1, 2 and 3.
Table 6: Relative error comparison among expmbertay vs exptaynsv3,expmbertay vs expmber,
and expmbertay vs expm new for the three tests.
Test 1 Test 2 Test 3
E(expmbertay)<E(exptaynsv3) 57% 91% 44.64%
E(expmbertay)>E(exptaynsv3) 42% 9% 44.64%
E(expmbertay)=E(exptaynsv3) 1% 0% 10.72%
E(expmbertay)<E(expmber) 3% 0% 23.21%
E(expmbertay)>E(expmber) 0% 0% 0%
E(expmbertay)=E(expmber) 97% 100% 76.79%
E(expmbertay)<E(expm new) 100% 100% 91.07%
E(expmbertay)>E(expm new) 0% 0% 8.93%
E(expmbertay)=E(expm new) 0% 0% 0%
We have also included into the developed software an “accelerated” version
of the expmber function that computes the matrix exponential on an NVIDIA
GPU. Matrix multiplication is an operation very rich in intrinsic parallelism that
17
0 20 40 60 80 100
Matrix
10-16
10-15
10-14
10-13
10-12
Er
cond*u
expmbertay
exptaynsv3
expmber
expm_newm
(a) Normwise relative errors.
12345
0
0.2
0.4
0.6
0.8
1
p
expmbertay
exptaynsv3
expmber
expm_newm
(b) Performance profile.
0 20 40 60 80 100
Matrix
10-2
10-1
100
101
102
103
Relative error ratio
E(expmbertay)/E(exptaynsv3)
E(expmbertay)/E(expmber)
E(expmbertay)/E(expm_newm)
(c) Ratio of relative errors.
Figure 5: Experimental results for Test 1.
can be optimized for GPUs. Algorithms that rely on many matrix multiplica-
tions, like the one proposed, can take full advantage of these devices through the
use of the cuBLAS [34] package. Our “GPU version” uses the regular MATLAB
scripting language in the same way as in the other algorithms used so far but, at
some points in the code, a function implemented into a MEX file is called. This
function is implemented in CUDA [35] and dispatches the operation described
in the function to the GPU. This way, all the matrix products are computed
by the GPU presented in the computing platform. The exact implementation
details about how these MEX files are implemented can be found in [36].
The experimental results corresponding to this part of the work were carried
out on a computer equipped with two processors Intel Xeon CPU E5-2698 v4
@2.20GHz (Broadwell architecture) featuring 20 cores each. The regular MAT-
LAB files, i.e. all those that do not make use of the GPU through a MEX file,
18
0 20 40 60 80 100
Matrix
10-15
10-14
Er
cond*u
expmbertay
exptaynsv3
expmber
expm_newm
(a) Normwise relative errors.
12345
0
0.2
0.4
0.6
0.8
1
p
expmbertay
exptaynsv3
expmber
expm_newm
(b) Performance profile.
0 20 40 60 80 100
Matrix
0.2
0.3
0.4
0.5
0.6
Relative error ratio
E(expmbertay)/E(exptaynsv3)
E(expmbertay)/E(expmber)
E(expmbertay)/E(expm_newm)
(c) Ratio of relative errors.
Figure 6: Experimental results for Test 2.
use the 40 cores available in the target computer by default1. We denote this
implementation as the “CPU version” when compared with the “GPU version”
described above. To get the algorithm performance on the GPU, we used one
NVIDIA Tesla P100-SXM2 (Kepler architecture) that counts on 3584 CUDA
cores and 16 GB of memory.
Figure 8 shows the execution time in seconds on the left and the speed up
achieved with the GPU version with regard to its CPU counterpart on the right.
The plots also compare the performance of the former algorithm based on Tay-
lor series (exptaynsv3) with the new one based on Bernoulli series (expmber)
presented here. At the light of the figure, it can be concluded that both algo-
1“Linear algebra and numerical functions such as fft,\(mldivide),eig,svd, and sort
are multithreaded in MATLAB. Multithreaded computations have been on by default in
MATLAB since Release 2008a.” In particular, MATLAB uses the Intel MKL, where the
matrix multiplication is threaded, i.e. is a parallel implementation with OpenMP.
19
0 10 20 30 40 50
Matrix
10-20
10-15
10-10
Er
cond*u
expmbertay
exptaynsv3
expmber
expm_newm
(a) Normwise relative errors.
12345
0
0.2
0.4
0.6
0.8
p
expmbertay
exptaynsv3
expmber
expm_newm
(b) Performance profile.
0 10 20 30 40 50 60
Matrix
10-10
10-5
100
105
1010
Relative error ratio
E(expmbertay)/E(exptaynsv3)
E(expmbertay)/E(expmber)
E(expmbertay)/E(expm_newm)
(c) Ratio of relative errors.
Figure 7: Experimental results for Test 3.
rithms, exptaynsv3 and expmber, behave very similarly. The reduction in time
obtained with the GPU with respect to the CPU starts approximately with ma-
trices of size n= 1000 and increases with the problem size. The weight of both
algorithms falls on the same basic computational kernel (matrix multiplications)
and both of them require an identical number of them. The computational per-
formance of routine expmbertay would be very similar to expmber, since it uses
once again the same number of matrix products.
5. Conclusions
The starting point of this work is a new expression of the matrix exponential
function cast in terms of Bernoulli matrix polynomials. Using this series expan-
sion, a new method for calculating the exponential of a matrix (implemented as
expmber code) has been developed. The proposed algorithm has been tested us-
ing a state-of-the-art matrix test battery with different features (diagonalizable
and non diagonalizable, with particular eigenvalue spectrum) that covers a wide
20
0
5
10
15
20
1000 2000 3000 4000 5000 6000 7000 8000 9000
Problem dimension
exptaynsv3 CPU.
expmber CPU.
exptaynsv3 GPU.
expmber GPU.
(a) Execution time.
0
0.5
1
1.5
2
2.5
3
3.5
4
1000 2000 3000 4000 5000 6000 7000 8000 9000
Problem dimension
exptaynsv3
expmber
(b) Speed up (GPU time / CPU time).
Figure 8: Execution time (a) and speed up (b) of the algorithm to compute the matrix
exponential using the Taylor series (exptaynsv3) and the Bernoulli (expmber) series on CPU
and on GPU for large randomly generated matrices.
range of cases. The developed code has been compared with the best imple-
mentations available, i.e. Pad´e-based algorithm (expm new) and Taylor-based
one (exptaynsv3), outperforming Pad´e-based algorithm and giving results at
the level of Taylor-based solutions in both accuracy and computational cost.
Preliminary results with the Bernoulli version for the matrix exponential
function motivated us to develop a hybrid code (called expmbertay) that com-
bines the best of both the Taylor and Bernoulli solutions and resulting in ex-
cellent results. Therefore, expmbertay code is clearly competitive and highly
recommended for the matrix exponential calculation, regardless of the type of
matrix to be computed. Finally, we showed that the two algorithms developed
in this contribution keep the advantages of other ones based on matrix polyno-
mial expansion. Since they all are based on matrix multiplications, the GPU
version implemented has turned out to be a strong tool to compute the matrix
exponential approximation when the numerical methods employed are stressed
with large dimension matrices.
Acknowledgements
This work has been partially supported by Spanish Ministerio de Econom´ıa
y Competitividad and European Regional Development Fund (ERDF) grants
TIN2017-89314-P and by the Programa de Apoyo a la Investigaci´on y De-
sarrollo 2018 of the Universitat Polit`ecnica de Val`encia (PAID-06-18) grants
SP20180016.
References
[1] C. F. V. Loan, A study of the matrix exponential, numerical analysis report,
Tech. rep., Manchester Institute for Mathematical Sciences, The University
21
(2006).
[2] C. B. Moler, C. V. Loan, Nineteen dubious ways to compute the exponential
of a matrix, SIAM Rev. 20 (4) (1978) 801–836.
[3] C. B. Moler, C. V. Loan, Nineteen dubious ways to compute the exponential
of a matrix, twenty-five years later*, SIAM Rev. 45 (2003) 3–49.
[4] N. J. Higham, Functions of Matrices: Theory and Computation, SIAM,
Philadelphia, PA, USA, 2008.
[5] M. Benzi, E. Estrada, C. Klymko, Ranking hubs and authorities using
matrix functions, Linear Algebra and its Applications 438 (2013) 2447–
2474.
[6] G. A. Baker, P. Graves-Morris, Pad´e Approximants, Encyclopedia of Math-
ematics and its Applications, Cambridge University Press, 1996.
[7] L. Dieci, A. Papini, Pad´e approximation for the exponential of a block
triangular matrix, Linear Algebra Appl. 308 (2000) 183–202.
[8] A. H. Al-Mohy, N. J. Higham, A new scaling and squaring algorithm for
the matrix exponential, SIAM J. Matrix Anal. Appl. 31 (3) (2009) 970–989.
[9] N. J. Higham, The scaling and squaring method for the matrix exponential
revisited, Tech. Rep. 452, Manchester Centre for Computational Mathe-
matics (2004).
[10] R. B. Sidje, Expokit: A software package for computing matrix exponen-
tials, ACM Trans. Math. Softw. 24 (1) (1998) 130–156.
[11] J. Sastre, J. Ib´a˜nez, E. Defez, P. Ruiz, New scaling-squaring Taylor algo-
rithms for computing the matrix exponential, SIAM Journal on Scientific
Computing 37 (1) (2015) A439–A455.
[12] P. Ruiz, J. Sastre, J. Ib´a˜nez, E. Defez, High performance computing of the
matrix exponential, Journal of Computational and Applied Mathematics
291 (2016) 370–379.
[13] J. Sastre, J. Ib´a˜nez, E. Defez, Boosting the computation of the matrix
exponential, Applied Mathematics and Computation 340 (2019) 206–220.
[14] E. Defez, L. J´odar, Some applications of the Hermite matrix polynomi-
als series expansions, Journal of Computational and Applied Mathematics
99 (1) (1998) 105–117.
[15] J. Sastre, J. Ib´a˜nez, E. Defez, P. Ruiz, Efficient orthogonal matrix polyno-
mial based method for computing matrix exponential, Applied Mathemat-
ics and Computation 217 (14) (2011) 6451–6463.
22
[16] F. W. Olver, D. W. Lozier, R. F. Boisvert, C. W. Clark, NIST handbook
of mathematical functions hardback and CD-ROM, Cambridge University
Press, 2010.
[17] E. Tohidi, K. Erfani, M. Gachpazan, S. Shateyi, A new Tau method for
solving nonlinear Lane-Emden type equations via Bernoulli operational ma-
trix of differentiation, Journal of Applied Mathematics 2013 (2013).
[18] A. W. Islam, M. A. Sharif, E. S. Carlson, Numerical investigation of double
diffusive natural convection of co2in a brine saturated geothermal reservoir,
Geothermics 48 (2013) 101–111.
[19] E. Tohidi, A. Bhrawy, K. Erfani, A collocation method based on Bernoulli
operational matrix for numerical solution of generalized pantograph equa-
tion, Applied Mathematical Modelling 37 (6) (2013) 4283–4294.
[20] A. Bhrawy, E. Tohidi, F. Soleymani, A new Bernoulli matrix method
for solving high-order linear and nonlinear Fredholm integro-differential
equations with piecewise intervals, Applied Mathematics and Computation
219 (2) (2012) 482–497.
[21] E. Tohidi, M. Ezadkhah, S. Shateyi, Numerical solution of nonlinear frac-
tional Volterra integro-differential equations via Bernoulli polynomials, Ab-
stract and Applied Analysis 2014 (2014).
[22] F. Toutounian, E. Tohidi, S. Shateyi, A collocation method based on the
Bernoulli operational matrix for solving high-order linear complex differ-
ential equations in a rectangular domain, Abstract and Applied Analysis
2013 (2013).
[23] E. Tohidi, F. Toutounian, Convergence analysis of Bernoulli matrix ap-
proach for one-dimensional matrix hyperbolic equations of the first order,
Computers & Mathematics with Applications 68 (1-2) (2014) 1–12.
[24] E. Tohidi, M. K. Zak, A new matrix approach for solving second-order lin-
ear matrix partial differential equations, Mediterranean Journal of Mathe-
matics 13 (3) (2016) 1353–1376.
[25] F. Toutounian, E. Tohidi, A new Bernoulli matrix method for solving sec-
ond order linear partial differential equations with the convergence analysis,
Applied Mathematics and Computation 223 (2013) 298–310.
[26] O. Kouba, Lecture Notes, Bernoulli Polynomials and Applications, arXiv
preprint arXiv:1309.7560 (2013).
[27] F. Costabile, F. Dell’Accio, Expansion over a rectangle of real functions in
Bernoulli polynomials and applications, BIT Numerical Mathematics 41 (3)
(2001) 451–464.
23
[28] F. Costabile, F. Dell’Accio, Expansions over a simplex of real functions
by means of Bernoulli polynomials, Numerical Algorithms 28 (1-4) (2001)
63–86.
[29] E. D. Rainville, Special functions, Vol. 442, New York, 1960.
[30] M. S. Paterson, L. J. Stockmeyer, On the number of nonscalar multiplica-
tions necessary to evaluate polynomials, SIAM Journal on Computing 2 (1)
(1973) 60–66.
[31] J. Higham, F. Tisseur, A block algorithm for matrix 1-norm estimation,
with an application to 1-norm pseudospectra, SIAM J. Matrix Anal. Appl.
21 (2000) 1185–1201.
[32] N. J. Higham, The test matrix toolbox for MATLAB (Version 3.0), Uni-
versity of Manchester Manchester, 1995.
[33] T. Wright, Eigtool, version 2.1 (2009).
URL web.comlab.ox.ac.uk/pseudospectra/eigtool.
[34] NVIDIA, cuBLAS (2020).
URL https://docs.nvidia.com/cuda/cublas
[35] NVIDIA, CUDA Toolkit Documentation v11.0.3 (2020).
URL https://docs.nvidia.com/cuda
[36] P. Alonso, J. Peinado, J. Ib´a˜nez, J. Sastre, E. Defez, Computing matrix
trigonometric functions with GPUs through Matlab, The Journal of Super-
computing 75 (3) (2019) 1227–1240.
24