Content uploaded by Jesús Peinado

Author content

All content in this area was uploaded by Jesús Peinado on Nov 11, 2021

Content may be subject to copyright.

On Bernoulli matrix polynomials and matrix

exponential approximation

E. Defeza, J. Ib´a˜nezb, P. Alonso-Jord´ac,∗, Jos´e M. Alonsob, J. Peinadoc

Universitat Polit`ecnica de Val`encia, Camino de Vera s/n, 46022, Valencia. Spain

aInstituto de Matem´atica Multidisciplinar

bInstituto de Instrumentaci´on para Imagen Molecular

cDepartment of Information Systems and Computation

Abstract

We present in this paper a new method based on Bernoulli matrix polynomials

to approximate the exponential of a matrix. The developed method has given

rise to two new algorithms whose eﬃciency and precision are compared to the

most eﬃcient implementations that currently exist. For that, a state-of-the-art

test matrix battery, that allows deeply exploring the highlights and downsides of

each method, has been used. Since the new algorithms proposed here do make an

intensive use of matrix products, we also provide a GPUs-based implementation

that allows to achieve a high performance thanks to the optimal implementation

of matrix multiplication available on these devices.

Keywords:

Bernoulli matrix approximation, Matrix exponential function, GPU computing

1. Introduction

The computation of matrix functions has received attention in the last years

because of its several applications in diﬀerent areas of science and technology. Of

all the matrix functions, the exponential matrix eA,A∈Cr×r,stands out, due

both to its applications in the resolution of systems of diﬀerential equations, or

in the graph theory, and to the diﬃculties involved in its computation, see [1–5].

Among the proposed methods for the approximate computation of the expo-

nential matrix, two fundamental ones stand out: those based on rational Pad´e

approximations [6–10], and those based on polynomial approximations, which

are either Taylor series developments [11–13] or serial developments of Hermite

matrix polynomials [14, 15]. In general, polynomial approximations showed to

∗Corresponding author

Email addresses: edefez@imm.upv.es (E. Defez), jjibanez@dsic.upv.es (J. Ib´a˜nez),

palonso@upv.es (P. Alonso-Jord´a), jmalonso@dsic.upv.es (Jos´e M. Alonso),

jpeinado@dsic.upv.es (J. Peinado)

Preprint submitted to Journal of Computational and Applied MathematicsNovember 11, 2021

be more eﬃcient than the Pad´e algorithm in tests because they are more ac-

curate, despite a slightly higher cost in some cases. All these methods use the

basic property of Scaling-Squaring, based on the relationship

eA=eA/2s2s

.

Thus, if Pm(A) is a matrix polynomial approximation of eA, then given a

matrix Aand a scaling factor s,Pm(A/2s) is an approximation to eA/2sand

eA≈(Pm(A/2s))2s

.(1)

Bernoulli polynomials and Bernoulli numbers have been extensively used in

several areas of mathematics, as number theory, and they appear in many math-

ematical formulas, such as the residual term of the Euler-Maclaurian quadra-

ture rule [16, p. 63], the Taylor series expansion of the trigonometric functions

tan (x),csc (x) and cot (x) [16, p. 116-117] and the Taylor series expansion of

the hyperbolic function tanh (x), [16, p. 125]. They are also employed in the

well known exact expression of the even values of the Riemann zeta function:

ξ(2k) = X

n≥1

1

n2k=(−1)k−1(2π)2kB2k

2(2k)! , k ≥1.

Moreover, they are even used for solving initial value problems [17], boundary

value problems [18, 19], high-order linear and nonlinear Fredholm and Volterra

integro-diﬀerential equations [20, 21], complex diﬀerential equations [22] and

partial diﬀerential equations [23–25]. An excellent survey about Bernoulli poly-

nomials and its applications can be found in [26]. The development of series

functions of Bernoulli polynomials has been studied in [27, 28].

In this paper, we present a new series development of the exponential matrix

in terms of Bernoulli matrix polynomials which demonstrates that the polyno-

mial approximations of the exponential matrix are more accurate and less com-

putationally expensive in most cases than those based on Pad´e approximants.

We also verify that this new method based on Bernoulli matrix polynomials is

a competitive method for the approximation of the exponential matrix, with a

similar computational cost than the Taylor one but, generally, more accurate.

The organization of the paper is as follows. Section 2 is devoted to Bernoulli

polynomials. We show how to obtain a series of a matrix exponential in terms

of Bernoulli matrix polynomials and how to approach that exponential of a

matrix. The following section describes the algorithms proposed. Tests and

comparisons are presented in Section 4. We close the document with some

conclusion remarks.

Notation

Throughout this paper, we denote by Cr×rthe set of all the complex square

matrices of size rand by Ithe identity matrix. A polynomial of degree m

means an expression of the form Pm(t) = amtm+am−1tm−1+··· +a1t+a0,

2

where tis a real variable and aj, for 0 ≤j≤m, are complex numbers. In this

way, we can deﬁne the matrix polynomial Pm(B) for B∈Cr×ras Pm(B) =

amBm+am−1Bm−1+·· · +a1B+a0I. Throughout this paper, we denote In

(or I) and 0n×nthe matrix identity and the null matrix of order n, respectively.

With dxe, we denote the result of rounding xto the nearest integer greater than

or equal to xand bxcdenotes the result of rounding xto the nearest integer

less than or equal to x. As usual, the matrix norm ||·|| denotes any subordinate

matrix norm; in particular || · ||1is the usual 1−norm. Finally, if A(k, m) are

matrices in Cn×nfor m≥0, k ≥0, from [29] it follows that

X

m≥0X

k≥0A(k, m) = X

m≥0

m

X

k=0 A(k, m −k).(2)

2. On Bernoulli matrix polynomials

The Bernoulli polynomials Bn(x) are deﬁned in [16, p.588] as the coeﬃcients

of the generating function

g(x, t) = tetx

et−1=X

n≥0

Bn(x)

n!tn,|t|<2π, (3)

where g(x, t) is an holomorphic function in Cfor the variable t(it has an avoid-

able singularity in t= 0). Bernoulli polynomials Bn(x) has the explicit expres-

sion

Bn(x) =

n

X

k=0 n

kBkxn−k,(4)

where the Bernoulli numbers are deﬁned by Bn=Bn(0). Therefore, it follows

that the Bernoulli numbers satisfy

B0= 1,Bk=−

k−1

X

i=0 k

iBi

k+ 1 −i, k ≥1.(5)

Note that B3=B5=··· =B2k+1 = 0, for k≥1. For a matrix A∈Cr×rwe

deﬁne the m−th Bernoulli matrix polynomial by the expression

Bm(A) =

m

X

k=0 m

kBkAm−k.(6)

Thus, we can now calculate the exact value of eAt t

et−1, where A∈

Cr×r. By using (3) and (6) one gets

eAt t

et−1=

X

n≥0

An

n!tn

X

k≥0

Bk

k!tk

=X

n≥0X

k≥0

AnBk

n!k!tntk.

3

Taking into account that A(k, n) = AnBktntk

n!k!from (2), we have

eAt t

et−1=X

n≥0

n

X

k=0

Bk

k!tkAn−k

(n−k)!tn−k

=X

n≥0 n

X

k=0 n

kBkAn−k!tn

n!=X

n≥0

Bn(A)tn

n!,

where Bn(A) is the n-th Bernoulli matrix polynomial deﬁned in (6). In this

way, we can use the series expansion

eAt =et−1

tX

n≥0

Bn(A)tn

n!,|t|<2π, (7)

to obtain approximations of the matrix exponential. To do this, let’s take sthe

scaling (to be determined) of the matrix Aand take t= 1 in (7). We use the

matrix exponential approximation

Pm(A/2s)=(e−1)

m

X

n=0

Bn(A/2s)

n!.(8)

Approximation (8) has the problem that it is not expressed in explicit terms

of powers of the matrix A/2s. This explicit relationship is provided below.

Lemma 1. Given expression (8), we get

1

(e−1)Pm(A/2s) =

m

X

n=0

Bn(A/2s)

n!=

m

X

i=0

α(m)

i(A/2s)i,(9)

where α(m)

i=

m

X

k=ik

k−iBk−i

k!.

Proof: For m= 0 and m= 1 formula (9) trivially holds. From (8) we have that,

for m= 2 and using (6), one gets

1

(e−1)P2(A/2s) =

2

X

n=0

Bn(A/2s)

n!

=1

0!B0(A/2s) + 1

1!B1(A/2s) + 1

2!B2(A/2s)

=1

0! 0

X

k=0 0

kBk(A/2s)0−k!+1

1! 1

X

k=0 1

kBk(A/2s)1−k!

+1

2! 2

X

k=0 2

kBk(A/2s)2−k!

4

= 2

X

k=0

1

k!k

kBk!(A/2s)0+ 2

X

k=1

1

k!k

k−1Bk−1!(A/2s)1

+ 2

X

k=2

1

k!k

k−2Bk−2!(A/2s)2

=α(2)

0(A/2s)0+α(2)

1(A/2s)1+α(2)

2(A/2s)2

=

2

X

i=0

α(2)

i(A/2s)i,

thus formula (9) is true for m= 2. Similarly, for m= 3 one gets:

1

(e−1)P3(A/2s) =

3

X

n=0

Bn(A/2s)

n!

=1

0!B0(A/2s) + 1

1!B1(A/2s) + 1

2!B2(A/2s) + 1

3!B3(A/2s)

=1

0! 0

X

k=0 0

kBk(A/2s)0−k!+1

1! 1

X

k=0 1

kBk(A/2s)1−k!

+1

2! 2

X

k=0 2

kBk(A/2s)2−k!+1

3! 3

X

k=0 3

kBk(A/2s)3−k!

= 3

X

k=0

1

k!k

kBk!(A/2s)0+ 3

X

k=1

1

k!k

k−1Bk−1!(A/2s)1

+ 3

X

k=2

1

k!k

k−2Bk−2!(A/2s)2+ 3

X

k=3

1

k!k

k−3Bk−3!(A/2s)3

=α(3)

0(A/2s)0+α(3)

1(A/2s)1+α(3)

2(A/2s)2+α(3)

3(A/2s)3

=

3

X

i=0

α(3)

i(A/2s)i,

and formula (9) is also true for m= 3. We will use now an induction argument.

Suppose that formula (9) it is valid for mand let’s see for m+ 1. Using (6)

together with the induction hypothesis, one gets

1

(e−1)Pm+1(A/2s) =

m+1

X

n=0

Bn(A/2s)

n!

=

m

X

n=0

Bn(A/2s)

n!+1

(m+ 1)!Bm+1 (A/2s)

=

m

X

i=0

α(m)

i(A/2s)i+1

(m+ 1)! m+1

X

k=0 m+ 1

kBk(A/2s)m+1−k!

5

=α(m)

0+m+ 1

m+ 1Bm+1

(m+ 1)!(A/2s)0+· · · +α(m)

m+m+ 1

1B1

(m+ 1)!(A/2s)m

+m+ 1

0B0

(m+ 1)!(A/2s)m+1

= m+1

X

k=0 k

kBk

k!!(A/2s)0+ m+1

X

k=1 k

k−1Bk−1

k!!(A/2s)1+·· ·

·· · + m+1

X

k=m+1 k

k−(m+1)Bk−(m+1)

k!!(A/2s)m+1

=

m+1

X

i=0

α(m+1)

i(A/2s)i.

3. The proposed algorithms

The matrix polynomial Pm(A/2s) from (8) can be computed eﬃciently in

terms of matrix products using values for min the set

mk∈ {2,4,6,9,12,16,20,25,30, . . .},

with k= 1,2, . . ., by using the Paterson–Stockmeyer’s method [30]. If we con-

sider Pmk(A), then:

Pmk(A) = (10)

(((pmkAq+pmk−1Aq−1+pmk−2Aq−2+· ·· +pmk−q+1A+pmk−qI)Aq

+pmk−q−1Aq−1+pmk−q−2Aq−2+· ·· +pmk−2q+1A+pmk−2qI)Aq

+pmk−2q−1Aq−1+pmk−2q−2Aq−2+· ·· +pmk−3q+1A+pmk−3qI)Aq

. . .

+pq−1Aq−1+pq−2Aq−2+· ·· +p1A+p0I,

where q=√mkor q=√mk. Taking into account Table 4.1 from [4, pp.

74], the computational cost in terms of matrix products of (10) is k. To obtain

the exponential of matrix Awith enough precision and eﬃciency, it is necessary

to determine the values of mand sin expression (8). Once these values have

been determined, the approximation (1) is used to compute eAby means of

Bernoulli matrix polynomials.

For obtaining the optimal order of the series expansion mkand the scaling

parameter s, we have used a similar error analysis as in [12]:

Let be

A∈Ωm=X∈n×n:ρe−XTm(X)−I<1,

where ρ(·) is the spectral radius of a matrix and Tm(X) the Taylor approxima-

tion of matrix exponential of order m.

6

The backward error for computing eAcan be deﬁned as the matrix ∆Asuch

that eA+∆A=Tm(A). It can be veriﬁed that

∆AwX

k≥m+1

c(m)

kAk,

where P

k≥m+1

c(m)

kAkis the backward error of matrix exponential for the Taylor

approximation. The absolute backward error can be bound as follows:

Eab(A) = k∆Ak '

X

k≥m+1

c(m)

kAk

≤X

k≥m+1 |c(m)

k|||Ak||1/kk

≤X

k≥m+1 |c(m)

k|βk

m,(11)

where βm= max ||Ak||1/k , k ≥m+ 1. Theorem 2 from [12] shows that

βm=||Ak0||1/k0, where m+ 1 ≤k0≤2m+ 1, and that value can be approxi-

mated by means of

βm'max na1/(m+1)

m+1 , a1/(m+2)

m+2 o,

where am+1 and am+2 are the 1–norm estimation of ||Am+1|| and ||Am+2 ||, re-

spectively, using the block 1–norm estimation algorithm of Higham and Tisseur

[31]. Let be

Θ(ab)

m= max

θ≥0 : X

k≥m+1 c(m)

kθk≤u

,(12)

where u= 2−53 is the unit roundoﬀ in IEEE double precision ﬂoating-point

arithmetic. If an integer s≥0 veriﬁes 2−sβm<Θ(ab)

m, then

Eab(A/2s)≤X

k≥m+1 c(m)

k(Θ(ab)

m)k< u

and mand swill be taken, respectively, as the adequate order of the polynomial

and the scaling parameter.

Using the above reasoning, the relative backward error can be bound in the

following way:

Erb(A) = k∆Ak

kAk| '

P

k≥m+1

c(m)

kAk

kAk≤P

k≥m+1 |c(m)

k|

Ak

kAk

=X

k≥m|c(m)

k+1|||Ak||1/k k≤X

k≥m|c(m)

k+1|βk

m.(13)

7

Let be

Θ(rb)

m= max

θ≥0 : X

k≥mc(m)

k+1θk≤u

.(14)

Therefore, if an integer s≥0 satisﬁes 2−sβm<Θ(rb)

m, then

Erb(A/2s)≤X

k≥mc(m)

k+1(Θ(rb)

m)k< u

and the polynomial order mand the scaling parameter swill have been obtained.

The Θ(ab)

mand Θ(rb)

mparameters were worked out, with the required pre-

cision by using symbolic computations, from m= 2 to m= 30. Then, the

maximum value Θmbetween Θ(ab)

mand Θ(rb)

mwas computed for each m, i.e.

Θm=max(Θ(ab)

m,Θ(rb)

m), giving place to the Θmparameter included in the sec-

ond column of Table 2 from [12]. As a result, Θmequals Θ(ab)

mwhen m≤16 and

Θmmatches Θ(rb)

motherwise. On other words, the absolute backward error was

considered when the polynomial order is less or equal than 16 and the relative

backward error was taken into account for greater values.

Algorithm 1 computes the matrix exponential function based on Bernoulli

series and the Paterson–Stockmeyer’s method. Step 1 of this algorithm uses

the procedure previously described to obtain the values of mand s(as known,

a more detailed description can be found in [12]). In steps 2, the Bernoulli

approximation is employed, depending of value mcalculated in step 1. Finally,

in steps 3-5, the matrix exponential is recovered.

Algorithm 1 Scaling and squaring Bernoulli algorithm for computing B=eA,

where A∈Cr×rand mMis the maximum approximation order allowed.

1: Choose adequate order mk6mMand scaling parameter s∈N∪ {0}

2: B=Pmk(A/2s) by using (10) (Pmk(·) Bernoulli matrix polynomial)

3: for i=1:sdo Recovering the matrix exponential

4: B=B2

5: end for

Bearing in mind (9), if Pm(A/2s) represents the matrix polynomial of order

mcorresponding to the Bernoulli approximation of exponential of matrix A/2s

and Tm(A/2s) denotes the same matrix polynomial but according to the Taylor

approach, then:

Pm(A/2s)=(e−1)

m

X

i=0

α(m)

i(A/2s)i=

m

X

i=0

b(m)

i(A/2s)i,(15)

Tm(A/2s) =

m

X

i=0

1

i!(A/2s)i=

m

X

i=0

ti(A/2s)i.

8

Table 1: Approximation polynomial coeﬃcient vector diﬀerences between Bernoulli (b) and

Taylor (t) methods.

m ||b-t|| ||b-t||/||t||

2 5.023311e-01 2.009324e-01

4 5.695696e-02 2.103026e-02

6 2.741618e-03 1.008669e-03

9 1.293850e-05 4.759808e-06

12 4.657888e-08 1.713541e-08

16 2.819122e-11 1.037097e-11

20 1.445479e-14 5.317621e-15

25 2.735502e-19 1.006335e-19

30 4.901565e-22 1.803185e-22

The coeﬃcients b(m)

iof the Bernoulli approximation polynomial diﬀer sig-

niﬁcantly from those of the Taylor one, ti, with i= 0, . . . , m, when m∈

{2,4,6,9,12,16,20}. However, they are practically identical for m∈ {25,30, . . .}.

This is so because, as the degree mof the Bernoulli polynomial increases, all its

coeﬃcients b(m)

ivary, approaching the corresponding Taylor ones. Table 1 col-

lects the 1-norm of the absolute and relative diﬀerences between the coeﬃcient

vectors of Bernoulli and Taylor polynomial approximations for diﬀerent values

of m. As it can be seen, the 1-norm of these diﬀerences is less than the unit

roundoﬀ (u= 2−53 w1.11 ×10−16) when m= 25 or m= 30. As an example,

the 1-norm of the relative diﬀerence between these coeﬃcients when m= 25

is equal to 1.006335 ×10−19. As expected, the diﬀerences are smaller when m

increases.

Therefore, these experimental results show that backward errors expressed

in (11) and (13) are only fulﬁlled for Bernoulli approximation for values of m

greater than or equal to 25, but not for lower values. As a consequence of

this analysis carried out, the Algorithm 2 has been developed. It computes the

matrix exponential by means of Taylor series, when mis less than or equal to

20, or by means of Bernoulli series, when mis equal to 25 or 30.

Algorithm 2 Scaling and squaring Bernoulli algorithm for computing B=eA,

where A∈Cr×rand mMis the maximum approximation order allowed.

1: Choose adequate order mk6mMand scaling parameter s∈N∪ {0}

2: if mk≤20 then

3: B=Pmk(A/2s) using (10) (Pmk(·) Taylor matrix polynomial)

4: else

5: B=Pmk(A/2s) using (10) (Pmk(·) Bernoulli matrix polynomial)

6: end if

7: for i=1:sdo Recovering the matrix exponential

8: B=B2

9: end for

9

4. Numerical experiments

In this section, we will ﬁrstly compare expmber, the MATLAB implementa-

tion corresponding to Algorithm 1, based on Bernoulli approximation, with the

functions exptaynsv3 [12], that computes the matrix exponential using Taylor

matrix polynomials, and expm new [8], which implements a scaling and squaring

Pad´e-based algorithm to work out the mentioned matrix function. Next, we

will compare expmbertay, the function that combines Taylor and Bernoulli ap-

proximations in accordance with Algorithm 2, with expmber,exptaynsv3 and

expm new.

Algorithm 3 computes the “exact” matrix exponential function thanks to

MATLAB’s Symbolic Math Toolbox with 256 digits of precision. This algorithm

provides the exact solution when it ﬁnds that the relative error between T(j)

mk(n)

and T(i)

mk−1(n) is less than u= 2−53 (see (18)), where T(s)

m(n) is the Taylor matrix

approximation of order mof the scaled matrix A/2s, computed with ndigits of

precision by using the vpa (variable-precision arithmetic) MATLAB function in

the Algorithm 4. Previously, T(j)

mk(n) and T(i)

mk−1have been calculated so that

(16) and (17) are fulﬁlled.

Algorithm 3 Computes the “exact” matrix exponential T=eA, where A∈

Cr×r, by means of Taylor expansion.

1: if there exist two consecutive orders mk−1, mk∈ {30,36,42,49,56,64}and

integers 1 ≤i, j ≤15 for ssuch that

T(i)

mk−1(n)−T(i−1)

mk−1(n)

1

T(i)

mk−1(n)

1

< u, (16)

and

T(j)

mk(n)−T(j−1)

mk(n)

1

T(j)

mk(n)

1

< u, (17)

and

T(j)

mk(n)−T(i)

mk−1(n)

1

T(j)

mk(n)

1

< u (18)

by using Algorithm 4, then

return T=T(j)

mk(n)

2: else

return error

3: end if

10

Algorithm 4 Computes T(s)

m(n) = eA, where A∈Cr×r, by Taylor expansion

of order mand scaling parameter susing vpa MATLAB function with ndigits

of precision.

1: Compute T(s)

m(n) = Pm(A/2s) using Taylor expansion of order mwith n

digits of precision

2: for i=1:sdo

3: T(s)

m(n) = [T(s)

m(n)]2

4: end for

4.1. Experiments description

The following test battery, composed of three types of diﬀerent and repre-

sentative matrices, has been chosen to compare the numerical performance of

the above described codes:

a) One hundred diagonalizable 128×128 real matrices with 1-norms varying

from 2.18 to 207.52. These matrices have the form A=V DV T, where D

is a diagonal matrix with real and complex eigenvalues and Vis an orthog-

onal matrix obtained as V=H/√128, being Hthe Hadamard matrix.

The “exact” matrix exponential was computed as exp(A) = Vexp(D)VT

(see [4, pp. 10]).

b) One hundred non-diagonalizable 128×128 complex matrices with 1-norms

ranging from 84 to 98. These matrices have the form A=V JV T, where

Jis a Jordan matrix with complex eigenvalues of modulus less than 10

and random algebraic multiplicity varying from 1 to 5. Vis an orthogonal

matrix obtained as V=H/√128, where His the Hadamard matrix. The

“exact” matrix exponential was worked out as exp(A) = Vexp(J)VT.

c) State-of-the-art matrices:

–Forty 128 ×128 matrices from the Matrix Computation Toolbox

(MCT) [32].

–Sixteen matrices from the Eigtool MATLAB package (EMP) [33] with

sizes 127 ×127 and 128 ×128.

The “exact” matrix exponential for these matrices was computed by

using Taylor approximations of orders 30, 36, 42, 49, 56 and 64,

changing their scaling parameter (see Algorithm 3).

Although the MCT and the EMP are initially composed of ﬁfty-two

and twenty matrices, respectively, twelve from the MCT and four

from the EMP matrices were discarded for diﬀerent reasons. For

example, matrices numbers 5, 10, 16, 17, 21, 25, 26, 42, 43, 44 and

49 belonging to the MCT and matrices numbers 5 and 6 appertaining

to the EMP were not taken into account since the exact exponential

solution could not be computed. Besides, matrix number 2 from

the MCT and matrices numbers 3 and 10 from the EMP were not

11

Table 2: Matrix products (P) for Tests 1, 2 and 3 using expmber,exptaynsv3 and expm new

MATLAB codes.

P(expmber) P(exptaynsv3) P(expm new)

Test 1 1131 1131 1178.33

Test 2 1100 1100 1227.33

Test 3 617 617 654.67

considered because the excessively high relative error provided by all

the methods to be compared.

An experiment, called Test, is performed for each of the three sets of matrices

described, respectively, that evaluates the computational cost and the numerical

accuracy of the methods under comparison. The three tests have been executed

using MATLAB (R2018b) running on an HP Pavilion dv8 Notebook PC with

an Intel Core i7 CPU Q720 @1.60Ghz processor and 6 GB of RAM.

4.2. Experimental results

Table 2 shows the computational costs of each method represented in terms

of the number of matrix products (P), taking into account that the cost of

the rest of the operations is negligible compared to it for big enough matrices.

As it can be seen, expmber and exptaynsv3 achieved an identical number of

matrix multiplications, since the same algorithm was used by both of them to

calculate the degree of the polynomial (m) and the value of the scaling (s).

This number of products was lower than that required by expm new, which gave

rise to the highest computational cost. In addition to the matrix products,

expm new solves a system of linear equations with nright-hand side vectors

where nrepresents the size of the square coeﬃcient matrix, whose computational

cost was approximated as 4/3 matrix products.

Table 3, on the other hand, shows the percentage of cases in which the

relative errors of expmber are lower, greater or equal than those of exptaynsv3

and expm new. More in detail, the relative error was computed as

E = kexp(A)−˜exp(A)k1

kexp(A)k1

where ˜exp(A) is the approximate solution and exp(A) is the exact one.

With the exception of Test 3, the Bernoulli approach resulted in relative

errors lower than those of Taylor one. With regard to Pad´e, the Bernoulli

algorithm always oﬀered considerably more accurate results, which reached up

to 100% of the matrices for Test 2.

For the three tests, respectively, the normwise relative errors (a), the per-

formance proﬁles (b), the ratio of the relative errors (c) and the ratio of the

matrix products (d) among the distinct compared methods have been plotted

in Figures 1, 2, and 3.

12

Table 3: Relative error comparison among expmber vs exptaynsv3 and expmber vs expm new

for the three tests.

Test 1 Test 2 Test 3

E(expmber)<E(exptaynsv3) 56% 91% 30.36%

E(expmber)>E(exptaynsv3) 43% 9% 62.5%

E(expmber)=E(exptaynsv3) 1% 0% 7.14%

E(expmber)<E(expm new) 97% 100% 69.64%

E(expmber)>E(expm new) 3% 0% 30.36%

E(expmber)=E(expm new) 0% 0% 0%

Regarding the normwise relative errors presented in Figures 1a, 2a and 3a,

their solid line represents the function kexpu, where kexp (or cond) is the con-

dition number of the matrix exponential function [4, Chapter 3] and uis the

unit roundoﬀ. In general, expmber exhibited a very good numerical stability.

This can be appreciated seeing the distance from each matrix normwise relative

error to the cond ∗uline. In Figures 1a and 2a, the numerical stability is even

much better because these errors are below this line. Because kexp was inﬁnite

or enormously high for the matrices 6, 7, 12, 15, 23, 36, 39, 50 and 51 from

the MCT and for the matrices 1, 4, 8 and 15 from the EMP, all of them were

rejected in the Figure 3a visualisation but considered in the other ones.

In the performance proﬁle Figures (1b, 2b and 3b), the αcoordinate, on the

x-axis, varies from 1 to 5 in steps equal to 0.1. For a concrete αvalue, the p

coordinate, on the y-axis, means the probability that the considered algorithm

has a relative error lower than or equal to α-times the smallest relative error

over all the methods on the given test. For the ﬁrst two tests (Figures 1b,

2b), the performance proﬁle shows that Bernoulli and Taylor methods accuracy

was similar. Both of them had much better correctness than the Pad´e method.

Notwithstanding, Figure 3b reveals that exptaynsv3 code improved the result

accuracy against the expmber function, for the Test 3.

In Figures 1c, 2c and 3c, the ratios of relative errors have been presented

in decreasing order with respect to E(expmber)/E(exptaynsv3). They conﬁrm

the data exposed in Table 3, where it was shown that expmber provides more

accurate results than exptaynsv3 for Tests 1 and 2, but not for Test 3. It is

obvious to note that Pad´e oﬀered the worst performance in most cases.

In our opinion, this is clearly due to the distinctive numerical characteristics

of the 3 sets of matrices analysed and the degree of the polynomial (m) re-

quired to be used. According to our experience, expmber provides results with

a very appropriate accuracy for values of mequal to 25 or 30. However, for

signiﬁcantly lower values, expmber will be less competitive than other codes,

such as exptaynsv3. Minimum, maximum and average values of mrequired for

Tests 1, 2 and 3 are collected in Table 4. In more detail, Figure 4 shows the

approximation polynomial order employed in the calculation of the exponential

function by means of expmber (or exptaynsv3) for each of the matrices that

are part of the test battery.

As it was presented in Table 2, expmber and exptaynsv3 functions per-

13

0 20 40 60 80 100

Matrix

10-16

10-15

10-14

10-13

10-12

Er

cond*u

expmber

exptaynsv3

expm_newm

(a) Normwise relative errors.

12345

0

0.2

0.4

0.6

0.8

1

p

expmber

exptaynsv3

expm_newm

(b) Performance proﬁle.

0 20 40 60 80 100

Matrix

10-2

10-1

100

101

102

103

Relative error ratio

E(expmber)/E(exptaynsv3)

E(expmber)/E(expm_newm)

(c) Ratio of relative errors.

0 20 40 60 80 100

Matrix

1

1.05

1.1

1.15

1.2

Matrix product ratio

P(exptaynsv3)/P(expmber)

P(expm_newm)/P(expmber)

(d) Ratio of matrix products.

Figure 1: Experimental results for Test 1.

Table 4: Minimum, maximum and average polynomial degree (m) required for Tests 1, 2 and

3 using expmber or exptaynsv3 functions.

Minimum Maximum Average

Test 1 16 30 27.51

Test 2 30 30 30

Test 3 12 30 25.70

formed a lower number of matrix operations than expm new one. This statement

can be also corroborated from the results displayed in Figures 1d, 2d and 3d,

where the ratio between the number of expm new and expmber matrix products

ranged from 1.03 to 1.22 for Test 1, from 1.03 to 1.12 for Test 2 and from 0.67

to 2.87 for Test 3.

Next, we will analyse the idea of using Bernoulli and Taylor methods to-

gether, giving place to a novel approach to compute the matrix exponential

function. For that, we will start getting the beneﬁts of the exptaynsv3 func-

tion against expm new one. As Table 5 shows, the percentage of cases in which

14

0 20 40 60 80 100

Matrix

10-15

10-14

Er

cond*u

expmber

exptaynsv3

expm_newm

(a) Normwise relative errors.

12345

0

0.2

0.4

0.6

0.8

1

p

expmber

exptaynsv3

expm_newm

(b) Performance proﬁle.

0 20 40 60 80 100

Matrix

0.2

0.3

0.4

0.5

0.6

Relative error ratio

E(expmber)/E(exptaynsv3)

E(expmber)/E(expm_newm)

(c) Ratio of relative errors.

0 20 40 60 80 100

Matrix

1

1.02

1.04

1.06

1.08

1.1

1.12

Matrix product ratio

P(exptaynsv3)/P(expmber)

P(expm_newm)/P(expmber)

(d) Ratio of matrix products.

Figure 2: Experimental results for Test 2.

Table 5: Relative error comparison between exptaynsv3 and expm new for the three tests.

Test 1 Test 2 Test 3

E(exptaynsv3)<E(expm new) 100% 100% 89.29%

E(exptaynsv3)>E(expm new) 0% 0% 10.71%

E(exptaynsv3)=E(expm new) 0% 0% 0%

Taylor relative error is lower than Pad´e reaches 100% for Test 1 and 2, and

89.29% for Test 3. Evidently, these error percentages improve those oﬀered by

Bernoulli approximation with respect to expm new, as described in Table 3.

From these excellent results, we therefore considered the possibility of com-

bining Bernoulli and Taylor methods, giving rise to the expmbertay code.

In this new function, and according to the comparison between the coeﬃ-

cients of their polynomials carried out previously, we will use the Taylor ap-

proach (exptaynsv3) for values of m below 25 and the Bernoulli approximation

(expmber) when m equals 25 or 30. In this way, the number of matrix prod-

ucts needed by expmbertay will obviously be identical to that of expmber or

15

0 10 20 30 40 50

Matrix

10-20

10-15

10-10

Er

cond*u

expmber

exptaynsv3

expm_newm

(a) Normwise relative errors.

12345

0

0.2

0.4

0.6

0.8

p

expmber

exptaynsv3

expm_newm

(b) Performance proﬁle.

0 10 20 30 40 50 60

Matrix

10-10

10-5

100

105

1010

Relative error ratio

E(expmber)/E(exptaynsv3)

E(expmber)/E(expm_newm)

(c) Ratio of relative errors.

0 10 20 30 40 50 60

Matrix

1

1.5

2

2.5

Matrix product ratio

P(exptaynsv3)/P(expmber)

P(expm_newm)/P(expmber)

(d) Ratio of matrix products.

Figure 3: Experimental results for Test 3.

exptaynsv3.

Table 6 collects thus the percentage of matrices in which the relative errors

of expmbertay are lower, greater or equal than those of exptaynsv3,expmber

and expm new. For the vast majority of matrices, expmbertay provided an accu-

racy in the results practically identical to that of expmber, but improving even

the latter by 23.21% for the matrices of Test 3. With respect to exptaynsv3,

expmbertay also enhanced the results achieved by expmber, so that this com-

bined method is now better or equal than exptaynsv3 in 55.36% of cases for Test

3. Moreover, expmbertay became better than expm new in 100% of the matrices

for Test 1 and 2, and 91.07% for Test 3, which is higher than the percentages

individually oﬀered by expmber (69.64%) and exptaynsv3 (89.29%).

Numerical features of expmbertay are ﬁnally exposed in Figures 5, 6 and 7

for the three tests by means of the normwise relative errors (a), the performance

proﬁles (b) and the ratio of the relative errors (c). As it can be seen, the method

presents an excellent precision in the results, with very low relative errors and

a very high probability in the performance proﬁle pictures.

16

0 20 40 60 80 100

Matrix

16

18

20

22

24

26

28

30

Approximation polynomial order

M(expmber)

M(exptaynsv3)

(a) Test 1.

0 20 40 60 80 100

Matrix

29

29.5

30

30.5

31

Approximation polynomial order

M(expmber)

M(exptaynsv3)

(b) Test 2.

0 10 20 30 40 50 60

Matrix

10

15

20

25

30

Approximation polynomial order

M(expmber)

M(exptaynsv3)

(c) Test 3.

Figure 4: Polynomial order (m) for Test 1, 2 and 3.

Table 6: Relative error comparison among expmbertay vs exptaynsv3,expmbertay vs expmber,

and expmbertay vs expm new for the three tests.

Test 1 Test 2 Test 3

E(expmbertay)<E(exptaynsv3) 57% 91% 44.64%

E(expmbertay)>E(exptaynsv3) 42% 9% 44.64%

E(expmbertay)=E(exptaynsv3) 1% 0% 10.72%

E(expmbertay)<E(expmber) 3% 0% 23.21%

E(expmbertay)>E(expmber) 0% 0% 0%

E(expmbertay)=E(expmber) 97% 100% 76.79%

E(expmbertay)<E(expm new) 100% 100% 91.07%

E(expmbertay)>E(expm new) 0% 0% 8.93%

E(expmbertay)=E(expm new) 0% 0% 0%

We have also included into the developed software an “accelerated” version

of the expmber function that computes the matrix exponential on an NVIDIA

GPU. Matrix multiplication is an operation very rich in intrinsic parallelism that

17

0 20 40 60 80 100

Matrix

10-16

10-15

10-14

10-13

10-12

Er

cond*u

expmbertay

exptaynsv3

expmber

expm_newm

(a) Normwise relative errors.

12345

0

0.2

0.4

0.6

0.8

1

p

expmbertay

exptaynsv3

expmber

expm_newm

(b) Performance proﬁle.

0 20 40 60 80 100

Matrix

10-2

10-1

100

101

102

103

Relative error ratio

E(expmbertay)/E(exptaynsv3)

E(expmbertay)/E(expmber)

E(expmbertay)/E(expm_newm)

(c) Ratio of relative errors.

Figure 5: Experimental results for Test 1.

can be optimized for GPUs. Algorithms that rely on many matrix multiplica-

tions, like the one proposed, can take full advantage of these devices through the

use of the cuBLAS [34] package. Our “GPU version” uses the regular MATLAB

scripting language in the same way as in the other algorithms used so far but, at

some points in the code, a function implemented into a MEX ﬁle is called. This

function is implemented in CUDA [35] and dispatches the operation described

in the function to the GPU. This way, all the matrix products are computed

by the GPU presented in the computing platform. The exact implementation

details about how these MEX ﬁles are implemented can be found in [36].

The experimental results corresponding to this part of the work were carried

out on a computer equipped with two processors Intel Xeon CPU E5-2698 v4

@2.20GHz (Broadwell architecture) featuring 20 cores each. The regular MAT-

LAB ﬁles, i.e. all those that do not make use of the GPU through a MEX ﬁle,

18

0 20 40 60 80 100

Matrix

10-15

10-14

Er

cond*u

expmbertay

exptaynsv3

expmber

expm_newm

(a) Normwise relative errors.

12345

0

0.2

0.4

0.6

0.8

1

p

expmbertay

exptaynsv3

expmber

expm_newm

(b) Performance proﬁle.

0 20 40 60 80 100

Matrix

0.2

0.3

0.4

0.5

0.6

Relative error ratio

E(expmbertay)/E(exptaynsv3)

E(expmbertay)/E(expmber)

E(expmbertay)/E(expm_newm)

(c) Ratio of relative errors.

Figure 6: Experimental results for Test 2.

use the 40 cores available in the target computer by default1. We denote this

implementation as the “CPU version” when compared with the “GPU version”

described above. To get the algorithm performance on the GPU, we used one

NVIDIA Tesla P100-SXM2 (Kepler architecture) that counts on 3584 CUDA

cores and 16 GB of memory.

Figure 8 shows the execution time in seconds on the left and the speed up

achieved with the GPU version with regard to its CPU counterpart on the right.

The plots also compare the performance of the former algorithm based on Tay-

lor series (exptaynsv3) with the new one based on Bernoulli series (expmber)

presented here. At the light of the ﬁgure, it can be concluded that both algo-

1“Linear algebra and numerical functions such as fft,\(mldivide),eig,svd, and sort

are multithreaded in MATLAB. Multithreaded computations have been on by default in

MATLAB since Release 2008a.” In particular, MATLAB uses the Intel MKL, where the

matrix multiplication is threaded, i.e. is a parallel implementation with OpenMP.

19

0 10 20 30 40 50

Matrix

10-20

10-15

10-10

Er

cond*u

expmbertay

exptaynsv3

expmber

expm_newm

(a) Normwise relative errors.

12345

0

0.2

0.4

0.6

0.8

p

expmbertay

exptaynsv3

expmber

expm_newm

(b) Performance proﬁle.

0 10 20 30 40 50 60

Matrix

10-10

10-5

100

105

1010

Relative error ratio

E(expmbertay)/E(exptaynsv3)

E(expmbertay)/E(expmber)

E(expmbertay)/E(expm_newm)

(c) Ratio of relative errors.

Figure 7: Experimental results for Test 3.

rithms, exptaynsv3 and expmber, behave very similarly. The reduction in time

obtained with the GPU with respect to the CPU starts approximately with ma-

trices of size n= 1000 and increases with the problem size. The weight of both

algorithms falls on the same basic computational kernel (matrix multiplications)

and both of them require an identical number of them. The computational per-

formance of routine expmbertay would be very similar to expmber, since it uses

once again the same number of matrix products.

5. Conclusions

The starting point of this work is a new expression of the matrix exponential

function cast in terms of Bernoulli matrix polynomials. Using this series expan-

sion, a new method for calculating the exponential of a matrix (implemented as

expmber code) has been developed. The proposed algorithm has been tested us-

ing a state-of-the-art matrix test battery with diﬀerent features (diagonalizable

and non diagonalizable, with particular eigenvalue spectrum) that covers a wide

20

0

5

10

15

20

1000 2000 3000 4000 5000 6000 7000 8000 9000

Problem dimension

exptaynsv3 CPU.

expmber CPU.

exptaynsv3 GPU.

expmber GPU.

(a) Execution time.

0

0.5

1

1.5

2

2.5

3

3.5

4

1000 2000 3000 4000 5000 6000 7000 8000 9000

Problem dimension

exptaynsv3

expmber

(b) Speed up (GPU time / CPU time).

Figure 8: Execution time (a) and speed up (b) of the algorithm to compute the matrix

exponential using the Taylor series (exptaynsv3) and the Bernoulli (expmber) series on CPU

and on GPU for large randomly generated matrices.

range of cases. The developed code has been compared with the best imple-

mentations available, i.e. Pad´e-based algorithm (expm new) and Taylor-based

one (exptaynsv3), outperforming Pad´e-based algorithm and giving results at

the level of Taylor-based solutions in both accuracy and computational cost.

Preliminary results with the Bernoulli version for the matrix exponential

function motivated us to develop a hybrid code (called expmbertay) that com-

bines the best of both the Taylor and Bernoulli solutions and resulting in ex-

cellent results. Therefore, expmbertay code is clearly competitive and highly

recommended for the matrix exponential calculation, regardless of the type of

matrix to be computed. Finally, we showed that the two algorithms developed

in this contribution keep the advantages of other ones based on matrix polyno-

mial expansion. Since they all are based on matrix multiplications, the GPU

version implemented has turned out to be a strong tool to compute the matrix

exponential approximation when the numerical methods employed are stressed

with large dimension matrices.

Acknowledgements

This work has been partially supported by Spanish Ministerio de Econom´ıa

y Competitividad and European Regional Development Fund (ERDF) grants

TIN2017-89314-P and by the Programa de Apoyo a la Investigaci´on y De-

sarrollo 2018 of the Universitat Polit`ecnica de Val`encia (PAID-06-18) grants

SP20180016.

References

[1] C. F. V. Loan, A study of the matrix exponential, numerical analysis report,

Tech. rep., Manchester Institute for Mathematical Sciences, The University

21

(2006).

[2] C. B. Moler, C. V. Loan, Nineteen dubious ways to compute the exponential

of a matrix, SIAM Rev. 20 (4) (1978) 801–836.

[3] C. B. Moler, C. V. Loan, Nineteen dubious ways to compute the exponential

of a matrix, twenty-ﬁve years later*, SIAM Rev. 45 (2003) 3–49.

[4] N. J. Higham, Functions of Matrices: Theory and Computation, SIAM,

Philadelphia, PA, USA, 2008.

[5] M. Benzi, E. Estrada, C. Klymko, Ranking hubs and authorities using

matrix functions, Linear Algebra and its Applications 438 (2013) 2447–

2474.

[6] G. A. Baker, P. Graves-Morris, Pad´e Approximants, Encyclopedia of Math-

ematics and its Applications, Cambridge University Press, 1996.

[7] L. Dieci, A. Papini, Pad´e approximation for the exponential of a block

triangular matrix, Linear Algebra Appl. 308 (2000) 183–202.

[8] A. H. Al-Mohy, N. J. Higham, A new scaling and squaring algorithm for

the matrix exponential, SIAM J. Matrix Anal. Appl. 31 (3) (2009) 970–989.

[9] N. J. Higham, The scaling and squaring method for the matrix exponential

revisited, Tech. Rep. 452, Manchester Centre for Computational Mathe-

matics (2004).

[10] R. B. Sidje, Expokit: A software package for computing matrix exponen-

tials, ACM Trans. Math. Softw. 24 (1) (1998) 130–156.

[11] J. Sastre, J. Ib´a˜nez, E. Defez, P. Ruiz, New scaling-squaring Taylor algo-

rithms for computing the matrix exponential, SIAM Journal on Scientiﬁc

Computing 37 (1) (2015) A439–A455.

[12] P. Ruiz, J. Sastre, J. Ib´a˜nez, E. Defez, High performance computing of the

matrix exponential, Journal of Computational and Applied Mathematics

291 (2016) 370–379.

[13] J. Sastre, J. Ib´a˜nez, E. Defez, Boosting the computation of the matrix

exponential, Applied Mathematics and Computation 340 (2019) 206–220.

[14] E. Defez, L. J´odar, Some applications of the Hermite matrix polynomi-

als series expansions, Journal of Computational and Applied Mathematics

99 (1) (1998) 105–117.

[15] J. Sastre, J. Ib´a˜nez, E. Defez, P. Ruiz, Eﬃcient orthogonal matrix polyno-

mial based method for computing matrix exponential, Applied Mathemat-

ics and Computation 217 (14) (2011) 6451–6463.

22

[16] F. W. Olver, D. W. Lozier, R. F. Boisvert, C. W. Clark, NIST handbook

of mathematical functions hardback and CD-ROM, Cambridge University

Press, 2010.

[17] E. Tohidi, K. Erfani, M. Gachpazan, S. Shateyi, A new Tau method for

solving nonlinear Lane-Emden type equations via Bernoulli operational ma-

trix of diﬀerentiation, Journal of Applied Mathematics 2013 (2013).

[18] A. W. Islam, M. A. Sharif, E. S. Carlson, Numerical investigation of double

diﬀusive natural convection of co2in a brine saturated geothermal reservoir,

Geothermics 48 (2013) 101–111.

[19] E. Tohidi, A. Bhrawy, K. Erfani, A collocation method based on Bernoulli

operational matrix for numerical solution of generalized pantograph equa-

tion, Applied Mathematical Modelling 37 (6) (2013) 4283–4294.

[20] A. Bhrawy, E. Tohidi, F. Soleymani, A new Bernoulli matrix method

for solving high-order linear and nonlinear Fredholm integro-diﬀerential

equations with piecewise intervals, Applied Mathematics and Computation

219 (2) (2012) 482–497.

[21] E. Tohidi, M. Ezadkhah, S. Shateyi, Numerical solution of nonlinear frac-

tional Volterra integro-diﬀerential equations via Bernoulli polynomials, Ab-

stract and Applied Analysis 2014 (2014).

[22] F. Toutounian, E. Tohidi, S. Shateyi, A collocation method based on the

Bernoulli operational matrix for solving high-order linear complex diﬀer-

ential equations in a rectangular domain, Abstract and Applied Analysis

2013 (2013).

[23] E. Tohidi, F. Toutounian, Convergence analysis of Bernoulli matrix ap-

proach for one-dimensional matrix hyperbolic equations of the ﬁrst order,

Computers & Mathematics with Applications 68 (1-2) (2014) 1–12.

[24] E. Tohidi, M. K. Zak, A new matrix approach for solving second-order lin-

ear matrix partial diﬀerential equations, Mediterranean Journal of Mathe-

matics 13 (3) (2016) 1353–1376.

[25] F. Toutounian, E. Tohidi, A new Bernoulli matrix method for solving sec-

ond order linear partial diﬀerential equations with the convergence analysis,

Applied Mathematics and Computation 223 (2013) 298–310.

[26] O. Kouba, Lecture Notes, Bernoulli Polynomials and Applications, arXiv

preprint arXiv:1309.7560 (2013).

[27] F. Costabile, F. Dell’Accio, Expansion over a rectangle of real functions in

Bernoulli polynomials and applications, BIT Numerical Mathematics 41 (3)

(2001) 451–464.

23

[28] F. Costabile, F. Dell’Accio, Expansions over a simplex of real functions

by means of Bernoulli polynomials, Numerical Algorithms 28 (1-4) (2001)

63–86.

[29] E. D. Rainville, Special functions, Vol. 442, New York, 1960.

[30] M. S. Paterson, L. J. Stockmeyer, On the number of nonscalar multiplica-

tions necessary to evaluate polynomials, SIAM Journal on Computing 2 (1)

(1973) 60–66.

[31] J. Higham, F. Tisseur, A block algorithm for matrix 1-norm estimation,

with an application to 1-norm pseudospectra, SIAM J. Matrix Anal. Appl.

21 (2000) 1185–1201.

[32] N. J. Higham, The test matrix toolbox for MATLAB (Version 3.0), Uni-

versity of Manchester Manchester, 1995.

[33] T. Wright, Eigtool, version 2.1 (2009).

URL web.comlab.ox.ac.uk/pseudospectra/eigtool.

[34] NVIDIA, cuBLAS (2020).

URL https://docs.nvidia.com/cuda/cublas

[35] NVIDIA, CUDA Toolkit Documentation v11.0.3 (2020).

URL https://docs.nvidia.com/cuda

[36] P. Alonso, J. Peinado, J. Ib´a˜nez, J. Sastre, E. Defez, Computing matrix

trigonometric functions with GPUs through Matlab, The Journal of Super-

computing 75 (3) (2019) 1227–1240.

24