ArticlePDF Available

Abstract

This paper presents a new family of methods for evaluating matrix polynomials more efficiently than the state-of-the-art Paterson–Stockmeyer method. Examples of the application of the methods to the Taylor polynomial approximation of matrix functions like the matrix exponential and matrix cosine are given. Their efficiency is compared with that of the best existing evaluation schemes for general polynomial and rational approximations, and also with a recent method based on mixed rational and polynomial approximants. For many years, the Paterson–Stockmeyer method has been considered the most efficient general method for the evaluation of matrix polynomials. In this paper we show that this statement is no longer true. Moreover, for many years rational approximations have been considered more efficient than polynomial approximations, although recently it has been shown that often this is not the case in the computation of the matrix exponential and matrix cosine. In this paper we show that in fact polynomial approximations provide a higher order of approximation than the state-of-the-art computational methods for rational approximations for the same cost in terms of matrix products.
Efficient evaluation of matrix polynomials
J. Sastrea
aInstituto de Telecomunicaciones y Aplicaciones Multimedia, Universitat Polit`ecnica de
Val`encia, Camino de Vera s/n, 46022-Valencia (Spain)
Abstract
This paper presents a new family of methods for evaluating matrix polynomi-
als more efficiently than the state-of-the-art Paterson–Stockmeyer method.
Examples of the application of the methods to the Taylor polynomial approx-
imation of matrix functions like the matrix exponential and matrix cosine are
given. Their efficiency is compared with that of the best existing evaluation
schemes for general polynomial and rational approximations, and also with
a recent method based on mixed rational and polynomial approximants. For
many years, the Paterson–Stockmeyer method has been considered the most
efficient general method for the evaluation of matrix polynomials. In this pa-
per we show that this statement is no longer true. Moreover, for many years
rational approximations have been considered more efficient than polynomial
approximations, although recently it has been shown that often this is not
the case in the computation of the matrix exponential and matrix cosine. In
this paper we show that in fact polynomial approximations provide a higher
order of approximation than the state-of-the-art computational methods for
rational approximations for the same cost in terms of matrix products.
Keywords: matrix, polynomial, rational, mixed rational and polynomial,
approximation, computation, matrix function.
PACS: 87.64.Aa
1. Introduction
In this paper we propose a new family of methods for evaluating matrix
polynomials more efficiently than the state-of-the-art Paterson–Stockmeyer
method combined with Horner’s method [1], [2, Sec. 4.2]. The proposed
Email address: jsastrem@upv.es (J. Sastre)
Preprint submitted to Elsevier November 24, 2017
methods are applied to compute efficiently Taylor polynomial approxima-
tions of matrix functions. The computation of matrix functions is a research
field with applications in many areas of science and many algorithms for
their computation have been proposed [2, 3]. Among all matrix functions,
the matrix exponential has attracted special attention, see [4, 5, 6] and the
references therein, and lately the matrix cosine, see [7, 8] and the references
therein. The main methods for computing matrix functions are those based
on rational approximations, like Pad´e or Chebyshev approximations, polyno-
mial approximations, like Taylor approximation, similarity transformations
and matrix iterations [2]. Moreover, a new kind of approximations based on
mixed rational and polynomial approximants has been proposed in [9].
Recently, it has been shown that using the combination of Horner and
Paterson–Stockmeyer methods [1], [2, Sec. 4.2], polynomial approximations
may be more efficient than rational Pad´e approximations for both the matrix
exponential and cosine [6, 8]. In this paper we show that using the proposed
matrix polynomial evaluation methods, polynomial approximations are more
accurate than existing state-of-the-art methods for evaluating both polyno-
mial and rational approximants for the same computing cost. Moreover, we
show that the new methods are more efficient than the recent mixed ratio-
nal and polynomial approximation [9] in some cases, and examples for the
computation of the matrix exponential and the matrix cosine are given.
Throughout this paper dxedenotes the lowest integer not less than x,
bxcdenotes the highest integer not exceeding x,Ndenotes the set of positive
integer numbers, Cn×nand Rn×ndenote the sets of complex and real matrices
of size n×n, respectively, Idenotes the identity matrix for both sets, and
Rk,m denotes the space of rational functions with numerator and denominator
of degrees at most kand m, respectively.
Note that the multiplication by the matrix inverse in matrix rational
approximations is calculated as the solution of a multiple right-hand side
linear system. Therefore, the cost of evaluating polynomial and rational
approximations will be given in terms of the number of matrix products,
denoted by M, and the cost of the solution of multiple right-hand side linear
systems AX =B, where matrices Aand Bare n×n, denoted by D. From
[10, App. C] it follows that, see [9, p. 11940]:
D4/3M. (1)
This paper is organized as follows. Section 2 recalls some results for
efficient Taylor, Pad´e, and mixed rational and polynomial approximation
2
of general matrix functions. Section 3 deals with the new matrix polyno-
mial evaluation methods giving examples for the computation of the matrix
exponential and the matrix cosine. Section 4 compares the new techniques
with efficient state-of-the-art evaluation schemes for polynomial, rational and
mixed rational and polynomial approximants. Section 5 gives examples for
the matrix exponential computation even more efficient than the ones given
in Section 3, suggesting more general formulas for evaluating matrix polyno-
mials. Finally, conclusions are given in Section 6.
2. Polynomial, rational, and mixed rational and polynomial ap-
proximants
This section summarizes some results of the computational costs of Tay-
lor, Pad´e, and the mixed rational and polynomial approximants given in
[9].
2.1. Taylor approximation of matrix functions
If f(A) is a matrix function defined by a Taylor series according to Theo-
rem 4.7 of [2, p. 76] where Ais a complex square matrix, then we will denote
by Tm(A) the matrix polynomial defined by the truncated Taylor series of
degree mof f(A). For scalar xCit follows that
f(x)Tm(x) = O(xm+1),(2)
about the origin, and, from now on, we will refer to mas the order of the
Taylor approximation. The most efficient method in the literature to evaluate
a matrix polynomial
Pm(A) =
m
X
i=0
biAi,(3)
is the combination of Horner and Paterson–Stockmeyer methods [1] given by
P Sm(A) = ···bmAs+bm1As1+. . . +bms+1 A+bmsI
×As+bms1As1+bms2As2+. . . +bm2s+1 A+bm2sI
×As+bm2s1As1+bm2s2As2+. . . +bm3s+1 A+bm3sI
.
.
.
×As+bs1As1+bs2As2+· ·· +b1A+b0I, (4)
3
m12469121620253036
CP S 0 1 2 3 4 5 6 7 8 9 10
Table 1: Cost CPS in terms of matrix products for the evaluation of polynomial Pm(A)
with Horner and Paterson–Stockmeyer methods for the first eleven values mthat maxi-
mize the polynomial degree obtained for a given cost.
where the integer s > 0 divides mand the matrix powers A2, A3, . . . , As, are
computed and stored previously.
Table 1 shows the maximum values of mthat can be obtained for a given
number of matrix products in Tm(A) using Paterson–Stockmeyer method,
corresponding to m=s2and m=s(s+ 1), for sN. The cost of evaluating
(4), denoted by CP S, for the values in mis given by [9, Eq. (6)]
CP S = (r+s2)M, with r=m/s, m m.(5)
Table 1 presents the cost CP S of evaluating (4) in terms of matrix products
for the first eleven values of m. For orders m /mwe evaluate Pm(A) =
P Sm0(A) using (4) taking m0= min{m1m, m1> m}and setting the
coefficients bi= 0 in (4) for i=m0, m01, . . . , m + 1, at the same cost
as evaluating P Sm0(A). Note that because of the way the polynomial is
evaluated, the cost of using (4) is lower than that of Paterson–Stockmeyer
as implemented in [2, Sec. 4.2] (compare (5) and [2, Eq. (4.3)]).
The matrix exponential is the most studied matrix function [4], [2, Chap.
10]. For ACn×nthe matrix exponential of Acan be defined by the Taylor
series
exp(A) = X
i0
Ai
i!.(6)
Another matrix function that has received attention recently is the matrix
cosine, which can be defined analogously by means of its Taylor series
cos(A) = X
i0
(1)iA2i
(2i)!.(7)
Several efficient algorithms based on Taylor approximations have been pro-
posed recently for the computation of the matrix exponential and cosine
[6, 8].
4
m+1 2 3 4 6 8 10 12 15 18 21
CR1.33 2.33 3.33 4.33 5.33 6.33 7.33 8.33 9.33 10.33 11.33
dR2 4 6 8 12 16 20 24 30 36 42
Table 2: Cost CRin terms of matrix products for diagonal rational approximation rmm(A)
taking D= 4/3M. Approximation order dRif rmm is a Pad´e approximant of a given
function f.
2.2. Pad´e approximations of matrix functions
The rational scalar function rkm(x) = pkm(x)/qkm(x) is a [k/m] Pad´e
approximant of the scalar function f(x) if rk,m Rk,m, qkm (0) = 1, and
f(x)rkm(x) = O(xk+m+1 ).(8)
From now on, dRwill denote the degree of the last term of the Taylor series
of fabout the origin that rkm (x) agrees with, i.e. dR=k+m, and we will
refer to dRas the order of the Pad´e approximation. Table 2 (see [9, Table 2])
shows the maximum values of mthat can be obtained for a given number of
matrix products in rmm(A), denoted by the set m+, and the corresponding
computing cost, denoted by CRgiven by
CR= (2r+s3)M+D(2r+s12/3)M, r =m/s, (9)
where stakes whichever value s=d2meor s=b2mcthat divides mand
gives the smaller CR. Table 2 also gives the corresponding order dRof the
approximation rmm(x) if it is a Pad´e approximant of a given function f(x),
i.e. dR= 2m.
Finally, it is important to note that for a given f, k and m, a [k/m] Pad´e
approximant might not exist. Moreover, when computing rational approxi-
mations rkm of a function ffor a given square matrix A, we must verify that
the matrix qkm(A) is nonsingular, and, for an accurate computation, that it
is well conditioned. This is not the case for polynomial approximations, since
they do not require matrix inversions.
2.3. Mixed rational and polynomial approximants.
For a square matrix Athe method proposed in [9] is based on using
aggregations of mixed rational and polynomial approximants of the type
tijs(A) = ···u(i)
s(A)(v(i)
s(A))1+u(i1)
s(A)(v(i1)
s(A))1+u(i2)
s(A)
(v(i2)
s(A))1+· ·· +u(1)
s(A)(v(1)
s(A))1+wjs(A).(10)
5
where v(k)
s(A), u(k)
s(A), k= 1,2, . . . , i, are polynomials of Aof degrees at
most s,wjs(A) is a polynomial of Awith degree at most js, and i0, s0
and j0. Note that if i= 0 we consider that tij s(A) = wj s(A), having no
rational part. In [9, Sec. 4] a method to obtain tijs from rational approxi-
mations is given. Similarly to rational approximations, each multiplication
by a matrix inverse is calculated as the solution of a multiple right-hand side
linear system. Therefore, when computing tijs (A) it is important to verify
that the matrices v(1)
s(A), v(2)
s(A), . . . , v(i)
s(A) are nonsingular and well con-
ditioned. The total cost for computing (10), denoted by CRP , is given by,
see [9, Sec. 5]
CRP = (s+j2)M+iD (s+j2+4i/3)M, j > 0, s > 0, i 0.(11)
Note that for the case where approximation (10) is intended to reproduce
the first terms of the Taylor series of a given function f, it is equivalent to a
[(i+j)s/is] Pad´e approximant, and then, whenever it exists, tijs for scalar
xCsatisfies
f(x)tijs(x) = O(x(2i+j)s+1).(12)
In that case we denote by dRP the order of the mixed rational and polynomial
approximation
dRP = (2i+j)s. (13)
Table 3 (see [9, Table 3]) shows for tijs(A) the approximation order dRP
if tijs reproduces the first terms of the Taylor series of a given function f,
and the cost CRP in terms of matrix products for the values of i, j, s that
maximize dRP for a given cost. See [9] for a complete description.
3. On the evaluation of matrix polynomials. Application to the
approximation of matrix functions
This section gives new general methods for evaluating matrix polynomi-
als in a more efficient way than the combination of Horner and Paterson–
Stockmeyer methods. Examples for computing the Taylor matrix polynomial
approximation of degree mof the matrix exponential and the matrix cosine
are given. These examples allow us to compute both approximations at a
lower cost than Horner and Paterson–Stockmeyer methods. Note that in this
section we used MATLAB R2017a for all the computations.
6
dRP 1 2 3 4 6 9 10 12 15 16 20 21
i0 0 1 0 1 1 2 1 2 1 2 3
j1 1 1 2 1 1 1 1 1 2 1 1
s1 2 1 2 2 3 2 4 3 4 4 3
CRP 0 1 1.33 2 2.33 3.33 3.67 4.33 4.67 5.33 5.67 6
dRP 25 28 30 35 36 42 45 49 54 55 56 63
i2 3 2 3 4 3 4 3 4 5 3 4
j1 1 1 1 1 1 1 1 1 1 1 1
s5 4 6 5 4 6 5 7 6 5 8 7
CRP 6.67 7 7.67 8 8.33 9 9.33 10 10.33 10.67 11 11.33
Table 3: Approximation order dRP if the mixed rational and polynomial approximation
tijs (A) from Section 2.3 reproduces the dRP first terms of the Taylor series of a given
function f, cost in terms of matrix products CRP for the mixed rational and polynomial
approximation tijs(A), taking D= 4/3M, and values of i, j and s, that maximize dRP for
a given cost.
Example 3.1. Let
y02(A) = A2(c4A2+c3A),(14)
y12(A) = (y02 (A) + d2A2+d1A)(y02(A) + e2A2) (15)
+e0y02(A) + f2A2+f1A+f0I,
where c4,c3,d2,d1,e2,e0,f2,f1and f0are scalar coefficients. In order to
evaluate a matrix polynomial (3) of degree m= 8, taking y12(A) = Pm(A)
and equating the coefficients of the matrix powers Ai,i= 8,7,...,0, the
following system of equations arises
c4c4A8=b8A8,(16)
2c3c4A7=b7A7,(17)
(c4(d2+e2) + c3c3)A6=b6A6,(18)
(c4d1+c3(d2+e2))A5=b5A5,(19)
(d2e2+c3d1+c4e0)A4=b4A4,(20)
(d1e2+c3e0)A3=b3A3,(21)
f2A2=b2A2,(22)
f1A=b1A, (23)
f0I=b0I. (24)
Note that for clarity the coefficient indices were chosen so that the sum
of the indices is equal to the exponent of the power of Athat coefficient is
7
multiplying. For instance, for (16) one gets 4 + 4 = 8, for (17) one gets
3 + 4 = 7, for (18) one gets 4 + 2 = 6 and 3 + 3 = 6, and so on.
We can solve the previous system using the equations (16)-(24) from top
to bottom. Using (16)-(19), one gets
c4=±pb8,(25)
c3=b7/(2c4),(26)
d2+e2= (b6c2
3)/c4,(27)
d1= (b5c3(d2+e2))/c4.(28)
If b86= 0 then c46= 0 and therefore c4,c3, the sum d2+e2and d1can be
obtained explicitly. From now on we will denote de2=d2+e2to simplify the
notation and to remark that this quantity can be computed explicitly. Using
(20) it follows that
e0= (b4c3d1de2e2+e2
2)/c4,(29)
where using (25)-(28) e0is a polynomial of second order in the variable e2.
Hence, using (21) and (29) one gets
d1e2+c3e0=b3 b3+d1e2+c3(b4c3d1de2e2+e2
2)/c4= 0 (30)
which is an equation of second order in the variable e2, and therefore, using
(25)-(28), the equation on the right-hand side of (30) has the solutions
e2=
c3
c4de2d1±rd1c3
c4de22+ 4c3
c4b3+c2
3
c4d1c3
c4b4
2c3/c4
,(31)
i.e., two solutions if we take c4=b8from (25), and other two solutions if
we take c4=b8. Substituting the four solutions of e2in (27) and (29),
four solutions are obtained for d2=de2e2and e0, respectively, and from
(22)-(24) it follows that
f2=b2, f1=b1, f0=b0.(32)
The cost of evaluating (15) is 3M, i.e. one matrix product to compute
and store A2, and then two matrix products to compute (14) and (15), being
8
exp cos
c44.980119205559973×1032.186201576339059×107
c31.992047682223989×102-2.623441891606870×105
d27.665265321119147×1026.257028774393310×103
d18.765009801785554×101-4.923675742167775×101
e21.225521150112075×1011.441694411274536×104
e02.974307204847627×1005.023570505224926×101
Table 4: One possible choice for the coefficients in (14) and (15) for Taylor approximation
of exponential and cosine of order m= 8.
y12(A)a polynomial of degree 8. From Table 1, the polynomial of maximum
degree that can be computed with Horner and Paterson–Stockmeyer methods
and cost 3Mis the lower value dPS = 6.
Table 4 shows one of the four solutions in IEEE double precision arith-
metic for the coefficients of the Taylor approximation of the exponential
and cosine, where bi= 1/i!, and bi= (1)i/(2i)!, respectively, for i=
0,1,...,8. Note that all the four solutions are real, avoiding complex arith-
metic if ARn×n. In order to check the stability of the double precision
arithmetic solutions ci,diand eifrom Table 4, they were substituted in
equations (16)-(21) to compute the relative error for each coefficient bi, for
i= 3,4,...,8. For instance, from (21) it follows that the relative error for
b3is |b3(d1e2+c3e0)|/|b3|.We checked that all the relative errors for all
bi, for i= 3,4,...,8, were below the unit roundoff in IEEE double precision
arithmetic, i.e. u= 253 1.11.
Note that if we take
y12(A) = (y02 (A)+d2A2+d1A)(y02(A)+e2A2+e1A)+f2A2+f1A+f0I, (33)
instead of (15), the four solutions for the corresponding coefficients for the
exponential and cosine Taylor approximations of order m= 8 are complex.
Therefore, if Ais real, using (33) instead of (15) is not efficient for the
computation of either matrix function since it is necessary to use complex
arithmetic for evaluating (33).
9
Following Example 3.1 we can take in general
y0s(A) = As
s
X
i=1
cs+iAi,(34)
y1s(A) = y0s(A) +
s
X
i=1
diAi! y0s(A) +
s
X
i=2
eiAi!
+e0y0s(A) +
s
X
i=0
fiAi,(35)
where Ai,i= 2,3, . . . , s, can be computed once and stored to be reused in
all the computations, and, then, y1s(A) is a matrix polynomial of degree,
denoted by dy1s, and computing cost, denoted by Cy1s
dy1s= 4s, Cy1s=s+ 1, s = 2,3,.... (36)
Note that (14) and (15) are a particular case of (34) and (35) where s= 2.
Again, in order to evaluate a matrix polynomial Pm(A) of degree m= 4s, we
take y1s(A) = Pm(A), and equate the coefficients of the matrix powers Ai,
i=m, m 1,...,0, from y1s(A) and Pm(A). The solution for the coefficients
taking s= 2 is given in Example 3.1, where the substitution of variables gives
a polynomial equation in es=e2of degree 2 with the exact solution given by
(31). In the following a general solution is given for s > 2. The sequations
corresponding to the coefficients of the powers A4sk, for k= 0,1, . . . , s 1
are, respectively
k
X
i=0
c2sic2s+ik=b4sk, k = 0,1, . . . , s 1.(37)
Since (37), is a triangular system, if b4s6= 0 then c2s6= 0 and it follows that:
c2s=±pb4s
c2s1=b4s1/(2c2s),(38)
c2sk= (b4sk
k1
X
i=1
c2sic2s+ik)/(2c2s), k = 2,3, . . . , s 1.
Note that if b4s<0, to prevent c2sfrom being complex we can compute
y1s(A) = Pm(A) using (35), where c2s=b4s>0 which gives Pm(A) =
y1s(A).
10
Taking again dei=di+eifor abbreviation, and de1=d1, since there is
no coefficient e1in (35), the equations corresponding to the coefficients of
powers A3sk, for k= 0,1, . . . , s 1, are, respectively
s
X
j=sk
c3skjdej+
sk1
X
i=1
c2skics+i=b3sk, k = 0,1, . . . , s 2,(39)
s
X
j=sk
c3skjdej=b3sk, k =s1,
and using (38) it follows that
des= (b3s
s1
X
i=1
c2sics+i)/c2s,
desk= (b3sk
s
X
j=s+1k
c3skjdej
s1k
X
i=1
c2skics+i)/c2s,(40)
k= 1,2, . . . , s 2,
d1= (b2s+1
s
X
j=2
c2s+1jdej)/c2s,
where, if c2s6= 0, each sum dei=di+ei,i=s, s 1,...,2, and the coefficient
d1can be obtained explicitly using the coefficients ci,i=s+ 1, s + 2,...,2s
obtained from (38).
The equations corresponding to the coefficients of powers A2sk, for k=
0,1, . . . , s 1, are
k
X
i=0
dsiesk+i+gk+e0c2sk=b2sk, k = 0,1...,s1,(41)
where
gk=
s1k
X
i=1
cs+idesik, k = 0,1, . . . , s 2, gs1= 0,(42)
and the coefficients gkcan be computed explicitly using (38) and (40).
Using (41) with k= 0 it follows that
esdese2
s+g0+e0c2s=b2se0= (b2sg0esdes+e2
s)/c2s,(43)
11
provided that c2s6= 0. Hence, since des,g0and c2scan be computed using
(38) and (40), the coefficient e0is a polynomial of second order in the variable
es. Using now (41) with k= 1 one gets
es1(des2es) + esdes1+g1+e0c2s1=b2s1,(44)
and then if ds6=esit follows that des2es=dses6= 0 and
es1= (b2s1g1e0c2s1esdes1)/(des2es),(45)
where es1is a rational function of es, since by (43) e0is a polynomial of
esof second order, and all the remaining quantities can be computed using
(38), (40) and (42). Note that analogously, using (41) with k= 2 it follows
that
es2(des2es) + esdes2+es1des1e2
s1+g2+e0c2s2=b2s2,(46)
and then, again if ds6=esit follows that
es2= (b2s2g2e0c2s2esdes2es1des1+e2
s1)/(des2es),(47)
where similarly es2is also a rational function of essince by (43) and (45)
one gets that e0is a polynomial of es, and es1is a rational function of es,
and all the remaining quantities can be computed using (38), (40) and (42).
Note that from (45) and (47) it follows that the rational function es2has
denominator (des2es)3.
Analogously, it is easy to show that
esk=b2skgke0c2skesdesk
dk/2e−1
X
i=1 esidesk+iesk+i(desi2esi)(48)
+
0/(des2es),odd k, 2< k s2,
esk/2desk/2e2
sk/2/(des2es),even k, 2< k s2,
(49)
where eskis also a rational function of eswith denominator (des2es)ik,s
where ik,s >0 is an integer number depending on kand s.
12
The last equation of this group is
0 = bs+1 +e0cs+1 +esd1+
ds/2e−1
X
i=1
(eside1+ie1+i(desi2esi))
+(0,even s > 2,
es+1
2des+1
2e2
s+1
2
,odd s > 2,(50)
Using the expressions (45), (47) and (48) obtained for esk, for k=
1,2, . . .,s2, as rational functions of esand e0in (43) as a polynomial of es,
it follows that expression (50) is a rational function of es, and multiplying it
by (des2es)is, where isis an integer number depending on s, expression
(50) can be written as a polynomial of es, provided that des2es=ds
es6= 0. Hence, it has as many solutions as the resulting polynomial degree.
Substituting these solutions in the expressions (45), (47) and (48) obtained
for esk,k= 1,2, . . . , s 2, and e0from (43) the coefficients e0and esk,
k= 1,2, . . . , s 2, can be obtained. The coefficients di, for i= 1,2, . . . , s,
can be obtained using the coefficients ei, for i= 0,2,3, . . . , s, and (40). The
solution for the coefficients with s= 3 and s= 4 gives polynomial equations
in the variable esof degrees 4 and 6, respectively, and for s5 larger degree
polynomials are obtained, and then, there are even more solutions for es.
Finally, from the equations involving Ai, for i=s, s 1,...,0, it is easy
to show that
fsk=bsk
sk2
X
i=1
dieski(51)
fi=bi, i = 2,1,0.
Using (36) and Table 1, Table 5 shows the maximum orders that can be
achieved for a given cost C(M) in terms of matrix products with Horner
and Paterson–Stockmeyer methods and the method given by y1s(A) using
(34) and (35). Note that y1s(A) allows to evaluate a polynomial of degree
greater than Horner and Paterson–Stockmeyer methods for a cost from 3M
to 9M, i.e. polynomial degrees from dy1s= 8 to 32 corresponding to s=
2,3,...,8, in y1s(A). We checked that there were at least 4 real solutions for
all the coefficients in (34) and (35) when y1s(A) was equal to the exponential
and cosine Taylor approximations of the corresponding degrees dy1s, avoiding
complex arithmetic if Ais a real square matrix.
13
C(M) 3 4 5 6 7 8 9 10 11 12
dP S 6 9 12 16 20 25 30 36 42 49
dy1s8 12 16 20 24 28 32 36 40 44
Table 5: Order of the approximation dP S that can be achieved using Horner and Paterson–
Stockmeyer methods and order dy1susing method given by (34) and (35) for a given cost
Cin terms of matrix products.
3.1. Combination of y1s(A)with Horner and Paterson–Stockmeyer methods
The following proposition combines Horner and Paterson–Stockmeyer
evaluation formula (4) with (35) to increase the degree of the resulting poly-
nomial to be evaluated:
Proposition 1. Let z1ps(x)be
z1ps(x) = ···y1s(x)xs+ap1xs1+ap2xs2+. . . +aps+1x+aps
×xs+aps1xs1+aps2xs2+. . . +ap2s+1 x+ap2s
×xs+ap2s1xs1+ap2s2xs2+. . . +ap3s+1 x+ap3s
.
.
.
×xs+as1xs1+as2xs2+· ·· +a1x+a0,(52)
where pis a multiple of sand y1s(x)is computed with (34) and (35). Then
the degree of z1ps(x)and its computational cost for x=ACn×nare
dz1ps = 4s+p, Cz1ps = (1 + s+p/s)M. (53)
Proof. The value of dz1ps follows from (36) and (52). For the value of Cz1ps
note that the matrix powers Ai,i= 2,3, . . . , s, to be evaluated for Horner and
Paterson–Stockmeyer evaluation formulas can be reused to compute y1s(A),
and note also that one matrix product is needed to compute y1s(A)Asin
(52). Then, if pis a multiple of s, using (36) and (52) it follows the value of
Cz1ps in (53).
If we apply the evaluation formula (52) to evaluate a polynomial of degree
m+p, i.e. Pm+p(A), it follows that
z1ps(A) = y1s(A)Ap+
p1
X
i=0
aiAi=Pm+p(A) =
m+p
X
i=0
biAi.(54)
14
m8 12 16 20 20 25 30 30 36 42 42 49 56 56 ···
s23445556667778···
p0 0 0 4 0 5 10 6 12 18 14 21 28 24 · ··
CP S (M) 4 5 6 7 7 8 9 9 10 11 11 12 13 13 ·· ·
Cz1ps (M) 3 4 5 6 6 7 8 8 9 10 10 11 12 12 ···
Table 6: Parameters sand pfor z1ps(x) from (52) to obtain the same approximation order
mas Horner and Paterson–Stockmeyer methods with a saving of 1 matrix product, where
CP S is the cost for evaluating (4) and Cz1ps is the cost for computing z1ps(x), both costs
in terms of matrix products. The first row shows the maximum values of mobtained in
z1ps(x) for a given number of matrix products.
Therefore, the coefficients ai,i= 0,1, . . . , p1, are directly the corresponding
coefficients bi,i= 0,1, . . . , p 1, from (54), and the coefficients from y1s(A)
can be obtained changing bito bi+pin (38), (40), (43), (45), (47), (48), (50),
(51).
Using (53) Table 6 shows the parameters sand pto evaluate a polynomial
of maximum degree mfor a given cost using z1ps(A) from (52), and it is
compared to the cost of Paterson–Stockmeyer method for the same values
of m. Except for m= 8, all the values are in the set mfrom Table 1,
and for all of them one matrix product is saved with respect to using only
the Paterson–Stockmeyer method. The evaluation scheme z1ps(A) allows to
evaluate polynomials of higher degree than that of the Paterson–Stockmeyer
method for a cost greater than or equal to 3M. Note that for a cost lower
than or equal to 5Mthe maximum degree is obtained using
z1,p=0, s(A) = y1s(A),(55)
from (35). Therefore, z1ps(A) can be considered as a generalization of y1s(A).
In order to evaluate polynomials of degrees different from those given in
Table 6 other combinations z1ps(A) of the new method with the Paterson–
Stockmeyer method can be used, where pis not a multiple of s. For instance,
a polynomial of degree m= 23 can be written as
P23(x) = z1,7,4(A) = (y1,4(x)x3+a6x2+a5x+a4)x4+a3x3+a2x2+a1x+a0,
(56)
where the coefficients of y1,4(x) can be obtained similarly to those of y1s(x)
in (54).
15
c10 -6.140022498994532×1017 e4-2.785084196756015×109
c9-9.210033748491798×1016 e3-4.032817333361947×108
c8-1.980157255925737×1014 e2-5.100472475630675×107
c7-4.508311519886735×1013 e0-1.023463999572971×103
c6-1.023660713518307×1011 f54.024189993755686×1013
d5-1.227011356117036×1010 f47.556768134694921×1012
d4-6.770221628797445×109f31.305311326377090×1010
d3-1.502070379373464×107f22.087675698786810×109
d2-3.013961104055248×106f12.505210838544172×108
d1-5.893435534477677×105f02.755731922398589×107
e5-3.294026127901678×1010
Table 7: One real solution for coefficients from (34) and (35) for computing Taylor ap-
proximation of the exponential of order m= 30 with (52) taking s= 5 and p= 10. Note
that in this case coefficients in (54) are bi= 1/i!, i= 0,1,...,30.
Example 3.2. Table 7 presents one solution for the coefficients for an exam-
ple of z1ps(x)from (52) combining (34) and (35) with Horner and Paterson–
Stockmeyer methods with p= 10 and s= 5 to compute Taylor approximation
of the matrix exponential of order m= 30.
From (53) the cost of computing z1,10,5(A)is Cz1,10,5= 8M, 1 matrix
product less than using Horner and Paterson–Stockmeyer methods, see Table
6.
Analogously, using z1ps (x)from (52) with (34) and (35), we computed the
coefficients from (34) and (35) for computing Taylor exponential and cosine
approximation polynomials for all the approximation orders min Table 6
up to approximation order m= 81. This process gave always several real
solutions for all the coefficients involved. The maximum degree used in the
Taylor approximation of the matrix exponential in double precision arithmetic
from [6] is m= 30, and in the matrix cosine in [8] is m= 16. Note that
the values from Table 7 can be directly used to evaluate Taylor approximation
of order m= 30 in the algorithm from [6]. We also checked that using
z1,p=0, s(A) = y1s(A)from (35) gave also real coefficients for computing Taylor
exponential and cosine approximation polynomials with s= 2,3,4. Hence,
if Ais a real square matrix, using z1ps(A)we can compute the exponential
and cosine approximations using real arithmetic saving 1Mwith respect to
the algorithms in [6, 8] for Taylor polynomial degrees mmfrom Table 1,
m12.
Finally, similarly to Example 3.1 we checked the stability of the solutions
16
of the coefficients in IEEE double precision arithmetic from Table 7, substi-
tuting them in the system of equations (37),(39) taking dei=di+eiwhere di
and eiare the values from Table 7, (41) and (51). Analogously, in all cases
the relative error |bi1/i!|i!,i=p, p +1, . . . , m +p, see (54), was lower than
the unit roundoff u.
In a similar way we also checked the stability for the computation of the
exponential Taylor polynomial approximation for all the degrees mfrom Table
6 up to m= 81 obtaining the following results:
There were 4 real solutions for all orders except for m= 25, with 12
real solutions, m= 49,64, and 56 (with parameters s= 8,p= 24)
with 8 real solutions, and m= 42 (with p= 14,s= 7) with 20 real
solutions.
The solutions for eswere in decreasing module from m= 12 with |es|
of order 102to m= 81 with |es|of order 1044.
In the case m= 42 (with p= 14,s= 7) the 20 solutions had all
positive values es[2.23 ×1016,8.07 ×1016]. Taking the solutions
esin double precision arithmetic, from the 20 solutions there were 12
solutions that gave a maximum relative error for all coefficients biless
than 3u, being stable. However, 8 solutions showed certain signs of
instability, giving a maximum relative error for coefficients bibetween
5.04×1012 and 2.99 ×1010 > u. Therefore, it is important to select a
solution for esin double precision arithmetic that gives relative errors
for all coefficients biof order u.
We checked also the stability for the Taylor approximation of the matrix
exponential in all the cases from Table 5 and found that the worst case
was m= 28 with s= 7. This is not a case of practical use since, from
Table 5 it has a cost 8M, and from Table 6, using z1ps(A)with p= 10
and s= 5 gives the greater order m= 30 for the same cost, and that
option was checked above to be stable. However, we checked its stability
as a worst case study. This case gave 3 real solutions, where one of
them had multiplicity 10. For the coefficients using the two solutions
eswith multiplicity 1 the maximum relative errors for all coefficients
biwhere of order 1015 > u. We also checked the scalar case A= 1,
giving relative errors |exp(1) y1,s=7(1)|/exp(1) = 4.36 ×1016 and
3.70×1015 , respectively. However, using the solution with multiplicity
17
10 gave a maximum relative error 10.75 ufor coefficient b8. For
the rest of coefficients the maximum relative error was 1.49 ×1014,
and for |exp(1) y1,s=7(1)|/exp(1) = 9.81 ×105, so the accuracy was
much lower when using the solution of eswith multiplicity 10.
Therefore, it is necessary to check the stability of the solutions for es
before using the method to evaluate a given polynomial. In general, we
propose to select the solution for esin double precision arithmetic that
gives the lowest maximum relative error for all coefficients bi. If there is
no solution giving relative errors of order ufor a given polynomial with
degree m, a different parameter selection from Tables 6 and 5 should be
tested, since in Table 5 for m > 16 there are two possibilities for pand
sthat gives each value of m.
4. Comparison with existing methods
Using (36), (53) and Tables 1, 2 and 3, 5 and Table 6, it follows Table 8
that shows the approximation orders that can be obtained with Taylor poly-
nomial approximations evaluated using Horner and Paterson–Stockmeyer
methods P Sm(A), y1s(x) from (35), z1ps(A) from (52), Pad´e rational ap-
proximation from Section 2.2, and the mixed rational and polynomial ap-
proximation from Section 2.3, for a given cost in terms of matrix products, if
each approximation reproduces the first terms of the Taylor series of a given
function f, whenever all the approximations exist. Note that the cost of
solving the multiple right-hand side linear system in rational approximations
was taken as 4/3M.
Table 8 shows that the polynomial approximation that allows for the
highest approximation order is y1s(A) for a cost C6Mand z1ps(A) for C
3M. Note that in Section 3.1 for C5Mwe took z1ps(A) = z1,p=0,s(A) =
y1s(A), see (55). Hence, the approximation orders allowed by z1ps(A) for C
3Mare higher than the approximation orders available with both Paterson–
Stockmeyer and rational Pad´e method. The highest order for C6Mis
given by the mixed rational and polynomial approximation tijs(A) (10). In
the following section particular examples are given in order to increase the
efficiency of polynomial approximations even more.
18
C(M)P Sm(A)y1s(A)z1ps(A)CR(M)rmm (A)CRP (M)tijs (A)
3 6 8 8 3.33 6 3.33 9
4 9 12 12 4.33 8 4.33 12
5 12 16 16 5.33 12 5.33 16
6 16 20 20 6.33 16 6 21
7 20 24 25 7.33 20 7 28
8 25 28 30 8.33 24 8 35
9 30 32 36 9.33 30 9 42
10 36 36 42 10.33 36 10 49
11 42 40 49 11.33 42 11 56
Table 8: Maximum approximation orders if any of the approximations reproduce the first
terms of the Taylor series of a given function ffor a given cost Cfor polynomial ap-
proximations, CRfor rational approximations and CRP for mixed rational and polynomial
approximants, where rational approximations are computed as in Section 2.2 and mixed
rational and polynomial approximants are evaluated as in Section 2.3. The polynomial
approximations considered are Horner and Paterson–Stockmeyer P Sm(A) from Section
2.1, and y1s(A) and z1ps(A) from Section 3. Bold style is applied to the maximum degrees
over all polynomial approximations, and to tijs(A) when it provides the maximum degree
over all approximations with an integer cost.
5. General expressions
This section gives examples that suggest new general expressions for eval-
uating matrix polynomials more efficiently than the evaluation schemes given
in Section 3.
Example 5.1. Consider
y02(A) = A2(c16 A2+c15A),(57)
y12(A) = (y02 (A) + c14A2+c13 A)(y02(A) + c12A2+c11I) + c10y02 (A),(58)
y22(A) = (y12 (A) + c9A2+c8A)(y12(A) + c7y02 (A) + c6A)
+c5y12(A) + c4y02 (A) + c3A2+c2A+c1I, (59)
where the coefficients are numbered correlatively and A2is computed once
and stored to be reused in all the computations. It is easy to show that the
degree of polynomial y22(A)is m= 16 and it can be evaluated with a cost
Cy22 = 4M.
Using function solve from MATLAB Symbolic Math Toolbox, Table 9
gives one solution for the coefficients to compute the exponential Taylor ap-
proximation Pm(A)of order m= 15, i.e. bi= 1/i!,i= 0,1,...,15. For the
19
c16 4.018761610201036×104c82.116367017255747×100
c15 2.945531440279683×103c7-5.792361707073261×100
c14 8.712167566050691×102c6-1.491449188999246×101
c13 4.017568440673568×101c51.040801735231354×101
c12 -6.352311335612147×102c4-6.331712455883370×101
c11 2.684264296504340×101c33.484665863364574×101
c10 1.857143141426026×101c2-1.224230230553340×101
c92.381070373870987×101c11
Table 9: Coefficients of y02,y12 ,y22 from (57)-(59) for computing the matrix exponential
Taylor approximation of order m= 15.
solution given in Table 9 if we write y22(A)as a polynomial Pm(A)of degree
m= 16 the relative error for b16 with respect to the corresponding Taylor
polynomial coefficient is
(b16 1/16!)16! = 0.454,(60)
showing three significant digits.
We selected different possibilities for a new coefficient c0added in (57)-
(59), trying compute the matrix exponential and the matrix cosine Taylor
approximations of order 16, for instance changing (58) for
y12(A) = (y02 (A)+c14A2+c13A+c0I)(y02(A)+c12A2+c11I)+c10 y02(A),(61)
and other options. However, sometimes MATLAB could not find an explicit
solution for the coefficients, and the other times MATLAB gave solutions
with numeric instability.
Note that in Example 5.1 the degree of yk,2(A), k= 1,2, is twice the de-
gree of the polynomial yk1,2(A), increasing the cost by just 1Mwhen com-
puting yk,2(A) using yk1,2(A). Therefore, the polynomial degree increases
exponentially while the cost increases linearly. Following this idea Proposi-
tion 2 gives expressions yks(A), k1 more general than (34) and (35) where
the degree of the polynomial yks (A) is twice the degree of the polynomial
yk1,s(A), k1, while the cost increases by 1Mwhen computing yks(A)
using yk1,s(A):
20
Proposition 2. Let
y0s(x) = xs
s
X
i=1
c(0,1)
ixi+
s
X
i=0
c(0,2)
ixi,(62)
y1s(x) = 0
X
i=0
c(1,1)
iyis(x) +
s
X
i=0
c(1,2)
ixi! 0
X
i=0
c(1,3)
iyis(x) +
s
X
i=0
c(1,4)
ixi!
+
0
X
i=0
c(1,5)
iyis(x) +
s
X
i=0
c(1,6)
ixi,(63)
y2s(x) = 1
X
i=0
c(2,1)
iyis(x) +
s
X
i=0
c(2,2)
ixi! 1
X
i=0
c(2,3)
iyis(x) +
s
X
i=0
c(2,4)
ixi!
+
1
X
i=0
c(2,5)
iyis(x) +
s
X
i=0
c(2,6)
ixi,(64)
.
.
.
yks(x) = k1
X
i=0
c(k,1)
iyis(x) +
s
X
i=0
c(k,2)
ixi! k1
X
i=0
c(k,3)
iyis(x) +
s
X
i=0
c(k,4)
ixi!
+
k1
X
i=0
c(k,5)
iyis(x) +
s
X
i=0
c(k,6)
ixi,(65)
where yks(x)is a polynomial of x. Then, the maximum polynomial degree,
denoted by dyks, and the computing cost if x=A,ACn×nin terms of
matrix products, denoted by Cyks are given by
dyks = 2k+1s, Cyks = (s+k)M, (66)
Proof. From (62), the maximum degree of the polynomial y0s(x) is 2s.
Then using (62)-(65) the maximum degree of yis(x), ikis 2(i+1)s.
If x=A,ACn×n, then the cost of computing yks(A) is s1 matrix
products for computing Ai, for i= 2,3, . . . , s, and one matrix product in
each iteration from (62)-(65), i.e. k+ 1. Therefore, Cyks = (s+k)M.
Note that (34) and (35) are particular cases of Proposition 2 where k= 1
and some coefficients c(l,j)
i,l= 0,1, in (62) and (63) are zero. Similarly,
(57)-(59) are particular cases of (62)-(64) where k= 2, s= 2 and some
coefficients c(l,j)
i,l= 0,1,2, are also zero.
21
If we write (65) in powers of xas
yks(x) =
m
X
i=0
aixi,(67)
then ai,i= 0,1, . . . , m, are functions of the coefficients c(l,j)
i, for all i, j, l in
(62)-(65). Hence, it is possible to evaluate matrix polynomial Pm(A) using
(62)-(65) if the system of equations
am(c(l,j)
i) = bm,
am1(c(l,j)
i) = bm1,(68)
.
.
.
a0(c(l,j)
i) = b0,
for all coefficients c(l,j)
ifrom (62)-(65) involved in each coefficient ai,i=
0,1, . . . , m, has at least one solution, where biare the polynomial coefficients
of Pm(A). We have obtained a general solution for evaluating polynomials
using (34) and (35) corresponding to particular cases of (62) and (63). And
we obtained one solution for computing the exponential Taylor approxima-
tion of order 15 with (57)-(59). Future work is addressed to obtain general
solutions for evaluating matrix polynomials of different degrees using (62)-
(65), and to study if at least there are particular solutions for evaluating
polynomials such that the Taylor polynomial approximation of certain de-
grees for different matrix functions. That is the case of Example 5.1 which
provides formulas for computing the exponential Taylor approximation poly-
nomial of order m= 15 with a cost C= 4M. From Table 8 it follows that
with a cost of 4MPaterson–Stockmeyer method allows to compute the ma-
trix exponential Taylor approximation polynomial of order only m= 9, Pad´e
rational method rmm(A) allows an order less than 8, the mixed rational and
polynomial approximation tijs(A) allows an order less than 12, and the new
method based on (34) and (35) allows an order m= 12.
In the following example we consider the computation of the Taylor ex-
ponential approximation of order 16 by using the product of two polynomials
of degree 8, both evaluated using (14) and (15).
Example 5.2. Let
h2m1(A) = Pm1(A)P0
m1(A) + β0=
m1
X
i=0
biAi
m1
X
i=0
b0
iAi+β0,(69)
22
b82.186201576339059×107b0
82.186201576339059×107
b79.839057366529322×107b0
72.514016785489562×106
b61.058964584814256×105b0
63.056479369585950×105
b51.554700173279057×104b0
53.197607034851565×104
b42.256892506343887×103b0
42.585006547542889×103
b32.358987357109499×102b0
31.619043970183846×102
b21.673139636901279×101b0
28.092036376147299×102
b17.723603212944010×101b0
13.229486011362677×101
b03.096467971936040×100β01
Table 10: Coefficients from (69) for computing the matrix exponential Taylor approxima-
tion of order m= 16 where coefficient b0
8=b8and b0
0= 0.
c44.675683454147702×104c0
44.675683454147702×104
c31.052151783051235×103c0
32.688394980266927×103
d2-3.289442879547955×102d0
22.219811707032801×102
d12.868706220817633×101d0
13.968985915411500×101
e25.317514832355802×102e0
22.771400028062960×102
e07.922322450524197×100e0
01.930814505527068×100
f21.673139636901279×101f0
28.092036376147299×102
f17.723603212944010×101f0
11.614743005681339×101
f03.096467971936040×100f0
00
Table 11: Coefficients from system (16)-(24) for evaluating polynomials y1(A) = Pm1(A)
and y0
1(x) = P0
m1(A) from (69) with coefficients given by Table 10. Note that f0
0= 0 since
y0
1(0) = b0
0= 0.
where we took m1= 8,b0
8=b8,b0
0= 0 and h2m1(0) = β0, and, therefore,
Pm1(A)and P0
m1(A)are both polynomials as (3) of degree 8, and h2m1(A)
can be written as a polynomial of degree 16 with 17 coefficients, i.e. bi,
i= 0,1,...,8,b0
i,i= 1,...,7and β0. Using the MATLAB Symbolic Math
Toolbox solve function, Table 10 presents one solution for the coefficients of
an example where h2m1(A) = P16
i=0 Ai/i!, i.e. the exponential Taylor polyno-
mial approximation of degree m= 16.
Note that one can evaluate both polynomials Pm1(A)and P0
m1(A)using
an evaluation scheme (14) and (15), see Example 3.1. Finally, from (69) it
follows that β0= 1 so that h2m1(0) = exp(0) = 1. Table 11 shows one solution
for the coefficients from (16)-(24) using (25)-(32) taking y1s(A) = Pm1(A),
and the coefficients taking y0
1s(A) = P0
m1(A), corresponding to c0
4,c0
3,d0
2,d0
1,
e0
2,e0
0,f0
2,f0
1and f0
0.
23
C(M) 6 7 8 9 10 11 12
dP S 16 20 25 30 36 42 49
dz1ps 20 25 30 36 42 49 56
dhm116 24 32 40 48 56 64
Table 12: Order of the approximation dPS that can be obtained using Horner and
Paterson–Stockmeyer methods, order dz1ps that can be obtained using z1ps(A) from (52),
and order dhm1that can be obtained using method given by hm1(A) from (69), using (34)
and (35) for evaluating the polynomials therein, for a given cost Cin terms of matrix
products, whenever the solutions for the coefficients from (69), (34) and (35) exist.
In general, if we evaluate both polynomials Pm1(A) and P0
m1(A) by using
(34) and (35) with m1= 4s, if there exists a solution for the coefficients bi
and b0
ifor Pm1(A) and P0
m1(A), using (36) the degree of the matrix polynomial
h2m1(A) and its computing cost are
dh2m1= 8s, Ch2m1= (s+ 4)M. (70)
Table 12 shows the comparison of the polynomial degrees that can be
obtained by Horner and Paterson–Stockmeyer methods, z1ps(A) from (52)
and h2m1(A) given by (69) varying m1, for a given cost, whenever a solution
for all the coefficients involved in h2m1(A) exists. Since for C > 6Mthey
would be more efficient than Paterson–Stockmeyer method and for C > 7M
they would be more efficient than the method given by (52), it is worth
studying if there exist evaluation schemes like (69) in general, or if at least
they exist for the polynomial approximation of specific matrix functions or
for the evaluation of matrix polynomials in the applications. Moreover, in
order to obtain a polynomial degree equal to 2m1, note that one can think
of other possibilities to have 2m1+ 1 coefficients in h2m1(A) different from
selecting bm1=b0
m1and b0
0as in Example 5.2.
Note that similarly to Section 3.1 Paterson–Stockmeyer method can be
combined with any other method proposed above. And analogously to Ex-
ample 5.2, we can also obtain new methods for evaluating matrix polynomi-
als and matrix polynomial approximations using products of the evaluation
schemes proposed above whenever a solution for the all the coefficients in-
volved exists. The same powers Ai,i= 1,2, . . . , s, should be used in each
evaluation scheme involved, so that they can be reused in all the computa-
tions. It is important to note that even in the case of the well known Pad´e
24
approximations, for a given function f, k and m, a [k/m] Pad´e approximant
rk,m might not exist, see Section 2.2. Therefore, the existence of particular
cases of the methods proposed in this section for computing matrix functions
arising often in the applications is useful if they are more efficient than the
existing methods in those concrete cases. That is the case of Example 5.1
with the matrix exponential Taylor approximation of order 15 which can be
computed with just 4M.
6. Conclusions
This paper proposes the new general evaluation schemes for matrix poly-
nomials given by y0s(A) (34), y1s(A) (35) and z1ps(A) (52), and a method to
check their stability was given. It was shown that these evaluation schemes
allow to evaluate polynomials of degree higher than that of the Paterson–
Stockmeyer method for the same cost. It was also shown that they provide
a greater Taylor approximation order than diagonal Pad´e approximation for
the same cost. Moreover, the new evaluation schemes are more efficient than
the recent mixed rational and polynomial approximation from [9] for several
orders of approximation.
Through Examples 5.1 and 5.2, we suggest the study of more general poly-
nomial evaluation schemes that can be even more efficient, and applications
to the Taylor approximation of matrix functions were given.
With the proposed methods we can state that the combination of Horner
and Paterson–Stockmeyer methods is no longer the most efficient general
method for evaluating matrix polynomials, and that Pad´e approximations
are no longer more accurate than polynomial approximations for the same
cost either.
Future work is:
To determine if it is possible to find general solutions for evaluating
matrix polynomials using (62)-(65) with s2 and k2, or at least
particular solutions for cases of interest as in Example 5.1.
To study if there are general solutions, or at least particular solutions
for the matrix polynomial evaluation using products of the new pro-
posed matrix polynomial evaluation schemes, similarly to Example 5.2.
25
7. Acknowledgements
This work has been supported by Spanish Ministerio de Econom´ıa y
Competitividad and European Regional Development Fund (ERDF) grant
TIN2014-59294-P. We thank the anonymous referee who revised this paper
so thoroughly and carefully.
[1] M. S. Paterson, L. J. Stockmeyer, On the number of nonscalar mul-
tiplications necessary to evaluate polynomials, SIAM J. Comput.,
2(1)(1973), pp. 60–66.
[2] N. J. Higham, Functions of Matrices: Theory and Computation, Society
for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008.
[3] G. H. Golub, C. V. Loan, Matrix Computations, 3rd Ed., Johns Hopkins
Studies in Math. Sci., The Johns Hopkins University Press, 1996.
[4] C. B. Moler, C. V. Loan, Nineteen dubious ways to compute the expo-
nential of a matrix, twenty-five years later, SIAM Rev., 45 (2003), pp.
3–49.
[5] A. H. Al-Mohy, N. J. Higham, A new scaling and squaring algorithm
for the matrix exponential, SIAM J. Matrix Anal. Appl., 31(3)(2009)
970–989.
[6] P. Ruiz, J. Sastre, J. Ib´a˜nez, E. Defez, High performance computing
of the matrix exponential, J. Comput. Appl. Math., 291 (2016), pp.
370-379.
[7] A. H. Al-Mohy, N. J. Higham, S. Relton, New Algorithms for Computing
the Matrix Sine and Cosine Separately or Simultaneously. SIAM J. Sci.
Comput., 37(1)(2015), pp. A456-A487.
[8] J. Sastre, J. Ib´nez, P. Alonso, J. Peinado, E. Defez, Two algorithms
for computing the matrix cosine function, Appl. Math. Comput., 312
(2017), pp. 66-77.
[9] J. Sastre, Efficient mixed rational and polynomial approximation of ma-
trix functions, Appl. Math. Comput., 218(24)(2012), pp. 11938–11946.
[10] S. Blackford, J. Dongarra, Installation guide for LAPACK, LAPACK
Working Note 41, Department of Computer Science University of Ten-
nessee, 1999.
26
... Next, in Line 2 of Algorithm 1 (Line 3 of Algorithm 2), matrix A is properly scaled, and in the following line, polynomial P m k (A) orP m k (A) must be efficiently computed using methods such as those described in [35,36]. In our implementations, the Paterson-Stockmeyer method [35] was employed. ...
... In our implementations, the Paterson-Stockmeyer method [35] was employed. In this procedure, assuming that polynomial order m k is chosen from the set 2,4,6,9,12,16,20,25,30,36,42,49, 56, 64, . . . }, powers A i , 2 ≤ i ≤ q, must be calculated, where q = √ m k or q = √ m k is an integer divisor of m k . ...
... The same is also performed in Figure 4e, which is referenced later. These matrices were number 6,7,12,15,23,36,39,50 and 51 from MCT. Figure 1b,d,f present performance profiles comparing the accuracy of the codes on the different matrix sets. For a given value α on the x-axis, the value of p on the y-axis is the proportion of matrices for which the considered code had a relative error lower than or equal to α times the smallest relative error of all the codes for the matrix. ...
Article
Full-text available
This paper presents three different alternatives to evaluate the matrix hyperbolic cosine using Bernoulli matrix polynomials, comparing them from the point of view of accuracy and computational complexity. The first two alternatives are derived from two different Bernoulli series expansions of the matrix hyperbolic cosine, while the third one is based on the approximation of the matrix exponential by means of Bernoulli matrix polynomials. We carry out an analysis of the absolute and relative forward errors incurred in the approximations, deriving corresponding suitable values for the matrix polynomial degree and the scaling factor to be used. Finally, we use a comprehensive matrix testbed to perform a thorough comparison of the alternative approximations, also taking into account other current state-of-the-art approaches. The most accurate and efficient options are identified as results.
... However, we will show that our analysis provides a larger radius of convergence of the backward error series. Finally, we will give an example where, by using the new matrix polynomial evaluation methods from [17], the Taylor method will be more efficient than the Padé method. The method from [17,Prop. ...
... Finally, we will give an example where, by using the new matrix polynomial evaluation methods from [17], the Taylor method will be more efficient than the Padé method. The method from [17,Prop. 1] is more efficient than the well-known Paterson-Stockmeyer method [18], see (5) from [17] for the cost of the Paterson-Stockmeyer method and [17, Tab. 8] for a cost comparison of both methods and also the Padé method. ...
... The method from [17,Prop. 1] is more efficient than the well-known Paterson-Stockmeyer method [18], see (5) from [17] for the cost of the Paterson-Stockmeyer method and [17, Tab. 8] for a cost comparison of both methods and also the Padé method. ...
Article
Full-text available
In this paper we give a new formula to write the forward error of Taylor approximations of analytical functions in terms of the backward error of those approximations, considering exact arithmetic in both errors. Using this formula, a method to compute a backward error given by the power series centered in the same expansion point as the Taylor approximation is provided. The application of this method for Padé approximations is also shown. Based on the previous method, a MATLAB implementation for computing the first power series terms of the backward error for Taylor and Padé approximations of a given analytical function is provided, and examples of its use are given. Applications to the computation of matrix functions are given that overcome limitations of other backward error analyses which uses inverse compositional functions in the literature.
... Their implementations for other distinct precisions are straightforward. This paper is organized as follows: Section 2 describes an inverse scaling and squaring Taylor algorithm based on efficient evaluation formulas [41] to approximate the matrix logarithm, including an error analysis. Section 3 includes the results corresponding to the experiments performed in order to compare the numerical and computational performance of different codes against a test battery composed of distinct types of matrices. ...
... For the lower orders m = 1, 2, and 4, the corresponding polynomial approximations from (3) use the Paterson-Stockmeyer method for the evaluation. For the Taylor approximation orders m ≥ 8, Sastre evaluation formulas from [41,42] are used for evaluating Taylor-based polynomial approximations from (3) more efficiently than the Paterson-Stockmeyer method. For reasons that will be shown below, the Taylor approximation of − log(I − A), denoted by T m , is used in all the Sastre approximations and evaluation formulas of the matrix logarithm. ...
... For m = 8, the following evaluation formulas from [41] (Example 3.1) are used: ...
Article
Full-text available
The most popular method for computing the matrix logarithm is a combination of the inverse scaling and squaring method in conjunction with a Padé approximation, sometimes accompanied by the Schur decomposition. In this work, we present a Taylor series algorithm, based on the free-transformation approach of the inverse scaling and squaring technique, that uses recent matrix polynomial formulas for evaluating the Taylor approximation of the matrix logarithm more efficiently than the Paterson–Stockmeyer method. Two MATLAB implementations of this algorithm, related to relative forward or backward error analysis, were developed and compared with different state-of-the art MATLAB functions. Numerical tests showed that the new implementations are generally more accurate than the previously available codes, with an intermediate execution time among all the codes in comparison.
... It is noteworthy that research in the field of matrix polynomial evaluation is increasingly active. In particular, a new family of methods for evaluating matrix polynomials, which are more efficient than the established Paterson-Stockmeyer method, was proposed in [29]. This area could be a subject of future research for us, as this section concentrates on introducing general techniques for approximating the MLF matrix using rational approximation, aiming for a general comparison. ...
Article
Full-text available
The two-parameter Mittag–Leffler function Eα,β is of fundamental importance in fractional calculus, and it appears frequently in the solutions of fractional differential and integral equations. However, the expense of calculating this function often prompts efforts to devise accurate approximations that are more cost-effective. When α>1, the monotonicity property is largely lost, resulting in the emergence of roots and oscillations. As a result, current rational approximants constructed mainly for α∈(0,1) often fail to capture this oscillatory behavior. In this paper, we develop computationally efficient rational approximants for Eα,β(−t), t≥0, with α∈(1,2). This process involves decomposing the Mittag–Leffler function with real roots into a weighted root-free Mittag–Leffler function and a polynomial. This provides approximants valid over extended intervals. These approximants are then extended to the matrix Mittag–Leffler function, and different implementation strategies are discussed, including using partial fraction decomposition. Numerical experiments are conducted to illustrate the performance of the proposed approximants.
... The efficient computation of an important number of functions of matrices of moderate size is of great interest in many different fields [1,3,4,5,7,8,9,20,22,23,24,25,30,31,34,35,36,37,38]. Frequently, it suffices to compute their action on a vector [2,17,25,26,27,32] allowing to solve problems of large dimensions, or using an appropriate filtering technique the previous methods can also be used to compute functions of large sparse matrices [40]. ...
Preprint
Full-text available
We present a novel class of methods to compute functions of matrices or their action on vectors that are suitable for parallel programming. Solving appropriate simple linear systems of equations in parallel (or computing the inverse of several matrices) and with a proper linear combination of the results, allows us to obtain new high order approximations to the desired functions of matrices. An error analysis to obtain forward and backward error bounds is presented. The coefficients of each method, which depends on the number of processors, can be adjusted to improve the accuracy, the stability or to reduce round off errors of the methods. We illustrate this procedure by explicitly constructing some methods which are then tested on several numerical examples.
... Comparing (12) with (11), we observe that there is always an extra matrix product when evaluating (12), the matrix product indicated by the symbol "?". Due to the importance of reducing the number of matrix products, see [19][20][21] for more details, we will focus mainly on the expansion (11). ...
Article
Full-text available
There are, currently, very few implementations to compute the hyperbolic cosine of a matrix. This work tries to fill this gap. To this end, we first introduce both a new rational-polynomial Hermite matrix expansion and a formula for the forward relative error of Hermite approximation in exact arithmetic with a sharp bound for the forward error. This matrix expansion allows obtaining a new accurate and efficient method for computing the hyperbolic matrix cosine. We present a MATLAB implementation, based on this method, which shows a superior efficiency and a better accuracy than other state-of-the-art methods. The algorithm developed on the basis of this method is also able to run on an NVIDIA GPU thanks to a MEX file that connects the MATLAB implementation to the CUDA code.
... This algorithm allows one to reduce the number of matrix multiplications and thus speed up calculations. We describe here only one specific case of this algorithm that we use in our numerical experiments in Section 8. See also modifications of the Paterson-Stockmeyer algorithm in [5,50]. Let p be a polynomial of degree 15 expanded in powers of λ: ...
Article
Full-text available
Let T be a square matrix with a real spectrum, and let f be an analytic function. The problem of the approximate calculation of f(T) is discussed. Applying the Schur triangular decomposition and the reordering, one can assume that T is triangular and its diagonal entries tii are arranged in increasing order. To avoid calculations using the differences tii − tjj with close (including equal) tii and tjj, it is proposed to represent T in a block form and calculate the two main block diagonals using interpolating polynomials. The rest of the f(T) entries can be calculated using the Parlett recurrence algorithm. It is also proposed to perform some scalar operations (such as the building of interpolating polynomials) with an enlarged number of significant decimal digits.
Article
Many numerical methods for evaluating matrix functions can be naturally viewed as computational graphs. Rephrasing these methods as directed acyclic graphs (DAGs) is a particularly effective approach to study existing techniques, improve them, and eventually derive new ones. The accuracy of these matrix techniques can be characterized by the accuracy of their scalar counterparts, thus designing algorithms for matrix functions can be regarded as a scalar-valued optimization problem. The derivatives needed during the optimization can be calculated automatically by exploiting the structure of the DAG, in a fashion analogous to backpropagation. This paper describes GraphMatFun.jl , a Julia package that offers the means to generate and manipulate computational graphs, optimize their coefficients, and generate Julia, MATLAB, and C code to evaluate them efficiently at a matrix argument. The software also provides tools to estimate the accuracy of a graph-based algorithm and thus obtain numerically reliable methods. For the exponential, for example, using a particular form (degree-optimal) of polynomials produces implementations that in many cases are cheaper, in terms of computational cost, than the Padé-based techniques typically used in mathematical software. The optimized graphs and the corresponding generated code are available online.
Article
We present a practical algorithm to approximate the exponential of skew-Hermitian matrices up to round-off error based on an efficient computation of Chebyshev polynomials of matrices and the corresponding error analysis. It is based on Chebyshev polynomials of degrees 2, 4, 8, 12 and 18 which are computed with only 1, 2, 3, 4 and 5 matrix-matrix products, respectively. For problems of the form exp(−iA), with A a real and symmetric matrix, an improved version is presented that computes the sine and cosine of A with a reduced computational cost. The theoretical analysis, supported by numerical experiments, indicates that the new methods are more efficient than schemes based on rational Padé approximants and Taylor polynomials for all tolerances and time interval lengths. The new procedure is particularly recommended to be used in conjunction with exponential integrators for the numerical time integration of the Schrödinger equation.
Article
Full-text available
The computation of matrix trigonometric functions has received remarkable attention in the last decades due to its usefulness in the solution of systems of second order linear differential equations. Several state-of-the-art algorithms have been provided recently for computing these matrix functions. In this work, we present two efficient algorithms based on Taylor series with forward and backward error analysis for computing the matrix cosine. A MATLAB implementation of the algorithms is compared to state-of-the-art algorithms, with excellent performance in both accuracy and cost.
Article
Full-text available
This work presents a new algorithm for matrix exponential computation that significantly simplifies a Taylor scaling and squaring algorithm presented previously by the authors, preserving accuracy. A Matlab version of the new simplified algorithm has been compared with the original algorithm, providing similar results in terms of accuracy, but reducing processing time. It has also been compared with two state-of-the-art implementations based on Padé approximations, one commercial and the other implemented in Matlab, getting better accuracy and processing time results in the majority of cases.
Article
Full-text available
This working note describes how to install, test, and time version 3.0 of LAPACK, a linear algebra package for high-performance computers. Separate instructions are provided for the Unix and non-Unix versions of the test package. Further details are also given on the design of the test and timing programs.
Article
Full-text available
In principle, the exponential of a matrix could be computed in many ways. Methods involv-ing approximation theory, differential equations, the matrix eigenvalues, and the matrix characteristic polynomial have been proposed. In practice, consideration of computational stability and efficiency indicates that some of the methods are preferable to others but that none are completely satisfactory. Most of this paper was originally published in 1978. An update, with a separate bibliography, describes a few recent developments.
Article
Full-text available
We present algorithms which use only O(x / nonscalar multiplications (i.e. multiplications involving "x " on both sides) to evaluate polynomials of degree n, and proofs that at least x/ are required. These results have practical application in the evaluation of matrix polynomials with scalar coefficients, since the "matrix matrix " multiplications are relatively expensive, and also in determining how many multiplications are needed for polynomials with rational coefficients, since multiplications by integers can in principle be replaced by several additions. Key words. Polynomial evaluation, nonscalar multiplications, rational coefficients, matrix polynomial. 1. Introduction. A well-known result given by Motzkin [23 and Winograd [6 is that, even with preliminary adaptation of the coefficients, at least n/2 multiplications are required to evaluate a polynomial of degree n if the coefficients of the polynomial are algebraically independent. However we frequently wish to evaluate polynomials with rational or integer coefficients for which this result does not
Article
Full-text available
The scaling and squaring method for the matrix exponential is based on the approximation eA ≈ (rm(2-sA))2s, where rm(x) is the [m/m] Padé approximant to ex and the integers m and s are to be chosen. Several authors have identified a weakness of existing scaling and squaring algorithms termed overscaling, in which a value of s much larger than necessary is chosen, causing a loss of accuracy in floating point arithmetic. Building on the scaling and squaring algorithm of Higham [SIAM J. Matrix Anal. Appl., 26 (2005), pp. 1179-1193], which is used by the MATLAB function expm, we derive a new algorithm that alleviates the overscaling problem. Two key ideas are employed. The first, specific to triangular matrices, is to compute the diagonal elements in the squaring phase as exponentials instead of from powers of rm. The second idea is to base the backward error analysis that underlies the algorithm on members of the sequence {∥Aκ∥ 1/κ} instead of ∥A∥, since for nonnormal matrices it is possible that ∥Aκ∥ 1/κ is much smaller than ∥A∥, and indeed this is likely when overscaling occurs in existing algorithms. The terms ∥Aκ∥ 1/κ are estimated without computing powers of A by using a matrix 1-norm estimator in conjunction with a bound of the form ∥Aκ∥ 1/κ ≤ max(∥Aρ∥ 1/ρ, ∥Aq∥) 1/q∥ that holds for certain fixed p and q less than κ. The improvements to the truncation error bounds have to be balanced by the potential for a large ∥A∥ to cause inaccurate evaluation of rm in floating point arithmetic. We employ rigorous error bounds along with some heuristics to ensure that rounding errors are kept under control. Our numerical experiments show that the new algorithm generally provides accuracy at least as good as the existing algorithm of Higham at no higher cost, while for matrices that are triangular or cause overscaling it usually yields significant improvements in accuracy, cost, or both.
Article
Several existing algorithms for computing the matrix cosine employ polynomial or rational approximations combined with scaling and use of a double angle formula. Their derivations are based on forward error bounds. We derive new algorithms for computing the matrix cosine, the matrix sine, and both simultaneously that are backward stable in exact arithmetic and behave in a forward stable manner in floating point arithmetic. Our new algorithms employ both Pade approximants of sin x and new rational approximants to cos x and sin x obtained from Pade approximants to e(x). The amount of scaling and the degree of the approximants are chosen to minimize the computational cost subject to backward stability in exact arithmetic. Numerical experiments show that the new algorithms have backward and forward errors that rival or surpass those of existing algorithms and are particularly favorable for triangular matrices.
Article
A thorough and elegant treatment of the theory of matrix functions and numerical methods for computing them, including an overview of applications, new and unpublished research results, and improved algorithms. Key features include a detailed treatment of the matrix sign function and matrix roots; a development of the theory of conditioning and properties of the Frechet derivative; Schur decomposition; block Parlett recurrence; a thorough analysis of the accuracy, stability, and computational cost of numerical methods; general results on convergence and stability of matrix iterations; and a chapter devoted to the f(A)b problem. Ideal for advanced courses and for self-study, its broad content, references and appendix also make this book a convenient general reference. Contains an extensive collection of problems with solutions and MATLAB implementations of key algorithms.
Article
Contenido: 1) Problemas de multiplicación de matrices; 2) Análisis de matrices; 3) Sistemas generales lineales; 4) Sistemas especiales lineales; 5) Ortogonalización de mínimos cuadrados; 6) Cómputo de matrices paralelas; 7) Problema de valor propio no simétrico; 8) Problema de valor simétrico; 9) Método de Lanczos; 10) Métodos iterativos para sistemas lineales; 11) Funciones de matrices; 12) Cuestiones especiales.
Article
This paper presents an efficient method for computing approximations for general matrix functions based on mixed rational and polynomial approximations. A method to obtain this kind of approximation from rational approximations is given, reaching the highest efficiency when transforming nondiagonal rational approximations with a higher numerator degree than the denominator degree. Then, the proposed mixed rational and polynomial approximation can be successfully applied for matrix functions which have any type of rational approximation, such as Padé, Chebyshev, etc., with maximum efficiency for higher numerator degrees than the denominator degrees. The efficiency of the mixed rational and polynomial approximation is compared with the best existing evaluating schemes for general polynomial and rational approximations, providing greater theoretical accuracy with the same cost in terms of matrix multiplications. It is well known that diagonal rational approximants are generally more accurate than the corresponding nondiagonal rational approximants which have the same computational cost. Using the proposed mixed approximation we show that the above statement is no longer true, and nondiagonal rational approximants are in fact generally more accurate than the corresponding diagonal rational approximants with the same cost.