ArticlePDF Available

Abstract and Figures

In this work we introduce a new method to compute the matrix cosine. It is based on recent new matrix polynomial evaluation methods for the Taylor approximation and a mixed forward and backward error analysis. The matrix polynomial evaluation methods allow to evaluate the Taylor polynomial approximation of the matrix cosine function more efficiently than using Paterson-Stockmeyer method. A sequential Matlab implementation of the new algorithm is provided, giving better efficiency and accuracy than state-of-the-art algorithms. Moreover, we provide an implementation in Matlab that can use NVIDIA GPUs easily and efficiently.
Content may be subject to copyright.
Fast Taylor polynomial evaluation for the computation
of the matrix cosine
Jorge Sastre1
Instituto de Telecomunicaciones y Aplicaciones Multimedia
Javier Ib´nez1
Instituto de Instrumentaci´on para Imagen Molecular
Pedro Alonso-Jord´a1,
Department of Information Systems and Computation
Jes´us Peinado1
Department of Information Systems and Computation
Emilio Defez1
Instituto de Matem´atica Multidisciplinar
Abstract
In this work we introduce a new method to compute the matrix cosine. It
is based on recent new matrix polynomial evaluation methods for the Taylor
approximation and a mixed forward and backward error analysis. The matrix
polynomial evaluation methods allow to evaluate the Taylor polynomial ap-
proximation of the matrix cosine function more efficiently than using Paterson-
Stockmeyer method. A sequential Matlab implementation of the new algo-
rithm is provided, giving better efficiency and accuracy than state-of-the-art
algorithms. Moreover, we provide an implementation in Matlab that can use
NVIDIA GPUs easily and efficiently.
Keywords: matrix, matrix trigonometric functions, matrix cosine, Taylor, fast
matrix polynomial evaluation, GPU computing
Corresponding author
Email address: palonso@upv.es (Pedro Alonso-Jord´a)
1All authors belong to Universitat Polit`ecnica de Val`encia.
Preprint submitted to Journal of Computational and Applied MathematicsSeptember 4, 2018
1. Introduction
The exact solution of many engineering processes that are described by sec-
ond order differential equations is given in terms of the trigonometric matrix
functions sine and/or cosine. This is the case, for instance, of the wave prob-
lem. The most popular state-of-the-art algorithms used to calculate these ma-
trix functions are based on polynomial and rational approximations with scaling
and recovering techniques [1, 2, 3, 4, 5]. Paterson-Stockmeyer’s method [6] is
the most used to compute the matrix polynomials that appear in these ap-
proximations in order to reduce computational costs. Recently, a new family of
methods for the evaluation of general matrix polynomials has been proposed [7],
allowing to reduce the number of matrix products needed to evaluate a poly-
nomial with respect to the Paterson–Stockmeyer’s method. In this work, we
present competitive algorithms for the computation of the matrix cosine based
on the evaluation of Taylor approximations using those methods. Sequential
and NVIDIA GPU based Matlab implementations of the new algorithms are
given. The basic computational kernel of algorithms based on Taylor approxi-
mations is matrix multiplication. This kernel can be executed very rapidly on
accelerator devices like GPUs (Graphic Processing Units). In this paper we have
exploited this fact, together with our previous experience on this subject [8], to
build a Matlab script plus a mex file capable of executing the new algorithm
very efficiently.
The next section presents a scaling and squaring Taylor algorithm for com-
puting the matrix cosine based on the methods described in [7]. Section 3
describes a forward and backward error analysis for computing the Taylor ap-
proximation using our algorithm. Finally, in Section 4, we show some numerical
results of the Matlab sequential and GPU implementations from both the per-
formance and accuracy points of view. Finally, Section 5 gives some conclusions.
2. Taylor algorithm for computing the matrix cosine
The matrix cosine can be defined for all ACn×nby the series
cos(A) =
X
i=0
(1)iA2i
(2i)! .(1)
Let
T2m(A) =
m
X
i=0
piA2i=
m
X
i=0
piBiPm(B),(2)
be the Taylor approximation of order 2mof cos(A), where pi=(1)i
(2i)! and
B=A2. Algorithm 1 shows a general algorithm for computing the matrix
cosine based on Taylor series. Since (2) is accurate near the origin, for computing
cos(A) from the Taylor approximation it is often necessary to scale matrix B
and recover cos(A) from the Taylor approximation of the cosine of the scaled
matrix. Scaling matrix Bby an integer s > 0 consists of computing B:= 4sB
2
Algorithm 1 Given a matrix ACn×nand a maximum order mMN,
this algorithm computes C= cos(A) by a Taylor approximation of order 2mk
2mM, where mkare optimal degrees of the Taylor polynomial, i.e. the maximum
degrees of the Taylor polynomial which can be evaluated for a certain number
of matrix products.
1: B=A2
2: Scaling phase: Choose mk6mMand an integer scaling parameter sfor
the Taylor approximation with scaling.
3: B=B/4s
4: Compute C=Pmk(B)
5: for i=1:sdo
6: C= 2C2I
7: end for
(Step 3). Once the cosine of the scaled matrix has been computed, cos(A)
can be computed (Steps 5–7) by using repeatedly the double angle formula
cos(2X) = 2 cos2(X)I[9, p. 288].
Matrix Acan be preprocessed to reduce its norm as described in [10, Alg. 1.2]
and this procedure will not be discussed in this paper. Step 4 was traditionally
performed by using the Paterson-Stockmeyer’s method [6], however, in this pa-
per we use a more efficient method for evaluating C=Pmk(B) based on [7].
This method depends on the value of mkselected in Step 2 of Algorithm 1 (from
now on we will use minstead of mkfor simplicity). Below we analyse each case.
For m= 1, 2 and 4, similarly to (10) from [11], the Taylor polynomials
Pm(B) can be computed by using the following expressions:
P1(B) = B/2 + I , (3)
P2(B)=(B2/12 B)/2 + I,
P4(B) = ((((B2/56 B)/30 + I)B2/12 B)/2 + I.
Following [7, Ex. 3.1], P8(B) can be evaluated by using the following formulae:
y02(B) = B2(c1B2+c2B),(4)
P8(B)=(y02 (B) + c3B2+c4B)(y02(B) + c5B2)
+c6y02(B) + B2/24 B/2 + I,
with a cost of 3 matrix products. With that cost the maximum approximation
order available with Paterson–Stockmeyer is m= 6. The coefficients cifor IEEE
double precision arithmetic are given in Table 1, see [7, Table 4].
In the following we show that the most efficient methods proposed in [7]
to evaluate the Taylor polynomial for m > 8 are not accurate enough for the
matrix cosine approximation. Therefore, other possibilities are proposed for
increasing accuracy, in exchange for a higher cost. Despite this higher cost,
the proposed matrix polynomial evaluation methods are more efficient than
Paterson-Stockmeyer method.
3
Table 1: Coefficients for computing the matrix exponential Taylor approximation.
m= 8 m= 12 m= 15
c12.186201576339059 ×1071.269542268337734 ×1012 6.140022498994532 ×1017
c22.623441891606870 ×1053.503936660612145 ×1010 2.670909787062621 ×1014
c36.257028774393310 ×1031.135275478038335 ×1071.438284920333222 ×1011
c44.923675742167775 ×1012.027712316612395 ×1051.050202496489896 ×108
c51.441694411274536 ×1041.647243380001247 ×1034.215975785860907 ×106
c65.023570505224926 ×1016.469859264308602 ×1011.238347173261210 ×103
c74.008589447357360 ×1053.234597615453410 ×109
c89.187724869020796 ×1039.292820886910254 ×107
c91.432942184841715 ×1022.466381973203188 ×101
c10 4.555439797286385 ×1039.369018510939971 ×1010
Following [12, Sec. 3.2], and similarly to [7, Ex. 5.1], with a cost of 4 matrix
products it is possible to obtain a Taylor based approximation P16 (B) of the
matrix cosine of order m= 15, with several real solutions for the coefficients
involved. However, for all the real solutions rounded in IEEE double precision
arithmetic, the stability check proposed in [7, Ex. 3.1] gives errors of order
1014 > u or greater, where uis the unit roundoff in IEEE double precision
arithmetic, i.e. u= 253 1.1116. We have checked that these evaluation
formulae provided a reduced accuracy results in numerical tests.
Then, for a cost of 4 matrix products we will use (34) and (35) from [7] to
evaluate P12(B) by using the following formulae:
y02(B) = B3(c1B3+c2B2+c3B),(5)
P12(B)=(y02(B) + c4B3+c5B2+c6B)(y02 (B) + c7B3+c8B2)
c9y02 +c10B3+B2/24 B/2 + I,
where the coefficients ciare given in Table 1. From the different real solutions for
the coefficients, we selected those ones giving the lowest maximum error in the
stability test, similarly to [7, Ex 3.1], giving errors lower than 1.31 ×1016 =
O(u). Equation (5) provides a lower order than 15, but behaves in a stable
manner being, in turn, more efficient than Paterson–Stockmeyer method, since
with a cost of 4 matrix products the maximum approximation order available
with Paterson–Stockmeyer is m= 9.
The highest order mused in [5] for Pm(B) is m= 16, available with 6 matrix
products using Paterson–Stockmeyer method. Using (34) and (35) from [7] it is
possible to evaluate P16(B) with 5 matrix products and several possibilities of
real coefficients. The stability check proposed in [7, Ex. 3.1] gives a maximum
error of 1.03 ×1015 > u, and we checked that numerical results were not
accurate enough. The stability can be improved using expression (52) from [7],
with s= 3 and p= 3, giving the following formulae for m= 15:
y02(B) = B3(c1B3+c2B2+c3B),(6)
P15(B) = ((y02 (B) + c4B3+c5B2+c6B)(y02(B) + c7B3+c8B2) + c9y02
+c10B3+B2/368800 B/40320 + I/720)B3+B2/24 B/2 + I .
4
The stability check for the coefficients cigiven in Table 1, selected among all the
possible real solutions of the coefficients, gives a maximum error of order u, in
an exchange for a lower order m= 15 <16. The minus sign at the beginning of
expression P15(B) allows to obtain real solutions for all the coefficients involved,
as suggested in [7, p. 237]. With a cost of 5 matrix products the maximum
approximation order available with Paterson–Stockmeyer is m= 12.
All coefficients cithat appear in expressions (4)–(6) were computed with the
MATLAB R2018a Symbolic Math Toolbox, using 200 decimal digit arithmetic.
Table 1 shows these values in IEEE double precision arithmetic.
3. Error analysis
In [2] an absolute forward error analysis of the Taylor approximation for the
matrix cosine was developed. in [5] a combination of a relative forward and
backward error analysis was developed for the same function. In this section
we present a unified study of the error analysis for the computation of that
matrix function, selecting the analysis among the three types of analysis giving
the most efficient option for each degree mof the cosine Taylor approximation.
The following theorem is used in this study:
Theorem 1 ([2]). Let hl(x) = P
il
pixibe a power series with radius of conver-
gence w,˜
hl(x) = P
il
|pi|xi,BCn×nwith ρ(B)< w,lNand tNwith
16t6l. If t0is the multiple of tsuch that l6t06l+t1and
βt= max{d1/j
j:j=t, l, l + 1, . . . , t01, t0+ 1, t0+ 2, . . . , l +t1},
where djis an upper bound for ||Bj||,dj>||Bj||, then
||hl(B)|| 6˜
hl(βt).
If we apply Theorem 1 for t=l, then ||hl(B)|| 6˜
hl(βl), where
βl= max{d1/j
j:j=l, l + 1, . . . , 2l1}.(7)
In [13, Sec. 4.1] the authors approximated βmin = min{β(l)
t,1tm+ 1}by
βmin max{d1/(l+1)
l+1 , d1/(l+2)
l+2 },(8)
corresponding to the two first terms of (7).
3.1. Absolute and relative forward errors
Let ACn×nand B=A2. Using (1) the absolute forward error in exact
arithmetic for the Taylor approximation (2) of cos(A), denoted by E(1)
f, is,
E(1)
f=kcos(A)Pm(B)k=
X
im+1
f(1)
m,iBi
,(9)
5
where f(1)
m,i = (1)i/(2i)!. This error analysis is used in [2, Sec. 4].
If kBk=kA2k<acosh2(2) 1.7343, then cos1(A) exists [5, Proposi-
tion 1] and it follows that the relative forward error to compute cos(A) in exact
arithmetic, denoted by E(2)
f, is [5, Sec. 2.1]
E(2)
f=
cos1(A) (cos(A)Pm(B))
=
IPm(B) cos1(A)
=
X
im+1
f(2)
m,iBi
.(10)
If we define g(k)
m+1(x) = P
i>m+1
f(k)
m,ixiand ˜g(k)
m+1(x) = P
im+1
f(k)
m,i
xi,k=
1,2, and we apply Theorem 1, then
E(k)
f=
g(k)
m+1(B)
˜g(k)
m+1(βt),(11)
for every t, 1 tm+ 1.
3.2. Relative backward error
The backward error ∆Aof approximating cos(A) by Taylor approximation
T2m(A) verifies
T2m(A) = cos(A+ ∆A).
From [5, Sec. 2.2] the backward error ∆Acan be expressed by
A=X
im
bm,iA2i+1 ,
where coefficients bm,i can be computed symbolically, see (8)-(11) of [5, Sec. 2.2]
for details. Then, the relative backward error (Eb) in exact arithmetic of ap-
proximating cos(A) by T2m(A) can be computed as
Eb=kAk
kAk=
P
im
bm,iA2i+1
kAk
X
im
bm,iA2i
=
X
im
bm,iBi
.
If we define hm(x) = P
i>m
bm,ixiand ˜
hm(x) = P
im
|bm,i|xi, and we apply
Theorem 1, then
Eb=khm(B)k ≤ ˜
hm(βt),(12)
for every t, 1 tm. In [5] it was used an error analysis that is a combination
of the relative forward and backward error analysis in exact arithmetic.
6
Table 2: Values of Θf1(m), Θf2(m), Θb(m), and Θ(m) for m= 8,12,15.
m= 8 m= 12 m= 15
Θf10.96 6.59 16.45
Θf20.91 − −
Θb0.94 6.75 9.91
Θ 0.9625107544271462 6.752349007371135 16.45123831556254
3.3. Computation of Taylor order mand the scaling parameter s
Let Θfk(m), k= 1,2, be
Θfk(m) = max
θ0 : ˜g(k)
m+1(θ) = X
im+1
f(k)
m,i
θiu
,(13)
and let Θb(m) be
Θb(m) = max
θ0 : ˜
hm(θ) = X
im
b(k)
m,i
θiu
,(14)
where u= 253 is the unit roundoff in double precision floating-point arithmetic.
We have used MATLAB Symbolic Math Toolbox to compute Θfk(m), k= 1,2,
and Θb(m) for m= 8,12,15 in 250-decimal digit arithmetic, considering enough
terms to obtain all the Θ values for each mwith enough significant digits, and
obtaining the coefficients symbolically. Note that Θb(15) needs more than 1500
terms to obtain three significant digits, similarly to what happens with Θb(16)
and Θb(20) in [5, Sec. 2.2]. Then, a numerical zero-finder is invoked to determine
the highest values Θfk(m), k= 1,2, and Θb(m), such that
˜g(k)
m+1fk(m)) = X
im+1
f(k)
m,i
Θi
fk(m)u,
and ˜
hmb(m)) = X
im
e(k)
m,i
Θi
b(m)u,
holds. The values of Θfk(m), k= 1,2, and Θb(m) for m= 8,12,15 are de-
picted in Table 2. For m={1,2,4}[5, Sec. 2.2] shows that Θf2(m)>Θb(m),
and comparing tables [2, Table 2] and [5, Table 1] one gets that Θf1(m)'
Θf2(m). This is a normal behavior since, for those values of m, it follows
that cos(pΘf1(m)1 and, in that case, the forward absolute error bound (9)
and the forward relative error bound (10) are approximately equal. Analo-
gously to [5, Sec. 2], and to minimize the computational cost, we are select-
ing Θ(m) = max {Θf1(m),Θf2(m),Θb(m)}for each mk={1,2,4,8,12,15},
k= 1,2,...,6, i.e. Θf1(m) for m={1,2,4,8,15}and Θb(m) for m= 12.
Then, considering (11) and taking into account the values of Θmfrom Table 2,
7
it follows that, for m={1,2,4,8,15}, if βtΘf1(m) the absolute forward error
is lower than or equal to the unit roundoff:
E(1)
f˜g(1)
m(βt)˜g(1)
mf1(m)) u,
and using (8) one gets
β(m)
m+1 max{b1/(m+1)
m+1 , b1/(m+2)
m+2 },(15)
where the superscript (m) stands for the Taylor approximation order used (see
(15) from [5]). For m= 12 and considering (12), if β(m)
min Θb(m), then relative
backward error is lower or equal than the unit roundoff:
Eb˜
hm(βt)˜
hmb(m)) u,
and using (8) one gets
β(m)
min max{b1/(m)
m, b1/(m+1)
m+1 },(16)
where the superscript (m) also stands for the Taylor approximation order used
(see (16) from [5]).
We provide here a scaling algorithm without norm estimations of matrix
powers and another one with norm estimations of matrix powers similar to
those algorithms developed in [5, Sec. 2.4]: If there exists a value m15 such
that β(mk)
min Θ(m), then one of the above conditions is verified and, in this
case, we choose the lower order mkverifying it with a scaling parameter s= 0.
Otherwise, we choose the Taylor approximation of order 12 or 15 providing the
lower cost, with
s= max 0,1
2log βmk
min
Θ(mk), mk= 12 or 15.
Note that Θ8<Θ12/4 and Θ12 >Θ15 /4, and then only mk= 12 and 15 are
efficient orders for scaling.
The algorithm without estimation of norms of matrix powers uses bounds
of matrix powers based on products of matrix powers previously computed and
is analogous to Algorithm 2 from [5]: For mk8, only Band B2are available
and then, using Theorem 1 and (15), we take
β(mk)
min =β2= max kB2k
mk
2kBk1
mk+1 ,kB2k
mk+2
21
mk+2
=kB2k
mk
2kBk1
mk+1 .
For mk= 12 and 15, B3is also available and then, using Theorem 1, we take
β(mk)
min = min{β2, β3}. Functions ms selectNoNormEst and beta NoNormEst
from http://personales.upv.es/jorsasma/software/cosmpol.m are MAT-
LAB implementations of the scaling algorithm with no estimation of norms of
matrix powers, and for the computation of β2and β3, respectively.
8
The algorithm with estimation of norms of matrix powers uses the estima-
tion of two matrix powers taking into account the simplifications (15) and (16),
where values dkare computed approximately by the block 1–norm estimation
algorithm of [14], and is also analogous to that on [5]: It reduces the num-
ber of estimations combining estimations of the values dkbased on products
of norms of matrix powers previously computed or estimated, see function
ms selectNormEst from cosmpol.m.
4. Numerical experiments
In this section, we compare the new MATLAB function developed in this
paper, cosmpol, with another two functions:
cosm: Code based on the Pad´e rational approximation for the matrix
cosine [3]. The MATLAB function cosm has an argument which allows us
to compute cos(A) by means of just Pad´e approximants, or also using the
real Schur decomposition and the complex Schur decomposition. In these
tests we did not use the Schur decomposition since, in the tests carried
out in [5] it was shown that, using the Schur decomposition in Taylor
algorithms from [5] provides higher efficiency than Pad´e method from [3]
with the Schur decomposition with similar accuracy. The MATLAB code
can be found in: http://github.com/sdrelton/cosm_sinm.
cosmtay: Code based on the Taylor series for the matrix cosine [5]. This
code allows us to use or not norm estimation. In this paper, we did not
use norm estimation. The MATLAB code of this algorithm can be found
in: http://personales.upv.es/jorsasma/software/cosmtay.m.
cosmpol: This is the new code presented in this paper for computing the
matrix cosine. This code is also able to use or not norm estimation but,
as with cosmtay, we have not used norm estimation here.
4.1. Numerical tests
To test and compare the accuracy of the three algorithms we define the
following tests:
Test 1: One hundred of diagonalizable 128 ×128 real matrices with a
1-norm varying from 2.32 to 220.04. These matrices have the form A=
VTDV , where Dis diagonal with real and complex eigenvalues, and V
is an orthogonal matrix obtained as V=H/16, being Ha Hadamard
matrix, i.e. a square matrix whose entries are either +1 or -1 and whose
rows are mutually orthogonal, being H1=HT, where HTis the matrix
Htransposed.
Test 2: One hundred of non diagonalizable 128 ×128 real matrices whose
1-norms vary from 6.5 to 249.5. These matrices have the form A=VTJ V ,
9
where Jis a Jordan matrix with real and complex eigenvalues. The alge-
braic multiplicity of the eigenvalues vary between 1 and 3. Matrix Vis an
orthogonal matrix obtained as V=H/16, being Ha Hadamard matrix.
Test 3: Fifteen matrices with dimensions lower than or equal to 128 from
the Eigtool MATLAB package [15] and forty four 128 ×128 real matrices
from the function matrix of the Matrix Computation Toolbox [16]. Those
matrices whose condition number cannot be calculated have dropped from
the test. In addition, we have scaled some matrices of this test so that
their 1-norm is lower or equal to 1024 and their matrix cosine can be
calculated with the compared functions.
The “exact” matrix cosine is computed as cos(A) = VTcos(D)V, for ma-
trices of Test 1, and cos(A) = VTcos(J)V, for matrices of Test 2 (see [9, pp.
10]), by using the MATLAB’s Symbolic Math Toolbox with 256 decimal digit
arithmetic in all the computations. Following [4, Sec. 4.1], for the other matrices
we used MATLAB symbolic versions of a scaled Pad´e rational approximation
from [3] and a scaled Taylor Paterson-Stockmeyer approximation [5, pp. 67],
both with 4096 decimal digit arithmetic and several orders mand/or scaling
parameters shigher than the ones used by cosm and cosmtay, respectively,
checking that their relative difference was small enough. The algorithm accu-
racy was tested by computing the relative error
E = kcos(A)˜
Yk1
kcos(A)k1
,
where ˜
Yis the computed solution and cos(A) is the exact solution.
To compute the condition number of a matrix cosine function we have used
the MATLAB function funm condest1, which estimates the condition number
for the matrix 1-norm.
Table 3 shows the computational costs. In this table, the computational
cost of each algorithm has been calculated by counting the number of matrix
products (M) of each code, since the cost of the rest of operations is negligible
compared to matrix products for big enough matrices. The cost of the resolu-
tion of linear systems that appears in the code based on Pad´e approximations
has been calculated as 4/3 products because, from a computational point of
view, the cost of that operation compared to the cost of a matrix product is
approximately equal to 4/3 (see Table C.1 from [9, pp. 336]). According to the
figures in this table, cosmpol is clearly faster than the other two routines.
To compare the relative errors we can see Table 4. This table shows the
percentage of cases in which the relative errors of cosmpol are lower than the
relative errors of MATLAB codes cosm and cosmtay.
We have plotted in Figures 1, 2, and 3 the Normwise relative errors (a), the
Performance Profiles (b), and the ratio of relative errors (c) to show if these
ratios are significant:
E(cosmpol)/E(cosm), E(cosmtay)/E(cosm),
10
Table 3: Matrix products (M) for the three tests using MATLAB functions cosmpol,cosmtay,
and cosm. The values shown in columns cosmtay and cosm are the percentage of extra products
carried out by these routines w.r.t. cosmpol.
M(cosmpol)M(cosmtay)M(cosm)
Test 1 854 11.00% 32.20%
Test 2 871 10.67% 31.57%
Test 3 511 9.20% 31.70%
Table 4: Relative error comparison.
Test 1 Test 2 Test 3
E(cosmpol)< E(cosm) 97% 97% 71.27%
E(cosmpol)< E(cosmtay) 60% 55% 50.85%
and the ratio of the matrix products (d):
M(cosmpol)/M(cosm), M (cosmtay)/M (cosm),
for the three tests, respectively. In the performance profile, the αcoordinate
varies between 1 and 5 in steps equal to 0.1, and the pcoordinate is the prob-
ability that the considered algorithm has a relative error lower than or equal
to α-times the smallest error over all methods. The ratios of relative errors are
presented in decreasing order with respect to E(cosmpol)/E(cosm). The solid
lines in figures 1a, 2a and 3a represent function kcosu, where kcos is the condition
number of the matrix cosine function [9, Chap. 3]. Value u= 253 represents
the unit roundoff in double precision floating-point arithmetic.
At the light of the results shown by the tables and figures we can make the
following analysis:
Regarding numerical stability we present figures showing the normwise
relative error: Figure 1a shows that all the functions behave in a numerical
stable way in Test 1. Figure 2a shows that in Test 2 Taylor functions are
more stable numerically than the Pad´e function cosm. Figure 3a shows
that all the three functions have a similar numerical stability in Test 3.
Only in one matrix of this test all the three functions present certain
numerical instability, with a relative error more than 108higher than the
solid line (see Subfigure 3a).
The functions based on polynomial approximations are more accurate than
the one based on Pad´e approximants, being the new function cosmpol
slightly more accurate than our former cosmtay function. The perfor-
mance profile (Figures 1b, 2b, and 3b) show that the graph of cosmpol is
above the graphs of the other two functions demonstrating that, in gen-
eral, it is the most accurate. This is also shown by Table 4 where we see
that function cosmpol has a lower relative error, between 71.27%97% of
11
0 20 40 60 80 100
Matrix
10-15
10-14
10-13
Er
cond*u
cosmpol
cosmtay
cosm
(a) Normwise relative errors.
1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
cosmpol
cosmtay
cosm
(b) Performance profile.
0 20 40 60 80 100
Matrix
100
Er
E(cosmpol)/E(cosm)
E(cosmtay)/E(cosm)
(c) Ratio of relative errors.
0 20 40 60 80 100
Matrix
0.65
0.7
0.75
0.8
0.85
0.9
Er
M(cosmpol)/M(cosm)
M(cosmtay)/M(cosm)
(d) Ratio of matrix products.
Figure 1: Experimental results for Test 1.
the matrices than cosm, and between 50.85%60% of the matrices, when
it is compared with cosmtay.
Regarding the computational costs, Table 3 shows that function cosmpol
has a lower computational cost than the other two functions. This is
also confirmed by Subfigures 1d, 2d, and 3d, which show that the ra-
tios of matrix products computed for functions cosmpol and cosm, i.e.
M(cosmpol)/M(cosm) are lower than 1 for all the test matrices, and lower
in almost all cases than the ratio of cosmtay and cosm, i.e. M(cosmtay)/M(cosm).
4.2. The accelerated algorithm
We have implemented an “accelerated” version of Algorithm 1 that can use
one NVIDIA GPU. The accelerated version of the algorithm has been developed
with the aim of being efficient and easy to use, for which we implemented a
MATLAB mex file.
We use languages CUDA and C++ to implement the mex file. This code
is to accelerate those parts of the original Matlab function that have a high
12
0 20 40 60 80 100
Matrix
10-16
10-15
10-14
10-13
Er
cond*u
cosmpol
cosmtay
cosm
(a) Normwise relative errors.
1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
cosmpol
cosmtay
cosm
(b) Performance Profile.
0 20 40 60 80 100
Matrix
100
Er
E(cosmpol)/E(cosm)
E(cosmtay)/E(cosm)
(c) Ratio of relative errors.
0 20 40 60 80 100
Matrix
0.65
0.7
0.75
0.8
0.85
0.9
Er
M(cosmpol)/M(cosm)
M(cosmtay)/M(cosm)
(d) Ratio of matrix products.
Figure 2: Experimental results for Test 2.
computational cost, i.e. matrix multiplications. In this work we have taken
the mex file developed in [8] to modify and adapt it to cosmpol using the new
method for selecting the degree mand scaling parameter sfrom Section 3,
corresponding to Step 2 of Algorithm 1, and the new methods for evaluating
the Taylor matrix polynomial approximations for the matrix cosine from Section
2, corresponding to Step 4 of Algorithm 1. The mex function is unique but
allows to perform the different operations required by the algorithm. This way
data (matrices) are kept in the device (GPU) memory between consecutive
calls to the mex function. The GPU is mainly in charge of executing matrix
multiplications but also performs low cost operations, e.g. calculation of the
1-norm of a matrix, to avoid transmitting data between CPU and GPU only
to perform these operations. Other low cost operations are carried out into
the host CPU. The Matlab mex function, called call_gpu, executes different
operations (init,power,scale, . . . ) depending on the arguments with which
it is called [8]. The only operation that has been change in this paper is eval,
which evaluates a matrix polynomial. Now, this operation implements Eqs. 3-6
using coefficients of Table 1.
With this implementation of Algorithm 1, the Matlab script can be executed
13
0 10 20 30 40 50
Matrix
10-15
10-10
10-5
Er
cond*u
cosmpol
cosmtay
cosm
(a) Normwise relative error.
1 1.5 2 2.5 3 3.5 4 4.5 5
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
cosmpol
cosmtay
cosm
(b) Performance Profile.
0 10 20 30 40 50 60
Matrix
10-2
10-1
100
101
102
Er
E(cosmpol)/E(cosm)
E(cosmtay)/E(cosm)
(c) Ratio of relative errors.
0 10 20 30 40 50 60
Matrix
0.4
0.5
0.6
0.7
0.8
0.9
Er
M(cosmpol)/M(cosm)
M(cosmtay)/M(cosm)
(d) Ratio of matrix products.
Figure 3: Experimental results for Test 3.
in both CPU or GPU. Table 5 shows the execution time (in sec.) obtained
in both devices for randomly generated matrices. To obtain the CPU time
we used two processors with 12 cores each (model Intel Xeon CPU E5-2697
v2 @2.70GHz). Thus, the matrix multiplication used by Matlab exploits the
24 cores available in our host. The GPU time was obtained on an NVIDIA
Tesla K20Xm, a high performance device that features 2688 CUDA cores. We
observe that our new algorithm cosmpol is faster than cosmtay in both devices,
a reduction in time due to the saving in matrix products. The reduction in time
from the CPU to the GPU is not so high in cosmpol than in cosmtay but is
still important since the algorithm is also supported on matrix multiplication,
a very optimized operation included into the library CUBLAS [17] for NVIDIA
GPUs.
5. Conclusions
In this paper we have introduced a new method to compute the matrix
cosine function. This method is based on the Taylor approximation of the cosine
function using matrix polynomial evaluation methods from [7] and an improved
14
Table 5: Execution time (sec.) of the algorithm in CPU and the accelerated version in GPU.
cosmtay cosmpol
nCPU GPU CPU GPU
1000 0.21 0.19 0.17 0.16
1500 0.54 0.37 0.52 0.31
2000 1.05 0.56 0.77 0.48
2500 1.98 0.93 1.83 0.81
3000 3.36 1.40 3.26 1.23
3500 5.12 1.97 4.71 1.78
4000 7.19 2.69 5.83 2.46
4500 8.30 3.61 8.07 3.32
5000 10.86 4.77 10.13 4.36
5500 15.55 6.13 15.02 5.62
6000 26.01 7.91 21.83 7.33
version of the scaling algorithm from [5]. From the different real solutions of the
coefficients of the formulae from [7], the coefficients selected in this paper give
the lowest maximum error in the stability check from [7], providing a maximum
order of approximation mM= 15, i.e. a maximum order of approximation of
the cosine Taylor series equal to 30, and excellent accuracy results in numerical
tests.
A MATLAB implementation, (cosmpol) based on that method has been
developed and compared with other state-of-the-art algorithms based on Taylor
(cosmtay) approximations using Paterson–Stockmeyer method for evaluating
the Taylor matrix polynomial approximations, and Pad´e (cosm) approximants.
Numerical experiments show that, in general, cosmpol has a computational cost
in terms of matrix products lower than the other functions cosmtay and cosm;
also, cosmpol has a higher accuracy in the majority of tests than the other
codes with a similar numerical stability.
Finally, we note that all the above discussion for the fast computation of
the matrix cosine is applicable to the computation of the matrix sine since
sin(A) = cos(Aπ/2I).
Acknowledgements
This work has been partially supported by Spanish Ministerio de Econom´ıa
y Competitividad and European Regional Development Fund (ERDF) grants
TIN2014-59294-P, and TIN2017-89314-P.
References
[1] E. Defez, J. Sastre, J. J. Ib´a˜nez, P. A. Ruiz, Computing matrix functions
arising in engineering models with orthogonal matrix polynomials, Math.
Comput. Model. 57 (7-8) (2013) 1738–1743.
15
[2] J. Sastre, J. Ib´a˜nez, P. Ruiz, E. Defez, Efficient computation of the matrix
cosine, Appl. Math. Comput. 219 (2013) 7575–7585.
[3] A. H. Al-Mohy, N. J. Higham, S. D. Relton, New algorithms for comput-
ing the matrix sine and cosine separately or simultaneously, SIAM J. Sci.
Comput. 37 (1) (2015) A456–A487.
[4] P. Alonso, J. Ib´nez, J. Sastre, J. Peinado, E. Defez, Efficient and accurate
algorithms for computing matrix trigonometric functions, J. Comput. Appl.
Math. 309 (2017) 325–332.
[5] J. Sastre, J. Ib´a˜nez, P. Alonso, J. Peinado, E. Defez, Two algorithms for
computing the matrix cosine functions, Appl. Math. Comput. 312 (2017)
66–77.
[6] M. S. Paterson, L. J. Stockmeyer, On the number of nonscalar multipli-
cations necessary to evaluate polynomials, SIAM J. Comput. 2 (1) (1973)
60–66.
[7] J. Sastre, Efficient evaluation of matrix polynomials, Linear Alg. Appl. 539
(2018) 229–250.
[8] P. Alonso, J. Peinado, J. Ib´nez, J. Sastre, E. Defez, Computing matrix
trigonometric functions with GPUs through Matlab, The Journal of Super-
computing (2018) Online.
[9] N. J. Higham, Functions of Matrices: Theory and Computation, SIAM,
Philadelphia, PA, USA, 2008.
[10] G. I. Hargreaves, N. J. Higham, Efficient algorithms for the matrix cosine
and sine, Numer. Algorithms 40 (2005) 383–400.
[11] J. Sastre, J. J. Ib´nez, E. Defez, P. A. Ruiz, Accurate matrix exponen-
tial computation to solve coupled differential models in engineering, Math.
Comput. Model. 54 (2011) 1835–1840.
[12] J. Sastre, J. J. Ib´a˜nez, E. Defez, Boosting the computation of the matrix
exponential, Appl. Math. Comput., in press.
[13] P. Ruiz, J. Sastre, J. Ib´nez, E. Defez, High perfomance computing of the
matrix exponential, J. Comput. Appl. Math. 291 (2016) 370–379.
[14] N. J. Higham, Fortran codes for estimating the one-norm of a real or com-
plex matrix, with applications to condition estimation, ACM Trans. Math.
Softw. 14 (4) (1988) 381–396.
[15] T. G. Wright, Eigtool, version 2.1 (2009).
URL web.comlab.ox.ac.uk/pseudospectra/eigtool.
[16] N. J. Higham, The Test Matrix Toolbox for MATLAB, Numerical Analysis
Report No. 237, Manchester, England (Dec. 1993).
16
[17] NVIDIA, CUDA Toolkit. cuBLAS library, docs.nvidia.com/cuda/cublas,
v9.2.148 Edition, Last accessed July, 2018 (July 2018).
17
... In the last years, the backward error analysis has become a fundamental tool in the development of algorithms based on Padé [1,2,3,4,5], Taylor [6,7,8,9,10,11] and other approximations [12] for computing the matrix exponential, logarithm, sine, cosine and other matrix functions. In many of the previous references the backward error analysis of the approximations is based on using the compositional inverse function of f , denoted by (f ) −1 , where the backward error of a certain approximation p(x) of function f (x) is defined by a quantity ∆x such that ...
... In Example 3.5 we apply the new method to the function cos( √ A) which appears in the resolution of second order differential equations. For that function we show that the method from [8] cannot be used, and we used the new method to compute cos( √ A) for a test matrix, using the new matrix polynomial evaluation formulas from [17,10]. We show that using this method, the Taylor approximation can be evaluated more efficiently than Padé approximation. ...
... The function backward error is intended for the computation of fixed maximum bounds of the backward error series of the approximations used in the applications, in the same way as it is done in [1]- [10]. Then it computes symbolically the exact values of the coefficients of ∆x, and its cost can increase considerably with the number of terms to be computed. ...
Article
Full-text available
In this paper we give a new formula to write the forward error of Taylor approximations of analytical functions in terms of the backward error of those approximations, considering exact arithmetic in both errors. Using this formula, a method to compute a backward error given by the power series centered in the same expansion point as the Taylor approximation is provided. The application of this method for Padé approximations is also shown. Based on the previous method, a MATLAB implementation for computing the first power series terms of the backward error for Taylor and Padé approximations of a given analytical function is provided, and examples of its use are given. Applications to the computation of matrix functions are given that overcome limitations of other backward error analyses which uses inverse compositional functions in the literature.
... The matrix cosine has received attention recently (see [4,7] and the references therein). This matrix function can be defined by ...
... , 8 (see [1], Example 3.1), giving a relative error of order u, turning out to be stable solutions. Moreover, the numerical tests from [1] (Example 3.2) and [4,5] also show that if the relative error for each coefficient is O(u), then the polynomial evaluation formulae are accurate, and if the relative errors are O(10u) or greater, then the polynomial evaluation formulae are not so accurate. ...
... Example 1. In [4] (Section 2), we showed that the solutions for the coefficients of the polynomial evaluation method similar to [5] (Section 3.2) of the matrix cosine Taylor approximation of order 2m = 30+ were not stable, giving poor accuracy results. Using Proposition 1, this example gives a stable solution for calculating a Taylor-based approximation of the matrix cosine with a combination of formula (21) with the Paterson-Stockmeyer method from (19). ...
Article
Full-text available
Recently, two general methods for evaluating matrix polynomials requiring one matrix product less than the Paterson–Stockmeyer method were proposed, where the cost of evaluating a matrix polynomial is given asymptotically by the total number of matrix product evaluations. An analysis of the stability of those methods was given and the methods have been applied to Taylor-based implementations for computing the exponential, the cosine and the hyperbolic tangent matrix functions. Moreover, a particular example for the evaluation of the matrix exponential Taylor approximation of degree 15 requiring four matrix products was given, whereas the maximum polynomial degree available using Paterson–Stockmeyer method with four matrix products is 9. Based on this example, a new family of methods for evaluating matrix polynomials more efficiently than the Paterson–Stockmeyer method was proposed, having the potential to achieve a much higher efficiency, i.e., requiring less matrix products for evaluating a matrix polynomial of certain degree, or increasing the available degree for the same cost. However, the difficulty of these family of methods lies in the calculation of the coefficients involved for the evaluation of general matrix polynomials and approximations. In this paper, we provide a general matrix polynomial evaluation method for evaluating matrix polynomials requiring two matrix products less than the Paterson-Stockmeyer method for degrees higher than 30. Moreover, we provide general methods for evaluating matrix polynomial approximations of degrees 15 and 21 with four and five matrix product evaluations, respectively, whereas the maximum available degrees for the same cost with the Paterson–Stockmeyer method are 9 and 12, respectively. Finally, practical examples for evaluating Taylor approximations of the matrix cosine and the matrix logarithm accurately and efficiently with these new methods are given.
... Another algorithm that can calculate the two functions simultaneously is proposed by Seydaoglu, Bader, Blanes, and Casas [40]; it chooses from some Taylor polynomial approximations of fixed degree and relies on precomputed constants. Other algorithms have been developed for computing the matrix cosine based on Taylor series [6], [35], [36], with improvements on the error bounds or the cost of evaluation of the approximating polynomials. There are also algorithms for evaluating the matrix cosine based on approximating functions other than Taylor and Padé approximants, for example, algorithms based on Bernoulli matrix polynomials [10] and Hermite matrix polynomials [11]. ...
... • cosm tay, the algorithm by Sastre et al. [35], which uses the scaling and recovering method based on truncated Taylor series, and is intended for double precision only. • cosm pol, the algorithm by Sastre et al. [36], which uses Taylor polynomial approximations of fixed degree with precomputed coefficients, and is intended for double precision only. The next four codes are for arbitrary precision. ...
... If the use of new coe cients derived from those of is allowed, however, then even more economical schemes, such as [27, Algorithms B and C], are possible. Fuelled by the seminal work of Sastre [30], the topic has received renewed attention in recent years, and algorithms that combine this scheme with the scaling and squaring technique have been developed for the exponential [2,31], the sine and cosine [34,36], and the hyperbolic tangent [23]. ...
... New approaches to determine the coe cients using symbolic computation have recently been proposed [33]. Remark 5 (Related notation) Some literature focusing on minimizing the number of matrix-matrix multiplications [30,31,34] uses a slightly di erent compact form to express multiplication-e cient polynomial evaluation. The authors prefer the notation where is the highest power of the input matrix that has to be computed and is a level parameter related to in the degree-optimal form. ...
Preprint
Full-text available
Many numerical methods for evaluating matrix functions can be naturally viewed as computational graphs. Rephrasing these methods as directed acyclic graphs (DAGs) is a particularly effective way to study existing techniques, improve them, and eventually derive new ones. As the accuracy of these matrix techniques is determined by the accuracy of their scalar counterparts, the design of algorithms for matrix functions can be viewed as a scalar-valued optimization problem. The derivatives needed during the optimization can be calculated automatically by exploiting the structure of the DAG, in a fashion akin to backpropagation. The Julia package GraphMatFun.jl offers the tools to generate and manipulate computational graphs, to optimize their coefficients, and to generate Julia, MATLAB, and C code to evaluate them efficiently. The software also provides the means to estimate the accuracy of an algorithm and thus obtain numerically reliable methods. For the matrix exponential, for example, using a particular form (degree-optimal) of polynomials produces algorithms that are cheaper, in terms of computational cost, than the Pad\'e-based techniques typically used in mathematical software. The optimized graphs and the corresponding generated code are available online.
... There are different techniques to compute efficiently the exponential of a matrix [2,3,5,12,21,25,26,27]. However, using any of these general algorithms to approximate the unitary matrix e −itA in (1) involves products of complex matrices making them computationally expensive. ...
... Although one can find in the literature several algorithms to compute cos(A) (see [25] and references therein), only few of them are designed to do so in a simultaneous way (see [1] and references therein). As our analysis shows and several numerical examples confirm, the technique we propose here outperform all of them. ...
Preprint
Full-text available
A new procedure is presented for computing the matrix cosine and sine simultaneously by means of Taylor polynomial approximations. These are factorized so as to reduce the number of matrix products involved. Two versions are developed to be used in single and double precision arithmetic. The resulting algorithms are more efficient than schemes based on Pad\'e approximations for a wide range of norm matrices.
... Among the proposed methods for the approximate computation of the matrix cosine, two fundamental ones stand out: those based on rational approximations [1,[5][6][7], and those related to polynomial approximations, using either Taylor series developments [8,9] or serial developments of Hermite matrix polynomials [10]. In general, polynomial approximations showed to be more efficient than the rational algorithms in tests because they are more accurate despite a slightly higher cost. ...
... Using approximation(9) ...
Conference Paper
Full-text available
The computation of matrix trigonometric functions has received remarkable attention in the last decades due to its usefulness in the solution of systems of second order linear differential equations. Recently, several state-of-the-art algorithms have been provided for computing these matrix functions, in particular for the matrix cosine function.
... , n being the dimension of matrix A. Although convergence is not guaranteed, if it does, then X k converges to A 1/2 . In Step 5, we use the following approximation for calculating α from (29) (see [46,47]): ...
Article
Full-text available
The most popular method for computing the matrix logarithm is a combination of the inverse scaling and squaring method in conjunction with a Padé approximation, sometimes accompanied by the Schur decomposition. In this work, we present a Taylor series algorithm, based on the free-transformation approach of the inverse scaling and squaring technique, that uses recent matrix polynomial formulas for evaluating the Taylor approximation of the matrix logarithm more efficiently than the Paterson–Stockmeyer method. Two MATLAB implementations of this algorithm, related to relative forward or backward error analysis, were developed and compared with different state-of-the art MATLAB functions. Numerical tests showed that the new implementations are generally more accurate than the previously available codes, with an intermediate execution time among all the codes in comparison.
... 6 Methods based on finite differences, and its application to solve fractional partial differential equations, can be found in previous studies. [10][11][12] Among the methods proposed to approximate the matrix cosine, two fundamentally stand out: (1) those focused on polynomial approximations, thanks to the developments of the matrix cosine in Taylor or Hermite series (see previous works [13][14][15] ), or (2) those based on rational approximations, such as Padé approach (see previous studies 5,9,16,17 ). In general, polynomial methods are more efficient in terms of accuracy than rational ones, although they may be somewhat more computationally expensive. ...
Article
Full-text available
This paper presents a new series expansion based on Bernoulli matrix polynomials to approximate the matrix cosine function. An approximation based on this series is not a straightforward exercise since there exist different options to implement such a solution. We dive into these options and include a thorough comparative of performance and accuracy in the experimental results section that shows benefits and downsides of each one. Also, a comparison with the Padé approximation is included. The algorithms have been implemented in MATLAB and in CUDA for NVIDIA GPUs.
Article
Many numerical methods for evaluating matrix functions can be naturally viewed as computational graphs. Rephrasing these methods as directed acyclic graphs (DAGs) is a particularly effective approach to study existing techniques, improve them, and eventually derive new ones. The accuracy of these matrix techniques can be characterized by the accuracy of their scalar counterparts, thus designing algorithms for matrix functions can be regarded as a scalar-valued optimization problem. The derivatives needed during the optimization can be calculated automatically by exploiting the structure of the DAG, in a fashion analogous to backpropagation. This paper describes GraphMatFun.jl , a Julia package that offers the means to generate and manipulate computational graphs, optimize their coefficients, and generate Julia, MATLAB, and C code to evaluate them efficiently at a matrix argument. The software also provides tools to estimate the accuracy of a graph-based algorithm and thus obtain numerically reliable methods. For the exponential, for example, using a particular form (degree-optimal) of polynomials produces implementations that in many cases are cheaper, in terms of computational cost, than the Padé-based techniques typically used in mathematical software. The optimized graphs and the corresponding generated code are available online.
Article
A new procedure is presented for computing the matrix cosine and sine simultaneously by means of Taylor polynomial approximations. These are factorized so as to reduce the number of matrix products involved. Two versions are developed to be used in single and double precision arithmetic. The resulting algorithms are more efficient than schemes based on Padé approximations for a wide range of norm matrices.
Article
Full-text available
This paper presents new Taylor algorithms for the computation of the matrix exponential based on recent new matrix polynomial evaluation methods. Those methods are more efficient than the well known Paterson–Stockmeyer method. The cost of the proposed algorithms is reduced with respect to previous algorithms based on Taylor approximations. Tests have been performed to compare the MATLAB implementations of the new algorithms to a state-of-the-art Padé algorithm for the computation of the matrix exponential, providing higher accuracy and cost performances.
Article
Full-text available
This paper presents an implementation of one of the most up-to-day algorithms proposed to compute the matrix trigonometric functions sine and cosine. The method used is based on Taylor series approximations which intensively uses matrix multiplications. To accelerate matrix products, our application can use from one to four NVIDIA GPUs by using the NVIDIA cublas and cublasXt libraries. The application, implemented in C++, can be used from the Matlab command line thanks to the mex files provided. We experimentally assess our implementation in modern and very high-performance NVIDIA GPUs.
Article
Full-text available
This paper presents a new family of methods for evaluating matrix polynomials more efficiently than the state-of-the-art Paterson–Stockmeyer method. Examples of the application of the methods to the Taylor polynomial approximation of matrix functions like the matrix exponential and matrix cosine are given. Their efficiency is compared with that of the best existing evaluation schemes for general polynomial and rational approximations, and also with a recent method based on mixed rational and polynomial approximants. For many years, the Paterson–Stockmeyer method has been considered the most efficient general method for the evaluation of matrix polynomials. In this paper we show that this statement is no longer true. Moreover, for many years rational approximations have been considered more efficient than polynomial approximations, although recently it has been shown that often this is not the case in the computation of the matrix exponential and matrix cosine. In this paper we show that in fact polynomial approximations provide a higher order of approximation than the state-of-the-art computational methods for rational approximations for the same cost in terms of matrix products.
Article
Full-text available
The computation of matrix trigonometric functions has received remarkable attention in the last decades due to its usefulness in the solution of systems of second order linear differential equations. Several state-of-the-art algorithms have been provided recently for computing these matrix functions. In this work, we present two efficient algorithms based on Taylor series with forward and backward error analysis for computing the matrix cosine. A MATLAB implementation of the algorithms is compared to state-of-the-art algorithms, with excellent performance in both accuracy and cost.
Article
Full-text available
Trigonometric matrix functions play a fundamental role in second order differential equations. This work presents an algorithm based on Taylor series for computing the matrix cosine. It uses a backward error analysis with improved bounds. Numerical experiments show that MATLAB implementations of this algorithm has higher accuracy than other MATLAB implementations of the state of the art in the majority of tests. Furthermore, we have implemented the designed algorithm in language C for general purpose processors, and in CUDA for one and two NVIDIA GPUs. We obtained a very good performance from these implementations thanks to the high computational power of these hardware accelerators and our effort driven to avoid as much communications as possible. All the implemented programs are accessible through the MATLAB environment.
Article
Full-text available
Several existing algorithms for computing the matrix cosine employ polynomial or rational approximations combined with scaling and use of a double angle formula. Their derivations are based on forward error bounds. We derive new algorithms for computing the matrix cosine, the matrix sine, and both simultaneously that are backward stable in exact arithmetic and behave in a forward stable manner in floating point arithmetic. Our new algorithms employ both Pade approximants of sin x and new rational approximants to cos x and sin x obtained from Pade approximants to e(x). The amount of scaling and the degree of the approximants are chosen to minimize the computational cost subject to backward stability in exact arithmetic. Numerical experiments show that the new algorithms have backward and forward errors that rival or surpass those of existing algorithms and are particularly favorable for triangular matrices.
Article
Full-text available
This work presents a new algorithm for matrix exponential computation that significantly simplifies a Taylor scaling and squaring algorithm presented previously by the authors, preserving accuracy. A Matlab version of the new simplified algorithm has been compared with the original algorithm, providing similar results in terms of accuracy, but reducing processing time. It has also been compared with two state-of-the-art implementations based on Padé approximations, one commercial and the other implemented in Matlab, getting better accuracy and processing time results in the majority of cases.
Article
Full-text available
Trigonometric matrix functions play a fundamental role in second order differential equation systems. This work presents an algorithm for computing the cosine matrix function based on Taylor series and the cosine double angle formula. It uses a forward absolute error analysis providing sharper bounds than existing methods. The proposed algorithm had lower cost than state-of-the-art algorithms based on Hermite matrix polynomial series and Padé approximants with higher accuracy in the majority of test matrices.
Article
Full-text available
We present algorithms which use only O(x / nonscalar multiplications (i.e. multiplications involving "x " on both sides) to evaluate polynomials of degree n, and proofs that at least x/ are required. These results have practical application in the evaluation of matrix polynomials with scalar coefficients, since the "matrix matrix " multiplications are relatively expensive, and also in determining how many multiplications are needed for polynomials with rational coefficients, since multiplications by integers can in principle be replaced by several additions. Key words. Polynomial evaluation, nonscalar multiplications, rational coefficients, matrix polynomial. 1. Introduction. A well-known result given by Motzkin [23 and Winograd [6 is that, even with preliminary adaptation of the coefficients, at least n/2 multiplications are required to evaluate a polynomial of degree n if the coefficients of the polynomial are algebraically independent. However we frequently wish to evaluate polynomials with rational or integer coefficients for which this result does not
Article
Trigonometric matrix functions play a fundamental role in the solution of second order differential equations. Hermite series truncation together with Paterson–Stockmeyer method and the double angle formula technique allow efficient computation of the matrix cosine. A careful error bound analysis of the Hermite approximation is given and a theoretical estimate for the optimal value of its parameters is obtained. Based on the ideas above, an efficient and highly-accurate Hermite algorithm is presented. A MATLAB implementation of this algorithm has also been developed and made available online. This implementation has been compared to other efficient state-of-the-art implementations on a large class of matrices for different dimensions, obtaining higher accuracy and lower computational costs in the majority of cases.