Content uploaded by Jorge Sastre Martínez
Author content
All content in this area was uploaded by Jorge Sastre Martínez on Oct 05, 2017
Content may be subject to copyright.
Two algorithms for computing the matrix cosine
function ✩
Jorge Sastre†,JavierIb´a˜nez
♮,PedroAlonso
‡,Jes´usPeinado
♮,EmilioDefez
⋆
†Instituto de Telecomunicaciones y Aplicaciones Multimedia.
♮Instituto de Instrumentaci´on para Imagen Molecular.
‡Dept. of Information Systems and Computation.
⋆Instituto de Matem´atica Multidisciplinar.
Universitat Polit`ecnica de Val`encia, Camino de Vera s/n, 46022, Valencia, Espa˜na.
jsastrem@upv.es, jjibanez@dsic.upv.es, palonso@dsic.upv.es, jpeinado@dsic.upv.es,
edefez@imm.upv.es
Abstract
The computation of matrix trigonometric functions has received remarkable
attention in the last decades due to its usefulness in the solution of systems of
second order linear differential equations. Several state-of-the-art algorithms
have been provided recently for computing these matrix functions. In this
work we present two efficient algorithms based on Taylor series with forward
and backward error analysis for computing the matrix cosine. A MATLAB
implementation of the algorithms is compared to state-of-the-art algorithms,
with excellent performance in both accuracy and cost.
Keywords: matrix cosine, scaling and recovering method, Taylor series,
forward error analysis, backward error analysis, MATLAB.
1. Introduction
Many engineering processes are described by second order differential
equations, whose solution is given in terms of the trigonometric matrix func-
tions sine and cosine. Examples arise in the spatial semi-discretization of
✩e-mail: jjibanez@dsic.upv.es. This work has been supported by Spanish Ministerio
de Econom´ıa y Competitividad and the European Regional Development Fund (ERDF)
grant TIN2014-59294-P.
Preprint submitted to Applied Mathematical Modelling. February 21, 2017
the wave equation or in mechanical systems without damping, where their
solutions can be expressed in terms of integrals involving the matrix sine and
cosine [1, 2]. Several state-of-the-art algorithms have been provided recently
for computing these matrix functions using polynomial and rational approxi-
mations with scaling and recovering techniques [3, 4, 5, 6]. In order to reduce
computational costs Paterson-Stockmeyer method [7] is used to evaluate the
matrix polynomials arising in these approximations.
In the Taylor algorithm proposed in [4] we used sharp absolute forward
error bounds. In the Taylor algorithm proposed in [6] we improved the previ-
ous algorithm using relative error bounds based on backward error bounds of
the matrix exponentials involved in cos(A). Those error bounds do not guar-
antee that the cosine backward error bound in exact arithmetic is less than
the unit roundoffin double precision arithmetic [6, Sec. 2]. However, accord-
ing to the tests, that algorithm improved the accuracy with respect to the
previous Taylor algorithm at the expense of some increase in cost (measured
in flops). The algorithm proposed in [6] was also superior in both accuracy
and cost to the version of the scaling and recovering Pad´e state-of-the-art
algorithm in [5] not using the Schur decomposition.
Other algorithms based on approximations on L∞for normal and non-
negative matrices have been presented recently in [8]. In this work we focus
on general matrices and algorithms using approximations at the origin. We
present two algorithms based on Taylor series that use Theorem 1 from [4]
for computing the matrix cosine. We provide relative forward and backward
error analysis for the matrix cosine Taylor approximation that improves even
more the comparison to the algorithm in [5] with and without Schur decom-
position in both accuracy and cost tests.
Throughout this paper Cn×ndenotes the set of complex matrices of size
n×n,Ithe identity matrix for this set, ρ(X)thespectralradiusofmatrixX,
and Nthe set of positive integers. In this work we use the 1-norm to compute
the actual norms. This paper is organized as follows. Section 2 presents a
Tay l o r a l g o r i thm for computing the matrix cosine function . S e c t i o n 3 d e a l s
with numerical tests and, finally, Section 4 gives some conclusions.
2
2. Algorithms for computing matrix cosine
The matrix cosine can be defined for all A∈Cn×nby
cos(A)=
∞
!
i=0
(−1)iA2i
(2i)! ,
and let
T2m(A)=
m
!
i=0
(−1)iBi
(2i)! ≡Pm(B),(1)
be the Taylor approximation of order 2mof cos(A), where B=A2.Since
Tay l o r s e r i es are accurate only near the origin, i n a l g o r i t h m s t h a t u s e t h i s
approximation the norm of matrix Bis reduced by scaling the matrix. Then,
aTaylorapproximationiscomputed,andfinallytheapproximationofcos(A)
is recovered by means of the double angle formula cos(2X)=2cos
2(X)−I.
Algorithm 1 shows a general algorithm for computing the matrix cosine based
on Taylor approximation. By using the fact that sin(A)=cos(A−π
2I),
Algorithm 1 also can be easily used to compute the matrix sine.
Algorithm 1 Given a matrix A∈Cn×n,thisalgorithmcomputesC=
cos(A)byTaylorseries.
1: Select adequate values of mand s◃Phase I
2: B=4
−sA2
3: C=Pm(B)◃Phase II: Compute Taylor approximation
4: for i=1:sdo ◃Phase III: Recovering cos(A)
5: C=2C2−I
6: end for
In Phase I of Algorithm 1, mand smust be calculated so that the Taylor
approximation of the scaled matrix is computed accurately and efficiently.
In this phase some powers Bi,i≥2, are usually computed for estimating m
and sand if so they are used in Phase II.
Phase II consists of computing the Taylor approximation (1). For clarity
of the exposition we recall some results summarized in [6, Sec. 2]. Tay-
lor matrix polynomial approximation (1), expressed as Pm(B)="m
i=0 piBi,
B∈Cn×n,canbecomputedwithoptimalcostbythePaterson-Stockmeyer’s
method [7] choosing mfrom the set
M={1,2,4,6,9,12,16,20,25,30,36,42,...},
3
where the elements of Mare denoted as m1,m
2,m
3,... The algorithm com-
putes first the powers Bi,2≤i≤qnot computed in the previous phase,
being q=#√mk$or q=⌊√mk⌋an integer divisor of mk,k≥1, both val-
ues giving the same cost in terms of matrix products. Therefore, (1) can be
computed efficiently as
Pmk(B)= (2)
(((pmkBq+pmk−1Bq−1+pmk−2Bq−2+···+pmk−q+1B+pmk−qI)Bq
+pmk−q−1Bq−1+pmk−q−2Bq−2+···+pmk−2q+1 B+pmk−2qI)Bq
+pmk−2q−1Bq−1+pmk−2q−2Bq−2+···+pmk−3q+1 B+pmk−3qI)Bq
...
+pq−1Bq−1+pq−2Bq−2+···+p1B+p0I.
Table 1 (page 10) shows the values of qfor different values of m.From
Table 4.1 from [9, p. 74] the cost of computing (1 ) with (2) is Πmk=k
matrix products, k=1,2,...
Finally, Phase III is necessary to obtain the cosine of matrix Afrom
cos(4−sB)computedpreviouslyinPhaseII.Ifmkis the order used and
sis the scaling parameter, then the computational cost of Algorithm 1 is
2(k+s)n3flops, and the storage cost is (2 + qk)n2.
The difficulty of Algorithm 1 is to find appropriate values of mkand s
such that cos(A)iscomputedaccuratelywithminimumcost. Forthat,in
the following sections we will use Theorem 1:
Theorem 1 ([4]). Let hl(x)="
i≥l
pixibe a power series with radius of con-
vergence w,˜
hl(x)="
i≥l|pi|xi,B∈Cn×nwith ρ(B)<w,l∈Nand t∈N
with 1!t!l.Ift0is the multiple of tsuch that l!t0!l+t−1and
βt=max{d1/j
j:j=t, l, l +1,...,t
0−1,t
0+1,t
0+2,...,l+t−1},
where djis an upper bound for ||Bj||,dj"||Bj||, then
||hl(B)|| !˜
hl(βt).
2.1. Relative forward error in Taylor approximation
The following proposition gives a necessary and sufficient condition for
the existence of cos−1(A).
4
Proposition 1. Let Abe a matrix in Cn×nand let B=A2.If∥B∥<a
2
where a=acosh(2)then cos (A)is invertible.
Proof. Since
∥I−cos(A)∥=%
%
%
%
%"
k≥1
(−1)kA2k
(2k)! %
%
%
%
%≤"
k≥1∥A2∥k
(2k)! <"
k≥1
a2k
(2k)! =cosh(a)−1=1,
then, by applying Lema 2.3.3 from [10, pp. 58], we obtain that I−(I−
cos(A)) = cos(A)isinvertible. #
Using Proposition 1, if
∥B∥=∥A2∥<acosh2(2) ≈1.7343,(3)
then cos−1(A)existsanditfollowsthattherelativeforwarderrorofcomput-
ing cos(A)bymeansof(1),denotedbyEf,is
Ef=%
%cos−1(A)(cos(A)−T2m(A))%
%=%
%
%
%
%!
i≥m+1
e(2m)
iA2i%
%
%
%
%
=%
%
%
%
%!
i≥m+1
e(2m)
iBi%
%
%
%
%
,
where the coefficients e(2m)
idepend on Taylor approximation order 2m.Ifwe
define gm+1(x)= "
i!m+1
e(2m)
ixiand ˜gm+1(x)= "
i≥m+1 &&&e(2m)
i&&&xi,andweapply
Theorem 1, then
Ef=∥gm+1(B)∥≤˜gm+1(β(m)
t),(4)
for every t,1≤t≤m+1. Following [4, Sec. 5.1], in (4) we denote by β(m)
t
the corresponding value of βtfrom Theorem 1 for order m,andfromnowon
we will use that nomenclature.
Let Θmbe
Θm=max'θ≥0: !
i≥m+1 &&&e(2m)
i&&&θi≤u(,(5)
where u=2
−53 is the unit roundoffin double precision floating-point arith-
metic. We have used MATLAB Symbolic Math Toolbox to evaluate
"
i≥m+1 &&&e(2m)
i&&&θifor each min 250-digit decimal arithmetic, adding the first
5
250 series terms with the coefficients obtained symbolically. Then, a numer-
ical zero-finder is invoked to determine the highest value of Θmsuch that
"
i≥m+1 &&&e(2m)
i&&&Θi
m≤uholds. For this analysis to hold it is necessary that
cos(A)isinvertible. Hence,ifcondition(3)holdsandβ(m)
t≤Θm,then
Ef≤u.
Some values of Θmare given in Table 1.
2.2. Backward error in Taylor approximation
In [5], a backward error analysis is made for computing sine and cosine
matrix functions. For each matrix function, two analysis were made. In the
cosine function, the first one is based on considering the function
h2m(x):=arccos(rm(x)) −x,
where rm(x)isthe[m/m]Pad´eapproximanttothecosinefunction,andthe
authors conclude that different restrictions make this analysis unusable [5,
Sec. 2.2]. We checked that an error analysis for the matrix cosine Tay-
lor approximation similar to that on [5, Sec. 2.2] yields analogous results.
Therefore, in order to calculate the backward error ∆Xof approximating
cos(X)byTaylorpolynomialT2m(X)suchthat
T2m(X)=cos(X+∆X),(6)
we propose a different approach that holds for any matrix X∈Cr×rand
uses the following result whose proof is trivial:
Lemma 1. If Aand Bare matrices in Cr×rand AB =BA then
cos(A+B)=cos(A)cos(B)−sin(A)sin(B).(7)
Note that the backward error ∆Xfrom (6) is a holomorphic function of X
and then X∆X=∆XX.Therefore,using(6)andLemma1
T2m(X)=cos(X+∆X)=cos(X)cos(∆X)−sin(X)sin(∆X)
=cos(X)!
i≥0
(−1)i∆X2i
(2i)! −sin(X)!
i≥0
(−1)i∆X2i+1
(2i+1)! .
6
Hence,
cos(X)−T2m(X)= !
i≥m+1
(−1)iX2i
(2i)! (8)
= sin(X)!
i≥0
(−1)i∆X2i+1
(2i+1)! −cos(X)!
i≥1
(−1)i∆X2i
(2i)! ,
and consequently the backward error ∆Xcan be expressed by
∆X=!
i≥m
c(2m)
iX2i+1,(9)
where coefficients c(2m)
idepend on Taylor approximation order 2m,and∆X
commutes with X.Notethatanexpressionsimilarto(8)canbeobtained
for other approximations of the matrix cosine such as Pad´e approximation.
Using (8) and (9) it follows that
sin(X)∆X+O(X4m+2)= !
i≥m+1
(−1)iX2i
(2i)!.(10)
Hence, coefficients c(2m)
i,i=m, m+1,...,2m−1canbecomputedobtaining
symbolically the Taylor series of sin(X)∆Xfrom the left-hand side of (10)
and solving the system of equations that arise when equating the coefficients
of X2i,i=m+1,m +2,...,2m,frombothsidesof(10)usingfunction
solve from the MATLAB Symbolic Math Toolbox.
Analogously, using (8) and (9) it follows that
sin(X)∆X+cos(X)∆X2
2! +O(X6m+4)= !
i≥m+1
(−1)iX2i
(2i)!,(11)
and coefficients c(2m)
i,i=2m, 2m+1,...,3m+1, can be calculated by using
coefficients c(2m)
i,i=m, m +1,...,2m−1, obtained previously, computing
symbolically the Taylor series of sin(X)∆X+cos(X)∆X2
2! in the left-hand
side of (11) and solving the system of equations that arise when equating
the coefficients of X2i,i=2m+1,2m+2,...,3m+1, from both sides of
(11). By proceeding analogously, c(2m)
i,i>3m+1can be computed. Then,
7
we compute the relative backward error of approximating cos(A)byT2m(A),
denoted by Eb,as
Eb=∥∆A∥
∥A∥=%
%
%
%"
i≥m
c(2m)
iA2i+1 %
%
%
%
∥A∥≤%
%
%
%
%!
i≥m
c(2m)
iA2i%
%
%
%
%
=%
%
%
%
%!
i≥m
c(2m)
iBi%
%
%
%
%
.
If we define hm(x)= "
i!m
c(2m)
ixiand ˜
hm(x)= "
i≥m&&&c(2m)
i&&&xi,andweapply
Theorem 1, then
Eb≤∥hm(B)∥≤˜
hm(β(m)
t).(12)
Let ¯
Θmbe
¯
Θm=max'θ≥0: !
i≥m&&&c(2m)
i&&&θi≤u(.(13)
For computing ¯
Θm,wehaveusedMATLABSymbolicMathToolboxtoevalu-
ate "
i≥m&&&c(2m)
i&&&θiin 250-digit decimal arithmetic for each madding a different
number of series terms depending on mwith the coefficients obtained sym-
bolically, and a numerical zero-finder was invoked to determine the highest
value of ¯
Θmsuch that "
i≥m&&&c(2m)
i&&&
¯
Θi
m≤uholds. We have checked that for
m=1,2,4,6,9,12 the values of ¯
Θmobtained in double precision arithmetic
do not vary if we take more than 256 series terms (and even fewer terms for
the lower orders).
Note that the values ¯
Θ1=2.66 ·10−15 ,¯
Θ2=2.83 ·10−7,¯
Θ4=4.48 ·10−3
and ¯
Θ6=1.45 ·10−1,presentedwithtwosignificantdigits,arelowerthan
the corresponding Θmvalues, m=1,2,4,6, from Table 1, for the forward
error analysis, respectively.
On the other hand, considering 1880 and 1946 series terms for m=
16 the corresponding ¯
Θ16 values have a relative difference of 4 .0010 ·10−4.
Considering 1880 and 1980 series terms for m=20thecorresponding ¯
Θ20
values have a relative difference of 1 .6569 ·10−3.Theprocessofcomputing
those values for so many terms was very time consuming and we took as
final values the ones shown in Table 1 corresponding to 1946 series terms for
m=16and1980seriestermsform=20.
With the final selected values of ¯
Θmin Table 1 it follows that if β(m)
t≤
¯
Θm,thenrelativebackwarderrorislowerthantheunitroundoffindouble
8
precision floating-point arithmetic, i.e.
Eb≤ufor mk=9,12,and Eb$ufor mk=16,20.
2.3. Backward error in double angle formula of the matrix cosine
We are int e r e sted in the backwa r d e r r o r i n P h a s e II I o f A l g o r ithm 1. I n t h e
previous section we have shown that it is possible to obtain a small backward
error in the Taylor approximation of the matrix cosine. As in [5, Sec. 2.3], it
can be shown that the backward error propagates linearly through the double
angle formula. A result for the backward error similar to Lemma 2.1 from [5]
can be obtained for polynomial approximations of the matrix cosine.
Lemma 2. Let A∈Cn×nand X=2
−sA,sinteger non negative, and sup-
pose that t(X)=cos(X+∆X)for a polynomial function t. Then the approx-
imation Yby applying the double angle formula satisfies Y=cos(A+∆A)
in exact arithmetic, and hence
∥∆A∥
∥A∥=∥∆X∥
∥X∥.
Proof. The proof is similar to that given in Lemma 2.1 from [5].
Lemma 2 shows that if we choose mand s,suchthat∥∆X∥/∥X∥≤u,
with X=4
−sA,thenifs≥1thetotalbackwarderrorinexactarithmetic
after Phase III of Algorithm 1 is bounded by u,producingnoerrorgrowth.
2.4. Determining the values of the Taylor approximation order mand the
scaling parameter s
Since Θ6≃0.1895 <acosh2(2) ≃1.7343 <¯
Θ9≃1.7985, see (3), and the
values Θmkfor the forward error analysis and mk≤6aregreaterthanthe
corresponding ¯
Θmkvalues for the backward error analysis, we use the relative
forward analysis for mk≤6, and the relative backward analysis for mk≥9.
Therefore, by Theorem 1, for mk=1,2,4,6, if there exists tsuch that
β(mk)
t≤Θmk,onegetsthat(2.1)holds,andformk=9,12,16,20, if there
exists tsuch that β(mk)
t≤¯
Θmkit follows that (2.2) holds. Table 1 shows
the values Θmk,mk=1,2,4,6and ¯
Θmk,mk=9,12,16,20. For simplicity of
notation, from now on we will denote by Θmkboth Θmk,mk≤6and ¯
Θmk,
mk≥9.
The selection of the order mand scaling parameter sis as follows. If there
exists tand mksuch that β(mk)
t≤Θ9then it is not necessary to scale Band
9
Table 1: Va l ues of Θmk(forward analysis), ¯
Θmk(backward analysis) and qk
used to compute (1) by Paterson-Stockmeyer method (2).
km
kqkΘmkkm
kqk¯
Θmk
1115.161913593731081e-8 5 9 3 1.798505876916759
222
4.307691256676447e-5 6 12 4 6.752349007371135
342
1.319680929892753e-2 7 16 4 9.971046342716772
463
1.895232414039165e-1 8 20 5 10.177842844012551
the Taylor approximation order mk,wherek=min
)k:β(mk)
t≤Θmk*is
selected, i.e. the order mkproviding the minimum cost. Since in this case no
scaling is applied then the double angle formula of Phase III of Algorithm 1
is not applied. Else, we scale the matrix Bby the scaling parameter
s=max'0,+1
2log2,β(mk)
t
Θmk-.(,m
k∈{9,12,16},(14)
such that the matrix cosine is computed with minimum cost. Lemma 2
ensures that the backward error propagates linearly through the double angle
formula in exact arithmetic if s≥1. Following [11, Sec. 3.1], the explanation
for the minimum and maximum orders mkto be used in (14) for scaling
(giving the minimum cost) is as follows: Since Θ9/4>Θ6and Θ9·4>Θ12
the minimum order to select for scaling is m=m5=9. Ontheotherhand,
since Θ20/4<Θ12 and Θ16 /4>Θ9,if||B|| >Θ9the maximum order to
select for scaling is m=m7=16. Following[11, Sec. 3.1] thefinalselection
of mkis the maximum order mk∈{9,12,16}giving also the minimum cost.
This selection provides the minimum scaling parameter sover all selections
of mkthat provide the minimum cost.
Then the Taylor approximation of order mkof cos(B/4s)iscomputed,
and if s≥1therecoveringPhaseIIIofAlgorithm1isapplied.
For computing the parameters β(mk)
tfor 1 ≤mk≤16, from Theorem 1,
it is necessary to calculate upper bounds dkfor ||Bk||.Wehavedeveloped
two different algorithms to obtain the order mkand the scaling parameter
s.Algorithm2usesanestimationβ(mk)
min of the minimum of the values β(mk)
t
from Theorem 1 obtained using Algorithm 3. In order to calculate the upper
bounds dkof ||Bk|| for obtaining β(mk)
min Algorithm 3 uses only products of
norms of matrix powers previously computed in Algorithm 2, i.e. ∥Bi∥,
10
i≤4. For instance, note that in order to compute β(m)
2for the relative
forward error bound and m=2,using(1)weneedonlyboundsd1/2
2and
d1/3
3.FromTable1form=2onegetsq=q2=2andBi,i=1,2, are
available, and we take d1=∥B∥and d2=∥B2∥.Since||B2||1/2≤||B||
it follows that β(2)
min =β(2)
2=max{d1/2
2,(d2d1)1/3=(d2d1)1/3,(Step3of
Algorithm 3). Something similar happens with β(4)
2,resultinginβ(4)
min =
β(4)
2=max{d1/2
2,(d2
2d1)1/5}=(d2
2d1)1/5,(Step5ofAlgorithm3).
Using Table 1, for m=6onegetsq=q4=3,beingnowalsoavailable
B3,andwetaked3=∥B3∥.Using(1),ifd1/2
2≤d1/3
3we select β(6)
min =β(6)
2=
max{d1/2
2,(d2
2d3)1/7}=(d2
2d3)1/7,(Step8ofAlgorithm3).Else,weselect
β(6)
min =β(6)
3=max{d1/3
3,min{d2
2d3,d
1d2
3}1/7,(d2
3d2)1/8}
=max{min{d2
2d3,d
1d2
3}1/7,(d2
3d2)1/8},
(Step 10 of Algorithm 3). Value β(mk)
min is obtained analogously for mk=
9,12,16 (Steps 12-37 of Algorithm 3).
11
Algorithm 2 Given a matrix A∈Cn×n,thisalgorithmdeterminestheorder
m,scalingparameters,andpowersofB=A2needed for computing Taylor
approximation of cos(A)usingnoestimationofnormsofmatrixpow
ers.
1: B1=A2
2: if ∥B1∥≤Θ1then m=1,s=0,quit
3: B2=B2
1,obtainβ(2)
min using Algorithm 3 with m=2,q=2
4: if β(2)
min ≤Θ2then m=2,s=0,quit
5: β(4)
min =min{β(2)
min,β(4)
min using Algorithm 3 with m=4,q=2}
6: if β(4)
min ≤Θ4then m=4,s=0,quit
7: B3=B2B1
8: β(6)
min =min{β(4)
min,β(6)
min using Algorithm 3 with m=6,q=3}
9: if β(6)
min ≤Θ6then m=6,s=0,quit
10: β(9)
min =min{β(6)
min,β(9)
min using Algorithm 3 with m=9,q=3}
11: if β(9)
min ≤Θ9then m=9,s=0,quit
12: β(12)
min =min{β(9)
min,β(12)
min using Algorithm 3 with m=12,q=3}
13: if β(12)
min ≤Θ12 then m=12,s=0,quit
14: s9=⌈1/2log
2(β(9)
min/Θ9)⌉
15: s12 =⌈1/2log
2(β(12)
min/Θ12 )⌉
16: if s9≤s12 then s=s9,m=9quit ◃m=9usedforscalingonlyif
providing less cost than m=12
17: B4=B3B1
18: β(12)
min =min{β(12)
min,β(12)
min using Algorithm 3 with m=12,q=4}
19: if β(12)
min ≤Θ12 then m=12,s=0,quit
20: s12 =⌈1/2log
2(β(12)
min/Θ12 )⌉
21: β(16)
min =min{β(12)
min,β(16)
min using Algorithm 3 with m=16,q=4}
22: s16 =max{0,⌈1/2log
2(β(16)
min/Θ16)⌉}
23: if s12 ≤s16 then m=12,s=s12 else m=16,s=s16 quit ◃m=12
only used if provides less cost than m=16
12
Algorithm 3 beta NoNormEst: determines value β(m)
min =min{β(m)
t}from The-
orem 1 given m∈{2,4,6,9,12,16},di=||Bi||,bi=||Bi||1/i,i=1,2,...,q, for
B∈Cn×n, using bounds di≥||Bi||,i>q, based on products of ||Bi||,i≤q.
1: switch mdo
2: case 2◃m=2
3: β(2)
min =(d2d1)1/3
4: case 4◃m=4
5: β(4)
min =(d2
2d1)1/5
6: case 6◃m=6
7: if b2≤b3then
8: β(6)
min =min{d2
2d3,d
1d2
3}1/7
9: else
10: β(6)
min = max{min{d2
2d3,d
1d2
3}1/7,(d2
3d2)1/8}
11: end if
12: case 9◃m=9
13: if b2≤b3then
14: β(9)
min =(d3
2d3)1/9
15: else
16: β(9)
min = max{min{d2
2d2
3,d
3
3d1}1/10,(d3
3d2)1/11}
17: end if
18: case 12 ◃m= 12
19: if q=3then
20: if b2≤b3then
21: β(12)
min =(d5
2d3)1/13
22: else
23: β(12)
min = max{min{d4
3d1,d
3
3d2
2}1/13,(d4
3d2)1/14
24: end if
25: else if q=4then
26: if b3≤b4then
27: β(12)
min = max{(d3
3d4)1/13,min{d2
3d2
4,d
4
3d2}1/14
28: else
29: β(12)
min = max{(d2
4min{d3d2,d
4d1}}1/13,(d2
4min{d2
3,d
4d2})1/14
30: end if
31: end if
32: case 16 ◃m= 16
33: if b3≤b4then
34: β(16)
min = max{(d4
3d4)1/16,min{d5
3d2,d
3
3d2
4}1/17}
35: else
36: β(16)
min = max{(d3
4min{d4d1,d
3d2})1/17,(d3
4min{d2
3,d
4d2})1/18}
37: end if
13
On the other hand, in order to reduce the value β(mk)
min and therefore the
scaling parameter sand/or order mgiven by Algorithm 2, using (4) and
(2.2), similarly to (16) from [12] we approximated β(mk)
min =min{β(mk)
t}from
Theorem 1 by
β(mk)
min ≈max )d1/(mk+1)
mk+1 ,d
1/(mk+2)
mk+2 *,m
k≤6(forwardbound),(15)
β(mk)
min ≈max )d1/mk
mk,d
1/(mk+1)
mk+1 *mk≥9(backwardbound),(16)
computing the 1–norms of the corresponding matrix powers ∥Bi∥by the
estimation algorithm from [13] and taking bounds di=∥Bi∥.Equation(15)
and (16) may give values β(mk)
min lower than the ones given by Algorithm 3,
especially for nonnormal matrices, since
∥Bp∥≤∥B∥i1∥B2∥i2∥B3∥i3∥B4∥i4,i
1+2i2+3i3+4i4=p.
Then it is possible to substitute Algorithm 3 by a new algorithm that
computes β(mk)
min using (15) and (16) with norm estimations of matrix powers
[13] for the corresponding di.ForacompleteMATLABimplementationsee
the nested function ms selectNormEst from function cosmtay.m available in
http://personales.upv.es/jorsasma/Software/cosmtay.m.
Function cosmtay.m also implements the option without norm estimation:
The MATLAB implementation of Algorithm 2 can be seen in the nested
function ms selectNoNormEst and Algorithm 3 in beta NoNormEst from
cosmtay.m.BothfunctionsareslightlydifferentfromAlgorithms2and3to
be compatible with the version with norm estimation ms selectNoNormEst.
3. Numerical experiments
In this section we compare the MATLAB functions cosmtay,cosm and
costaym:
Function cosmtay(A,NormEst) (http://personales.upv.es/jorsasma/
software/cosmtay.m), is the MATLAB implementation of Algorithm 1 with
determination of the order mand the scaling parameter susing the 1-
norm estimator [13] (NormEst=1 corresponding in cosmtay to the nested
function ms selectNormEst)withcosm [5, Alg. 4.2] (http://github.com/
sdrelton/cosm_sinm). The tests using Algorithm 2 for determining mand
s(NormEst=0 in cosmtay(A,NormEst) corresponding to the nested function
14
ms selectNoNormEst in cosmtay)gavesimilaraccuracyresultsandarela-
tive increase of the number of matrix products less than or equal to 3% in
tests, therefore we omitted them. We also noted that for small matrices the
cost of the norm estimation algorithm is not negligible compared to a ma-
trix product. Therefore, the algorithm using no norm estimation is typically
more efficient. For large matrices the norm estimation algorithm is negligi-
ble, and then, the algorithm with norm estimation is faster when it really
saves matrix products.
The MATLAB function cosm has an argument which allows to compute
cos(A)bymeansofjustPad´eapproximants,oralsousingtherealSchur
decomposition and the complex Schur decomposition. For this function, we
have used Pad´e approximants with and without real Schur decomposition,
denoted by cosmSchur and cosm,respectively.
Finally, function costaym is the MATLAB implementation of Algorithm
1from[6](http://personales.upv.es/jorsasma/software/costaym.m ).
In tests we used MATLAB (R2014b) running on an Intel Core 2 Duo
processor at 3.00 GHz with 4 GB main memory. The following tests were
made:
•Te st 1: 1 00 d ia go n al iz a bl e 1 2 8 ×128 real matrices with real and complex
eigenvalues and 1-norms varying from 2.32 to 220.04.
•Test 2: 100 non diagonalizable 128 ×128 real matrices with eigenval-
ues whose algebraic multiplicity vary between 1 and 128 and 1-norms
varying from 5.27 to 21.97.
•Test 3: Sevent e e n m a t r i c e s with dimensions lower than or equal to 128
from the Eigtool MATLAB package [14], twenty eight matrices from the
matrix function literature with dimensions lower than or equal to 128,
fifty one 128×128 real matrices from the function matrix of the Matrix
Computation Toolbox [15] and fifty two 8 ×8realmatricesobtained
scaling the matrices from the Matrix Computation Toolbox [15] such
that their norms vary between 0.000145 and 0.334780. This last group
of matrices sized 8 ×8wereusedfortestingspecificallytheforward
error analysis from Section 2.1, since for those matrices the lower orders
m=1,2,4,6, were only used.
•Test 4: Fifty matrices from the semidiscretization of the wave e q u a t i o n
from [5, Sec. 7.5] [16, Problem 4].
15
The “exact” matrix cosine was computed exactly for the matrices of Tests
1and2. Following[6,Sec. 4.1],fortheothermatricesweusedMATLAB
symbolic versions of a scaled Pad´e rational approximation from [5] and a
scaled Taylor Paterson-Stockmeyer approximation (2), both with 4096 deci-
mal digit arithmetic and several orders mand/or scaling parameters shigher
than the ones used by cosm and cosmtay,respectively,checkingthattheir
relative difference was small enough. The algorithm accuracy was tested by
computing the relative error
E=∥cos(A)−˜
Y∥1
∥cos(A)∥1
,
where ˜
Yis the computed solution and cos(A)istheexactsolution.
To compare the relative errors of the functions, we plotted in Figure 1 t h e
performance profiles and the ratio of relative errors E(cosm)/E(cosmtay)and
E(cosmSchur)/E(cosmtay)forthethreetests. Intheperformanceprofile,
the αcoordinate varies between 1 and 5 in steps equal to 0.1, and the pcoordi-
nate is the probability that the considered algorithm has a relative error lower
than or equal to α-times the smallest error over all the methods. The ratios
of relative errors are presented in decreasing order of E(cosm)/E(cosmtay).
The results were:
•Figure 1 shows the performance profile and the relative error ratios
giving the accuracy of the tested functions. Figures 1a, 1c and 1e show
that the most accurate functions in Tests 1, 2 and 3 were costaym from
[6] and cosmtay.InTest1functioncostaym was slightly more accurate
than cosmtay and this function was more accurate than cosm for 96 of
the 100 matrices of Test 1 and more accurate than cosmSchur for all
of them (see Figure 1b). The graph of cosmSchur does not appear in
Figure 1a because for all matrices of Test 1 the relative error of this
function was greater than 5 times the error by the other functions. In
Test 2 cosmtay was the most accurate function, being more accurate
than cosm for 93 of the 100 matrices and more accurate than cosmSchur
for 98 matrices (see Figure 1d). Finally, in Text 3 costaym from [6]
was the most accurate function, and cosmtay was more accurate than
cosm for 114 of the 135 matrices of Test 3 and more accurate than
cosmSchur for 109 matrices of that test (see Figure 1f).
•The ratios of flops from Figure 2 show that the computational costs
of cosmtay are always lower than cosm,cosmSchur and costaym.In
16
the majority of matrices of Test 3, the ratios of flops of cosmSchur
and cosmtay are between 2 and 4 and the ratios of flops of cosm and
cosmtay are between 1 and 2 (Figures 2a, 2c and 2e). In the majority
of matrices of Test 3, the execution time ratios of cosm and cosmtay are
between 1 and 5, the ratios of cosmSchur and cosmtay are greater than
5(Figures2b,2dand2f),andtheexecutiontimeratiosof costaym and
cosmtay are between 1 and 2. Then, cosmtay provided always a lower
cost than costaym but giving a lower accuracy for some test matrices.
•Test 4: Section 7.5 from [5] showed that for the matrices which appear
in a wave equation problem the version of cosm using Schur decompo-
sition (cosmSchur)wasincreasinglymoreaccuratethantheMATLAB
function costay for the biggest matrix dimensions in that test. This
function was our first MATLAB implementation for computing the ma-
trix cosine [4]. In Test 4 we compared the MATLAB implementations
cosmSchur,cosm,cosmtay and cosmtaySchur for the biggest matrix
size given in Section 7.5 from [5]. cosmtaySchur was based on the
real Schur decomposition given by a modified implementation of Algo-
rithm 4.2 from [5], where the Algorithm 1 is used for computing the
cosine of the real Schur matrix of matrix A.Figure3showsthatthe
accuracy of cosmSchur and cosmtaySchur are similar, and both im-
plementations are more accurate than the other implementations not
based on the real Schur of a matrix. In [5] the authors claimed that
costay had signs of instability. Note that cosm without Schur decom-
position also shows signs of instability, even greater than those from
cosmtay.WehavetestedthatfunctioncosmtaySchur is more accurate
than cosmSchur for 28 of the 50 matrices of Test 4 and less accurate
for 22 matrices of that test. Anyway, the differences are negligible
and the main result from this test is that algorithms cosmtay,cosm,
cosmtaySchur,cosmSchur had cost 1100, 1200, 1800 and 1900 matrix
products, respectively. Therefore, cosmtay is 8.33% more efficient than
cosm [5], and cosmtaySchur is an 5.26% more efficient than cosmSchur
[5].
We have used cosmtay with the 1-norm estimator (cosmtay(A,NormEst)
with NormEst=1) in the four tests above, because the results obtained with
the variant that does not use the 1-norm estimator are similar in accuracy
and we found that the computational cost in terms of matrix products for
the implementation that uses the 1-norm estimator is only 1.90%, 1.85%,
17
3.00% and 0% lower than the implementation without estimation in Tests 1,
2, 3 and 4, respectively. However, the execution time was greater due to the
overhead of using the 1-norm estimator, since the matrix sizes are not large
enough so that the cost of the estimation (O(n2)forn×nmatrices [13]) is
negligible compared to the cost of matrix products (O(n3)).
4. Conclusions
In this work two accurate Taylor algorithms have been proposed to com-
pute the matrix cosine. These algorithms use the scaling technique based on
the double angle formula of the cosine function, the Paterson-Stockmeyer’s
method for computing the Taylor approximation, and new forward and back-
ward relative error bounds for the matrix cosine Taylor approximation, which
allow to calculate the optimal scaling parameter and the optimal order of the
Tay l o r a p p r ox i m a t i on. T h e two algorithms differ only in the use or not of
the 1-norm estimation of norms of matrix powers [13]. The algorithm with
no norm estimation uses Theorem 1 using the norms of matrix powers used
for computing the matrix cosine to obtain bounds on the norms of matrix
powers involved, giving in tests small relative cost differences in terms of ma-
trix products with the version with norm estimation. The accuracy of both
algorithms is similar, and the norm estimation algorithm has a cost negligi-
ble only for large matrices. Therefore, we recommend using the algorithm
with no norm estimation for small matrices, and the algorithm with norm
estimation for large matrices.
The MATLAB implementation that uses estimation was compared with
other state-of-the-art MATLAB implementations for matrices sized up to
128 ×128 (analogous results were obtained with the other implementation).
Numerical experiments show in general that our Taylor implementations have
higher accuracy and less cost than the Pad´e state-of-the-art implementation
cosm from [5] in the majority of tests. In particular, when the real Schur
decomposition was used the ratio of flops between cosmSchur and cosmtay
was flops(cosmSchur)/flops(cosmtay)>5forsomematrices,andusingthe
Schur decomposition in our algorithms gave the same accuracy results as
cosmSchur with less cost. Numerical experiments also showed that function
cosmtay was slightly less accurate than costaym from [6] in some tests but
cosmtay provided always a lower computational cost.
18
1 1.5 2 2.5 3 3.5 4 4.5 5
p
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
cosm
cosmSchur
costaym
cosmtay
(a) Perfomance profile Test 1.
10 20 30 40 50 60 70 80 90 100
Er
100
101
102
E(cosm)/E(cosmtay)
E(cosmSchur)/E(cosmtay)
E(costaym)/E(cosmtay)
(b) Ratio of relative errors Test 1.
1 1.5 2 2.5 3 3.5 4 4.5 5
p
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
cosm
cosmSchur
costaym
cosmtay
(c) Perfomance profile Test 2.
10 20 30 40 50 60 70 80 90 100
Er
100
101
E(cosm)/E(cosmtay)
E(cosmSchur)/E(cosmtay)
E(costaym)/E(cosmtay)
(d) Ratio of relative errors Test 2.
1 1.5 2 2.5 3 3.5 4 4.5 5
p
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
cosm
cosmSchur
costaym
cosmtay
(e) Perfomance profile Test 3.
20 40 60 80 100 120
Er
10-5
100
105
1010
E(cosm)/E(cosmtay)
E(cosmSchur)/E(cosmtay)
E(costaym)/E(cosmtay)
(f) Ratio of relative errors Test 3.
Figure 1: Accuracy in Tests 1, 2 an 3.
19
Matrix
0 20 40 60 80 100
Flops ratio
1
1.5
2
2.5
3
3.5
4Flops of cosm/Flops of cosmtay
Flops of cosmSchur/Flops of cosmtay
Flops of costaym/Flops of cosmtay
(a) Ratio of flops Test 1.
Matrix
0 20 40 60 80 100
Ratio of execution times
1
1.5
2
2.5
3
3.5
4T(cosm)/T(cosmtay)
T(cosmSchur)/T(cosmtay)
T(costaym)/T(cosmtay)
(b) Ratio of execution times Test 1.
Matrix
0 20 40 60 80 100
Flops ratio
1
1.5
2
2.5
3
3.5 Flops of cosm/Flops of cosmtay
Flops of cosmSchur/Flops of cosmtay
Flops of costaym/Flops of cosmtay
(c) Ratio of flops Test 2.
Matrix
0 20 40 60 80 100
Ratio of execution times
1
1.5
2
2.5
3
3.5 T(cosm)/T(cosmtay)
T(cosmSchur)/T(cosmtay)
T(costaym)/T(cosmtay)
(d) Ratio of execution times Test 2.
Matrix
0 20 40 60 80 100 120
Flops ratio
100
101
Flops of cosm/Flops of cosmtay
Flops of cosmSchur/Flops of cosmtay
Flops of costaym/Flops of cosmtay
(e) Ratio of flops Test 3.
Matrix
0 20 40 60 80 100 120
Ratio of execution times
100
101
T(cosm)/T(cosmtay)
T(cosmSchur)/T(cosmtay)
T(costaym)/T(cosmtay)
(f) Ratio of execution times Test 3.
Figure 2: Computational costs in Tests 1, 2 an 3.
20
matrix
0 10 20 30 40 50
Er
10-11
10-10
10-9
cond*u
cosm
cosmSchur
cosmtay
cosmtaySchur
(a) Normwise relative error Test 4.
5 10 15 20 25 30 35 40 45 50
Er
100
101
E(cosm)/E(cosmtaySchur)
E(cosmSchur)/E(cosmtaySchur)
E(cosmtay)/E(cosmtaySchur)
(b) Ratio of relative errors Test 4.
Figure 3: Accuracy in Test 4.
Acknowledgments
The authors are very grateful to the anonymous referees, whose comments
greatly improved this paper.
[1] S. Serbin, Rational approximations of trigonometric matrices with ap-
plication to second-order systems of differential equations, Appl. Math.
Comput. 5 (1) (1979) 75–92.
[2] S. M. Serbin, S. A. Blalock, An algorithm for computing the matrix
cosine, SIAM J. Sci. Statist. Comput. 1 (2) (1980) 198–204.
[3] E. Defez, J. Sastre, J. J. Ib´a˜nez, P. A. Ruiz, Computing matrix func-
tions arising in engineering models with orthogonal matrix polynomials,
Math. Comput. Model. 57 (7-8) (2013) 1738–1743.
[4] J. Sastre, J. Ib´a˜nez, P. Ruiz, E. Defez, Efficient computation of the
matrix cosine, Appl. Math. Comput. 219 (2013) 7575–7585.
[5] A. H. Al-Mohy, N. J. Higham, S. D. Relton, New algorithms for com-
puting the matrix sine and cosine separately or simultaneously, SIAM
J. Sci. Comput. 37 (1) (2015) A456–A487.
21
[6] P. Alonso, J. Ib´a˜nez, J. Sastre, J. Peinado, E. Defez, Efficient and accu-
rate algorithms for computing matrix trigonometric functions, J. Com-
put. Appl. Math. 309 (2017) 325–332. doi:10.1016/j.cam.2016.05.015.
URL http://dx.doi.org/10.1016/j.cam.2016.05.015
[7] M. S. Paterson, L. J. Stockmeyer, On the number of nonscalar multi-
plications necessary to evaluate polynomials, SIAM J. Comput. 2 (1)
(1973) 60–66.
[8] C. Tsitouras, V. N. Katsikis, Bounds for variable degree rational L∞ap-
proximations to the matrix cosine, Computer Physics Communications
185 (11) (2014) 2834–2840.
[9] N. J. Higham, Functions of Matrices: Theory and Computation, SIAM,
Philadelphia, PA, USA, 2008.
[10] G. H. Golub, C. V. Loan, Matrix Computations, 3rd Edition, Johns
Hopkins Studies in Mathematical Sciences, The Johns Hopkins Univer-
sity Press, 1996.
[11] J. Sastre, J. J. Ib´a˜nez, E. Defez, P. A. Ruiz, Efficient scaling-squaring
Taylor method for computing matrix exponential, SIAM J. Sci. Comput.
37 (1) (2015) A439–A455.
[12] P. Ruiz, J. Sastre, J. Ib´a˜nez, E. Defez, High perfomance computing of
the matrix exponential, J. Comput. Appl. Math. 291 (2016) 370–379.
[13] J. Higham, F. Tisseur, A block algorithm for matrix 1-norm estimation,
with an application to 1-norm pseudospectra, SIAM J. Matrix Anal.
Appl. 21 (2000) 1185–1201.
[14] T. G. Wright, Eigtool, version 2.1 (2009).
URL web.comlab.ox.ac.uk/pseudospectra/eigtool.
[15] N. J. Higham, The Test Matrix Toolbox for MATLAB, Numerical Anal-
ysis Report No. 237, Manchester, England (Dec. 1993).
[16] J. M. Franco, New methods for oscillatory systems based on
arkn methods, Appl. Numer. Math. 56 (8) (2006) 1040–1053.
doi:10.1016/j.apnum.2005.09.005.
URL http://dx.doi.org/10.1016/j.apnum.2005.09.005
22