Content uploaded by Jorge Sastre Martínez

Author content

All content in this area was uploaded by Jorge Sastre Martínez on Oct 05, 2017

Content may be subject to copyright.

Two algorithms for computing the matrix cosine

function ✩

Jorge Sastre†,JavierIb´a˜nez

♮,PedroAlonso

‡,Jes´usPeinado

♮,EmilioDefez

⋆

†Instituto de Telecomunicaciones y Aplicaciones Multimedia.

♮Instituto de Instrumentaci´on para Imagen Molecular.

‡Dept. of Information Systems and Computation.

⋆Instituto de Matem´atica Multidisciplinar.

Universitat Polit`ecnica de Val`encia, Camino de Vera s/n, 46022, Valencia, Espa˜na.

jsastrem@upv.es, jjibanez@dsic.upv.es, palonso@dsic.upv.es, jpeinado@dsic.upv.es,

edefez@imm.upv.es

Abstract

The computation of matrix trigonometric functions has received remarkable

attention in the last decades due to its usefulness in the solution of systems of

second order linear diﬀerential equations. Several state-of-the-art algorithms

have been provided recently for computing these matrix functions. In this

work we present two eﬃcient algorithms based on Taylor series with forward

and backward error analysis for computing the matrix cosine. A MATLAB

implementation of the algorithms is compared to state-of-the-art algorithms,

with excellent performance in both accuracy and cost.

Keywords: matrix cosine, scaling and recovering method, Taylor series,

forward error analysis, backward error analysis, MATLAB.

1. Introduction

Many engineering processes are described by second order diﬀerential

equations, whose solution is given in terms of the trigonometric matrix func-

tions sine and cosine. Examples arise in the spatial semi-discretization of

✩e-mail: jjibanez@dsic.upv.es. This work has been supported by Spanish Ministerio

de Econom´ıa y Competitividad and the European Regional Development Fund (ERDF)

grant TIN2014-59294-P.

Preprint submitted to Applied Mathematical Modelling. February 21, 2017

the wave equation or in mechanical systems without damping, where their

solutions can be expressed in terms of integrals involving the matrix sine and

cosine [1, 2]. Several state-of-the-art algorithms have been provided recently

for computing these matrix functions using polynomial and rational approxi-

mations with scaling and recovering techniques [3, 4, 5, 6]. In order to reduce

computational costs Paterson-Stockmeyer method [7] is used to evaluate the

matrix polynomials arising in these approximations.

In the Taylor algorithm proposed in [4] we used sharp absolute forward

error bounds. In the Taylor algorithm proposed in [6] we improved the previ-

ous algorithm using relative error bounds based on backward error bounds of

the matrix exponentials involved in cos(A). Those error bounds do not guar-

antee that the cosine backward error bound in exact arithmetic is less than

the unit roundoﬀin double precision arithmetic [6, Sec. 2]. However, accord-

ing to the tests, that algorithm improved the accuracy with respect to the

previous Taylor algorithm at the expense of some increase in cost (measured

in ﬂops). The algorithm proposed in [6] was also superior in both accuracy

and cost to the version of the scaling and recovering Pad´e state-of-the-art

algorithm in [5] not using the Schur decomposition.

Other algorithms based on approximations on L∞for normal and non-

negative matrices have been presented recently in [8]. In this work we focus

on general matrices and algorithms using approximations at the origin. We

present two algorithms based on Taylor series that use Theorem 1 from [4]

for computing the matrix cosine. We provide relative forward and backward

error analysis for the matrix cosine Taylor approximation that improves even

more the comparison to the algorithm in [5] with and without Schur decom-

position in both accuracy and cost tests.

Throughout this paper Cn×ndenotes the set of complex matrices of size

n×n,Ithe identity matrix for this set, ρ(X)thespectralradiusofmatrixX,

and Nthe set of positive integers. In this work we use the 1-norm to compute

the actual norms. This paper is organized as follows. Section 2 presents a

Tay l o r a l g o r i thm for computing the matrix cosine function . S e c t i o n 3 d e a l s

with numerical tests and, ﬁnally, Section 4 gives some conclusions.

2

2. Algorithms for computing matrix cosine

The matrix cosine can be deﬁned for all A∈Cn×nby

cos(A)=

∞

!

i=0

(−1)iA2i

(2i)! ,

and let

T2m(A)=

m

!

i=0

(−1)iBi

(2i)! ≡Pm(B),(1)

be the Taylor approximation of order 2mof cos(A), where B=A2.Since

Tay l o r s e r i es are accurate only near the origin, i n a l g o r i t h m s t h a t u s e t h i s

approximation the norm of matrix Bis reduced by scaling the matrix. Then,

aTaylorapproximationiscomputed,andﬁnallytheapproximationofcos(A)

is recovered by means of the double angle formula cos(2X)=2cos

2(X)−I.

Algorithm 1 shows a general algorithm for computing the matrix cosine based

on Taylor approximation. By using the fact that sin(A)=cos(A−π

2I),

Algorithm 1 also can be easily used to compute the matrix sine.

Algorithm 1 Given a matrix A∈Cn×n,thisalgorithmcomputesC=

cos(A)byTaylorseries.

1: Select adequate values of mand s◃Phase I

2: B=4

−sA2

3: C=Pm(B)◃Phase II: Compute Taylor approximation

4: for i=1:sdo ◃Phase III: Recovering cos(A)

5: C=2C2−I

6: end for

In Phase I of Algorithm 1, mand smust be calculated so that the Taylor

approximation of the scaled matrix is computed accurately and eﬃciently.

In this phase some powers Bi,i≥2, are usually computed for estimating m

and sand if so they are used in Phase II.

Phase II consists of computing the Taylor approximation (1). For clarity

of the exposition we recall some results summarized in [6, Sec. 2]. Tay-

lor matrix polynomial approximation (1), expressed as Pm(B)="m

i=0 piBi,

B∈Cn×n,canbecomputedwithoptimalcostbythePaterson-Stockmeyer’s

method [7] choosing mfrom the set

M={1,2,4,6,9,12,16,20,25,30,36,42,...},

3

where the elements of Mare denoted as m1,m

2,m

3,... The algorithm com-

putes ﬁrst the powers Bi,2≤i≤qnot computed in the previous phase,

being q=#√mk$or q=⌊√mk⌋an integer divisor of mk,k≥1, both val-

ues giving the same cost in terms of matrix products. Therefore, (1) can be

computed eﬃciently as

Pmk(B)= (2)

(((pmkBq+pmk−1Bq−1+pmk−2Bq−2+···+pmk−q+1B+pmk−qI)Bq

+pmk−q−1Bq−1+pmk−q−2Bq−2+···+pmk−2q+1 B+pmk−2qI)Bq

+pmk−2q−1Bq−1+pmk−2q−2Bq−2+···+pmk−3q+1 B+pmk−3qI)Bq

...

+pq−1Bq−1+pq−2Bq−2+···+p1B+p0I.

Table 1 (page 10) shows the values of qfor diﬀerent values of m.From

Table 4.1 from [9, p. 74] the cost of computing (1 ) with (2) is Πmk=k

matrix products, k=1,2,...

Finally, Phase III is necessary to obtain the cosine of matrix Afrom

cos(4−sB)computedpreviouslyinPhaseII.Ifmkis the order used and

sis the scaling parameter, then the computational cost of Algorithm 1 is

2(k+s)n3ﬂops, and the storage cost is (2 + qk)n2.

The diﬃculty of Algorithm 1 is to ﬁnd appropriate values of mkand s

such that cos(A)iscomputedaccuratelywithminimumcost. Forthat,in

the following sections we will use Theorem 1:

Theorem 1 ([4]). Let hl(x)="

i≥l

pixibe a power series with radius of con-

vergence w,˜

hl(x)="

i≥l|pi|xi,B∈Cn×nwith ρ(B)<w,l∈Nand t∈N

with 1!t!l.Ift0is the multiple of tsuch that l!t0!l+t−1and

βt=max{d1/j

j:j=t, l, l +1,...,t

0−1,t

0+1,t

0+2,...,l+t−1},

where djis an upper bound for ||Bj||,dj"||Bj||, then

||hl(B)|| !˜

hl(βt).

2.1. Relative forward error in Taylor approximation

The following proposition gives a necessary and suﬃcient condition for

the existence of cos−1(A).

4

Proposition 1. Let Abe a matrix in Cn×nand let B=A2.If∥B∥<a

2

where a=acosh(2)then cos (A)is invertible.

Proof. Since

∥I−cos(A)∥=%

%

%

%

%"

k≥1

(−1)kA2k

(2k)! %

%

%

%

%≤"

k≥1∥A2∥k

(2k)! <"

k≥1

a2k

(2k)! =cosh(a)−1=1,

then, by applying Lema 2.3.3 from [10, pp. 58], we obtain that I−(I−

cos(A)) = cos(A)isinvertible. #

Using Proposition 1, if

∥B∥=∥A2∥<acosh2(2) ≈1.7343,(3)

then cos−1(A)existsanditfollowsthattherelativeforwarderrorofcomput-

ing cos(A)bymeansof(1),denotedbyEf,is

Ef=%

%cos−1(A)(cos(A)−T2m(A))%

%=%

%

%

%

%!

i≥m+1

e(2m)

iA2i%

%

%

%

%

=%

%

%

%

%!

i≥m+1

e(2m)

iBi%

%

%

%

%

,

where the coeﬃcients e(2m)

idepend on Taylor approximation order 2m.Ifwe

deﬁne gm+1(x)= "

i!m+1

e(2m)

ixiand ˜gm+1(x)= "

i≥m+1 &&&e(2m)

i&&&xi,andweapply

Theorem 1, then

Ef=∥gm+1(B)∥≤˜gm+1(β(m)

t),(4)

for every t,1≤t≤m+1. Following [4, Sec. 5.1], in (4) we denote by β(m)

t

the corresponding value of βtfrom Theorem 1 for order m,andfromnowon

we will use that nomenclature.

Let Θmbe

Θm=max'θ≥0: !

i≥m+1 &&&e(2m)

i&&&θi≤u(,(5)

where u=2

−53 is the unit roundoﬀin double precision ﬂoating-point arith-

metic. We have used MATLAB Symbolic Math Toolbox to evaluate

"

i≥m+1 &&&e(2m)

i&&&θifor each min 250-digit decimal arithmetic, adding the ﬁrst

5

250 series terms with the coeﬃcients obtained symbolically. Then, a numer-

ical zero-ﬁnder is invoked to determine the highest value of Θmsuch that

"

i≥m+1 &&&e(2m)

i&&&Θi

m≤uholds. For this analysis to hold it is necessary that

cos(A)isinvertible. Hence,ifcondition(3)holdsandβ(m)

t≤Θm,then

Ef≤u.

Some values of Θmare given in Table 1.

2.2. Backward error in Taylor approximation

In [5], a backward error analysis is made for computing sine and cosine

matrix functions. For each matrix function, two analysis were made. In the

cosine function, the ﬁrst one is based on considering the function

h2m(x):=arccos(rm(x)) −x,

where rm(x)isthe[m/m]Pad´eapproximanttothecosinefunction,andthe

authors conclude that diﬀerent restrictions make this analysis unusable [5,

Sec. 2.2]. We checked that an error analysis for the matrix cosine Tay-

lor approximation similar to that on [5, Sec. 2.2] yields analogous results.

Therefore, in order to calculate the backward error ∆Xof approximating

cos(X)byTaylorpolynomialT2m(X)suchthat

T2m(X)=cos(X+∆X),(6)

we propose a diﬀerent approach that holds for any matrix X∈Cr×rand

uses the following result whose proof is trivial:

Lemma 1. If Aand Bare matrices in Cr×rand AB =BA then

cos(A+B)=cos(A)cos(B)−sin(A)sin(B).(7)

Note that the backward error ∆Xfrom (6) is a holomorphic function of X

and then X∆X=∆XX.Therefore,using(6)andLemma1

T2m(X)=cos(X+∆X)=cos(X)cos(∆X)−sin(X)sin(∆X)

=cos(X)!

i≥0

(−1)i∆X2i

(2i)! −sin(X)!

i≥0

(−1)i∆X2i+1

(2i+1)! .

6

Hence,

cos(X)−T2m(X)= !

i≥m+1

(−1)iX2i

(2i)! (8)

= sin(X)!

i≥0

(−1)i∆X2i+1

(2i+1)! −cos(X)!

i≥1

(−1)i∆X2i

(2i)! ,

and consequently the backward error ∆Xcan be expressed by

∆X=!

i≥m

c(2m)

iX2i+1,(9)

where coeﬃcients c(2m)

idepend on Taylor approximation order 2m,and∆X

commutes with X.Notethatanexpressionsimilarto(8)canbeobtained

for other approximations of the matrix cosine such as Pad´e approximation.

Using (8) and (9) it follows that

sin(X)∆X+O(X4m+2)= !

i≥m+1

(−1)iX2i

(2i)!.(10)

Hence, coeﬃcients c(2m)

i,i=m, m+1,...,2m−1canbecomputedobtaining

symbolically the Taylor series of sin(X)∆Xfrom the left-hand side of (10)

and solving the system of equations that arise when equating the coeﬃcients

of X2i,i=m+1,m +2,...,2m,frombothsidesof(10)usingfunction

solve from the MATLAB Symbolic Math Toolbox.

Analogously, using (8) and (9) it follows that

sin(X)∆X+cos(X)∆X2

2! +O(X6m+4)= !

i≥m+1

(−1)iX2i

(2i)!,(11)

and coeﬃcients c(2m)

i,i=2m, 2m+1,...,3m+1, can be calculated by using

coeﬃcients c(2m)

i,i=m, m +1,...,2m−1, obtained previously, computing

symbolically the Taylor series of sin(X)∆X+cos(X)∆X2

2! in the left-hand

side of (11) and solving the system of equations that arise when equating

the coeﬃcients of X2i,i=2m+1,2m+2,...,3m+1, from both sides of

(11). By proceeding analogously, c(2m)

i,i>3m+1can be computed. Then,

7

we compute the relative backward error of approximating cos(A)byT2m(A),

denoted by Eb,as

Eb=∥∆A∥

∥A∥=%

%

%

%"

i≥m

c(2m)

iA2i+1 %

%

%

%

∥A∥≤%

%

%

%

%!

i≥m

c(2m)

iA2i%

%

%

%

%

=%

%

%

%

%!

i≥m

c(2m)

iBi%

%

%

%

%

.

If we deﬁne hm(x)= "

i!m

c(2m)

ixiand ˜

hm(x)= "

i≥m&&&c(2m)

i&&&xi,andweapply

Theorem 1, then

Eb≤∥hm(B)∥≤˜

hm(β(m)

t).(12)

Let ¯

Θmbe

¯

Θm=max'θ≥0: !

i≥m&&&c(2m)

i&&&θi≤u(.(13)

For computing ¯

Θm,wehaveusedMATLABSymbolicMathToolboxtoevalu-

ate "

i≥m&&&c(2m)

i&&&θiin 250-digit decimal arithmetic for each madding a diﬀerent

number of series terms depending on mwith the coeﬃcients obtained sym-

bolically, and a numerical zero-ﬁnder was invoked to determine the highest

value of ¯

Θmsuch that "

i≥m&&&c(2m)

i&&&

¯

Θi

m≤uholds. We have checked that for

m=1,2,4,6,9,12 the values of ¯

Θmobtained in double precision arithmetic

do not vary if we take more than 256 series terms (and even fewer terms for

the lower orders).

Note that the values ¯

Θ1=2.66 ·10−15 ,¯

Θ2=2.83 ·10−7,¯

Θ4=4.48 ·10−3

and ¯

Θ6=1.45 ·10−1,presentedwithtwosigniﬁcantdigits,arelowerthan

the corresponding Θmvalues, m=1,2,4,6, from Table 1, for the forward

error analysis, respectively.

On the other hand, considering 1880 and 1946 series terms for m=

16 the corresponding ¯

Θ16 values have a relative diﬀerence of 4 .0010 ·10−4.

Considering 1880 and 1980 series terms for m=20thecorresponding ¯

Θ20

values have a relative diﬀerence of 1 .6569 ·10−3.Theprocessofcomputing

those values for so many terms was very time consuming and we took as

ﬁnal values the ones shown in Table 1 corresponding to 1946 series terms for

m=16and1980seriestermsform=20.

With the ﬁnal selected values of ¯

Θmin Table 1 it follows that if β(m)

t≤

¯

Θm,thenrelativebackwarderrorislowerthantheunitroundoﬀindouble

8

precision ﬂoating-point arithmetic, i.e.

Eb≤ufor mk=9,12,and Eb$ufor mk=16,20.

2.3. Backward error in double angle formula of the matrix cosine

We are int e r e sted in the backwa r d e r r o r i n P h a s e II I o f A l g o r ithm 1. I n t h e

previous section we have shown that it is possible to obtain a small backward

error in the Taylor approximation of the matrix cosine. As in [5, Sec. 2.3], it

can be shown that the backward error propagates linearly through the double

angle formula. A result for the backward error similar to Lemma 2.1 from [5]

can be obtained for polynomial approximations of the matrix cosine.

Lemma 2. Let A∈Cn×nand X=2

−sA,sinteger non negative, and sup-

pose that t(X)=cos(X+∆X)for a polynomial function t. Then the approx-

imation Yby applying the double angle formula satisﬁes Y=cos(A+∆A)

in exact arithmetic, and hence

∥∆A∥

∥A∥=∥∆X∥

∥X∥.

Proof. The proof is similar to that given in Lemma 2.1 from [5].

Lemma 2 shows that if we choose mand s,suchthat∥∆X∥/∥X∥≤u,

with X=4

−sA,thenifs≥1thetotalbackwarderrorinexactarithmetic

after Phase III of Algorithm 1 is bounded by u,producingnoerrorgrowth.

2.4. Determining the values of the Taylor approximation order mand the

scaling parameter s

Since Θ6≃0.1895 <acosh2(2) ≃1.7343 <¯

Θ9≃1.7985, see (3), and the

values Θmkfor the forward error analysis and mk≤6aregreaterthanthe

corresponding ¯

Θmkvalues for the backward error analysis, we use the relative

forward analysis for mk≤6, and the relative backward analysis for mk≥9.

Therefore, by Theorem 1, for mk=1,2,4,6, if there exists tsuch that

β(mk)

t≤Θmk,onegetsthat(2.1)holds,andformk=9,12,16,20, if there

exists tsuch that β(mk)

t≤¯

Θmkit follows that (2.2) holds. Table 1 shows

the values Θmk,mk=1,2,4,6and ¯

Θmk,mk=9,12,16,20. For simplicity of

notation, from now on we will denote by Θmkboth Θmk,mk≤6and ¯

Θmk,

mk≥9.

The selection of the order mand scaling parameter sis as follows. If there

exists tand mksuch that β(mk)

t≤Θ9then it is not necessary to scale Band

9

Table 1: Va l ues of Θmk(forward analysis), ¯

Θmk(backward analysis) and qk

used to compute (1) by Paterson-Stockmeyer method (2).

km

kqkΘmkkm

kqk¯

Θmk

1115.161913593731081e-8 5 9 3 1.798505876916759

222

4.307691256676447e-5 6 12 4 6.752349007371135

342

1.319680929892753e-2 7 16 4 9.971046342716772

463

1.895232414039165e-1 8 20 5 10.177842844012551

the Taylor approximation order mk,wherek=min

)k:β(mk)

t≤Θmk*is

selected, i.e. the order mkproviding the minimum cost. Since in this case no

scaling is applied then the double angle formula of Phase III of Algorithm 1

is not applied. Else, we scale the matrix Bby the scaling parameter

s=max'0,+1

2log2,β(mk)

t

Θmk-.(,m

k∈{9,12,16},(14)

such that the matrix cosine is computed with minimum cost. Lemma 2

ensures that the backward error propagates linearly through the double angle

formula in exact arithmetic if s≥1. Following [11, Sec. 3.1], the explanation

for the minimum and maximum orders mkto be used in (14) for scaling

(giving the minimum cost) is as follows: Since Θ9/4>Θ6and Θ9·4>Θ12

the minimum order to select for scaling is m=m5=9. Ontheotherhand,

since Θ20/4<Θ12 and Θ16 /4>Θ9,if||B|| >Θ9the maximum order to

select for scaling is m=m7=16. Following[11, Sec. 3.1] theﬁnalselection

of mkis the maximum order mk∈{9,12,16}giving also the minimum cost.

This selection provides the minimum scaling parameter sover all selections

of mkthat provide the minimum cost.

Then the Taylor approximation of order mkof cos(B/4s)iscomputed,

and if s≥1therecoveringPhaseIIIofAlgorithm1isapplied.

For computing the parameters β(mk)

tfor 1 ≤mk≤16, from Theorem 1,

it is necessary to calculate upper bounds dkfor ||Bk||.Wehavedeveloped

two diﬀerent algorithms to obtain the order mkand the scaling parameter

s.Algorithm2usesanestimationβ(mk)

min of the minimum of the values β(mk)

t

from Theorem 1 obtained using Algorithm 3. In order to calculate the upper

bounds dkof ||Bk|| for obtaining β(mk)

min Algorithm 3 uses only products of

norms of matrix powers previously computed in Algorithm 2, i.e. ∥Bi∥,

10

i≤4. For instance, note that in order to compute β(m)

2for the relative

forward error bound and m=2,using(1)weneedonlyboundsd1/2

2and

d1/3

3.FromTable1form=2onegetsq=q2=2andBi,i=1,2, are

available, and we take d1=∥B∥and d2=∥B2∥.Since||B2||1/2≤||B||

it follows that β(2)

min =β(2)

2=max{d1/2

2,(d2d1)1/3=(d2d1)1/3,(Step3of

Algorithm 3). Something similar happens with β(4)

2,resultinginβ(4)

min =

β(4)

2=max{d1/2

2,(d2

2d1)1/5}=(d2

2d1)1/5,(Step5ofAlgorithm3).

Using Table 1, for m=6onegetsq=q4=3,beingnowalsoavailable

B3,andwetaked3=∥B3∥.Using(1),ifd1/2

2≤d1/3

3we select β(6)

min =β(6)

2=

max{d1/2

2,(d2

2d3)1/7}=(d2

2d3)1/7,(Step8ofAlgorithm3).Else,weselect

β(6)

min =β(6)

3=max{d1/3

3,min{d2

2d3,d

1d2

3}1/7,(d2

3d2)1/8}

=max{min{d2

2d3,d

1d2

3}1/7,(d2

3d2)1/8},

(Step 10 of Algorithm 3). Value β(mk)

min is obtained analogously for mk=

9,12,16 (Steps 12-37 of Algorithm 3).

11

Algorithm 2 Given a matrix A∈Cn×n,thisalgorithmdeterminestheorder

m,scalingparameters,andpowersofB=A2needed for computing Taylor

approximation of cos(A)usingnoestimationofnormsofmatrixpow

ers.

1: B1=A2

2: if ∥B1∥≤Θ1then m=1,s=0,quit

3: B2=B2

1,obtainβ(2)

min using Algorithm 3 with m=2,q=2

4: if β(2)

min ≤Θ2then m=2,s=0,quit

5: β(4)

min =min{β(2)

min,β(4)

min using Algorithm 3 with m=4,q=2}

6: if β(4)

min ≤Θ4then m=4,s=0,quit

7: B3=B2B1

8: β(6)

min =min{β(4)

min,β(6)

min using Algorithm 3 with m=6,q=3}

9: if β(6)

min ≤Θ6then m=6,s=0,quit

10: β(9)

min =min{β(6)

min,β(9)

min using Algorithm 3 with m=9,q=3}

11: if β(9)

min ≤Θ9then m=9,s=0,quit

12: β(12)

min =min{β(9)

min,β(12)

min using Algorithm 3 with m=12,q=3}

13: if β(12)

min ≤Θ12 then m=12,s=0,quit

14: s9=⌈1/2log

2(β(9)

min/Θ9)⌉

15: s12 =⌈1/2log

2(β(12)

min/Θ12 )⌉

16: if s9≤s12 then s=s9,m=9quit ◃m=9usedforscalingonlyif

providing less cost than m=12

17: B4=B3B1

18: β(12)

min =min{β(12)

min,β(12)

min using Algorithm 3 with m=12,q=4}

19: if β(12)

min ≤Θ12 then m=12,s=0,quit

20: s12 =⌈1/2log

2(β(12)

min/Θ12 )⌉

21: β(16)

min =min{β(12)

min,β(16)

min using Algorithm 3 with m=16,q=4}

22: s16 =max{0,⌈1/2log

2(β(16)

min/Θ16)⌉}

23: if s12 ≤s16 then m=12,s=s12 else m=16,s=s16 quit ◃m=12

only used if provides less cost than m=16

12

Algorithm 3 beta NoNormEst: determines value β(m)

min =min{β(m)

t}from The-

orem 1 given m∈{2,4,6,9,12,16},di=||Bi||,bi=||Bi||1/i,i=1,2,...,q, for

B∈Cn×n, using bounds di≥||Bi||,i>q, based on products of ||Bi||,i≤q.

1: switch mdo

2: case 2◃m=2

3: β(2)

min =(d2d1)1/3

4: case 4◃m=4

5: β(4)

min =(d2

2d1)1/5

6: case 6◃m=6

7: if b2≤b3then

8: β(6)

min =min{d2

2d3,d

1d2

3}1/7

9: else

10: β(6)

min = max{min{d2

2d3,d

1d2

3}1/7,(d2

3d2)1/8}

11: end if

12: case 9◃m=9

13: if b2≤b3then

14: β(9)

min =(d3

2d3)1/9

15: else

16: β(9)

min = max{min{d2

2d2

3,d

3

3d1}1/10,(d3

3d2)1/11}

17: end if

18: case 12 ◃m= 12

19: if q=3then

20: if b2≤b3then

21: β(12)

min =(d5

2d3)1/13

22: else

23: β(12)

min = max{min{d4

3d1,d

3

3d2

2}1/13,(d4

3d2)1/14

24: end if

25: else if q=4then

26: if b3≤b4then

27: β(12)

min = max{(d3

3d4)1/13,min{d2

3d2

4,d

4

3d2}1/14

28: else

29: β(12)

min = max{(d2

4min{d3d2,d

4d1}}1/13,(d2

4min{d2

3,d

4d2})1/14

30: end if

31: end if

32: case 16 ◃m= 16

33: if b3≤b4then

34: β(16)

min = max{(d4

3d4)1/16,min{d5

3d2,d

3

3d2

4}1/17}

35: else

36: β(16)

min = max{(d3

4min{d4d1,d

3d2})1/17,(d3

4min{d2

3,d

4d2})1/18}

37: end if

13

On the other hand, in order to reduce the value β(mk)

min and therefore the

scaling parameter sand/or order mgiven by Algorithm 2, using (4) and

(2.2), similarly to (16) from [12] we approximated β(mk)

min =min{β(mk)

t}from

Theorem 1 by

β(mk)

min ≈max )d1/(mk+1)

mk+1 ,d

1/(mk+2)

mk+2 *,m

k≤6(forwardbound),(15)

β(mk)

min ≈max )d1/mk

mk,d

1/(mk+1)

mk+1 *mk≥9(backwardbound),(16)

computing the 1–norms of the corresponding matrix powers ∥Bi∥by the

estimation algorithm from [13] and taking bounds di=∥Bi∥.Equation(15)

and (16) may give values β(mk)

min lower than the ones given by Algorithm 3,

especially for nonnormal matrices, since

∥Bp∥≤∥B∥i1∥B2∥i2∥B3∥i3∥B4∥i4,i

1+2i2+3i3+4i4=p.

Then it is possible to substitute Algorithm 3 by a new algorithm that

computes β(mk)

min using (15) and (16) with norm estimations of matrix powers

[13] for the corresponding di.ForacompleteMATLABimplementationsee

the nested function ms selectNormEst from function cosmtay.m available in

http://personales.upv.es/jorsasma/Software/cosmtay.m.

Function cosmtay.m also implements the option without norm estimation:

The MATLAB implementation of Algorithm 2 can be seen in the nested

function ms selectNoNormEst and Algorithm 3 in beta NoNormEst from

cosmtay.m.BothfunctionsareslightlydiﬀerentfromAlgorithms2and3to

be compatible with the version with norm estimation ms selectNoNormEst.

3. Numerical experiments

In this section we compare the MATLAB functions cosmtay,cosm and

costaym:

Function cosmtay(A,NormEst) (http://personales.upv.es/jorsasma/

software/cosmtay.m), is the MATLAB implementation of Algorithm 1 with

determination of the order mand the scaling parameter susing the 1-

norm estimator [13] (NormEst=1 corresponding in cosmtay to the nested

function ms selectNormEst)withcosm [5, Alg. 4.2] (http://github.com/

sdrelton/cosm_sinm). The tests using Algorithm 2 for determining mand

s(NormEst=0 in cosmtay(A,NormEst) corresponding to the nested function

14

ms selectNoNormEst in cosmtay)gavesimilaraccuracyresultsandarela-

tive increase of the number of matrix products less than or equal to 3% in

tests, therefore we omitted them. We also noted that for small matrices the

cost of the norm estimation algorithm is not negligible compared to a ma-

trix product. Therefore, the algorithm using no norm estimation is typically

more eﬃcient. For large matrices the norm estimation algorithm is negligi-

ble, and then, the algorithm with norm estimation is faster when it really

saves matrix products.

The MATLAB function cosm has an argument which allows to compute

cos(A)bymeansofjustPad´eapproximants,oralsousingtherealSchur

decomposition and the complex Schur decomposition. For this function, we

have used Pad´e approximants with and without real Schur decomposition,

denoted by cosmSchur and cosm,respectively.

Finally, function costaym is the MATLAB implementation of Algorithm

1from[6](http://personales.upv.es/jorsasma/software/costaym.m ).

In tests we used MATLAB (R2014b) running on an Intel Core 2 Duo

processor at 3.00 GHz with 4 GB main memory. The following tests were

made:

•Te st 1: 1 00 d ia go n al iz a bl e 1 2 8 ×128 real matrices with real and complex

eigenvalues and 1-norms varying from 2.32 to 220.04.

•Test 2: 100 non diagonalizable 128 ×128 real matrices with eigenval-

ues whose algebraic multiplicity vary between 1 and 128 and 1-norms

varying from 5.27 to 21.97.

•Test 3: Sevent e e n m a t r i c e s with dimensions lower than or equal to 128

from the Eigtool MATLAB package [14], twenty eight matrices from the

matrix function literature with dimensions lower than or equal to 128,

ﬁfty one 128×128 real matrices from the function matrix of the Matrix

Computation Toolbox [15] and ﬁfty two 8 ×8realmatricesobtained

scaling the matrices from the Matrix Computation Toolbox [15] such

that their norms vary between 0.000145 and 0.334780. This last group

of matrices sized 8 ×8wereusedfortestingspeciﬁcallytheforward

error analysis from Section 2.1, since for those matrices the lower orders

m=1,2,4,6, were only used.

•Test 4: Fifty matrices from the semidiscretization of the wave e q u a t i o n

from [5, Sec. 7.5] [16, Problem 4].

15

The “exact” matrix cosine was computed exactly for the matrices of Tests

1and2. Following[6,Sec. 4.1],fortheothermatricesweusedMATLAB

symbolic versions of a scaled Pad´e rational approximation from [5] and a

scaled Taylor Paterson-Stockmeyer approximation (2), both with 4096 deci-

mal digit arithmetic and several orders mand/or scaling parameters shigher

than the ones used by cosm and cosmtay,respectively,checkingthattheir

relative diﬀerence was small enough. The algorithm accuracy was tested by

computing the relative error

E=∥cos(A)−˜

Y∥1

∥cos(A)∥1

,

where ˜

Yis the computed solution and cos(A)istheexactsolution.

To compare the relative errors of the functions, we plotted in Figure 1 t h e

performance proﬁles and the ratio of relative errors E(cosm)/E(cosmtay)and

E(cosmSchur)/E(cosmtay)forthethreetests. Intheperformanceproﬁle,

the αcoordinate varies between 1 and 5 in steps equal to 0.1, and the pcoordi-

nate is the probability that the considered algorithm has a relative error lower

than or equal to α-times the smallest error over all the methods. The ratios

of relative errors are presented in decreasing order of E(cosm)/E(cosmtay).

The results were:

•Figure 1 shows the performance proﬁle and the relative error ratios

giving the accuracy of the tested functions. Figures 1a, 1c and 1e show

that the most accurate functions in Tests 1, 2 and 3 were costaym from

[6] and cosmtay.InTest1functioncostaym was slightly more accurate

than cosmtay and this function was more accurate than cosm for 96 of

the 100 matrices of Test 1 and more accurate than cosmSchur for all

of them (see Figure 1b). The graph of cosmSchur does not appear in

Figure 1a because for all matrices of Test 1 the relative error of this

function was greater than 5 times the error by the other functions. In

Test 2 cosmtay was the most accurate function, being more accurate

than cosm for 93 of the 100 matrices and more accurate than cosmSchur

for 98 matrices (see Figure 1d). Finally, in Text 3 costaym from [6]

was the most accurate function, and cosmtay was more accurate than

cosm for 114 of the 135 matrices of Test 3 and more accurate than

cosmSchur for 109 matrices of that test (see Figure 1f).

•The ratios of ﬂops from Figure 2 show that the computational costs

of cosmtay are always lower than cosm,cosmSchur and costaym.In

16

the majority of matrices of Test 3, the ratios of ﬂops of cosmSchur

and cosmtay are between 2 and 4 and the ratios of ﬂops of cosm and

cosmtay are between 1 and 2 (Figures 2a, 2c and 2e). In the majority

of matrices of Test 3, the execution time ratios of cosm and cosmtay are

between 1 and 5, the ratios of cosmSchur and cosmtay are greater than

5(Figures2b,2dand2f),andtheexecutiontimeratiosof costaym and

cosmtay are between 1 and 2. Then, cosmtay provided always a lower

cost than costaym but giving a lower accuracy for some test matrices.

•Test 4: Section 7.5 from [5] showed that for the matrices which appear

in a wave equation problem the version of cosm using Schur decompo-

sition (cosmSchur)wasincreasinglymoreaccuratethantheMATLAB

function costay for the biggest matrix dimensions in that test. This

function was our ﬁrst MATLAB implementation for computing the ma-

trix cosine [4]. In Test 4 we compared the MATLAB implementations

cosmSchur,cosm,cosmtay and cosmtaySchur for the biggest matrix

size given in Section 7.5 from [5]. cosmtaySchur was based on the

real Schur decomposition given by a modiﬁed implementation of Algo-

rithm 4.2 from [5], where the Algorithm 1 is used for computing the

cosine of the real Schur matrix of matrix A.Figure3showsthatthe

accuracy of cosmSchur and cosmtaySchur are similar, and both im-

plementations are more accurate than the other implementations not

based on the real Schur of a matrix. In [5] the authors claimed that

costay had signs of instability. Note that cosm without Schur decom-

position also shows signs of instability, even greater than those from

cosmtay.WehavetestedthatfunctioncosmtaySchur is more accurate

than cosmSchur for 28 of the 50 matrices of Test 4 and less accurate

for 22 matrices of that test. Anyway, the diﬀerences are negligible

and the main result from this test is that algorithms cosmtay,cosm,

cosmtaySchur,cosmSchur had cost 1100, 1200, 1800 and 1900 matrix

products, respectively. Therefore, cosmtay is 8.33% more eﬃcient than

cosm [5], and cosmtaySchur is an 5.26% more eﬃcient than cosmSchur

[5].

We have used cosmtay with the 1-norm estimator (cosmtay(A,NormEst)

with NormEst=1) in the four tests above, because the results obtained with

the variant that does not use the 1-norm estimator are similar in accuracy

and we found that the computational cost in terms of matrix products for

the implementation that uses the 1-norm estimator is only 1.90%, 1.85%,

17

3.00% and 0% lower than the implementation without estimation in Tests 1,

2, 3 and 4, respectively. However, the execution time was greater due to the

overhead of using the 1-norm estimator, since the matrix sizes are not large

enough so that the cost of the estimation (O(n2)forn×nmatrices [13]) is

negligible compared to the cost of matrix products (O(n3)).

4. Conclusions

In this work two accurate Taylor algorithms have been proposed to com-

pute the matrix cosine. These algorithms use the scaling technique based on

the double angle formula of the cosine function, the Paterson-Stockmeyer’s

method for computing the Taylor approximation, and new forward and back-

ward relative error bounds for the matrix cosine Taylor approximation, which

allow to calculate the optimal scaling parameter and the optimal order of the

Tay l o r a p p r ox i m a t i on. T h e two algorithms diﬀer only in the use or not of

the 1-norm estimation of norms of matrix powers [13]. The algorithm with

no norm estimation uses Theorem 1 using the norms of matrix powers used

for computing the matrix cosine to obtain bounds on the norms of matrix

powers involved, giving in tests small relative cost diﬀerences in terms of ma-

trix products with the version with norm estimation. The accuracy of both

algorithms is similar, and the norm estimation algorithm has a cost negligi-

ble only for large matrices. Therefore, we recommend using the algorithm

with no norm estimation for small matrices, and the algorithm with norm

estimation for large matrices.

The MATLAB implementation that uses estimation was compared with

other state-of-the-art MATLAB implementations for matrices sized up to

128 ×128 (analogous results were obtained with the other implementation).

Numerical experiments show in general that our Taylor implementations have

higher accuracy and less cost than the Pad´e state-of-the-art implementation

cosm from [5] in the majority of tests. In particular, when the real Schur

decomposition was used the ratio of ﬂops between cosmSchur and cosmtay

was ﬂops(cosmSchur)/ﬂops(cosmtay)>5forsomematrices,andusingthe

Schur decomposition in our algorithms gave the same accuracy results as

cosmSchur with less cost. Numerical experiments also showed that function

cosmtay was slightly less accurate than costaym from [6] in some tests but

cosmtay provided always a lower computational cost.

18

1 1.5 2 2.5 3 3.5 4 4.5 5

p

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

cosm

cosmSchur

costaym

cosmtay

(a) Perfomance proﬁle Test 1.

10 20 30 40 50 60 70 80 90 100

Er

100

101

102

E(cosm)/E(cosmtay)

E(cosmSchur)/E(cosmtay)

E(costaym)/E(cosmtay)

(b) Ratio of relative errors Test 1.

1 1.5 2 2.5 3 3.5 4 4.5 5

p

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

cosm

cosmSchur

costaym

cosmtay

(c) Perfomance proﬁle Test 2.

10 20 30 40 50 60 70 80 90 100

Er

100

101

E(cosm)/E(cosmtay)

E(cosmSchur)/E(cosmtay)

E(costaym)/E(cosmtay)

(d) Ratio of relative errors Test 2.

1 1.5 2 2.5 3 3.5 4 4.5 5

p

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

cosm

cosmSchur

costaym

cosmtay

(e) Perfomance proﬁle Test 3.

20 40 60 80 100 120

Er

10-5

100

105

1010

E(cosm)/E(cosmtay)

E(cosmSchur)/E(cosmtay)

E(costaym)/E(cosmtay)

(f) Ratio of relative errors Test 3.

Figure 1: Accuracy in Tests 1, 2 an 3.

19

Matrix

0 20 40 60 80 100

Flops ratio

1

1.5

2

2.5

3

3.5

4Flops of cosm/Flops of cosmtay

Flops of cosmSchur/Flops of cosmtay

Flops of costaym/Flops of cosmtay

(a) Ratio of ﬂops Test 1.

Matrix

0 20 40 60 80 100

Ratio of execution times

1

1.5

2

2.5

3

3.5

4T(cosm)/T(cosmtay)

T(cosmSchur)/T(cosmtay)

T(costaym)/T(cosmtay)

(b) Ratio of execution times Test 1.

Matrix

0 20 40 60 80 100

Flops ratio

1

1.5

2

2.5

3

3.5 Flops of cosm/Flops of cosmtay

Flops of cosmSchur/Flops of cosmtay

Flops of costaym/Flops of cosmtay

(c) Ratio of ﬂops Test 2.

Matrix

0 20 40 60 80 100

Ratio of execution times

1

1.5

2

2.5

3

3.5 T(cosm)/T(cosmtay)

T(cosmSchur)/T(cosmtay)

T(costaym)/T(cosmtay)

(d) Ratio of execution times Test 2.

Matrix

0 20 40 60 80 100 120

Flops ratio

100

101

Flops of cosm/Flops of cosmtay

Flops of cosmSchur/Flops of cosmtay

Flops of costaym/Flops of cosmtay

(e) Ratio of ﬂops Test 3.

Matrix

0 20 40 60 80 100 120

Ratio of execution times

100

101

T(cosm)/T(cosmtay)

T(cosmSchur)/T(cosmtay)

T(costaym)/T(cosmtay)

(f) Ratio of execution times Test 3.

Figure 2: Computational costs in Tests 1, 2 an 3.

20

matrix

0 10 20 30 40 50

Er

10-11

10-10

10-9

cond*u

cosm

cosmSchur

cosmtay

cosmtaySchur

(a) Normwise relative error Test 4.

5 10 15 20 25 30 35 40 45 50

Er

100

101

E(cosm)/E(cosmtaySchur)

E(cosmSchur)/E(cosmtaySchur)

E(cosmtay)/E(cosmtaySchur)

(b) Ratio of relative errors Test 4.

Figure 3: Accuracy in Test 4.

Acknowledgments

The authors are very grateful to the anonymous referees, whose comments

greatly improved this paper.

[1] S. Serbin, Rational approximations of trigonometric matrices with ap-

plication to second-order systems of diﬀerential equations, Appl. Math.

Comput. 5 (1) (1979) 75–92.

[2] S. M. Serbin, S. A. Blalock, An algorithm for computing the matrix

cosine, SIAM J. Sci. Statist. Comput. 1 (2) (1980) 198–204.

[3] E. Defez, J. Sastre, J. J. Ib´a˜nez, P. A. Ruiz, Computing matrix func-

tions arising in engineering models with orthogonal matrix polynomials,

Math. Comput. Model. 57 (7-8) (2013) 1738–1743.

[4] J. Sastre, J. Ib´a˜nez, P. Ruiz, E. Defez, Eﬃcient computation of the

matrix cosine, Appl. Math. Comput. 219 (2013) 7575–7585.

[5] A. H. Al-Mohy, N. J. Higham, S. D. Relton, New algorithms for com-

puting the matrix sine and cosine separately or simultaneously, SIAM

J. Sci. Comput. 37 (1) (2015) A456–A487.

21

[6] P. Alonso, J. Ib´a˜nez, J. Sastre, J. Peinado, E. Defez, Eﬃcient and accu-

rate algorithms for computing matrix trigonometric functions, J. Com-

put. Appl. Math. 309 (2017) 325–332. doi:10.1016/j.cam.2016.05.015.

URL http://dx.doi.org/10.1016/j.cam.2016.05.015

[7] M. S. Paterson, L. J. Stockmeyer, On the number of nonscalar multi-

plications necessary to evaluate polynomials, SIAM J. Comput. 2 (1)

(1973) 60–66.

[8] C. Tsitouras, V. N. Katsikis, Bounds for variable degree rational L∞ap-

proximations to the matrix cosine, Computer Physics Communications

185 (11) (2014) 2834–2840.

[9] N. J. Higham, Functions of Matrices: Theory and Computation, SIAM,

Philadelphia, PA, USA, 2008.

[10] G. H. Golub, C. V. Loan, Matrix Computations, 3rd Edition, Johns

Hopkins Studies in Mathematical Sciences, The Johns Hopkins Univer-

sity Press, 1996.

[11] J. Sastre, J. J. Ib´a˜nez, E. Defez, P. A. Ruiz, Eﬃcient scaling-squaring

Taylor method for computing matrix exponential, SIAM J. Sci. Comput.

37 (1) (2015) A439–A455.

[12] P. Ruiz, J. Sastre, J. Ib´a˜nez, E. Defez, High perfomance computing of

the matrix exponential, J. Comput. Appl. Math. 291 (2016) 370–379.

[13] J. Higham, F. Tisseur, A block algorithm for matrix 1-norm estimation,

with an application to 1-norm pseudospectra, SIAM J. Matrix Anal.

Appl. 21 (2000) 1185–1201.

[14] T. G. Wright, Eigtool, version 2.1 (2009).

URL web.comlab.ox.ac.uk/pseudospectra/eigtool.

[15] N. J. Higham, The Test Matrix Toolbox for MATLAB, Numerical Anal-

ysis Report No. 237, Manchester, England (Dec. 1993).

[16] J. M. Franco, New methods for oscillatory systems based on

arkn methods, Appl. Numer. Math. 56 (8) (2006) 1040–1053.

doi:10.1016/j.apnum.2005.09.005.

URL http://dx.doi.org/10.1016/j.apnum.2005.09.005

22