Content uploaded by Hou-biao Li

Author content

All content in this area was uploaded by Hou-biao Li on Dec 29, 2019

Content may be subject to copyright.

Author's personal copy

Chebyshev-type methods and preconditioning techniques

q

Hou-Biao Li

a,b,

⇑

, Ting-Zhu Huang

a

, Yong Zhang

a

, Xing-Ping Liu

b

, Tong-Xiang Gu

b

a

School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 610054, PR China

b

Lab of Comp. Phys., Institute of Applied Physics and Computational Mathematics, Beijing 100088, PR China

article info

Keywords:

Chebyshev’s method

Approximate inverse preconditioner

Convergent

abstract

Recently, a Newton’s iterative method is attracting more and more attention from various

ﬁelds of science and engineering. This method is generally quadratically convergent. In this

paper, some Chebyshev-type methods with the third order convergence are analyzed in

detail and used to compute approximate inverse preconditioners for solving the linear sys-

tem Ax =b. Theoretic analysis and numerical experiments show that Chebyshev’s method

is more effective than Newton’s one in the case of constructing approximate inverse

preconditioners.

Ó2011 Elsevier Inc. All rights reserved.

1. Introduction

In scientiﬁc and engineering applications, it is often needed to solve the large sparse linear systems [1,28],

Ax ¼b;x;b2R

n

;ð1Þ

where A2R

nn

is a large sparse real matrix.

For the linear system (1), direct solvers become prohibitively expensive because of the large amount of work and storage

required. As an alternative, we usually consider preconditioned Krylov subspace methods [1,28], that is, one constructs an

easily invertible matrix Mcalled the preconditioning matrix or preconditioner and then applies the iterative solver either to the

left preconditioned linear system MAx =Mb or to the right preconditioned linear system AMy =b, where y=M

1

x. Generally

speaking, the preconditioner Mshould be chosen so that MA or AM is a good approximation to the identity matrix [5,28].

However, it is not easy to ﬁnd this kind of preconditioner Mfor the large sparse linear systems (1).

Recently, a Newton’s iterative method with quadratic convergence is applied to compute the approximate inverse pre-

conditioners Mfor solving the linear system (1) in Refs. [4,22,29]. The iterative formula, which is known as the Schulz iter-

ation in [30], as follows.

N

mþ1

¼N

m

ð2IAN

m

Þ;m¼0;1;...;ð2Þ

where N

0

is an initial approximation for A

1

and Iis an identity matrix with the same dimensions as the matrix A. In Ref. [30],

authors showed the iterative formula (2) is quadratically convergent provided that kIAN

0

k< 1, where kkdenotes any

subordinate matrix norm. Lately, Wu [33] reconsidered its convergence analysis by using different convergence criteria

and obtained the same conclusions as those of Ref. [30].

In this paper, some Chebyshev-type methods are investigated and used to compute the approximate inverse precondi-

tioners for solving the linear system Ax =b. In this case, Chebyshev’s scheme is proved to be at least cubic convergent to

0096-3003/$ - see front matter Ó2011 Elsevier Inc. All rights reserved.

doi:10.1016/j.amc.2011.05.036

q

Supported by NSFC (11026085, 60973015, 60973151), 973 Program (2007CB311002), Sichuan Province Sci. & Tech. Research Project (2011JY0002,

2009SPT-1, 2009GZ0004, 2009HH0025) and the Fundamental Research Funds for the Central Universities (ZYGX2009J103).

⇑

Corresponding author at: School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 610054, PR China.

E-mail addresses: lihoubiao0189@163.com,txgu@iapcm.ac.cn (H.-B. Li).

Applied Mathematics and Computation 218 (2011) 260–270

Contents lists available at ScienceDirect

Applied Mathematics and Computation

journal homepage: www.elsevier.com/locate/amc

Author's personal copy

A

1

under the same convergence conditions as Newton’s method (2), and it also has the same properties as Newton’s meth-

od, which can ensure that it can be applied to not only the left preconditioned linear system N

m

Ax =N

m

bbut also the right

preconditioned one AN

m

y=b. Moreover, theoretic analysis and numerical experiments show that Chebyshev’s method is

more effective than Newton’s one in the case of constructing approximate inverse preconditioners N

m

.

The remainder of the paper is organized as follows. In Section 2, we mainly recall some results on Newton’s and Cheby-

shev’s iterative methods. In Section 3, some preconditioning technologies based on Chebyshev’s method are presented. The

computational complexity of these schemes are considered in Section 4. In addition, some choices for the initial value N

0

on

iterative processes (2) are also discussed in Section 5. Finally, the effectiveness of these improvements is demonstrated by

some numerical experiments in Section 6.

2. Newton’s and Chebyshev’s iterative methods

First, let us introduce Newton’s iterative idea and some corresponding results.

It is well known that Newton’s method arises from the ﬁrst-degree Taylor’s expansion of f(x)=0at(x

n

,f(x

n

)), and it usually

has quadratic convergence for simple roots. For x

0

given, Newton’s iterative method applies the following iterative process

x

nþ1

¼x

n

fðx

n

Þ

f

0

ðx

n

Þ;n¼0;1;... ð3Þ

to solve the nonlinear equation f(x)=0ðx2RÞ, see Ref. [2,27].

Since Newton’s iteration has strong numerical stability, it is attracting more and more attention from various ﬁelds of

science and engineering.

Recently, some higher order methods have been obtained [2,17], but these higher order methods also include the inverse

of the operator f

0

in their expression. In order to avoid this problem, Ulm [18] and Hald [19] proposed the following quadrat-

ically convergent iteration [15,16], by introducing an approximation of f

0

(x

n+1

)

1

:

x

nþ1

¼x

n

B

n

fðx

n

Þ;

B

nþ1

¼2B

n

B

n

f

0

ðx

nþ1

ÞB

n

;n¼0;1;...

ð4Þ

where x

0

and B

0

2LðY;XÞ, the set of bounded linear operators from Xinto Y, are given.

Generally, for some improvements on Newton’s method, as higher the order is, higher the velocity of convergence is.

However, the operational cost of a method also increases with the order, which leads to ﬁnd an equilibrium between the

high velocity and the operational cost (see [14]). Therefore, we only consider third-order methods in this paper.

Recently, two kinds of third-order methods are presented in [14]. One is that

x

nþ1

¼x

n

1þ1

2

L

f

ðx

n

Þ

1þb

n

ðfðx

n

Þ=f

0

ðx

n

ÞÞ

fðx

n

Þ

f

0

ðx

n

Þ;n¼0;1;...;ð5Þ

where b

n

is a parameter depending on n, and

L

f

ðxÞ¼fðxÞf

00

ðxÞ

f

0

ðxÞ

2

:

This third-order method includes, as particular cases, the following ones (see, [14]):

1. For b

n

= 0, the famous Chebyshev’s method, which is given by quadratic interpolation of the inverse function of fin [2],is

obtained:

x

nþ1

¼x

n

fðx

n

Þ

f

0

ðx

n

Þf

00

ðx

n

Þ

2f

0

ðx

n

Þ

fðx

n

Þ

f

0

ðx

n

Þ

2

:ð6Þ

2. For b

n

¼

a

f

00

ðx

n

Þ=f

0

ðx

n

Þ;

a

2R, we have that [13]

x

nþ1

¼x

n

1þ1

2

L

f

ðx

n

Þ

1

a

L

f

ðx

n

Þ

fðx

n

Þ

f

0

ðx

n

Þ;n¼0;1;... ð7Þ

3. Finally, as a limit case, when b

n

=±1, Newton’s method (3) is obtained.

The other is C-methods [13]

x

nþ1

¼x

n

1þ1

2L

f

ðx

n

ÞþCL

f

ðx

n

Þ

2

fðx

n

Þ

f

0

ðx

n

Þ;n¼0;1;... ð8Þ

But third-order methods are not the only these schemes. For instance, the following methods are also needed in this paper:

1. The mid-poind rule method (MM)[34]:

x

nþ1

¼x

n

fðx

n

Þ

f

0

x

n

fðx

n

Þ

2f

0

ðx

n

Þ

;n¼0;1;... ð9Þ

H.-B. Li et al. / Applied Mathematics and Computation 218 (2011) 260–270 261

Author's personal copy

2. Homeier’s method (HM)[34]:

x

nþ1

¼x

n

fðx

n

Þ

2

1

f

0

ðx

n

Þþ1

f

0

x

n

fðx

n

Þ

f

0

ðx

n

Þ

0

@1

A;n¼0;1;... ð10Þ

3. Preconditioning technologies based on Chebyshev’s method

As Amat and Ezquerro pointed out in [14,16], most of the methods mentioned in the previous section can be extended to

Banach spaces by writing inverse operators instead of quotients. For instance, the Eq. (5) can be written in terms of linear

operators as (see, [14])

x

nþ1

¼x

n

Iþ1

2ðIþb

n

f

0

ðx

n

Þ

1

fðx

n

ÞÞ

1

L

f

ðx

n

Þ

f

0

ðx

n

Þ

1

fðx

n

Þ;

where Iis the identity operator on Xand for each x2X,L

f

(x) is a linear operator on Xdeﬁned by

L

f

ðxÞ¼f

0

ðxÞ

1

f

00

ðxÞf

0

ðxÞ

1

fðxÞ;

assuming that f

0

(x)

1

exists.

Based on these theories, if we apply Newton’s method (3) to the equation f(x)=x

1

A, then one may obtain the famous

Newton’s iterative process

1

(2). In fact, Newton’s iterative method for matrix inversion or Moore–Penrose generalized inversion

has been covered in Ref. [14,23,29–31,33] and is attractive because of very strong numerical stability, local quadratic conver-

gence, and convenience for parallel implementation.

Similarly, Chebyshev’s method (6) allows us to obtain a sequence of approximations {N

m

} for calculating the inverse of a

matrix A, which was proposed by Amat [14]:

N

mþ1

¼N

m

ð3IAN

m

ð3IAN

m

ÞÞ;m¼0;1;...;ð11Þ

or equivalently

N

mþ1

¼ð3IN

m

Að3IN

m

AÞÞN

m

;m¼0;1;... ð12Þ

Though Chebyshev’s method has at least three order rate of convergence for nonlinear equations f(x) = 0, the paper [14] did

not give detail proof on approximating the matrix A

1

. Next, we will show that the above sequences (11) and (12) will con-

verge to A

1

with the same convergence rate under some conditions.

Theorem 3.1. Let A = [a

ij

] be any nonsingular square matrix. If the initial approximation N

0

satisﬁes that

kE

0

k,kIAN

0

k<1;ð13Þ

then the iterative formula (11)is at least cubic convergent to A

1

.

Proof. Here, the proof is mainly based on those ideas in Ref. [29,33]. Let E

k

,IAN

k

, then

E

kþ1

¼IAN

kþ1

¼IA½N

k

ð3IAN

k

ð3IAN

k

ÞÞ ¼ IAN

k

½3IAN

k

ð3IAN

k

Þ ¼ IAN

k

½3I3AN

k

þðAN

k

Þ

2

¼ðIAN

k

Þ

3

¼ðE

k

Þ

3

:ð14Þ

Since kE

0

k< 1, then from (14), we have that

kE

kþ1

k6kE

k

k

3

66kE

0

k

3

kþ1

!0;as k!1:ð15Þ

i.e.,

IAN

k

!0;as k!1

and

N

k

!A

1

;as k!1:

Next, we prove that the order of convergence for the sequence of approximations {N

m

} is at least 3.

Let e

k

denote the error matrix e

k

,N

k

A

1

, then from the above (11), we have that

A

1

þe

kþ1

¼N

kþ1

¼N

k

ð3IAN

k

ð3IAN

k

ÞÞ ¼ ðA

1

þe

k

Þ½3IAðA

1

þe

k

Þð3IAðA

1

þe

k

ÞÞ

¼ðA

1

þe

k

ÞðIAe

k

þðAe

k

Þ

2

Þ¼A

1

þe

k

ðAe

k

Þ

2

:ð16Þ

1

In some literature, let Bbe an approximate inverse of A, i.e.,kIBAk< 1, then A1¼P1

i¼0ðIBAÞiB, which also implies Newton’s method (2).

262 H.-B. Li et al. / Applied Mathematics and Computation 218 (2011) 260–270

Author's personal copy

i.e.,

e

kþ1

¼e

k

ðAe

k

Þ

2

:

So ke

k+1

k6kAk

2

ke

k

k

3

, which indicates that the sequence {N

m

} has at least three order rate of convergence. The proof is com-

pleted. h

Remark 2.1. For the above Theorem 3.1, we give the following several explanations:

1) From (14), one knows that the condition (13) may be weakened. In fact, we only need that the spectral radius of AN

0

is

less than one for the convergence of the above formula (11). In this case, the choice of N

0

may be obtained according to

the estimate formulas for the spectral radius

q

(AN

0

) in Ref. [8–10] for some matrices.

2) In addition, it should be mentioned that the matrix AN

m

does not need to be formed explicitly since AN

m

can be com-

puted from a sequence of matrix-by-vector products, see, Ref. [22]. In some experiments, to further reduce computa-

tional cost, we may solve them based on the vector and parallel processors, see Section 6.

3) Finally, for the choice of N

0

, there exist many of different forms. We will describe this problem in detail, see Section 5.

Next, we give a property about Chebyshev’s schemes (11) or (12), which is similar to the Theorem 2.3 of [33]. This prop-

erty shows that N

m

(mP0) (11) may be applied to not only the left preconditioned linear system N

m

Ax =N

m

bbut also the

right preconditioned linear system AN

m

y=b.

Theorem 3.2. Let A = [a

ij

] be any nonsingular matrix. If AN

0

=N

0

A is valid, then for the sequence of {N

m

}(11)we have that

AN

k

¼N

k

A;ð17Þ

holds, for all k = 1,2,....

Proof. First, since AN

0

=N

0

A, we have from (11) that

AN

1

¼AN

0

ð3IAN

0

ð3IAN

0

ÞÞ

¼N

0

Að3IN

0

Að3IN

0

AÞÞ

¼N

0

ð3AAN

0

ð3AAN

0

AÞÞ

¼N

0

ð3IAN

0

ð3IAN

0

ÞÞA

¼N

1

A:

That is, when k= 1, the Eq. (17) holds.

Next we use the mathematical induction to prove the Eq. (17). Suppose that AN

k

=N

k

Ais true, then a straightforward

calculation using (11) shows that, for all kP1,

AN

kþ1

¼AN

k

ð3IAN

k

ð3IAN

k

ÞÞ

¼N

k

Að3IN

k

Að3IN

k

AÞÞ

¼N

k

ð3AAN

k

ð3AAN

k

AÞÞ

¼N

k

ð3IAN

k

ð3IAN

k

ÞÞA

¼N

kþ1

A:

i.e.,

AN

kþ1

¼N

kþ1

A:

Thus, the proof is completed. h

In addition, there also exist some other analogous iterative processes. For example, for the mid-poind rule method (9) and

Homeier’s method (10), we may obtain the following analogous iterative processesc

N

mþ1

¼Iþ1

4ðIN

m

AÞð3IN

m

AÞ

2

N

m

;m¼0;1;... ð18Þ

and

N

mþ1

¼N

m

Iþ1

2ðIAN

m

ÞðIþð2IAN

m

Þ

2

Þ

;m¼0;1;... ð19Þ

for computing the approximate inverses of a matrix, respectively. Simultaneously, Theorem 3.1 is also available for (18) and

(19).

Besides, as a general scheme, the literature [23,26] presented the following form:

N

mþ1

¼aN

m

ðAN

m

Þ

2

þbN

m

ðAN

m

ÞþcN

m

þdI;m¼0;1;...;

H.-B. Li et al. / Applied Mathematics and Computation 218 (2011) 260–270 263

Author's personal copy

where a;b;c;d2Rand satisfy some limited conditions, to approximate the matrix inversion or Moore–Penrose generalized

inversion. But they usually have the same similar effect as the above three-order methods [23,26].

Finally, it is mentioned that not all methods in the previous section such as (4), (7) and (8) can be used to calculate the

matrix inversion or Moore–Penrose generalized inversion, since there exist some inverse operators in their expressions.

4. Comparisons on the computational complexity

Now let us consider the computational complexity among the above three iterative processes (11) or (12), (18) and (19),

since they are all at least cubic convergent to A

1

in the same conditions.

First, for Newton’s and Chebyshev’s methods, let mand nbe the iterative number in the same output precision, respec-

tively. According to their convergence order, for the same initial value N

0

, we have that

kE

0

k

2

m

¼kE

0

k

3

n

;

i.e.,

m

n¼ln 3

ln 2 1:585:

In addition, for AN

m

or N

m

Ain these expressions (2), (11), (18) and (19), we need only calculate once for every iterative step.

Thus, we have the following comparative result (see Table 1), where nis the iterative number.

Comparing these results, one can see that Chebyshev’s method (11) reduce computational complexity by using less num-

ber of basic operations and leads to the better equilibrium between the high velocity and the operational cost.

5. The choice for the initial value N

0

As it is all known, the choices for the initial value N

0

on iterative processes (2),(11), (18) and (19) are very important to

preserve convergence. Recently, there exist many of different forms for the initial value N

0

.

First, note that the analysis of Newton’s method to calculate the matrix inversion or Moore–Penrose generalized inversion

is generally based on the singular value decomposition of A. For instance, Pan [26] assumed that the approximations N

m

share singular vectors with A

T

, and both the largest (

r

1

) and smallest singular values (

r

r

)ofAare available, then for general

matrices A, one chooses

N

0

¼2A

T

r

2

1

þ

r

2

r

;ð20Þ

which may yield the bound

kE

0

k

2

¼

r

2

1

r

2

r

r

2

1

þ

r

2

r

¼

j

2

1

j

2

þ1<1;

j

¼

r

1

r

r

:

However, if one uses Eq. (20) to compute approximate inverse preconditioners N

m

, the cost is very high even though the larg-

est singular value

r

1

can be effectively approximated by using the power method or (better) the Lanczos method, and the

value

r

r

can be also easily obtained by approximating the largest eigenvalue

m

r

2

r

of the symmetric positive deﬁnite matrix

m

I

AA

T

for

m

>

r

2

r

(see [26]). Fortunately, Pan [26] also pointed out that almost the same upper bounds on kE

i

k

2

(i=0,1,...)

can be yielded, when

r

1

and

r

r

are replaced by their upper estimates. Thus we may apply the lower dimension norm matri-

ces in [12] to cheaply obtain their upper estimates in some cases.

In addition, even without approximating

r

1

and

r

r

, we still may apply the processes (2) and (11) by using other choices

for N

0

. In particular, we may easily compute

N

0

¼A

T

kAk

1

kAk

1

;ð21Þ

Table 1

Comparisons of the computational complexity for different algorithms in the same output precision.

Algorithms Matrix–matrix multiplication Addition and subtraction

Newton (2) 3.17n1.585n

Chebyshev (11) 3n2n

MM (18) 4n3n

HM (19) 4n4n

264 H.-B. Li et al. / Applied Mathematics and Computation 218 (2011) 260–270

Author's personal copy

to yield the bound

kE

0

k

2

611=ðn

j

2

Þ

for any nnmatrix A, where kAk

1

¼max

j

P

n

i¼1

ja

ij

j

and kAk

1

¼max

i

P

n

j¼1

ja

ij

j

no

(see [23]). Moreover, for an nnsymmet-

ric positive deﬁnite matrix A, we may yield the bound

kE

0

k

2

611=ﬃﬃﬃ

n

p

j

by choosing N

0

=I/kAk

F

in [23].

Note that the Eqs. (20) and (21), they exist the matrix A

T

, it is expensive to compute N

0

A. For some speciﬁc matrices, we

hope that N

0

is more sparse than A

T

. Since for the matrix splitting A=NP, we have IN

1

A=N

1

Pto the equation. There-

fore, generally speaking, N

0

may be obtained from the convergence splitting of A(i.e., let N

0

=N

1

), such as the stair-splitting

(see, [21]), regular-splitting, H-splitting (see, [3]), etc. In addition, according to the Householder-John theorem, for any sym-

metric positive deﬁnite matrix A, we may let N

0

=P

1

, where the matrix Pis any of matrix such that P+P

T

Ais symmetric

positive deﬁnite.

Recently, Moriya [22] also proposed the following diagonal matrix

N

0

¼diagð1=a

11

;1=a

22

;...;1=a

nn

Þ;ð22Þ

as the initial approximation, where a

ii

is the ith diagonal entry of A, since the condition (13) is satisﬁed when Ais a strictly

diagonally dominant matrix (SD). However, the numerical results reported in Ref. [22] show that the sequence {N

m

}(2) also

performs well even if Eq. (22) is used for obtaining N

0

when ARSD. Noting that for any H-matrix ([1]), there exists a diagonal

matrix Dsuch that AD 2SD(see [7,20]), we may, in fact, select a diagonal matrix as the initial approximation, when Ais an

H-matrix.

In addition, if it is very difﬁcult to ﬁnd the initial approximation N

0

for certain matrix A, then we also may choose

a

I

ð

a

2RÞas the initial approximation (see, [23]):

Corollary 5.1. If kI

a

Ak<1, where

a

is a scalar, then the iterative procedure of (11)for determining A

1

will converge for

N

0

=

a

I.

Proof. It is obvious by Theorem 3.1.h

Finally, for some ill conditioned matrices A, we may apply the homotopic (continuation) approach [24] to decrease the

residual norms or add a matrix P(a preconditioner, having a smaller rank and/or structured) to the input matrix Ato obtain

its additive modiﬁcation Cwith a smaller condition number (i.e., The additive preprocessing, see [25]). In addition, we may

also transform the linear system Ax =binto ðAþe

AÞd

kþ1

¼r

ðkÞ

(where d

k+1

=x

k+1

x

k

,r

(k)

=

x

(Ax

k

b)) by the strongly impli-

cit method [32]:

Algorithm 1: The strongly implicit method [32]

(1) Choose any initial value x

0

;

(2) Compute r

(k)

=

x

(Ax

0

b);

(3) Solve ðAþe

AÞd

kþ1

¼r

ðkÞ

;

(4) Compute x

k+1

=x

k

+d

k+1

;

(5) If kd

k+1

k<

, then x

k+1

is an approximate solution of Ax =b. Otherwise, continue steps (2)–(5).

Next, let us use some numerical experiments to examine the effectiveness of these schemes for the preconditioned BiCG-

STAB Krylov subspace method [28].

6. Numerical experiments

As it is mentioned in Ref. [23], if the ﬂoating-point operations are the measure of cost, Newton’ method (2) is more expen-

sive than the conventional alternatives. The reason for our interest in this method is that (2) essentially amounts to matrix

multiplications, which can be very efﬁciently implemented on systolic arrays and on vector and parallel computers.

Next, to further reduce the computational cost, we propose a new implementation of Chebyshev’s scheme (11), which

makes the iterative solver use the matrix–vector product W

m

,N

m

v

(where

v

is the object vector) instead of N

m

. Moreover,

the vectors W

m

(m=1,2,...) are computed by only using N

m1

,Aand W

m1

:

W

0

¼N

0

v

;

W

1

¼N

0

ð3IAN

0

ð3IAN

0

ÞÞ

v

¼3W

0

N

0

Að3W

0

N

0

AW

0

Þ;

W

2

¼N

1

ð3IAN

1

ð3IAN

1

ÞÞ

v

¼3W

1

N

1

ð3IAN

1

ÞAW

1

:

H.-B. Li et al. / Applied Mathematics and Computation 218 (2011) 260–270 265

Author's personal copy

In addition, Newton’s iteration begins with (21) and (22) in Ref. [4,22,23,29], respectively. They show that these initial

approximations perform well in most cases. Recently, we ﬁnd that the stair matrix ([21]) is also a very good choice in some

cases, see [11]. In this paper, we mainly use the stair matrix as an initial approximation. Next, let us recall stair matrices and

their properties introduced in the ﬁrst part of [21]. All notations are the same as those of Ref. [21].

Deﬁnition 6.1. An nntridiagonal matrix A= tridiag (a

i,i1

,a

i,i

,a

i,i+1

) is called a stair matrix if one of the following

conditions is satisﬁed.

(I) a

i;i1

¼0;a

i;iþ1

¼0;i¼1;3;...;2

n1

2

þ1;

(II) a

i;i1

¼0;a

i;iþ1

¼0;i¼2;4;...;2

n

2

.

And a stair matrix is of type I if condition I is satisﬁed and is of type II if condition II holds.

Generally speaking, the inverse of a nonsingular stair matrix is also a stair matrix, and the inverse of a stair matrix may be

obtained by the following parallel algorithm:

Algorithm 2:([21]). Let A= stair (a

i,i1

,a

i,i

,a

i,i+1

) be a nonsingular stair matrix and A

1

= stair (b

i,i1

,b

i,i

,b

i,i+1

).

If (Ais of the type I)

for i= 1:1:n

b

i;i

¼a

1

i;i

endfor i

for i¼2:2:2

n

2

b

i;j

¼a

1

i;i

a

i;j

a

1

j;j

,j=i1,i+1

endfor i

endif

If (Ais of the type II)

for i= 1:1:n

b

i;i

¼a

1

i;i

endfor i

for i¼1:2:2

n1

2

þ1

b

i;j

¼a

1

i;i

a

i;j

a

1

j;j

,j=i1,i+1

endfor i

endif

where b

i,i

=0, if i<1 or i>n. A remarkable feature of the Algorithm 2 is its high parallelism. For example, if Ais a stair

matrix of the type I, ﬁrst, for all i, the computations of b

i,i

can be fulﬁlled by different processors at same time, and then

proceed to compute b

i,j

in parallel for even iat the same time. Thus, if nis reasonably large, we achieve a fair

parallelism.

In this section, the algorithms discussed above are tested for different matrices, using the MATLAB 6.5 software

2

and

BiCGSTAB Krylov subspace method. The machine we have used is a PC-Pentium (R) 4, CPU 1.70 GHz, RAM 256 M. In all of

our runs, we used the zero vector as an initial approximate solution, and the right-hand side vector bis created artiﬁcially using

b=A(1,1,...,1)

T

. Unless otherwise stated, the BiCGSTAB is used with right preconditioning. The iterative process ends when the

residual satisﬁes

kr

ðkÞ

k

2

=kr

ð0Þ

k

2

61:010

8

;ð23Þ

where r

(k)

is the residual vector after k-th iterations. In addition, we mainly study the case of k< 3 since in most of cases, the

performances of N

2

and N

3

are almost the same. So we do not consider that we always beneﬁt by increasing k.

Example 6.1 [6]. Let us consider the linear system of the form

Ax ¼b;x;b2R

n

ð24Þ

where the matrix Aarises from ﬁve-point discretization of the following second order elliptic partial differential equation:

@

@xa@u

@x

@

@yb@u

@y

þ@

@xðcuÞþ @

@yðduÞþfu ¼0ð25Þ

with a(x,y)>0,b(x,y)>0, c(x,y), d(x,y), and f(x,y) deﬁned on a unit square region

X

= (0,1) (0, 1), and Dirichlet boundary

condition u(x,y)=0on o

X

are used, where o

X

denotes the boundary of

X

.

2

Matrix computations in MATLAB 6.5 are based on LAPACK, and Optimized Basic Linear Algebra Subroutines (BLAS), on all MATLAB platforms, which speeds

up matrix multiplication and the LAPACK routines themselves, see Matlab user manual.

266 H.-B. Li et al. / Applied Mathematics and Computation 218 (2011) 260–270

Author's personal copy

Now, we consider the Eq. (25) with a(x,y)=b(x,y)=1,c(x,y) = cos (x/6), d(x,y) = sin (y/6) and f(x,y) = 1. And we use four

uniform meshes of n

s

=n

b

= 1/11, n

s

=n

b

= 1/21, n

s

=n

b

= 1/31 and n

s

=n

b

= 1/41, which lead to four matrices of order

n=1010, n=2020, n=3030 and n=4040, where n

s

and n

b

refer to the mesh sizes in the x-direction and y-direc-

tion, respectively.

Table 2 shows the computation cost of computing preconditioners N

m

in explicit form with respect to the cases of using

Eqs. (2) and (11) when N

0

= [diag (A)]

1

, where ‘‘0.0000’’ means that only a little time is needed.

We see that Chebyshev’s scheme (11) is more expensive than Newton’s one (2), but the following experiments show that

Chebyshev’s scheme (11) is more effective than Newton’s one (2) in the total time consuming.

Now, we compare the above two schemes (2) and (11) with different initial approximations. Comparisons are made in

terms of the same method between the diagonal matrix Eq. (22) and the stair matrix Stair (A).

Table 3 presents the computation time and iterations of BiCGSTAB algorithm required for satisfying the condition

(23). ‘‘N + diag (A)/Stair (A)+N

l

’’ and ‘‘M-N + diag (A)/Stair (A)+N

l

’’ denote the BiCGSTAB algorithms by the proposed

Table 2

CPU time(s) for computing preconditioners N

m

in explicit form when N

0

= [diag (A)]

1

.

Algorithm Size (n)N

1

N

2

Newton 100 0.0000 0.0000

400 0.0000 0.0100

900 0.0000 0.0200

1600 0.0100 0.0200

Chebyshev 100 0.0000 0.0100

400 0.0100 0.0500

900 0.0100 0.1200

1600 0.0200 0.1910

Table 3

The computation time(s) and iterations of BiCGSTAB algorithm (Time: computation time, Iter: iterations).

Algorithm 100 400 900 1600

Time Iter Time Iter Time Iter Time Iter

N + diag (A) + N

1

0.0780 11 3.2650 20 40.1410 28 240.3280 37.5

N + Stair (A) + N

1

0.0780 10.5 3.4530 21 46.1250 34.5 241.7340 40.5

M-N + diag (A) + N

1

0.0940 12 3.3910 20.5 43.9380 33 262.5000 43.5

M-N + Stair (A) + N

1

0.0790 9 2.2190 16 34.2660 24 53.3600 8.5

N + diag (A) + N

2

0.0620 8 2.0160 14 27.0160 20.5 158.9530 26.5

N + Stair (A) + N

2

0.0580 7 2.3120 14 27.7190 20.5 162.9060 27

M-N + diag (A) + N

2

0.0470 6 1.8600 12.5 23.7180 18 140.0310 23.5

M-N + Stair (A) + N

2

0.0310 4.5 1.1710 8.5 18.9530 12.5 98.6250 16

0 10 20 30 40 50

0

10

20

30

40

50

nz = 460

010 20 30 40 50

0

10

20

30

40

50

nz = 190

Fig. 1. Patterns of the matrix in Example 6.2 (left) and its stair matrix (right).

H.-B. Li et al. / Applied Mathematics and Computation 218 (2011) 260–270 267

Author's personal copy

implementations (2) and (11), respectively. The symbol ‘‘N’’ and ‘‘M-N’’ mean that Newton’s scheme (2) and Chebyshev’s

scheme (11) are used, respectively.

The numerical results reported in Table 3 illustrate that the initial approximate solution N

0

has little effect on the com-

putation time and iterations of BiCGSTAB algorithm in the case of Newton’s scheme (2), while there has an effect on the com-

putation time and iterations in the case of Chebyshev’s scheme (11). As it can be seen, the application of Chebyshev’s scheme

greatly improves the convergence rate and so reduces the number of iterations in most cases. These numerical experiments

also show that ‘‘M-N + Stair (A)+N

2

’’ algorithm is the ‘‘better’’ among of all algorithms.

Example 6.2 [6]. Let us consider the linear system of the form (25) with a(x,y)=1, b(x,y)=1, c(x,y) = 10(x+y),

d(x,y) = 10(xy), and f(x,y) = 0 deﬁned on a unit square region

X

= (0,1) (0,1), and Dirichlet boundary condition

u(x,y)=0on o

X

are used. The structure of the resulting matrix Aand its stair matrix are plotted in Fig. 1 when n= 50.

Table 4 shows the computation cost for selecting N

m

(m= 1,2). We choose the stair matrix instead of Eq. (22) as the initial

approximation. The results are presented in Figs. 2, 3, which illustrate the behavior of the residual norm of the precondi-

tioned BiCGSTAB algorithm in the case of n=3030. These numerical results show that, in any residual norms, the precon-

ditioned BiCGSTAB algorithm using (11) works better than the preconditioned BiCGSTAB algorithm using (2).

In many of applications, N

m

may be too large to be solved inexpensively and therefore the further research is needed. Gen-

erally speaking, we can exploit the sparsity or other properties of Ato reduce its cost. For instance, a dropping strategy based

on some given threshold tolerance may be adopted:

N

mþ1

¼

a

m

J½N

m

ð3IAN

m

ð3IAN

m

ÞÞ;m¼0;1;...;

where

a

m

2[0,2] is an acceleration parameter, Jis a sparse matrix with only elements {0, 1} and ‘‘’’ denotes the Hadamard

product (elementwise product). It is hoped that the question will be resolved with the above approach. However, whether

there exist other better modiﬁed forms for Newton’s scheme may be still an interesting problem.

7. Conclusions

As it was shown in Ref. [23], Newton’s iteration is simple to describe and to analyze, and is strongly numerically stable for

nonsingular input matrices. And it essentially amounts to matrix multiplications, which can be very efﬁciently implemented

Table 4

CPU time(s) for computing preconditioners N

m

in explicit form when N

0

= Stair (A)

1

.

Algorithm N

1

N

2

Newton 0.0300 0.0700

Chebyshev 0.0400 0.2910

12 14

10

20

30

40

50

60

70

Residual Norm

ComputationTime (s)

Newton+N1+BiCGSTAB

Newton+N2+BiCGSTAB

Modified+N1+BiCGSTAB

Modified+N2+BiCGSTAB

1e−02 1e−04 1e−06 1e−08 1e−10 1e− 1e−

Fig. 2. The convergence history of residual norm of the preconditioned BiCGSTAB algorithm (n= 900): (a) residual norm vs. time(s).

268 H.-B. Li et al. / Applied Mathematics and Computation 218 (2011) 260–270

Author's personal copy

on systolic arrays and on vector and parallel computers. In addition, additional saving is also possible (see, [23]): if Ais

sparse, then N

m

Aor AN

m

is less costly to compute, in the case of Toeplitz and many other structured matrices, as only the

product of Aand some vectors is required, we do not need to form A, which is convenient in many of applications.

In this paper, we study a Chebyshev’s iteration (11). The Chebyshev’s scheme has a similar form as Newton’s one (2), but

it has better convergence rate than Newton’s one. In addition, the stair matrix instead of Eq. (22) is selected as the initial

approximation of Newton’s iterations (2) and (11) in our proposed implementation. As a result, the total time consuming

of Chebyshev’s iteration (11) is less expensive than the classical Newton’s one (2) in the case of constructing approximate

inverse preconditioners.

In addition, it is worth mentioning that recent other applications of Newton’s method such as the homotopic residual cor-

rection, preservation intermediate matrices structure and the computation of ﬃﬃﬃ

A

pare given in [35,36], which shows that

Newton’s method is interesting, we will continue to study in the future.

Acknowledgements

The authors sincerely thank Ph.D. Melvin Scott and reviewers for valuable comments and suggestions on the early man-

uscript of this paper, which led to a substantial improvement on the presentation and contents of this paper.

References

[1] O. Axelsson, Iterative Solution Methods, Cambridge University Press, New York, 1994.

[2] J.F. Traub, Iterative Methods for Solution of Equations, Prentice-Hall, Englewood Cliffs, NJ, 1964.

[3] L. Elsner, Comparisons of weak regular splittings and multisplittings methods, Numer. Math. 56 (1989) 283–289.

[4] L. Grosz, Preconditioning by incomplete block elimination, Numer. Linear Algebra Appl. 7 (7-8) (2000) 527–541.

[5] T. Huckle, Approximate sparsity patterns for the inverse of a matrix and preconditioning, Appl. Numer. Math. 30 (1999) 291–303.

[6] S.W. Kim, J.H. Yun, Block ILU preconditioners for a block-tridiagonal H-matrix, Linear Algebra Appl. 317 (2000) 103–125.

[7] H.B. Li, T.Z. Huang, On a new criterion for the H-matrix property, Appl. Math. Lett. 19 (10) (2006) 1134–1142.

[8] H.B. Li, T.Z. Huang, H. Li, An improvement on a new upper bound for moduli of eigenvalues of iterative matrices, Appl. Math. Comput. 173 (2006)

977–984.

[9] H.B. Li, T.Z. Huang, et al, A new upper bound for spectral radius of block iterative matrices, J. Comput. Inform. Syst. 1 (2005) 595–599.

[10] H.B. Li, T.Z. Huang, H. Li, On some subclasses of P-matrices, Numer. Lin. Alg. Appl. 14 (5) (2007) 391–405.

[11] H.B. Li, T.Z. Huang, Y. Zhang, et al, On some new approximate factorization methods for block tridiagonal matrices suitable for vector and parallel

processors, Math. Comput. Simul. 79 (2009) 2135–2147.

[12] H.B. Li, T.Z. Huang, X.P. Liu, H. Li, Singularity, Wielandt’s lemma and singular values, J. Comput. Appl. Math. 234 (2010) 2943–2952.

[13] S. Amat, S. Busquier, Third-order iterative methods under Kantorovich conditions, J. Math. Anal. Appl. 336 (1) (2007) 243–261.

[14] S. Amat, S. Busquier, J.M. Gutiérrez, Geometric constructions of iterative functions to solve nonlinear equations, J. Comput. Appl. Math. 157 (1) (2003)

197–205.

[15] J.M. Gutiérrez, M.A. Hernández, N. Romero, A note on a modiﬁcation of Moser’s method, J. Complex. 24 (2008) 185–197.

[16] J.A. Ezquerro, M.A. Hernández, The Ulm method under mild differentiability conditions, Numer. Math. 109 (2) (2008) 193–207.

[17] S. Amat, M.A. Hernándezz, N. Romero, A modiﬁed Chebyshev’s iterative method with at least sixth order of convergence, Appl. Math. Comput. 206

(2008) 164–174.

[18] S. Ulm, On iterative methods with successive aproximation of the inverse operator (in Russian), Izv. Akad. Nauk Est. SSR 16 (1967) 403–411.

14

5

10

15

20

25

30

35

40

45

50

Residual Norm

Iterations Number

Newton+N1+BiCGSTAB

Newton+N2+BiCGSTAB

Modified+N1+BiCGSTAB

Modified+N2+BiCGSTAB

1e−02 1e−04 1e−06 1e−08 1e−10 1e−12 1e−

Fig. 3. The convergence history of residual norm of the preconditioned BiCGSTAB algorithm (n= 900): (b) residual norm vs. iterations.

H.-B. Li et al. / Applied Mathematics and Computation 218 (2011) 260–270 269

Author's personal copy

[19] O.H. Hald, On a Newton-Moser type method, Numer. Math. 23 (1975) 411–425.

[20] W.M. Lioen, On the diagonal approximation of full matrices, J. Comput. Appl. Math. 75 (1996) 35–42.

[21] H. Lu, Stair matrices and their generalizations with applications to iterative methods I: A generalization of the successive overrelaxation method, SIAM

J. Numer. Anal. 37 (1) (2001) 1–17.

[22] K. Moriya, T. Noderab, A new scheme of computing the approximate inverse preconditioner for the reduced linear systems, J. Comput. Appl. Math. 199

(2007) 345–352.

[23] V.Y. Pan, R. Schreiber, An improved Newton iteration for the generalized inverse of a matrix with applications, SIAM J. Sci. Stat. Comput. 12 (5) (1991)

1109–1130.

[24] V.Y. Pan, New homotopic/factorization and symmetrization techniques for Newton’s and Newton/structured iteration, Comput. Math. Appl. 54 (2007)

721–729.

[25] V.Y. Pan, D. Ivolginetc, B. Murphy, et al, Additive preconditioning and aggregation in matrix computations, Comput. Math. Appl. 55 (2008) 1870–1886.

[26] G. Codevico, V.Y. Pan, M.V. Barel, Newton-Like Iteration based on a cubic polynomial for structured matrices, Numer. Algorithms 36 (2004) 365–380.

[27] G.M. Phillips, P.J. Taylor, Theory and Applications of Numerical Analysis, Academic Press, 1980.

[28] Y. Saad, Iterative Methods for Sparse Linear Systems Second ed., PWS Publishing Company, Boston, MA, 2000.

[29] H. Saberi Najaﬁ, M. Shams Solary, Computational algrithms for computing the inverse of a square matrix, quasi-inverse of a non-square matrix and

block matrices, Appl. Math. Comput. 183 (2006) 539–550.

[30] G. Schulz, Iterative Berechnung der Reziproken matrix, Z. Angew. Math. Mech. 13 (1933) 57–59.

[31] A. Ben-Israel, D. Cohen, On iterative computation of generalized inverses and associated projections, SIAM J. Numer. Anal. 3 (1966) 410–419.

[32] S.G. Rubin, P.K. Khosla, Navier-Stokes calculations with a coupled strongly implicit method-I: Finite-difference solutions, Comput. Fluids 9 (2) (1981)

163–180.

[33] X.Y. Wu, A note on computational algorithm for the inverse of a square matrix, Appl. Math. Comput. 187 (2) (2007) 962–964.

[34] Changbum Chun, A geometric construction of iterative functions of order three to solve nonlinear equations, Comput. Math. Appl. 53 (2007) 972–976.

[35] V.Y. Pan, M. Kunin, R.E. Rosholt, H. Kodal, Homotopic residual correction process, Math. Comput. 253 (75) (2005) 345–368.

[36] W. Hackbusch, B.N. Khoromskij, E.E. Tyrtyshnikov, Approximate iterative for structured matrices, Numer. Math. 109 (2008) 365–383.

270 H.-B. Li et al. / Applied Mathematics and Computation 218 (2011) 260–270