Page 1

Computational Statistics and Data Analysis 53 (2008) 477–485

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda

Selection between Weibull and lognormal distributions: A comparative

simulation study

Jin Seon Kim, Bong-Jin Yum∗

Department of Industrial Engineering, Korea Advanced Institute of Science and Technology, 373-1, Guseong-Dong, Yuseong-Gu,

Daejeon, 305-701, Republic of Korea

a r t i c l ei n f o

Article history:

Received 15 April 2008

Received in revised form 7 August 2008

Accepted 15 August 2008

Available online 20 August 2008

a b s t r a c t

Howtoselectthecorrectdistributionforagivensetofdataisanimportantissue,especially

when the tail probabilities are of interest as in lifetime data analysis. The Weibull and

lognormal distributions are assumed most often in analyzing lifetime data, and in many

cases, they are competing with each other. In addition, lifetime data are usually censored

due to the constraint on the amount of testing time. A literature review reveals that little

attention has been paid to the selection problems for the case of censored samples. In

this article, relative performances of the two selection procedures, namely, the maximized

likelihood and scale invariant procedures are compared for selecting between the Weibull

and lognormal distributions for the cases of not only complete but also censored samples.

Monte Carlo simulation experiments are conducted for various combinations of the

censoring rate and sample size, and the performance of each procedure is evaluated in

terms of the probability of correct selection (PCS) and average error rate. Then, previously

unknown behaviors and relative performances of the two procedures are summarized.

Computational results suggest that the maximized likelihood procedure can be generally

recommended for censored as well as complete sample cases.

© 2008 Elsevier B.V. All rights reserved.

1. Introduction

Choosing the correct or best-fitting distribution for a given set of data is an important issue, especially when the tail

probability, which is usually sensitive to the assumed model, is of interest as in reliability engineering. Statistical methods

fordistributionchoiceincludeprobabilityplotting(Nelson,1982),goodness-of-fit(GOF)testing,hypothesistesting(HT),and

selection procedures. Probability plotting provides useful information concerning the right distribution. However, it can be

subjective and may yield multiple plots which appear to be equally adequate. Concerning the usual GOF and HT approaches,

one may prefer the latter when one may not wish to reject the distribution in the null hypothesis without an alternative

to take its place. However, the HT approach treats the two distributions asymmetrically, while selection procedures put

candidate distributions on equal footings and can deal with more than two distributions at the same time.

Many authors developed selection procedures with respective selection statistics and decision rules. The two well-

known procedures are those that are respectively based on the maximized likelihood function (MLF) and scale invariant (SI)

selection statistic. For the MLF-based procedure, the reader is referred to Bain and Engelhardt (1980), Kappenman (1982),

Gupta and Kundu (2003, 2004), Kundu and Manglick (2004), Kundu et al. (2005), Strupczewski et al. (2006), and Kundu and

Raqab (2007), among others. The SI selection procedure was developed in Quesenberry and Kent (1982) for selecting among

the exponential, Weibull, lognormal, and gamma distributions. In both procedures, the distribution with the largest value of

the selection statistic is selected. For other selection procedures, the reader is referred to Croes et al. (1998), Marshall et al.

∗Corresponding author. Tel.: +82 42 869 3116; fax: +82 42 869 3110.

E-mail address: bjyum@kaist.ac.kr (B.-J. Yum).

0167-9473/$ – see front matter © 2008 Elsevier B.V. All rights reserved.

doi:10.1016/j.csda.2008.08.012

Page 2

478

J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485

(2001), Cain (2002), Dick (2004) and Mitosek et al. (2006), among others. In addition, Bayesian selection procedures were

developed by Kim et al. (2000), Upadhyay and Peshwani (2003), Araújo and Pereira (2007), etc.

Several authors compared the relative performance of selection procedures. Siswadi and Quesenberry (1982) compared

the SI, scale-shape invariant (SSI), and MLF procedures when complete data are available and the SI and MLF procedures

when the data are Type-I censored. For other comparative studies, the reader is referred to Kappenman (1989), Pandey et al.

(1991), Lee and Pope (2006), Mitosek et al. (2006), and Basu et al. (2008).

A review of the related literature reveals that little attention has been paid to the selection problems for the case of

censoring. Exceptions include Siswadi and Quesenberry (1982) for Type-I censored samples, Croes et al. (1998), Cain (2002)

and Kim et al. (2000) for Type-II censored samples, and Block and Leemis (2008) for randomly right-censored samples.

In this paper, the ratio of the maximized likelihoods (RML) and SI procedures are compared for discriminating between

the Weibull and lognormal distributions for the cases of complete, Type-I censored, and Type-II censored samples (The RML

procedure is equivalent to the MLF procedure in the case of two distributions). The Weibull and lognormal distributions

are most often assumed and competing with each other in analyzing lifetime data (e.g., see Bain (1978), Chen et al. (1987)

and Prendergast et al. (2005) for Time Dependent Dielectric Breakdown (TDDB) data, and Lloyd (1979) and Pinto (1991)

for electromigration lifetime data). In such cases, it is highly desirable to have a means to discriminating between the two

distributions since there is a significant difference in the low percentiles which are of interest in lifetime data analysis.

In addition, lifetime data are usually censored due to the constraint on the amount of testing time, which necessitates an

extensive comparative study of selection procedures for the case of censoring. Siswadi and Quesenberry (1982) compared

the SI and MLF procedures for Type-I censored samples, but only considered the case where the sample size is 30 and the

expected censoring rate is 10%. As can be seen in the simulation results in Section 3, considering some specific cases only

could be misleading in understanding the whole behaviors of the two procedures. In this paper, the performances of the

RML and SI procedures are evaluated in terms of the probability of correct selection (PCS) for various simulated data sets

with different sample sizes and censoring rates.

The rest of this article is organized as follows. In Section 2, the two procedures are described in detail, and Section 3

shows the comparison results obtained from the simulation experiments. Finally, conclusions and guidelines are presented

in Section 4.

2. RML and SI procedures for discriminating between Weibull and lognormal distributions

The probability density functions (pdfs) of the Weibull and lognormal distributions are respectively given as follows.

?t

where η > 0 and β > 0 are the scale and shape parameters, respectively, and

1

√

2πσt

where θ > 0 and σ > 0 are the scale and shape parameters, respectively.

W(η,β) : f(t) =β

ηη

?β−1

exp

?

−

?t

η

?β?

LN(θ,σ) : g(t) =

exp

?

−[ln(t/θ)]2

2σ2

?

2.1. Procedure RML: Maximized likelihood function approach

For the case of complete data, the logarithm of the MLF for the Weibull distribution (i.e., lnLW) and that for the lognormal

distribution (i.e., lnLLN) are given as in Eqs. (1) and (2), respectively.

?

n

2ln2π −

i=1

where ˆ η,ˆβ,ˆθ and ˆ σ are the maximum likelihood estimators (MLEs) of η,β,θ and σ, respectively.

For the case of Type-I censoring, let t1,t2,...,trbe the ordered failure times observed until the censoring time tc. Then,

thelogarithmoftheMLFforeachdistributioncanbederivedasfollowsexceptforthecommonterm,ln[n!/(n − r)!](Siswadi

and Quesenberry, 1982).

?

r

2ln2π −

i=1

where Φ is the cumulative distribution function (cdf) of the standard normal distribution.

lnLW= nlnˆβ − nˆβ ln ˆ η +

ˆβ − 1

?

n

?

i=1

lnti−

n

?

n

?

i=1

?

lnti− lnˆθ

ti/ˆ η?ˆβ

(1)

lnLLN= −nln ˆ σ −

n

?

lnti−

1

2ˆ σ2

i=1

??2

(2)

lnLW= r lnˆβ − rˆβ ln ˆ η +

ˆβ − 1

?

r ?

i=1

lnti−

r ?

r ?

i=1

?

lnti− lnˆθ

ti/ˆ η?ˆβ− (n − r)?

tc/ˆ η?ˆβ

(3)

lnLLN= −r ln ˆ σ −

r ?

lnti−

1

2ˆ σ2

i=1

??2

+ (n − r)ln

?

1 − Φ

?

lntc− lnˆθ

ˆ σ

??

(4)

Page 3

J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485

479

It can be easily shown that the MLFs for the case of Type-II censoring are obtained by replacing the censoring time tcin

Eqs. (3) and (4) with the rth failure time tr.

Let RML = LW/LLNin the above three cases. Then, the RML procedure selects the Weibull distribution if RML > 1 (or

equivalently, if lnLW> lnLLN), and the lognormal distribution otherwise.

2.2. Procedure SI: Scale invariant statistic

QuesenberryandKent(1982)adoptedthescaletransformationmaximalinvariantstatisticsproposedbyHájekandŠidák

(1967) as selection statistics and modified the most powerful scale invariant test procedure of Hájek and Šidák (1967) to

propose the following selection rule for discriminating k candidate distributions given a set of complete data.

Select the jth distribution for which Sj= max{S1,S2,...,Sk}

where S is the scale invariant statistic of Hájek and Šidák (1967).

For the case where k = 2, Quesenberry and Kent (1982) states that the above rule minimizes the sum of the two

probabilities of selecting the incorrect distribution among scale invariant procedures, and also has some other optimal

properties. On the other hand, the above rule may be no longer optimal for selecting between two distributions from

respective classes which involve not only the scale but also the shape parameter. For this case, Quesenberry and Kent

(1982)proposedtoderivetheoptimalselectionruleassumingthattheshapeparameterisknown,andthenascaleinvariant

estimateoftheshapeparameterissubstitutedfortheparameter.ThissuboptimalprocedureiscalledtheSIprocedureinthis

paper. In the present study, the SI procedure is considered instead of the SSI procedure since, as Siswadi and Quesenberry

(1982) showed based on simulation, the performances of the two procedures are almost equivalent, and the former is

computationally easier to implement than the latter.

Let SW and SLNbe the scale invariant statistics for the Weibull and lognormal distributions, respectively. In the SI

procedure, the Weibull distribution is selected if SW> SLN(or if lnSW> lnSLN), and the lognormal distribution otherwise.

For complete data, the logarithms of the scale invariant statistics for the two distributions are given as follows (Quesenberry

and Kent, 1982).

?

?√

where ?(·) denotes the gamma function andˆβ and ˆ σ are the MLEs of β and σ, respectively.

The logarithms of the scale invariant statistics of the Weibull and lognormal distributions based on Type-I censored data

are respectively given as follows except for the common term ln[n!/(n − r)!] (Siswadi and Quesenberry, 1982).

?

?√

+ lnE

ˆ σ

whereˆβ and ˆ σ are the MLEs of β and σ, respectively, and u is a random variable which follows a normal distribution with

i=1lnti/r and variance ˆ σ2/r.

It can be shown that the selection statistics for the case of Type-II censoring are obtained by replacing tcin Eqs. (5) and

(6) with the rth failure time tr.

Notice in Eq. (6) that the selection statistic for the lognormal distribution involves an expectation which is difficult to

analytically determine or time-consuming to evaluate by simulation. This may be the reason why Quesenberry and Kent

(1982) considered the complete sample case only.

lnSW= (n − 1)ln

ˆβ

?

+ ln? (n) +

?

ˆβ − 1

?

n

n

?

i=1

lnti− nln

?

n

?

n

?

i=1

t

ˆβ

i

?

lnSLN= −(n − 1)ln2π ˆ σ

?

−

1

2lnn −

?

i=1

lnti−

1

2ˆ σ2

i=1

(lnti)2−

1

n

?

n

?

i=1

lnti

?2

lnSW= ln? (r) + (r − 1)lnˆβ +

ˆβ − 1

?

r ?

r ?

i=1

lnti− r ln

?

r ?

i=1

t

ˆβ

i+ (n − r)tˆβ

c

?

(5)

lnSLN = −(r − 1)ln2π ˆ σ

?u + lntc

?

−

1

2lnr −

??n−r

i=1

lnti−

1

2ˆ σ2

r ?

i=1

(lnti)2−

1

r

?

r ?

i=1

lnti

?2

?

1 − Φ

(6)

mean −?r

3. Comparisons of the two selection procedures

In this section, the performances of the two procedures are compared for the cases of complete, Type-I censored, and

Type-II censored samples based on simulation. The simulation conditions are as follows.

(i) Sample size (n): 10, 20, 30, 50, 100 and 200

(ii) Censoring rate (c): 0 (complete data), 10, 20, 30, 40, 50, 70 and 90%

Page 4

480

J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485

Table 1

PCS values for complete data

nT ∼ Weibull

RML

T ∼ Lognormal

RML

¯ e = (eW+ eLN)/2

RMLSISISI

10

20

30

50

100

200

0.6793

0.7730

0.8312

0.9039

0.9724

0.9972

0.5988

0.7407

0.8109

0.8942

0.9691

0.9971

0.6598

0.7682

0.8295

0.9043

0.9727

0.9969

0.7430

0.8010

0.8507

0.9113

0.9740

0.9969

0.3304

0.2294

0.1696

0.0959

0.0274

0.0029

0.3291

0.2291

0.1692

0.0972

0.0284

0.0030

(iii) Number of simulation runs (k) for each sample size:

k = 100,000

Toconductsimulationexperiments,thevaluesoftheunknownparametersoftheWeibullorlognormaldistributionmust

be specified for each combination of n, c, and k. It is well known that the RML statistic for a complete or Type-II censored

sample is independent of location-scale parameters (Bain, 1978). Note that selecting between the Weibull and lognormal

distributions is equivalent to selecting between the extreme value and normal distributions which belong to the location-

scale family. In addition, it is shown in Appendix A that the RML statistic for Type-I censored sample is also independent of

location-scale parameters if the censoring rate c is given as in the present simulation experiments. Quesenberry and Kent

(1982) showed that, for the case of a complete sample from the Weibull or lognormal distribution, the selection statistics

of the SI procedure are invariant under scale and shape transformations of the form atb

invariant estimators (e.g., MLEs) of the shape parameters are used. It is shown in Appendix B that the same is true for

censored samples.

The above discussions indicate that the scale and shape parameters of the Weibull and lognormal distributions need not

bevaried,butcanbefixedtosomespecificvalueswithoutanylossofgeneralityinassessingtherelativeperformancesofthe

two selection procedures. In the present simulation experiments, random deviates are generated from W(1,1) or LN(1, 1)

for each combination of n, c and k.

In simulating the cases of Type-I censoring, the censoring time tcis determined according to the given censoring rate c.

For instance, for W(1,1), tcis given by −lnc. For the simulation of Type-II censoring cases, r is determined as n(1 − c).

The performance of each procedure is evaluated by the PCS which is the ratio of the number of simulation runs in which

the procedure selects the true distribution to the total number of simulation runs. In the simulation of Type-I censoring

cases, r could be zero especially when c is large and n is small. In such cases, the MLEs of the unknown parameters of the

Weibull and lognormal distributions do not exist, and therefore, such cases are excluded from calculating the PCS.

In addition to the PCS, selection error rate e(= 1−PCS) is also calculated and the average error rate ¯ e = (eW+ eLN)/2 is

tabulated where, for instance, eWis the error rate when the true distribution is Weibull. All the simulations were conducted

using Matlab (Matlab Statistics Toolbox, 2008).

when n < 100and

k = 50,000when n ≥ 100.

i(a,b > 0,i = 1,2,...,n) if scale

3.1. Complete data

Table1showsthePCSand ¯ evaluesforthetwoprocedures.Asexpected,thePCSvaluesforthetwoproceduresincreaseas

the sample size increases. When the true distribution is Weibull (i.e., T ∼ Weibull in Table 1), procedure RML yields higher

PCS values for all sample sizes considered. On the other hand, the reverse holds when the true distribution is lognormal.

In terms of the average error rate, procedure SI performs slightly better than procedure RML for ‘small’ samples (n ≤ 30,

say), while procedure RML performs slightly better than SI for ‘large’ samples (n ≥ 50, say). That is, in terms of the average

error rate, the two procedures perform similarly. However, comparing the PCS values of procedure SI for both distributions

shows that it has a tendency to be in favor of the lognormal distribution, while procedure RML yields balanced PCS values

for both distributions. In summary, the RML procedure may be recommended for complete samples.

3.2. Type-I censored data

3.2.1. Simulation results

Selected simulation results are shown in Tables 2a–2c in which the censoring rates are 10, 50, and 90%, respectively. The

results for other cases are available from the authors upon request. Let PCS(P|D) be the probability of correct selection for

procedure P when the true distribution is D. The following is a summary of the findings of the simulation experiment.

1. When the true distribution is Weibull, it is observed that:

(a) both PCS(RML|W) and PCS(SI|W) increase as n increases regardless of the censoring rate c; and

(b) PCS(RML|W) is larger than PCS(SI|W) up to a certain sample size n1, beyond which PCS(RML|W) becomes smaller

than PCS(SI|W). n1decreases as c increases.

Page 5

J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485

481

Table 2a

PCS values for the case of Type-I censoring (censoring rate = 10%)

nT ∼ Weibull

RML

T ∼ Lognormal

RML

¯ e = (eW+ eLN)/2

RML SI SI SI

10

20

30

50

100

200

0.5383

0.6413

0.7165

0.8102

0.9182

0.9818

0.4543

0.5959

0.6846

0.7885

0.9113

0.9999

0.7306

0.8057

0.8478

0.8954

0.9588

0.9913

0.7975

0.8379

0.8691

0.9099

0.9614

0.9923

0.3655

0.2765

0.2178

0.1472

0.0615

0.0134

0.3741

0.2831

0.2231

0.1508

0.0636

0.0039

Table 2b

PCS values for the case of Type-I censoring (censoring rate = 50%)

nT ∼ Weibull

RML

T ∼ Lognormal

RML

¯ e = (eW+ eLN)/2

RMLSISI SI

10

20

30

50

100

200

0.2877

0.4161

0.4868

0.5779

0.7116

0.8346

0.1748

0.3851

0.5651

0.8028

0.9805

1

0.8276

0.7983

0.7968

0.8101

0.8513

0.9077

0.9013

0.7949

0.6656

0.3887

0.0639

0.0012

0.4423

0.3928

0.3582

0.3060

0.2185

0.1288

0.4619

0.4100

0.3846

0.4042

0.4778

0.4994

Table 2c

PCS values for the case of Type-I censoring (censoring rate = 90%)

nT ∼ Weibull

RML

T ∼ Lognormal

RML

¯ e = (eW+ eLN)/2

RMLSISISI

10

20

30

50

100

200

0.0358

0.0865

0.1388

0.2268

0.3474

0.4509

0.3059

0.3913

0.5162

0.7621

0.9839

1

0.9708

0.9314

0.8940

0.8378

0.7807

0.7664

0.6975

0.6134

0.4880

0.2405

0.0158

0

0.4967

0.4911

0.4836

0.4677

0.4360

0.3914

0.4983

0.4977

0.4979

0.4987

0.5002

0.5000

2. For the case where the true distribution is lognormal, the followings are observed.

(a) PCS(RML|LN) increases as n increases if c is about 30% or less (e.g., see Table 2a). However, if c is 40% or larger,

PCS(RML|LN) decreases up to a certain sample size n2, beyond which it starts to increase. For instance, when

c = 50% (see Table 2b), PCS(RML|LN) decreases up to n2= 30, and then increases. n2becomes larger as c increases.

AlthoughnotappearedinTable2c,additionalsimulationrunsshowthatPCS(RML|LN)decreasesuptoapproximately

n2= 300, and then increases.

(b) PCS(SI|LN)increasesasnincreasesifc isabout20%orless(e.g.,seeTable2a).However,ifc is30%orlarger,itdecreases

as n increases (e.g., see Tables 2b and 2c).

(c) PCS(SI|LN)islargerthanPCS(RML|LN)uptoacertainsamplesizen3,beyondwhichPCS(SI|LN)becomessmallerthan

PCS(RML|LN). n3decreases as c increases.

3. PCS(RML|LN) is larger than PCS(RML|W) for any combination of n and c considered in the experiment. On the other

hand, PCS(SI|LN) is larger than PCS(SI|W) up to a certain sample size n4, beyond which PCS(SI|LN) becomes smaller than

PCS(SI|W). n4decreases as c increases.

4. The average error rate of the RML procedure is smaller than that of the SI procedure for almost all combinations of n and

c considered in the experiment. It is also noted that the two procedures do not differ appreciably in terms of ¯ e when c

is 20% or less (e.g., see Table 2a). However, when c is 30% or larger, ¯ e of the RML procedure becomes noticeably smaller

than that of the SI procedure if n is about 50 or larger (e.g., see Table 2b).

3.2.2. Interpretations of the results

Findings 2(a) and (b) are counter-intuitive in the sense that PCS values for both procedures may decrease (depending

on c) even if n increases. For the SI procedure, this result can be explained using Eq. (6). In the last term on the right-hand

side of Eq. (6), let

?u + lntc

Note that p is always less than 1. Then, as n increases for a fixed, sufficiently large c, n − r tends to be large, pn−rand its

tends to have a negatively large value, and therefore, lnSLNtends

to be less than lnSW.

p = 1 − Φ

ˆ σ

?

.

expectation tend to approach to 0, the last term lnE?

pn−r?

Page 6

482

J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485

Table 3a

PCS values for the case of Type-II censoring (censoring rate = 10%)

n/rT ∼ Weibull

RML

T ∼ Lognormal

RML

¯ e = (eW+ eLN)/2

RMLSISISI

10/9

20/18

30/27

50/45

100/90

200/180

0.6231

0.7107

0.7681

0.8442

0.9339

0.9850

0.5330

0.6625

0.7367

0.8258

0.9267

0.9844

0.6503

0.7390

0.7926

0.8601

0.9389

0.9878

0.7403

0.7833

0.8234

0.8776

0.9457

0.9881

0.3633

0.2751

0.2196

0.1478

0.0636

0.0136

0.3633

0.2771

0.2199

0.1483

0.0638

0.0137

Table 3b

PCS values for the case of Type-II censoring (censoring rate = 50%)

n/rT ∼ Weibull

RML

T ∼ Lognormal

RML

¯ e = (eW+ eLN)/2

RMLSISISI

10/5

20/10

30/15

50/25

100/50

200/100

0.5213

0.5665

0.6062

0.6634

0.7583

0.8604

0.2998

0.4856

0.6288

0.8326

0.9838

0.9999

0.5870

0.6429

0.6791

0.7262

0.8002

0.8819

0.7925

0.6969

0.5869

0.3299

0.0475

0.0006

0.4458

0.3953

0.3573

0.3052

0.2207

0.1288

0.4538

0.4087

0.3921

0.4187

0.4843

0.4997

Table 3c

PCS values for the case of Type-II censoring (censoring rate = 90%)

n/rT ∼ Weibull

RML

T ∼ Lognormal

RML

¯ e = (eW+ eLN)/2

RMLSISISI

10/1

20/2

30/3

50/5

100/10

200/20

–a

1

0.4885

0.4861

0.5120

0.5590

–

0.3466

0.5224

0.8258

0.9267

1

–

0

0.5394

0.5782

0.6174

0.6538

–

0.6593

0.4816

0.1980

0.0093

0

–

0.5000

0.4860

0.4678

0.4353

0.3936

–

0.4970

0.4980

0.4881

0.5320

0.5000

aNot possible.

Finding 2(a) implies that as c increases the discriminating power of the RML procedure is lowered up to sample

size n2, beyond which it is recovered. We are currently unable to explain why such a phenomenon occurs for the RML

procedure.

Finding 3 implies that the RML procedure tends to favor the lognormal distribution for all combinations of c and n. On the

other hand, the SI procedure tends to favor the lognormal distribution for relatively small n’s, but tends to favor the Weibull

distribution for large n’s. These tendencies differ from those observed in complete sample cases.

3.3. Type-II censored data

As in the case of Type-I censoring, selected simulation results are presented in Tables 3a–3c.

ThebehaviorsofthePCSand¯ ewithrespecttotheselectionprocedure,c andnaresimilartothosefortheType-Icensoring

case except the following.

1. If r = 1, the MLEs of the unknown parameters of the Weibull and lognormal distributions do not exist. Therefore, it is

not possible to select a distribution using the RML or SI procedure (see Table 3c).

2. If r

=

PCS(RML|W) = 1) as shown in Table 3c. In fact, our additional computational results (not shown here) show that this

holds regardless of n as long as r = 2.

3. Unlike the case of Type-I censoring (see Finding 2(a) in Section 3.2.1), PCS(RML|LN) increases as n increases regardless

of c.

4. The tendency that the RML procedure favors the lognormal distribution is still present but weakened compared to the

case of Type-I censoring (see Finding 3 for the case of Type-I censoring).

2 and the true distribution is Weibull, the RML procedure always selects the Weibull distribution (i.e.,

4. Conclusions

Relative performances of the RML and SI procedures are compared for selecting between the Weibull and lognormal

distributions. Not only complete but also censored sample cases are considered for various combinations of n and c using

Monte Carlo simulation.

Page 7

J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485

483

Notable behaviors of the two selection procedures, observed in the present study, but not recognized in the previous

works, include: (1) the decrease in the PCS(SI|LN) as n increases if c is about 30% or larger; (2) in the Type-I censoring case,

the decrease in the PCS(RML|LN) up to a certain n if c is about 40% or larger; (3) the tendency of the SI procedure to favor

the Weibull distribution for moderate to heavily censored cases if n exceeds a certain threshold; and (4) the tendency of the

RML procedure to favor the lognormal distribution for censored samples for all combinations of n and c considered.

As for the relative performance of the RML and SI procedures, computational results indicate that: (1) for complete

samples,theRMLprocedureyieldsbalancedPCSvaluesforbothdistributions,although ¯ e’softhetwoproceduresaresimilar;

and (2) for censored samples, if c is about 20% or less, ¯ e’s of the two procedures are not appreciably different. However,

under heavier censoring and for relatively large n’s, ¯ e of the RML procedure becomes noticeably smaller than that of the SI

procedure. In addition, the SI procedure involves additional time-consuming simulation runs to evaluate the last term on

the right-hand side of Eq. (6).

Insummary,theRMLprocedureisrecommendedtoselectbetweentheWeibullandlognormaldistributionsforcensored

as well as complete samples with an understanding that it (and any other selection procedures) may require a large n and/or

large number of failures for censored cases to lower ¯ e and that it tends to favor the lognormal distribution especially in the

caseofType-Icensoring.Inreliabilityengineering,theproblemofalargenand/orlargenumberoffailurescouldbealleviated

by employing an accelerated life test.

Further investigation is needed to explain why the PCS of the RML procedure when the true distribution is lognormal

decreases up to a certain sample size and then increases under moderate to heavy Type-I censoring. In addition, various

extended Weibull and lognormal distributions have recently appeared in the literature (e.g., see Murthy et al. (2004),

Al-Saleh and Agarwal (2006), Pham and Lai (2007), Sultan et al. (2007) and Zhang and Xie (2007) for extended Weibull

distributions; and Flynn (2004), Flynn (2005), Chen (2006), and Vera and Díaz-García (2008) for extended lognormal

distributions). They generally fit the data better than the two-parameter distributions, although the difference in fits to

the data could be insignificant (Alqam et al., 2002) or may depend on the selection criterion adopted (Lu et al., 2002).

This suggests that it would be a fruitful area of future research to extend the present study to the cases of those extended

distributions and/or other selection procedures than the RML or SI procedure.

Acknowledgements

We would like to thank the Co-Editor, Associate Editor and two anonymous referees for their constructive suggestions

that improved the original manuscript.

Appendix A. Properties of the RML for Type-I censored samples

Suppose that random variable X follows a distribution which belongs to the location-scale family. That is, the pdf and

cdf of X are respectively given by1

bb

represent the ordered observations on X until the censoring time xcand assume that we wish to discriminate between two

distributions for X. Then,

r?

max

a2,b2

i=1

Let a0and b0be the true values of a and b, respectively. Then,

?

max

a2,b2

i=1

max

˜ a1,˜b1

max

˜ a2,˜b2

where

zi= (xi− a0)/b0,zc= (xc− a0)/b0,˜b = b/b0, ˜ a = (a − a0)/b0.

Note that zi’s are observations on a standardized variable, and therefore, independent of unknown parameters. In addition,

?xc− a0

bg?x−a

??

??

?

and G?x−a

?

1 − G2

?

where −∞ < x < ∞,−∞ < a < ∞,and b > 0. Let x1,x2,...,xr

RML =

max

a1,b1

b−r

1

i=1

r?

g1

?

?

xi−a1

b1

1 − G1

xc−a1

b1

??n−r

??n−r.

b−r

2

g2

xi−a2

b2

?

xc−a2

b2

RML =

max

a1,b1

b−r

0

b1

b0

?−r

?−r

g1

r?

i=1

r?

zi−˜ a1

˜b1

g1

?

?

(xi−a0)/b0−(a1−a0)/b0

b1/b0

??

??

1 − G1

?

?

(xc−a0)/b0−(a1−a0)/b0

b1/b0

??n−r

??n−r

b−r

0

?

r?

b2

b0

g2

(xi−a0)/b0−(a2−a0)/b0

b2/b0

??

??

1 − G2

(xc−a0)/b0−(a2−a0)/b0

b2/b0

=

˜b−r

1

i=1

r?

?

?

1 − G1

?

?

zc−˜ a1

˜b1

??n−r

??n−r

˜b−r

2

i=1

g2

zi−˜ a2

˜b2

1 − G2

zc−˜ a2

˜b2

G

b0

?

= G(zc) = 1 − c

Page 8

484

J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485

where c is the censoring rate. Therefore, if c is given, zcis completely specified. In summary, the RML for a Type-I censored

sample is independent of location-scale parameters if the censoring rate is given.

Appendix B. Properties of the SI procedure for censored samples

Consider the following scale and shape transformation of random variable T.

Y = aTb,(a,b > 0).

It is well known that if T ∼ W (η,β), then Y ∼ W?

by (see Eq. (5))

aηb,β/b?

, and if T ∼ LN(θ,σ), then Y ∼ LN?

aθb,bσ?

.

First, consider the case of Type-I censoring. The logarithm of the scale invariant statistic of Y when T is Weibull is given

lnS?

W= ln? (r) + (r − 1)ln

?ˆβ

b

?

?

+

?ˆβ

b− 1

?

r ?

i=1

lnatb

i− r ln

?

i=1

?

r ?

i+ (n − r)tˆβ

i=1

?

atb

i

?ˆβ

b+ (n − r)?

?

atb

c

?ˆβ

b

?

= ln? (r) + (r − 1)lnˆβ +

ˆβ − 1

?

?

r ?

i=1

lnti− r ln

r ?

t

ˆβ

c

− (r − 1)lnb +

r

?

ˆβ − b

b

lna + (1 − b)

r ?

lnti

i=1

lnti−

rˆβ

b

lna

= lnSW− (r − 1)lnb − r lna + (1 − b)

r ?

i=1

where lnSWis defined in Eq. (5). Similarly, the logarithm of the scale invariant statistic of Y when T is lognormal is given by

(see Eq. (6))

lnS?

LN= −(r − 1)ln

?√

r ?

?√

2πbˆ σ

?

?2−

?

−

1

2lnr −

?

r ?

i=1

lnatb

i

−

1

2b2ˆ σ2

i=1

?

lnatb

i

1

r

r ?

i=1

lnatb

i

?2

+ lnE

1

2ˆ σ2

?

1 − Φ

?u?+ lnatb

c

bˆ σ

??n−r

= −(r − 1)ln2π ˆ σ

−

1

2lnr −

??n−r

r ?

i=1

lnti−

r ?

i=1

(lnti)2−

1

r

?

r ?

r ?

i=1

lnti

?2

+ lnE

?

1 − Φ

?u?+ lnatb

i/r,b2ˆ σ2/r?

= Φ

c

bˆ σ

− (r − 1)lnb − r lna + (1 − b)

i=1

lnti

where u?∼ N?−?r

Φ

i=1lnatb

?

. Note that

?u?+ lnatb

u?+ lna?/b follows the same distribution as u in Eq. (6). Therefore,

lnS?

c

bˆ σ

??

u?+ lna?/b + lntc

ˆ σ

?

and?

LN= lnSLN− (r − 1)lnb − r lna + (1 − b)

r ?

i=1

lnti.

Consequently,

lnS?

W− lnS?

LN= lnSW− lnSLN.

That is, the selection statistics of the SI procedure are invariant under scale and shape transformations for Type-I censored

samples. Similar arguments hold for Type-II censored samples.

Page 9

J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485

485

References

Alqam, M., Bennett, R.M., Zureick, A.-H., 2002. Three-parameter vs. two-parameter Weibull distribution for pultruded composite material properties.

Composite Structures 58 (4), 497–503.

Al-Saleh, J.A., Agarwal, S.K., 2006. Extended Weibull type distribution and finite mixture of distributions. Statistical Methodology 3 (3), 224–233.

Araújo, M.I., Pereira, B.B., 2007. A comparison of Bayes factors for separated models: Some simulation results. Communications in Statistics-Simulation and

Computation 36 (2), 297–309.

Bain, L.J., 1978. Statistical Analysis of Reliability and Life-testing Models. Marcel Dekker, New York.

Bain, L.J., Engelhardt, M., 1980. Probability of correct selection of Weibull versus gamma based on likelihood ratio. Communications in Statistics-Theory

and Methods 9 (4), 375–381.

Basu, B., Tiwari, D., Kundu, D., Prasad, R., 2008. Is Weibull distribution the most appropriate statistical strength distribution for brittle materials? Ceramics

International. doi:10.1016/j.ceramint.2007.10.003.

Block, A.D., Leemis, L.M., 2008. Parametric model discrimination for heavily censored survival data. IEEE Transactions on Reliability 57 (2), 248–259.

Cain, S.R., 2002. Distinguishing between lognormal and Weibull distributions. IEEE Transactions on Reliability 51 (1), 32–38.

Chen, C., 2006. Tests of fit for the three-parameter lognormal distribution. Computational Statistics and Data Analysis 50 (6), 1418–1440.

Chen, C.-F., Wu, C.-Y., Lee, M.-K., Chen, C.-N., 1987. The dielectric reliability of intrinsic thin SiO2films thermally grown on a heavily doped Si substrate—

Characterization and modeling. IEEE Transactions on Electron Devices 34 (7), 1540–1552.

Croes, K., Manca, V., Ceuninck, W.D., Schepper, L.D., Molenberghs, G., 1998. The time of ‘‘guessing’’ your failure time distribution is over! Microelectronics

Reliability 38 (6–8), 1187–1191.

Dick, E.J., 2004. Beyond lognormal versus gamma: Discrimination among error distributions for generalized linear models. Fisheries Research 70 (2–3),

351–366.

Flynn, M.R., 2004. The 4-parameter lognormal (SB) model of human exposure. Annals of Occupational Hygiene 48 (7), 617–622.

Flynn, M.R., 2005. On the moments of the 4-parameter lognormal distribution. Communications in Statistics-Theory and Methods 34 (4), 745–751.

Gupta, R.D., Kundu, D., 2003. Discriminating between Weibull and generalized exponential distributions. Computational Statistics and Data Analysis 43

(2), 179–196.

Gupta, R.D., Kundu, D., 2004. Discriminating between gamma and generalized exponential distributions. Journal of Statistical Computation & Simulation

74 (2), 107–121.

Hájek, J., Šidák, Z., 1967. Theory of Rank Tests. Academic Press, New York.

Kappenman, R.F., 1982. On a method for selecting a distributional model. Communication in Statistics-Theory and Methods 11 (6), 663–672.

Kappenman, R.F., 1989. A simple method for choosing between the lognormal and Weibull models. Statistic & Probability Letters 7 (22), 123–126.

Kim,D.H., Lee,W.D.,Kang, S.G.,2000. Bayesian modelselectionfor lifetimedata undertypeII censoring.CommunicationsinStatistics-Theory andMethods

29 (12), 2865–2878.

Kundu, D., Manglick, A., 2004. Discriminating between the Weibull and log-normal distributions. Naval Research Logistics 51 (6), 893–905.

Kundu, D., Raqab, M.Z., 2007. Discriminating between the generalized Rayleigh and log-normal distribution. Statistics 41 (6), 505–515.

Kundu, D., Gupta, R.D., Manglick, A., 2005. Discriminating between the log-normal and generalized exponential distributions. Journal of Statistical Planning

and Inference 127 (1–2), 213–227.

Lee, M.D., Pope, K.J., 2006. Model selection for the rate problem: A comparison of significance testing, Bayesian, and minimum description length statistical

inference. Journal of Mathematical Psychology 50 (2), 193–202.

Lloyd, J.R., 1979. On the log-normal distribution of electromigration lifetimes. Journal of Applied Physics 50 (7), 5062–5064.

Lu, C., Danzer, R., Fischer, F.D., 2002. Fracture statistics of brittle materials: Weibull or normal distribution. Physical Review E 65, 1–4.

Marshall, A.W., Meza, J.C., Olkin, I., 2001. Can data recognize its parent distribution? Journal of Computational and Graphical Statistics 10 (3), 555–580.

Matlab Statistics Toolbox (2008), http://www.mathworks.com/access/helpdesk/help/toolbox/stats/.

Mitosek,H.T.,Strupczewski,W.G.,Singh,V.P.,2006.Threeproceduresforselectionofannualfloodpeakdistribution.JournalofHydrology323(1–4),57–73.

Murthy,D.N.P.,Bulmer,M.,Eccleston,J.A.,2004.Weibullmodelselectionforreliabilitymodelling.ReliabilityEngineeringandSystemSafety86(3),257–267.

Nelson, W., 1982. Applied Life Data Analysis. John Wiley & Sons, New York.

Pandey, M., Ferdous, J., Uddin, M.B., 1991. Selection of probability distribution for life testing data. Communications in Statistics-Theory and Methods 20

(4), 1373–1388.

Pham, H., Lai, C.-D., 2007. On recent generalizations of the Weibull distribution. IEEE Transactions on Reliability 56 (3), 454–458.

Pinto, M., 1991. The effect of barrier layers on the distribution function of interconnect electromigration failures. Quality and Reliability Engineering

International 7 (4), 287–291.

Prendergast, J., O’Driscoll, E., Mullen, E., 2005. Investigation into the correct statistical distribution for oxide breakdown over oxide thickness range.

Microelectronics Reliability 45 (5–6), 973–977.

Quesenberry, C.P., Kent, J., 1982. Selecting among probability distributions in reliability. Technometrics 24 (1), 59–65.

Siswadi, Quesenberry, C.P., 1982. Selecting among Weibull, lognormal and gamma distributions using complete and censored samples. Naval Research

Logistics Quarterly 29 (4), 557–569.

Strupczewski, W.G., Mitosek, H.T., Kochanek, K., Singh, V.P., Weglarczyk, S., 2006. Probability of correct selection from lognormal and convective diffusion

models based on the likelihood ratio. Stochastic Environmental Research and Risk Assessment 20 (3), 152–163.

Sultan, K.S., Ismail, M.A., Al-Moisheer, A.S., 2007. Mixture of two inverse Weibull distributions: Properties and estimation. Computational Statistics and

Data Analysis 51 (11), 5377–5387.

Upadhyay, S.K., Peshwani, M., 2003. Choice between Weibull and lognormal models: A simulation based Bayesian study. Communications in Statistics-

Theory and Methods 32 (2), 381–405.

Vera, J.F.,Díaz-García, J.A., 2008.A global simulatedannealing heuristic forthe three-parameter lognormalmaximum likelihoodestimation. Computational

Statistics and Data Analysis 52 (12), 5055–5065.

Zhang, T., Xie, M., 2007. Failure data analysis with extended Weibull distribution. Communications in Statistics-Simulation and Computation 36 (3),

579–592.