Page 1

Computational Statistics and Data Analysis 53 (2008) 477–485

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda

Selection between Weibull and lognormal distributions: A comparative

simulation study

Jin Seon Kim, Bong-Jin Yum∗

Department of Industrial Engineering, Korea Advanced Institute of Science and Technology, 373-1, Guseong-Dong, Yuseong-Gu,

Daejeon, 305-701, Republic of Korea

a r t i c l ei n f o

Article history:

Received 15 April 2008

Received in revised form 7 August 2008

Accepted 15 August 2008

Available online 20 August 2008

a b s t r a c t

Howtoselectthecorrectdistributionforagivensetofdataisanimportantissue,especially

when the tail probabilities are of interest as in lifetime data analysis. The Weibull and

lognormal distributions are assumed most often in analyzing lifetime data, and in many

cases, they are competing with each other. In addition, lifetime data are usually censored

due to the constraint on the amount of testing time. A literature review reveals that little

attention has been paid to the selection problems for the case of censored samples. In

this article, relative performances of the two selection procedures, namely, the maximized

likelihood and scale invariant procedures are compared for selecting between the Weibull

and lognormal distributions for the cases of not only complete but also censored samples.

Monte Carlo simulation experiments are conducted for various combinations of the

censoring rate and sample size, and the performance of each procedure is evaluated in

terms of the probability of correct selection (PCS) and average error rate. Then, previously

unknown behaviors and relative performances of the two procedures are summarized.

Computational results suggest that the maximized likelihood procedure can be generally

recommended for censored as well as complete sample cases.

© 2008 Elsevier B.V. All rights reserved.

1. Introduction

Choosing the correct or best-fitting distribution for a given set of data is an important issue, especially when the tail

probability, which is usually sensitive to the assumed model, is of interest as in reliability engineering. Statistical methods

fordistributionchoiceincludeprobabilityplotting(Nelson,1982),goodness-of-fit(GOF)testing,hypothesistesting(HT),and

selection procedures. Probability plotting provides useful information concerning the right distribution. However, it can be

subjective and may yield multiple plots which appear to be equally adequate. Concerning the usual GOF and HT approaches,

one may prefer the latter when one may not wish to reject the distribution in the null hypothesis without an alternative

to take its place. However, the HT approach treats the two distributions asymmetrically, while selection procedures put

candidate distributions on equal footings and can deal with more than two distributions at the same time.

Many authors developed selection procedures with respective selection statistics and decision rules. The two well-

known procedures are those that are respectively based on the maximized likelihood function (MLF) and scale invariant (SI)

selection statistic. For the MLF-based procedure, the reader is referred to Bain and Engelhardt (1980), Kappenman (1982),

Gupta and Kundu (2003, 2004), Kundu and Manglick (2004), Kundu et al. (2005), Strupczewski et al. (2006), and Kundu and

Raqab (2007), among others. The SI selection procedure was developed in Quesenberry and Kent (1982) for selecting among

the exponential, Weibull, lognormal, and gamma distributions. In both procedures, the distribution with the largest value of

the selection statistic is selected. For other selection procedures, the reader is referred to Croes et al. (1998), Marshall et al.

∗Corresponding author. Tel.: +82 42 869 3116; fax: +82 42 869 3110.

E-mail address: bjyum@kaist.ac.kr (B.-J. Yum).

0167-9473/$ – see front matter © 2008 Elsevier B.V. All rights reserved.

doi:10.1016/j.csda.2008.08.012

Page 2

478

J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485

(2001), Cain (2002), Dick (2004) and Mitosek et al. (2006), among others. In addition, Bayesian selection procedures were

developed by Kim et al. (2000), Upadhyay and Peshwani (2003), Araújo and Pereira (2007), etc.

Several authors compared the relative performance of selection procedures. Siswadi and Quesenberry (1982) compared

the SI, scale-shape invariant (SSI), and MLF procedures when complete data are available and the SI and MLF procedures

when the data are Type-I censored. For other comparative studies, the reader is referred to Kappenman (1989), Pandey et al.

(1991), Lee and Pope (2006), Mitosek et al. (2006), and Basu et al. (2008).

A review of the related literature reveals that little attention has been paid to the selection problems for the case of

censoring. Exceptions include Siswadi and Quesenberry (1982) for Type-I censored samples, Croes et al. (1998), Cain (2002)

and Kim et al. (2000) for Type-II censored samples, and Block and Leemis (2008) for randomly right-censored samples.

In this paper, the ratio of the maximized likelihoods (RML) and SI procedures are compared for discriminating between

the Weibull and lognormal distributions for the cases of complete, Type-I censored, and Type-II censored samples (The RML

procedure is equivalent to the MLF procedure in the case of two distributions). The Weibull and lognormal distributions

are most often assumed and competing with each other in analyzing lifetime data (e.g., see Bain (1978), Chen et al. (1987)

and Prendergast et al. (2005) for Time Dependent Dielectric Breakdown (TDDB) data, and Lloyd (1979) and Pinto (1991)

for electromigration lifetime data). In such cases, it is highly desirable to have a means to discriminating between the two

distributions since there is a significant difference in the low percentiles which are of interest in lifetime data analysis.

In addition, lifetime data are usually censored due to the constraint on the amount of testing time, which necessitates an

extensive comparative study of selection procedures for the case of censoring. Siswadi and Quesenberry (1982) compared

the SI and MLF procedures for Type-I censored samples, but only considered the case where the sample size is 30 and the

expected censoring rate is 10%. As can be seen in the simulation results in Section 3, considering some specific cases only

could be misleading in understanding the whole behaviors of the two procedures. In this paper, the performances of the

RML and SI procedures are evaluated in terms of the probability of correct selection (PCS) for various simulated data sets

with different sample sizes and censoring rates.

The rest of this article is organized as follows. In Section 2, the two procedures are described in detail, and Section 3

shows the comparison results obtained from the simulation experiments. Finally, conclusions and guidelines are presented

in Section 4.

2. RML and SI procedures for discriminating between Weibull and lognormal distributions

The probability density functions (pdfs) of the Weibull and lognormal distributions are respectively given as follows.

?t

where η > 0 and β > 0 are the scale and shape parameters, respectively, and

1

√

2πσt

where θ > 0 and σ > 0 are the scale and shape parameters, respectively.

W(η,β) : f(t) =β

ηη

?β−1

exp

?

−

?t

η

?β?

LN(θ,σ) : g(t) =

exp

?

−[ln(t/θ)]2

2σ2

?

2.1. Procedure RML: Maximized likelihood function approach

For the case of complete data, the logarithm of the MLF for the Weibull distribution (i.e., lnLW) and that for the lognormal

distribution (i.e., lnLLN) are given as in Eqs. (1) and (2), respectively.

?

n

2ln2π −

i=1

where ˆ η,ˆβ,ˆθ and ˆ σ are the maximum likelihood estimators (MLEs) of η,β,θ and σ, respectively.

For the case of Type-I censoring, let t1,t2,...,trbe the ordered failure times observed until the censoring time tc. Then,

thelogarithmoftheMLFforeachdistributioncanbederivedasfollowsexceptforthecommonterm,ln[n!/(n − r)!](Siswadi

and Quesenberry, 1982).

?

r

2ln2π −

i=1

where Φ is the cumulative distribution function (cdf) of the standard normal distribution.

lnLW= nlnˆβ − nˆβ ln ˆ η +

ˆβ − 1

?

n

?

i=1

lnti−

n

?

n

?

i=1

?

lnti− lnˆθ

ti/ˆ η?ˆβ

(1)

lnLLN= −nln ˆ σ −

n

?

lnti−

1

2ˆ σ2

i=1

??2

(2)

lnLW= r lnˆβ − rˆβ ln ˆ η +

ˆβ − 1

?

r ?

i=1

lnti−

r ?

r ?

i=1

?

lnti− lnˆθ

ti/ˆ η?ˆβ− (n − r)?

tc/ˆ η?ˆβ

(3)

lnLLN= −r ln ˆ σ −

r ?

lnti−

1

2ˆ σ2

i=1

??2

+ (n − r)ln

?

1 − Φ

?

lntc− lnˆθ

ˆ σ

??

(4)

Page 3

J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485

479

It can be easily shown that the MLFs for the case of Type-II censoring are obtained by replacing the censoring time tcin

Eqs. (3) and (4) with the rth failure time tr.

Let RML = LW/LLNin the above three cases. Then, the RML procedure selects the Weibull distribution if RML > 1 (or

equivalently, if lnLW> lnLLN), and the lognormal distribution otherwise.

2.2. Procedure SI: Scale invariant statistic

QuesenberryandKent(1982)adoptedthescaletransformationmaximalinvariantstatisticsproposedbyHájekandŠidák

(1967) as selection statistics and modified the most powerful scale invariant test procedure of Hájek and Šidák (1967) to

propose the following selection rule for discriminating k candidate distributions given a set of complete data.

Select the jth distribution for which Sj= max{S1,S2,...,Sk}

where S is the scale invariant statistic of Hájek and Šidák (1967).

For the case where k = 2, Quesenberry and Kent (1982) states that the above rule minimizes the sum of the two

probabilities of selecting the incorrect distribution among scale invariant procedures, and also has some other optimal

properties. On the other hand, the above rule may be no longer optimal for selecting between two distributions from

respective classes which involve not only the scale but also the shape parameter. For this case, Quesenberry and Kent

(1982)proposedtoderivetheoptimalselectionruleassumingthattheshapeparameterisknown,andthenascaleinvariant

estimateoftheshapeparameterissubstitutedfortheparameter.ThissuboptimalprocedureiscalledtheSIprocedureinthis

paper. In the present study, the SI procedure is considered instead of the SSI procedure since, as Siswadi and Quesenberry

(1982) showed based on simulation, the performances of the two procedures are almost equivalent, and the former is

computationally easier to implement than the latter.

Let SW and SLNbe the scale invariant statistics for the Weibull and lognormal distributions, respectively. In the SI

procedure, the Weibull distribution is selected if SW> SLN(or if lnSW> lnSLN), and the lognormal distribution otherwise.

For complete data, the logarithms of the scale invariant statistics for the two distributions are given as follows (Quesenberry

and Kent, 1982).

?

?√

where ?(·) denotes the gamma function andˆβ and ˆ σ are the MLEs of β and σ, respectively.

The logarithms of the scale invariant statistics of the Weibull and lognormal distributions based on Type-I censored data

are respectively given as follows except for the common term ln[n!/(n − r)!] (Siswadi and Quesenberry, 1982).

?

?√

+ lnE

ˆ σ

whereˆβ and ˆ σ are the MLEs of β and σ, respectively, and u is a random variable which follows a normal distribution with

i=1lnti/r and variance ˆ σ2/r.

It can be shown that the selection statistics for the case of Type-II censoring are obtained by replacing tcin Eqs. (5) and

(6) with the rth failure time tr.

Notice in Eq. (6) that the selection statistic for the lognormal distribution involves an expectation which is difficult to

analytically determine or time-consuming to evaluate by simulation. This may be the reason why Quesenberry and Kent

(1982) considered the complete sample case only.

lnSW= (n − 1)ln

ˆβ

?

+ ln? (n) +

?

ˆβ − 1

?

n

n

?

i=1

lnti− nln

?

n

?

n

?

i=1

t

ˆβ

i

?

lnSLN= −(n − 1)ln2π ˆ σ

?

−

1

2lnn −

?

i=1

lnti−

1

2ˆ σ2

i=1

(lnti)2−

1

n

?

n

?

i=1

lnti

?2

lnSW= ln? (r) + (r − 1)lnˆβ +

ˆβ − 1

?

r ?

r ?

i=1

lnti− r ln

?

r ?

i=1

t

ˆβ

i+ (n − r)tˆβ

c

?

(5)

lnSLN = −(r − 1)ln2π ˆ σ

?u + lntc

?

−

1

2lnr −

??n−r

i=1

lnti−

1

2ˆ σ2

r ?

i=1

(lnti)2−

1

r

?

r ?

i=1

lnti

?2

?

1 − Φ

(6)

mean −?r

3. Comparisons of the two selection procedures

In this section, the performances of the two procedures are compared for the cases of complete, Type-I censored, and

Type-II censored samples based on simulation. The simulation conditions are as follows.

(i) Sample size (n): 10, 20, 30, 50, 100 and 200

(ii) Censoring rate (c): 0 (complete data), 10, 20, 30, 40, 50, 70 and 90%

Page 4

480

J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485

Table 1

PCS values for complete data

nT ∼ Weibull

RML

T ∼ Lognormal

RML

¯ e = (eW+ eLN)/2

RMLSISISI

10

20

30

50

100

200

0.6793

0.7730

0.8312

0.9039

0.9724

0.9972

0.5988

0.7407

0.8109

0.8942

0.9691

0.9971

0.6598

0.7682

0.8295

0.9043

0.9727

0.9969

0.7430

0.8010

0.8507

0.9113

0.9740

0.9969

0.3304

0.2294

0.1696

0.0959

0.0274

0.0029

0.3291

0.2291

0.1692

0.0972

0.0284

0.0030

(iii) Number of simulation runs (k) for each sample size:

k = 100,000

Toconductsimulationexperiments,thevaluesoftheunknownparametersoftheWeibullorlognormaldistributionmust

be specified for each combination of n, c, and k. It is well known that the RML statistic for a complete or Type-II censored

sample is independent of location-scale parameters (Bain, 1978). Note that selecting between the Weibull and lognormal

distributions is equivalent to selecting between the extreme value and normal distributions which belong to the location-

scale family. In addition, it is shown in Appendix A that the RML statistic for Type-I censored sample is also independent of

location-scale parameters if the censoring rate c is given as in the present simulation experiments. Quesenberry and Kent

(1982) showed that, for the case of a complete sample from the Weibull or lognormal distribution, the selection statistics

of the SI procedure are invariant under scale and shape transformations of the form atb

invariant estimators (e.g., MLEs) of the shape parameters are used. It is shown in Appendix B that the same is true for

censored samples.

The above discussions indicate that the scale and shape parameters of the Weibull and lognormal distributions need not

bevaried,butcanbefixedtosomespecificvalueswithoutanylossofgeneralityinassessingtherelativeperformancesofthe

two selection procedures. In the present simulation experiments, random deviates are generated from W(1,1) or LN(1, 1)

for each combination of n, c and k.

In simulating the cases of Type-I censoring, the censoring time tcis determined according to the given censoring rate c.

For instance, for W(1,1), tcis given by −lnc. For the simulation of Type-II censoring cases, r is determined as n(1 − c).

The performance of each procedure is evaluated by the PCS which is the ratio of the number of simulation runs in which

the procedure selects the true distribution to the total number of simulation runs. In the simulation of Type-I censoring

cases, r could be zero especially when c is large and n is small. In such cases, the MLEs of the unknown parameters of the

Weibull and lognormal distributions do not exist, and therefore, such cases are excluded from calculating the PCS.

In addition to the PCS, selection error rate e(= 1−PCS) is also calculated and the average error rate ¯ e = (eW+ eLN)/2 is

tabulated where, for instance, eWis the error rate when the true distribution is Weibull. All the simulations were conducted

using Matlab (Matlab Statistics Toolbox, 2008).

when n < 100and

k = 50,000when n ≥ 100.

i(a,b > 0,i = 1,2,...,n) if scale

3.1. Complete data

Table1showsthePCSand ¯ evaluesforthetwoprocedures.Asexpected,thePCSvaluesforthetwoproceduresincreaseas

the sample size increases. When the true distribution is Weibull (i.e., T ∼ Weibull in Table 1), procedure RML yields higher

PCS values for all sample sizes considered. On the other hand, the reverse holds when the true distribution is lognormal.

In terms of the average error rate, procedure SI performs slightly better than procedure RML for ‘small’ samples (n ≤ 30,

say), while procedure RML performs slightly better than SI for ‘large’ samples (n ≥ 50, say). That is, in terms of the average

error rate, the two procedures perform similarly. However, comparing the PCS values of procedure SI for both distributions

shows that it has a tendency to be in favor of the lognormal distribution, while procedure RML yields balanced PCS values

for both distributions. In summary, the RML procedure may be recommended for complete samples.

3.2. Type-I censored data

3.2.1. Simulation results

Selected simulation results are shown in Tables 2a–2c in which the censoring rates are 10, 50, and 90%, respectively. The

results for other cases are available from the authors upon request. Let PCS(P|D) be the probability of correct selection for

procedure P when the true distribution is D. The following is a summary of the findings of the simulation experiment.

1. When the true distribution is Weibull, it is observed that:

(a) both PCS(RML|W) and PCS(SI|W) increase as n increases regardless of the censoring rate c; and

(b) PCS(RML|W) is larger than PCS(SI|W) up to a certain sample size n1, beyond which PCS(RML|W) becomes smaller

than PCS(SI|W). n1decreases as c increases.

Page 5

J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485

481

Table 2a

PCS values for the case of Type-I censoring (censoring rate = 10%)

nT ∼ Weibull

RML

T ∼ Lognormal

RML

¯ e = (eW+ eLN)/2

RML SI SI SI

10

20

30

50

100

200

0.5383

0.6413

0.7165

0.8102

0.9182

0.9818

0.4543

0.5959

0.6846

0.7885

0.9113

0.9999

0.7306

0.8057

0.8478

0.8954

0.9588

0.9913

0.7975

0.8379

0.8691

0.9099

0.9614

0.9923

0.3655

0.2765

0.2178

0.1472

0.0615

0.0134

0.3741

0.2831

0.2231

0.1508

0.0636

0.0039

Table 2b

PCS values for the case of Type-I censoring (censoring rate = 50%)

nT ∼ Weibull

RML

T ∼ Lognormal

RML

¯ e = (eW+ eLN)/2

RMLSISI SI

10

20

30

50

100

200

0.2877

0.4161

0.4868

0.5779

0.7116

0.8346

0.1748

0.3851

0.5651

0.8028

0.9805

1

0.8276

0.7983

0.7968

0.8101

0.8513

0.9077

0.9013

0.7949

0.6656

0.3887

0.0639

0.0012

0.4423

0.3928

0.3582

0.3060

0.2185

0.1288

0.4619

0.4100

0.3846

0.4042

0.4778

0.4994

Table 2c

PCS values for the case of Type-I censoring (censoring rate = 90%)

nT ∼ Weibull

RML

T ∼ Lognormal

RML

¯ e = (eW+ eLN)/2

RMLSISISI

10

20

30

50

100

200

0.0358

0.0865

0.1388

0.2268

0.3474

0.4509

0.3059

0.3913

0.5162

0.7621

0.9839

1

0.9708

0.9314

0.8940

0.8378

0.7807

0.7664

0.6975

0.6134

0.4880

0.2405

0.0158

0

0.4967

0.4911

0.4836

0.4677

0.4360

0.3914

0.4983

0.4977

0.4979

0.4987

0.5002

0.5000

2. For the case where the true distribution is lognormal, the followings are observed.

(a) PCS(RML|LN) increases as n increases if c is about 30% or less (e.g., see Table 2a). However, if c is 40% or larger,

PCS(RML|LN) decreases up to a certain sample size n2, beyond which it starts to increase. For instance, when

c = 50% (see Table 2b), PCS(RML|LN) decreases up to n2= 30, and then increases. n2becomes larger as c increases.

AlthoughnotappearedinTable2c,additionalsimulationrunsshowthatPCS(RML|LN)decreasesuptoapproximately

n2= 300, and then increases.

(b) PCS(SI|LN)increasesasnincreasesifc isabout20%orless(e.g.,seeTable2a).However,ifc is30%orlarger,itdecreases

as n increases (e.g., see Tables 2b and 2c).

(c) PCS(SI|LN)islargerthanPCS(RML|LN)uptoacertainsamplesizen3,beyondwhichPCS(SI|LN)becomessmallerthan

PCS(RML|LN). n3decreases as c increases.

3. PCS(RML|LN) is larger than PCS(RML|W) for any combination of n and c considered in the experiment. On the other

hand, PCS(SI|LN) is larger than PCS(SI|W) up to a certain sample size n4, beyond which PCS(SI|LN) becomes smaller than

PCS(SI|W). n4decreases as c increases.

4. The average error rate of the RML procedure is smaller than that of the SI procedure for almost all combinations of n and

c considered in the experiment. It is also noted that the two procedures do not differ appreciably in terms of ¯ e when c

is 20% or less (e.g., see Table 2a). However, when c is 30% or larger, ¯ e of the RML procedure becomes noticeably smaller

than that of the SI procedure if n is about 50 or larger (e.g., see Table 2b).

3.2.2. Interpretations of the results

Findings 2(a) and (b) are counter-intuitive in the sense that PCS values for both procedures may decrease (depending

on c) even if n increases. For the SI procedure, this result can be explained using Eq. (6). In the last term on the right-hand

side of Eq. (6), let

?u + lntc

Note that p is always less than 1. Then, as n increases for a fixed, sufficiently large c, n − r tends to be large, pn−rand its

tends to have a negatively large value, and therefore, lnSLNtends

to be less than lnSW.

p = 1 − Φ

ˆ σ

?

.

expectation tend to approach to 0, the last term lnE?

pn−r?