Page 1
Computational Statistics and Data Analysis 53 (2008) 477–485
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis
journal homepage: www.elsevier.com/locate/csda
Selection between Weibull and lognormal distributions: A comparative
simulation study
Jin Seon Kim, Bong-Jin Yum∗
Department of Industrial Engineering, Korea Advanced Institute of Science and Technology, 373-1, Guseong-Dong, Yuseong-Gu,
Daejeon, 305-701, Republic of Korea
a r t i c l ei n f o
Article history:
Received 15 April 2008
Received in revised form 7 August 2008
Accepted 15 August 2008
Available online 20 August 2008
a b s t r a c t
Howtoselectthecorrectdistributionforagivensetofdataisanimportantissue,especially
when the tail probabilities are of interest as in lifetime data analysis. The Weibull and
lognormal distributions are assumed most often in analyzing lifetime data, and in many
cases, they are competing with each other. In addition, lifetime data are usually censored
due to the constraint on the amount of testing time. A literature review reveals that little
attention has been paid to the selection problems for the case of censored samples. In
this article, relative performances of the two selection procedures, namely, the maximized
likelihood and scale invariant procedures are compared for selecting between the Weibull
and lognormal distributions for the cases of not only complete but also censored samples.
Monte Carlo simulation experiments are conducted for various combinations of the
censoring rate and sample size, and the performance of each procedure is evaluated in
terms of the probability of correct selection (PCS) and average error rate. Then, previously
unknown behaviors and relative performances of the two procedures are summarized.
Computational results suggest that the maximized likelihood procedure can be generally
recommended for censored as well as complete sample cases.
© 2008 Elsevier B.V. All rights reserved.
1. Introduction
Choosing the correct or best-fitting distribution for a given set of data is an important issue, especially when the tail
probability, which is usually sensitive to the assumed model, is of interest as in reliability engineering. Statistical methods
fordistributionchoiceincludeprobabilityplotting(Nelson,1982),goodness-of-fit(GOF)testing,hypothesistesting(HT),and
selection procedures. Probability plotting provides useful information concerning the right distribution. However, it can be
subjective and may yield multiple plots which appear to be equally adequate. Concerning the usual GOF and HT approaches,
one may prefer the latter when one may not wish to reject the distribution in the null hypothesis without an alternative
to take its place. However, the HT approach treats the two distributions asymmetrically, while selection procedures put
candidate distributions on equal footings and can deal with more than two distributions at the same time.
Many authors developed selection procedures with respective selection statistics and decision rules. The two well-
known procedures are those that are respectively based on the maximized likelihood function (MLF) and scale invariant (SI)
selection statistic. For the MLF-based procedure, the reader is referred to Bain and Engelhardt (1980), Kappenman (1982),
Gupta and Kundu (2003, 2004), Kundu and Manglick (2004), Kundu et al. (2005), Strupczewski et al. (2006), and Kundu and
Raqab (2007), among others. The SI selection procedure was developed in Quesenberry and Kent (1982) for selecting among
the exponential, Weibull, lognormal, and gamma distributions. In both procedures, the distribution with the largest value of
the selection statistic is selected. For other selection procedures, the reader is referred to Croes et al. (1998), Marshall et al.
∗Corresponding author. Tel.: +82 42 869 3116; fax: +82 42 869 3110.
E-mail address: bjyum@kaist.ac.kr (B.-J. Yum).
0167-9473/$ – see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.csda.2008.08.012
Page 2
478
J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485
(2001), Cain (2002), Dick (2004) and Mitosek et al. (2006), among others. In addition, Bayesian selection procedures were
developed by Kim et al. (2000), Upadhyay and Peshwani (2003), Araújo and Pereira (2007), etc.
Several authors compared the relative performance of selection procedures. Siswadi and Quesenberry (1982) compared
the SI, scale-shape invariant (SSI), and MLF procedures when complete data are available and the SI and MLF procedures
when the data are Type-I censored. For other comparative studies, the reader is referred to Kappenman (1989), Pandey et al.
(1991), Lee and Pope (2006), Mitosek et al. (2006), and Basu et al. (2008).
A review of the related literature reveals that little attention has been paid to the selection problems for the case of
censoring. Exceptions include Siswadi and Quesenberry (1982) for Type-I censored samples, Croes et al. (1998), Cain (2002)
and Kim et al. (2000) for Type-II censored samples, and Block and Leemis (2008) for randomly right-censored samples.
In this paper, the ratio of the maximized likelihoods (RML) and SI procedures are compared for discriminating between
the Weibull and lognormal distributions for the cases of complete, Type-I censored, and Type-II censored samples (The RML
procedure is equivalent to the MLF procedure in the case of two distributions). The Weibull and lognormal distributions
are most often assumed and competing with each other in analyzing lifetime data (e.g., see Bain (1978), Chen et al. (1987)
and Prendergast et al. (2005) for Time Dependent Dielectric Breakdown (TDDB) data, and Lloyd (1979) and Pinto (1991)
for electromigration lifetime data). In such cases, it is highly desirable to have a means to discriminating between the two
distributions since there is a significant difference in the low percentiles which are of interest in lifetime data analysis.
In addition, lifetime data are usually censored due to the constraint on the amount of testing time, which necessitates an
extensive comparative study of selection procedures for the case of censoring. Siswadi and Quesenberry (1982) compared
the SI and MLF procedures for Type-I censored samples, but only considered the case where the sample size is 30 and the
expected censoring rate is 10%. As can be seen in the simulation results in Section 3, considering some specific cases only
could be misleading in understanding the whole behaviors of the two procedures. In this paper, the performances of the
RML and SI procedures are evaluated in terms of the probability of correct selection (PCS) for various simulated data sets
with different sample sizes and censoring rates.
The rest of this article is organized as follows. In Section 2, the two procedures are described in detail, and Section 3
shows the comparison results obtained from the simulation experiments. Finally, conclusions and guidelines are presented
in Section 4.
2. RML and SI procedures for discriminating between Weibull and lognormal distributions
The probability density functions (pdfs) of the Weibull and lognormal distributions are respectively given as follows.
?t
where η > 0 and β > 0 are the scale and shape parameters, respectively, and
1
√
2πσt
where θ > 0 and σ > 0 are the scale and shape parameters, respectively.
W(η,β) : f(t) =β
ηη
?β−1
exp
?
−
?t
η
?β?
LN(θ,σ) : g(t) =
exp
?
−[ln(t/θ)]2
2σ2
?
2.1. Procedure RML: Maximized likelihood function approach
For the case of complete data, the logarithm of the MLF for the Weibull distribution (i.e., lnLW) and that for the lognormal
distribution (i.e., lnLLN) are given as in Eqs. (1) and (2), respectively.
?
n
2ln2π −
i=1
where ˆ η,ˆβ,ˆθ and ˆ σ are the maximum likelihood estimators (MLEs) of η,β,θ and σ, respectively.
For the case of Type-I censoring, let t1,t2,...,trbe the ordered failure times observed until the censoring time tc. Then,
thelogarithmoftheMLFforeachdistributioncanbederivedasfollowsexceptforthecommonterm,ln[n!/(n − r)!](Siswadi
and Quesenberry, 1982).
?
r
2ln2π −
i=1
where Φ is the cumulative distribution function (cdf) of the standard normal distribution.
lnLW= nlnˆβ − nˆβ ln ˆ η +
ˆβ − 1
?
n
?
i=1
lnti−
n
?
n
?
i=1
?
lnti− lnˆθ
ti/ˆ η?ˆβ
(1)
lnLLN= −nln ˆ σ −
n
?
lnti−
1
2ˆ σ2
i=1
??2
(2)
lnLW= r lnˆβ − rˆβ ln ˆ η +
ˆβ − 1
?
r ?
i=1
lnti−
r ?
r ?
i=1
?
lnti− lnˆθ
ti/ˆ η?ˆβ− (n − r)?
tc/ˆ η?ˆβ
(3)
lnLLN= −r ln ˆ σ −
r ?
lnti−
1
2ˆ σ2
i=1
??2
+ (n − r)ln
?
1 − Φ
?
lntc− lnˆθ
ˆ σ
??
(4)
Page 9
J.S. Kim, B.-J. Yum / Computational Statistics and Data Analysis 53 (2008) 477–485
485
References
Alqam, M., Bennett, R.M., Zureick, A.-H., 2002. Three-parameter vs. two-parameter Weibull distribution for pultruded composite material properties.
Composite Structures 58 (4), 497–503.
Al-Saleh, J.A., Agarwal, S.K., 2006. Extended Weibull type distribution and finite mixture of distributions. Statistical Methodology 3 (3), 224–233.
Araújo, M.I., Pereira, B.B., 2007. A comparison of Bayes factors for separated models: Some simulation results. Communications in Statistics-Simulation and
Computation 36 (2), 297–309.
Bain, L.J., 1978. Statistical Analysis of Reliability and Life-testing Models. Marcel Dekker, New York.
Bain, L.J., Engelhardt, M., 1980. Probability of correct selection of Weibull versus gamma based on likelihood ratio. Communications in Statistics-Theory
and Methods 9 (4), 375–381.
Basu, B., Tiwari, D., Kundu, D., Prasad, R., 2008. Is Weibull distribution the most appropriate statistical strength distribution for brittle materials? Ceramics
International. doi:10.1016/j.ceramint.2007.10.003.
Block, A.D., Leemis, L.M., 2008. Parametric model discrimination for heavily censored survival data. IEEE Transactions on Reliability 57 (2), 248–259.
Cain, S.R., 2002. Distinguishing between lognormal and Weibull distributions. IEEE Transactions on Reliability 51 (1), 32–38.
Chen, C., 2006. Tests of fit for the three-parameter lognormal distribution. Computational Statistics and Data Analysis 50 (6), 1418–1440.
Chen, C.-F., Wu, C.-Y., Lee, M.-K., Chen, C.-N., 1987. The dielectric reliability of intrinsic thin SiO2films thermally grown on a heavily doped Si substrate—
Characterization and modeling. IEEE Transactions on Electron Devices 34 (7), 1540–1552.
Croes, K., Manca, V., Ceuninck, W.D., Schepper, L.D., Molenberghs, G., 1998. The time of ‘‘guessing’’ your failure time distribution is over! Microelectronics
Reliability 38 (6–8), 1187–1191.
Dick, E.J., 2004. Beyond lognormal versus gamma: Discrimination among error distributions for generalized linear models. Fisheries Research 70 (2–3),
351–366.
Flynn, M.R., 2004. The 4-parameter lognormal (SB) model of human exposure. Annals of Occupational Hygiene 48 (7), 617–622.
Flynn, M.R., 2005. On the moments of the 4-parameter lognormal distribution. Communications in Statistics-Theory and Methods 34 (4), 745–751.
Gupta, R.D., Kundu, D., 2003. Discriminating between Weibull and generalized exponential distributions. Computational Statistics and Data Analysis 43
(2), 179–196.
Gupta, R.D., Kundu, D., 2004. Discriminating between gamma and generalized exponential distributions. Journal of Statistical Computation & Simulation
74 (2), 107–121.
Hájek, J., Šidák, Z., 1967. Theory of Rank Tests. Academic Press, New York.
Kappenman, R.F., 1982. On a method for selecting a distributional model. Communication in Statistics-Theory and Methods 11 (6), 663–672.
Kappenman, R.F., 1989. A simple method for choosing between the lognormal and Weibull models. Statistic & Probability Letters 7 (22), 123–126.
Kim,D.H., Lee,W.D.,Kang, S.G.,2000. Bayesian modelselectionfor lifetimedata undertypeII censoring.CommunicationsinStatistics-Theory andMethods
29 (12), 2865–2878.
Kundu, D., Manglick, A., 2004. Discriminating between the Weibull and log-normal distributions. Naval Research Logistics 51 (6), 893–905.
Kundu, D., Raqab, M.Z., 2007. Discriminating between the generalized Rayleigh and log-normal distribution. Statistics 41 (6), 505–515.
Kundu, D., Gupta, R.D., Manglick, A., 2005. Discriminating between the log-normal and generalized exponential distributions. Journal of Statistical Planning
and Inference 127 (1–2), 213–227.
Lee, M.D., Pope, K.J., 2006. Model selection for the rate problem: A comparison of significance testing, Bayesian, and minimum description length statistical
inference. Journal of Mathematical Psychology 50 (2), 193–202.
Lloyd, J.R., 1979. On the log-normal distribution of electromigration lifetimes. Journal of Applied Physics 50 (7), 5062–5064.
Lu, C., Danzer, R., Fischer, F.D., 2002. Fracture statistics of brittle materials: Weibull or normal distribution. Physical Review E 65, 1–4.
Marshall, A.W., Meza, J.C., Olkin, I., 2001. Can data recognize its parent distribution? Journal of Computational and Graphical Statistics 10 (3), 555–580.
Matlab Statistics Toolbox (2008), http://www.mathworks.com/access/helpdesk/help/toolbox/stats/.
Mitosek,H.T.,Strupczewski,W.G.,Singh,V.P.,2006.Threeproceduresforselectionofannualfloodpeakdistribution.JournalofHydrology323(1–4),57–73.
Murthy,D.N.P.,Bulmer,M.,Eccleston,J.A.,2004.Weibullmodelselectionforreliabilitymodelling.ReliabilityEngineeringandSystemSafety86(3),257–267.
Nelson, W., 1982. Applied Life Data Analysis. John Wiley & Sons, New York.
Pandey, M., Ferdous, J., Uddin, M.B., 1991. Selection of probability distribution for life testing data. Communications in Statistics-Theory and Methods 20
(4), 1373–1388.
Pham, H., Lai, C.-D., 2007. On recent generalizations of the Weibull distribution. IEEE Transactions on Reliability 56 (3), 454–458.
Pinto, M., 1991. The effect of barrier layers on the distribution function of interconnect electromigration failures. Quality and Reliability Engineering
International 7 (4), 287–291.
Prendergast, J., O’Driscoll, E., Mullen, E., 2005. Investigation into the correct statistical distribution for oxide breakdown over oxide thickness range.
Microelectronics Reliability 45 (5–6), 973–977.
Quesenberry, C.P., Kent, J., 1982. Selecting among probability distributions in reliability. Technometrics 24 (1), 59–65.
Siswadi, Quesenberry, C.P., 1982. Selecting among Weibull, lognormal and gamma distributions using complete and censored samples. Naval Research
Logistics Quarterly 29 (4), 557–569.
Strupczewski, W.G., Mitosek, H.T., Kochanek, K., Singh, V.P., Weglarczyk, S., 2006. Probability of correct selection from lognormal and convective diffusion
models based on the likelihood ratio. Stochastic Environmental Research and Risk Assessment 20 (3), 152–163.
Sultan, K.S., Ismail, M.A., Al-Moisheer, A.S., 2007. Mixture of two inverse Weibull distributions: Properties and estimation. Computational Statistics and
Data Analysis 51 (11), 5377–5387.
Upadhyay, S.K., Peshwani, M., 2003. Choice between Weibull and lognormal models: A simulation based Bayesian study. Communications in Statistics-
Theory and Methods 32 (2), 381–405.
Vera, J.F.,Díaz-García, J.A., 2008.A global simulatedannealing heuristic forthe three-parameter lognormalmaximum likelihoodestimation. Computational
Statistics and Data Analysis 52 (12), 5055–5065.
Zhang, T., Xie, M., 2007. Failure data analysis with extended Weibull distribution. Communications in Statistics-Simulation and Computation 36 (3),
579–592.