Generalized Pvalues and confidence intervals: a novel approach for analyzing lognormally distributed exposure data.
ABSTRACT The problem of assessing occupational exposure using the mean of a lognormal distribution is addressed. The novel concepts of generalized pvalues and generalized confidence intervals are applied for testing hypotheses and computing confidence intervals for a lognormal mean. The proposed methods perform well, they are applicable to small sample sizes, and they are easy to implement. Power studies and sample size calculation are also discussed. Computational details and a source for the computer program are given. The procedures are also extended to compare two lognormal means and to make inference about a lognormal variance. In fact, our approach based on generalized pvalues and generalized confidence intervals is easily adapted to deal with any parametric function involving one or two lognormal distributions. Several examples involving industrial exposure data are used to illustrate the methods. An added advantage of the generalized variables approach is the ease of computation and implementation. In fact, the procedures can be easily coded in a programming language for implementation. Furthermore, extensive numerical computations by the authors show that the results based on the generalized pvalue approach are essentially equivalent to those based on the Land's method. We want to draw the attention of the industrial hygiene community to this accurate and unified methodology to deal with any parameter associated with the lognormal distribution.

Article: The SymmetricRange Accuracy under a OneWay Random Model with Balanced or Unbalanced Data.
[Show abstract] [Hide abstract]
ABSTRACT: The symmetricrange accuracy of a sampler is defined as the fractional range, symmetric about the true concentration, that includes a specified proportion of sampler measurements. In this article, we give an explicit expression for assuming that the sampler measurements follow a oneway random model so as to capture different components of variability, for example, variabilities among and within different laboratories or variabilities among and within exposed workers. We derive an upper confidence limit for based on the concept of a 'generalized confidence interval'. A convenient approximation is also provided for computing the upper confidence limit. Both balanced and unbalanced data situations are investigated. Monte Carlo evaluation indicates that the proposed upper confidence limit is satisfactory even for small samples. The statistical procedures are illustrated using an example.Annals of Occupational Hygiene 03/2013; · 2.16 Impact Factor 
Article: Comparison of Means of Two Lognormal Distributions Based on Samples with Multiple Detection Limits.
[Show abstract] [Hide abstract]
ABSTRACT: ABSTRACT The problem of comparing the means of two lognormal distributions based on samples with multiple detection limits is considered. Tests and confidence intervals for the ratio of the two means, based on pivotal quantities involving the maximum likelihood estimators, are proposed. The merits of the proposed approaches are evaluated by Monte Carlo simulation. Simulation study indicates that the procedures are satisfactory in terms of coverage probabilities of confidence intervals, and powers of tests. The proposed approach can also be applied to find confidence intervals for the difference between the means of the two lognormal distributions. Illustrative examples with a real data set and with a simulated data set are given.Journal of Occupational and Environmental Hygiene 01/2014; · 1.28 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: The lognormal distribution is currently used extensively to describe the distribution of positive random variables. This is especially the case with data pertaining to occupational health and other biological data. One particular application of the data is statistical inference with regards to the mean of the data. Other authors, namely Zou et al. (2009), have proposed procedures involving the socalled “method of variance estimates recovery” (MOVER), while an alternative approach based on simulation is the socalled generalized confidence interval, discussed by Krishnamoorthy and Mathew (2003). In this paper we compare the performance of the MOVERbased confidence interval estimates and the generalized confidence interval procedure to coverage of credibility intervals obtained using Bayesian methodology using a variety of different prior distributions to estimate the appropriateness of each. An extensive simulation study is conducted to evaluate the coverage accuracy and interval width of the proposed methods. For the Bayesian approach both the equaltail and highest posterior density (HPD) credibility intervals are presented. Various prior distributions (Independence Jeffreys' prior, Jeffreys'Rule prior, namely, the square root of the determinant of the Fisher Information matrix, reference and probabilitymatching priors) are evaluated and compared to determine which give the best coverage with the most efficient interval width. The simulation studies show that the constructed Bayesian confidence intervals have satisfying coverage probabilities and in some cases outperform the MOVER and generalized confidence interval results. The Bayesian inference procedures (hypothesis tests and confidence intervals) are also extended to the difference between two lognormal means as well as to the case of zerovalued observations and confidence intervals for the lognormal variance. In the last section of this paper the bivariate lognormal distribution is discussed and Bayesian confidence intervals are obtained for the difference between two correlated lognormal means as well as for the ratio of lognormal variances, using nine different priors.Journal of Statistical Planning and Inference 06/2012; 142(6):1294–1309. · 0.71 Impact Factor
Page 1
Journal of Occupational and Environmental Hygiene, 3: 642–650
ISSN: 15459624 print / 15459632 online
Copyright c ?2006 JOEH, LLC
DOI: 10.1080/15459620600961196
Generalized PValues and Confidence Intervals:
A Novel Approach for Analyzing Lognormally
Distributed Exposure Data
K. Krishnamoorthy,1Thomas Mathew,2and Gurumurthy Ramachandran3
1Department of Mathematics, University of Louisiana at Lafayette, Lafayette, Louisiana
2Department of Mathematics and Statistics, University of Maryland, Baltimore, Maryland
3Division of Environmental Health Sciences, School of Public Health, University of Minnesota,
Minneapolis, Minnesota
The problem of assessing occupational exposure using the
mean of a lognormal distribution is addressed. The novel
concepts of generalized pvalues and generalized confidence
intervals are applied for testing hypotheses and computing
confidence intervals for a lognormal mean. The proposed
methods perform well, they are applicable to small sample
sizes, and they are easy to implement. Power studies and
sample size calculation are also discussed. Computational
details and a source for the computer program are given.
The procedures are also extended to compare two lognormal
means and to make inference about a lognormal variance.
In fact, our approach based on generalized pvalues and
generalized confidence intervals is easily adapted to deal
with any parametric function involving one or two lognormal
distributions. Several examples involving industrial exposure
data are used to illustrate the methods. An added advantage of
the generalized variables approach is the ease of computation
andimplementation.Infact,theprocedurescanbeeasilycoded
in a programming language for implementation. Furthermore,
extensive numerical computations by the authors show that
the results based on the generalized pvalue approach are
essentiallyequivalenttothosebasedontheLand’smethod.We
wanttodrawtheattentionoftheindustrialhygienecommunity
to this accurate and unified methodology to deal with any
parameter associated with the lognormal distribution.
Keywords
confidence interval, hypothesis test, Type 1 error
Address correspondence to: K. Krishnamoorthy, Department
of Mathematics, 217 Maxim D. Doucet Hall, P.O. Box 41010,
University of Louisiana at Lafayette, Lafayette, LA 70504; email:
krishna@louisiana.edu.
INTRODUCTION
I
distribution.SinceOldham’s(1)1953reportthatthedistribution
ofdustlevelsincoalminesisapproximatelylognormal,several
t has been well established that occupational exposure
data and pollution data very often follow the lognormal
authors have postulated the lognormal model for studying and
analyzing workplace pollutant data.(2−8)The most common
explanation for this phenomenon is as follows: workplace
concentrations are related to rates of contaminant generation
andventilationratesthatarevariable.Workersmovearoundin
this nonuniform environment, and their activity patterns also
vary from day to day. The workers’ exposures are related to
theabovefactorsinamultiplicativemanner.Irrespectiveofthe
distribution of contaminant generation rates, ventilation rates,
and worker activity patterns, their multiplicative interactions
typically lead to exposure distributions that are right skewed
and described well by the lognormal probability distribution.
The validity of lognormality assumption for a given data
set can be easily tested. The fact that the data y1,..., ynare
said to follow a lognormal distribution if ln(y1),...,ln(yn)
follow a normal distribution (where “ln” denotes the natural
logarithm) allows us to adequately validate the assumption of
lognormalityofagivendataset.Thus,testingforlognormality
is simply a matter of validating the normality assumption for
the logged data, and this can be done using many widely
available software programs such as Minitab, SPSS, and SAS
or using some popular methods such as ShapiroWilks test or
AndersonDarling test.
If we let y denote the lognormally distributed exposure
measurement of an employee, then x = ln(y) is distributed
normallywithmeanandstandarddeviationtobedenotedbyµl
andσl,respectively,andthemeanofthelognormaldistribution
(say, µ) is given by
µ = exp(η), where η = µl+ σ2
If repeated exposure measurements are available from a
single worker, then µ can be viewed as the mean of the
worker,andourapproachcanbeusedtoestimatetheindividual
worker’s mean. Our approach is also applicable to estimate
the mean of a similarly exposed group (SEG) of workers
if only one exposure measurement is obtained per worker
l/2.
(1)
642Journal of Occupational and Environmental HygieneNovember 2006
Page 2
NOMENCLATURE
y1,..., yn
x1,...,xn
µl
σl
µ
sample from a lognormal distribution
logged data; xi= ln(yi),
population mean of the logged data
population standard deviation of the logged data
mean of the lognormal distribution;
µ= exp(µl+σ2
i = 1,...,n
l/2)
σ2
variance of the lognormal distribution;
σ2= exp(2µl+ σ2
geometric standard deviation; σg= exp(σl)
sample mean of the logged data
sample standard deviation of the logged data
l)[exp(σ2
l) − 1]
σg
¯ x
s
or to estimate the mean contaminant level in a workplace.
If multiple measurements exist for each worker, and both
betweenandwithinworkervariabilityaresignificantandneed
to be accounted for, then one should use the random effects
model.(9,10,11)
Thesamplemeanexposurecanbeusedasanestimateofthe
longtermaverageexposureortheaverageexposureforaSEG
of workers over an extended period of time. For substances
that cause health effects due to chronic exposures, daytoday
variability in longterm exposures is less health relevant than
the longterm mean. For such exposures, the arithmetic mean
is the best measure of cumulative exposure over a biologically
relevant time period, since the body would have integrated
exposures over this time period.(9)The longterm mean is of
relevance in occupational epidemiology where the estimated
valueofthelongtermmeanisassignedtoallworkersinaSEG.
Once lognormality has been verified for an exposure sample,
inferences on the parameters of the lognormal distribution
can be made. Whereas there are currently only a few legal
standards and threshold values based on longterm averages,
some researchers have explored the statistics of exposures
exceeding longterm limits.(3)To show that the mean exposure
does not exceed the longterm average exposure limit (LTA
OEL), we may want to test the hypotheses
H0: µ ≥ LTAOEL vs. Ha: µ < LTAOEL
Note that the null and alternative hypotheses in Eq. 2 are
set up to look for evidence in favor of Ha. Rejection of the
null hypothesis in Eq. 2 implies that the exposure level is
acceptable.
Another method of assessing workplace exposure, sug
gested by some investigators,(2,5,8)is based on the proportion
of exposure data in excess of the LTAOEL. Because the
proportion of the measurements that are above the LTAOEL
is equal to the proportion of the logged measurements that
are above ln(LTAOEL), this approach reduces to the problem
of hypothesis testing about an upper quantile of a normal
distribution.Thishypothesistestingcanbecarriedoutusingan
appropriatetolerancelimitofthenormaldistribution,andithas
been well addressed in the context of assessing occupational
exposurebyTuggle,(2)Selvinetal.,(5)andLylesandKupper.(8)
In the context of exposure assessment, the problem of
comparing two lognormal means will arise when we want to
compare exposure levels of two similarly exposed groups of
workers,orwhenwewanttocomparetwoexposureassessment
(2)
methods or two different sampling devices. Thus, let y1
and y2be lognormally distributed random variables denoting
exposurelevelsattwodifferentsitesormeasurementsobtained
by two different methods, and let µl1, µl2and σ2
the respective means and variances of the normally distributed
random variables ln(y1) and ln(y2). Then the means of x1and
x2, say µ1and µ2, respectively, are given by
l1, σ2
l2denote
µ1= exp(η1), and µ2= exp(η2),
where η1= µl1+ σ2
For comparing the exposure levels at the two sites, it is of
interest to test the hypotheses
l1/2 and η2= µl2+ σ2
l2/2.
(3)
H0: µ1≤ µ2vs. Ha: µ1> µ2.
Land(12)has proposed exact methods for constructing
confidence intervals and hypothesis tests for the lognormal
mean. His methods, however, are computationally intensive
anddependonthestandarddeviationoftheloggeddata,which
makes the necessary tabulation difficult. For this reason, Rap
paport and Selvin(3)proposed a simple approximate method
that is satisfactory as long as σ2
Gao(13)reviewed and compared several approximate methods
and concluded that all the approximate methods are either too
conservativeorliberal,exceptforlargesamples,inwhichcase,
a method developed by Cox(12)is satisfactory. Armstrong(14)
compared four approximate methods for estimating the confi
dence intervals (CI) with Land’s(12)exact interval. These were
the (a) “simple tinterval,” (b) the “lognormal tinterval,” (c)
the Cox interval proposed by Land,(15)and (d) a variation
of the Cox interval. Armstrong(14)found that whereas some
of these approximate intervals were adequate for large sample
sizes (n ≥ 25) or small geometric standard deviations (σg=
1.5), none of them were accurate for small sample sizes
and large σg—precisely the situations that are commonly
encountered in occupational exposure assessment. Hewett
and Ganser(16)have developed procedures that considerably
simplify the calculation of Land’s exact confidence interval.
In a recent article, Taylor and colleagues(17)evaluated several
approximate confidence intervals in terms of their coverage
probabilities and also suggested an improved approximation.
Very little work is available on the problem of comparing
two lognormal means. A large sample test is derived in Zhou
et al.(18)for testing the equality of two lognormal means.
(4)
l≤ 3 and n > 5. Zhou and
Journal of Occupational and Environmental Hygiene November 2006643
Page 3
The purpose of this article is to illustrate the application
of a novel approach for carrying out tests and confidence
intervals for a single lognormal mean, for the ratio of two
lognormal means, for a single lognormal variance, and for the
ratiooftwolognormalvariances.Theapproachisbasedonthe
concepts of generalized pvalues and generalized confidence
intervals, collectively referred to as the generalized variables
method. The generalized variables methodology is already
described in Krishnamoorthy and Mathew(19)for obtaining
testsandconfidenceintervalsforasinglelognormalmean,and
for comparing two lognormal means; however, the lognormal
variance is not considered in that article. In this article, we
extend this approach for obtaining confidence intervals for
the lognormal variance. Even though the lognormal mean
is addressed by Krishnamoorthy and Mathew, we shall first
briefly review the generalized variables procedure for the
lognormal means described in their article and then apply
it to the lognormal variance. We want to draw the attention
of the industrial hygiene community to an accurate and
unified methodology to deal with any parameter associated
with the lognormal distribution. An added advantage of the
generalized variables approach is the ease of computation and
implementation. In fact the procedures can be easily coded
in a programming language for implementation. Furthermore,
extensivenumericalresultsbytheauthors(19)showthatforone
sided tests concerning a single lognormal mean, the results
based on the generalized pvalue approach are essentially
equivalent to those based on the Land’s(12)method.
The concept of generalized pvalue was originally intro
duced by Tsui and Weerahandi,(20)and the concept of gener
alized confidence intervals was introduced by Weerahandi.(21)
A later book by Weerahandi(22)illustrates several nonstandard
statistical problems where the generalized variable approach
produced remarkably useful results. Because the concepts are
not well known, we have presented them in a brief outline
in Appendix 1. In this article, we first present generalized
variables for making inferences about a normal mean and
variance. We then outline the hypothesis testing and interval
estimationproceduresforasinglelognormalmeanandthenfor
the difference between two lognormal means. The necessary
algorithmsandFortranandSASprogramstocarryoutourpro
cedures are posted at http://www.ucs.louisiana.edu/∼kxk4695
and are available as an appendix to the online version of
this article on the JOEH website. In a later section, we also
addresstheproblemofobtainingtestsandconfidenceintervals
concerning a single lognormal variance, or the ratio of two
lognormal variances. A confidence interval for the lognormal
variance should be of interest to assess the variability among
exposure measurements.
We have used two examples to illustrate our methods.
The first example involves the sample of air lead levels data
collected from a lab by the National Institute of Occupational
Safety and Health (NIOSH) health hazard evaluation staffs.
The problem is to assess the contaminant level within the
facility based on a sample. We also illustrate the generalized
variable method for testing the equality of the means of
measurements obtained by two different methods. For this
purpose we used the data presented in O’Brien et al.(23)
Generalized Variables for the Mean and Variance of
a Normal Distribution
As the mean of a lognormal distribution is a function of
the mean and variance of a normal distribution, we present
the generalized variables for the mean and variance of a
normal population. The details of construction of generalized
variables can be found in Krishnamoorthy and Mathew(19)or
in Weerahandi,(22)and for easy reference they are provided in
Appendix 1. Let X1,..., Xnbe a sample from a normal popu
lation with mean µland variance σ2
mean and the variance of the Xis are respectively given by
l, N(µl,σ2
l). The sample
¯X =1
n
n ?
i=1
Xiand S2=
1
n − 1
n ?
i=1
(Xi−¯X)2.
(5)
Let Z and V be independent random variables with
√n(¯X − µl)
σl
Z =
∼ N(0,1), and V2=(n − 1)S2
σ2
l
∼ χ2
n−1,
(6)
where χ2
degrees of freedom. Let ¯ x and s be the observed values of
¯Xand S, respectively. Following the procedure outlined in the
appendix, a generalized variable for making inferences on µl
is given by
?¯X − µl
= ¯ x −
= Tµl− µl,
where
rdenotes the central chisquare distribution with r
Gµl= ¯ x −
σl/√n
Z
V/√n − 1
?
σl
√n
s
√n− µl
s
S− µl
(7)
Tµl= ¯ x −
Z
V/√n − 1
s
√n,
(8)
and Z and V are as defined in (Eq. 6). In the above, Gµl
denotes the generalized test variable for µl, and Tµldenotes
the generalized pivot statistic (the statistic that can be used for
making inference about the unknown parameter) for µl. We
shall now show that Gµlsatisfies the three conditions given in
(Eq.A3)ofAppendix1:(1)Foragiven ¯ x ands,thedistribution
of Gµldoes not depend on the nuisance parameter σ2
follows from Step 1 of Eq. 7 that the value of Gµlat (¯X, S) =
(¯ x,s) is µl; (3) it follows from Step 3 of Eq. 7 that, for a given
¯ x ands, the generalized variable is stochastically decreasing
withrespecttoµlandhencethegeneralizedpvaluefortesting
H0: µl≥ µl0vs. Ha:µl< µl0is given by
supH0P(Gµl≥ 0) = P(Gµl≥ 0µl= µl0)
= P(Tµl≥ µl0)
= P
l; (2) it
?
tn−1<¯ x − µl0
s/√n
?
,
644Journal of Occupational and Environmental HygieneNovember 2006
Page 4
which is the pvalue based on the usual ttest. To get the
last equality, we used the fact that Z/(V/√n − 1) follows a
Student’s t distribution with degrees of freedom n − 1, tn−1.
Foragiven ¯ x ands,thelowerα/2quantile Tµl,α/2of Tµland
theupperα/2quantile Tµl,1−α/2of Tµlforma1–α generalized
confidence interval for µl. This generalized CI is indeed
equal to the usual tinterval; that is, (Tµl,α/2,Tµl,1−α/2) =
(¯ x − tn−1,1−α/2
100pthpercentileoftheStudent’st distributionwithm degrees
of freedom.
Thegeneralizedtestvariableforthevarianceσ2
s2
V2/(n − 1)− σ2
where
Tσ2
V2/(n − 1)
is the generalized pivot statistic, and V is as defined in Eq. 6.
Again,foragivens2,thegeneralized1–α CIforσ2
by the lower and upper α/2 quantiles of Tσ2
usual CI based on a chisquare distribution with n −1 degrees
of freedom.
Even though the generalized variable method produced
exact inferential procedures for the normal parameters, in
general, the generalized variable method is not necessarily
exact. In other words, the generalized pvalue may not satisfy
the conventional properties of the usual pvalue. In such cases,
the properties (such as Type I error rates of the generalized
variable test and coverage probability of the generalized
confidence limits) of the generalized variable method should
be evaluated numerically.
Suppose we are interested in making inference about a
function of µland σ2
test variable for q(µl, σ2
andthegeneralizedpivotstatisticisgivenbyq(Tµl,Tσ2
given ¯ x and s, the variable q(µl, σ2
dom variables Z and V whose distributions do not depend on
any unknown parameters. Therefore, Monte Carlo simulation
can be used to find a generalized CI for q(µl,σ2
illustrated for the lognormal case in the following section.
s
√n, ¯ x + tn−1,1−α/2
s
√n), where tm,pdenotes the
lisgivenby
Gσ2
l=
l= Tσ2
l− σ2
l,
(9)
l=
s2
(10)
lisformed
land is equal to the
l, say, q(µl,σ2
l) is given by q(Tµl,Tσ2
l). Then, the generalized
l) −q(µl,σ2
l),
l).Fora
l) depends only on the ran
l). This will be
Inference about a Lognormal Mean
Let y1,..., ynbe a sample of exposure measurements and
let xi = ln(yi),i = 1,...,n. Then, x1,...,xnis a random
sample from a N(µl,σ2
mean exp(µl+ σ2
of the preceding section can be readily applied to construct
a generalized test variable and a generalized pivot statistic
for the lognormal mean. From the preceding section, we
have the generalized test variable for making inference on
η = (µl+ σ2
Gη= Tµl+
2
Z
V/√n − 1
= Tη− η,
l) distribution. Since the lognormal
l/2) is a function of µland σ2
l, the results
l/2) as
Tσ2
l
− η
= ¯ x −
s
√n+
s2
2V2/(n − 1)− η
(11)
where
Tη= ¯ x −
Z
V/√n − 1
s
√n+
s2
2V2/(n − 1)
(12)
and Z and V areasdefinedinEq.6.Forgivensamplestatistics
¯ x and s, we note that Gη is stochastically decreasing in η,
and hence the generalized pvalue for testing (Eq. 2) is given
by
P(Gη≥ 0η = ln(LTAOEL))
= P(Tη≥ ln(LTAOEL)).
The null hypothesis in Eq. 2 will be rejected whenever the
probability in Eq. 13 is less than the nominal level α.
Thegeneralizedpivotstatisticforintervalestimationofη is
given by Tη. Appropriate quantiles of Tηcan be used to obtain
confidence intervals for η or for the lognormal mean exp(η).
Specifically,if Tη,p,0 < p < 1,denotesthe pthquantileof Tη,
then(Tη,α/2,Tη,1−α/2)isa1−α generalizedconfidenceinterval
for η, and (exp(Tη,α/2), exp(Tη,1−α/2)) is a 1 − α generalized
confidence interval for the lognormal mean exp(η). Onesided
limits for η and exp(η) can be similarly obtained. In particular,
a 1 − α lower limit for exp(η) is given by exp(Tη,α).
Throughnumericalresults,
Mathew(17)noted that the confidence limits based on
Land’s(12)approach and the generalized confidence interval
are practically the same. However, computationally, our
approach is very easy to implement. The simple algorithm
presented in Appendix 2 of Krishnamoorthy and Mathew(19)
can be used for computing the generalized pvalue and the
generalized confidence interval.
(13)
Krishnamoorthyand
Power Studies and Sample Size Calculation for
Testing a Lognormal Mean
We shall now discuss the power of the test based on the
generalized pvalue in Eq. 13. For a given sample size n and
for a given value of µl and σl such that Ha in Eq. 2 holds
(i.e., η = µl+ σ2
test can be estimated by Monte Carlo simulation. In practice,
however, practitioners are mainly interested in finding the
required sample size to have a specified power at a given
level of significance. The sample size can be calculated using
an iterative method. For power calculation, an algorithm and
Fortran and SAS programs based on the algorithm are posted
at http://www.ucs.louisiana.edu/∼kxk4695 and are available
as an appendix to the online version of this article. Using
this program, we computed sample sizes that are required to
have a power of 0.90 at the level of significance α = 0.05 for
various parameter configurations, and these are presented in
TableI.Asanexample,ifanemployerspeculatesthatthemean
exposurelevelis40%(thevalue R inTableI)oftheLTAOEL,
and the geometric standard deviation is 2.0, then the required
sample size to have a power of at least 0.90 at the level 0.05
is 13.
We observe from Table I that the power of the test increases
as the ratio R decreases, which is a natural requirement for a
test.Wealsonotethatthepowerdecreasesasσgincreasesand,
l/2 < ln(LTAOEL)), the power of the
Journal of Occupational and Environmental HygieneNovember 2006645
Page 5
TABLE I.
Attain a Power of 0.90 at the Level of 0.05, Using the
Generalized PValue Test
Sample Size for Testing Equation 2 to
σg
2.5
R
1.52.03.03.5
0.1
0.2
0.3
0.4
0.5
0.7
0.8
4 (.96)
4 (.90)
5 (.93)
6 (.91)
8 (.93)
18 (.91)
37 (.90)
6 (.94)
7 (.91)
10 (.91)
13 (.90)
18 (.91)
56 (.90)
120 (.90)
8 (.90)
11 (.91)
16 (.90)
23 (.91)
33 (.90)
99 (.90)
241 (.90)
11 (.90)
16 (.90)
21 (.91)
35 (.91)
52 (.90)
162 (.90)
363 (.90)
13 (.91)
21 (.91)
30 (.90)
45 (.90)
68 (.90)
235 (.90)
563 (.90)
Note: R =
numbers in parenthesis represent actual attained powers; LTAOEL = 1.0; the
lognormal mean.
µl
LTAOEL; σg = exp(σl) = geometric standard deviation; the
hence, large samples are required to make correct decisions
when σgis expected to be large.
Comparison of Two Lognormal Means
Consider the independent lognormal random variables y1
and y2so that x1= ln(y1) ∼ N(µl1,σ2
N(µl2,σ2
exp(η1) and E(y2) = exp(η2), where
η1= exp?µl1+ σ2
Thus, hypothesis tests and confidence intervals for the ratio
of the two lognormal means are respectively equivalent to
those for the difference η1− η2. We shall now develop gen
eralized pvalues and generalized confidence intervals for this
problem.
We shall first consider the testing problem
l1) and x2= ln(y2) ∼
l2). Then the lognormal means are given by E(y1) =
l1/2?and η2= exp?µl2+ σ2
l2/2?. (14)
H0: η1≤ η2vs. Ha: η1> η2.
Let y1j, j = 1,...,n1, and y2j, j = 1,...,n2, denote
random samples from the lognormal distributions of y1and
y2, respectively. Let x1j= ln(y1j), j = 1,...,n1, and x2j=
ln(y2j), j = 1,...,n2. The sample means ¯ x1and ¯ x2and the
sample variances s2
(15)
1and s2
2are then given by
¯ xi=
1
ni
ni
?
j=1
xijand s2
i=
1
ni− 1
ni
?
j=1
(xij− ¯ xi)2, i = 1,2.
(16)
It follows from Eq. 12 that the generalized variable for ηi
can be expressed as
Tηi= ¯ xi−
Zi
Vi/√ni− 1
si
√ni
+
s2
i
2V2
i/(ni− 1), i = 1,2,
(17)
where Zi ∼ N(0,1) and V2
these random variables are independent. The generalized test
i∼ χ2
ni−1, for i = 1, 2, and all
variable for testing (Eq. 15) is given by
Gη1−η2= Tη1− Tη2− (η1− η2)
and the generalized pivot statistic to construct CI for η1− η2
for is given by
(18)
Tη1−η2= Tη1− Tη2.
(19)
Forgivensamplestatistics,Gη1−η2isstochasticallydecreas
ing in η1− η2. Thus the generalized pvalue for testing the
hypotheses in Eq. 15 is given by
supH0P(Gη1−η2≤ 0) = P(Gη1−η2≤ 0η1− η2= 0)
= P(Tη1−η2≤ 0).
For given sample statistics, the confidence intervals for
η1− η2 can be computed using the percentiles of Tη1−η2.
Because, given ¯ x1, ¯ x2,s2
is free of any unknown parameters, the percentiles of Tη1−η2
can be estimated using Monte Carlo simulation. We can also
construct confidence intervals for the difference between the
lognormal means, that is, exp(η1) − exp(η2). For this, we
can use the percentiles of exp(Tη1) − exp(Tη2), where Tη1
and Tη2are given in Eq. 17. Note that algorithms similar to
Algorithm 1 can be easily developed for computing the above
generalized pvalues and confidence intervals. An algorithm
and Fortran and SAS programs for computing the generalized
pvalue test and the CI for exp(Tη1) − exp(Tη2) are posted at
http://www.ucs.louisiana.edu/∼kxk4695 and are available as
an appendix to the online version of this article.
(20)
1and s2
2, the distribution of Tη1−η2
Power Properties of the Generalized Test for the
TwoSample Case
For given sample sizes n1and n2, parameters µl1,µl2,σl1
and σl2 the powers of the generalized test based on Eq.
20 can be estimated using Monte Carlo method. A Fortran
program and SAS codes for computing the power (along
with a help file) are posted at http://www.ucs.louisiana.edu/
∼kxk4695 and are available as an appendix to the online
version of this article. The help file also contains an algo
rithm that can be coded in any desired computing language.
Krishnamoorthy and Mathew(19)computed powers for several
samplesizesandparametercombinations.Itisobservedinthis
articlethatthegeneralizedtestpossessesallnaturalproperties.
However, the power of the test depends on µl1− µl2,σl1and
σl2.Therefore,tocomputetherequiredsamplesizestoattaina
specified power, the practitioner should have knowledge about
µl1− µl2,σl1, and σl2.
Inference about a Lognormal Variance and
Geometric Standard Deviation
For the assessment of the extent of variability among
the exposure measurements, confidence intervals, or tests
concerning the variance becomes necessary. If y denotes the
lognormally distributed exposure measurements, then x =
ln(y) is distributed normally with mean µland variance σ2
l.
646 Journal of Occupational and Environmental Hygiene November 2006
Page 6
TABLE II.
Lognormal Variance in Equation 21; Nominal Level = 0.05
Monte Carlo Estimates of the Sizes of the Generalized PValue Test Based on Equation 24 for
σl= 0.5
σl= 1.0
σl= 1.5
µl
n = 10
n = 15
n = 20
n = 10
n = 15
n = 20
n = 10
n = 15
n = 20
0.00
0.30
0.70
1.00
1.30
1.50
1.70
2.00
.050
.046
.047
.050
.054
.048
.053
.055
.041
.056
.050
.050
.046
.060
.053
.053
.047
.052
.052
.054
.043
.048
.051
.054
.047
.053
.045
.048
.052
.053
.043
.046
.048
.044
.050
.049
.049
.048
.045
.054
.046
.049
.048
.050
.045
.049
.048
.054
.050
.043
.049
.048
.045
.048
.052
.054
.044
.046
.044
.044
.051
.050
.048
.048
.049
.056
.049
.049
.050
.045
.042
.047
The variance of y, to be denoted by σ2, is given by
σ2= exp?2µl+ σ2
Asfarasweareaware,noprocedures(exceptobviouslarge
sample procedures) are known for computing a confidence
interval or for testing hypotheses concerning σ2. It turns out
that the ideas of generalized pvalues and generalized confi
dence intervals provide solutions to this problem, regardless
of the sample size. We shall now construct a generalized pivot
statistic that can be used to compute a confidence interval for
σ2, and a generalized test variable that can be used for testing
the hypotheses
l
??exp?σ2
l
?− 1?.
(21)
H0: σ2≥ σ2
0is a known constant. Note that it is by rejectingH0
that we conclude that the variability is small, that is, below the
bound σ2
Using earlier notations, the generalized test variable for σ2
is given by
Gσ2 = exp?2Tµl+ Tσ2
= exp
?
where Z and V are as defined in Eq. 6. The generalized pivot
statistic for constructing CI for σ2
?
?
Arguing as in previous sections, the generalized pvalue for
testing the hypotheses in Eq. 22 is given by
P?Gσ2 ≥ 0??σ2= σ2
puting a generalized confidence interval for σ2. An algorithm
0vs. Ha: σ2< σ2
0,
(22)
where σ2
0.
l
??exp?σ2
Z
V/√n − 1
s2
V2/(n − 1)
l
?− 1?− σ2
s
√n
?
?
exp
2
?
?
¯ x −
?
− σ2,
+
s2
V2/(n − 1)
?
(23)
×
?
− 1
lis given by
Tσ2 = exp2
?
¯ x −
Z
V/√n − 1
s2
V2/(n − 1)
s
√n
?
?
+
s2
V2/(n − 1)
?
exp
?
?
− 1
.
(24)
0
?= P?Tσ2 ≥ σ2
0
?.
(25)
Furthermore, the percentiles of Tσ2 can be used for com
(similartoAlgorithm1inAppendix2)canbeeasilydeveloped
for computing the above generalized pvalue and confidence
interval. We also note that the above procedure can be
easily extended for the purpose of comparing two lognormal
variances.
To understand the validity of the generalized test based
on Eq. 25, we estimated its sizes (Type I error rates) using
Monte Carlo method for various values of µl, σl and n =
10,15, and 20. The sizes are estimated for testing hypotheses
in Eq. 22 at the nominal level 0.05, and they are given in Table
II. For a good test, the estimated sizes should be close to
the nominal level. We see from Table II that the estimated
sizes are very close to the nominal level for all the cases
considered.
Thegeneralizedvariableforageometricstandarddeviation
σg= exp(σl) is given by
??
where the generalized variable Gσ2
However, it was pointed out earlier that the generalized
variable approach gives the same confidence interval for σ2
the conventional chisquare interval. From this, a confidence
interval for the geometric standard deviation is easily obtained
as
Gσg= exp
Gσ2
l
?
,
(26)
lfor σ2
lis given in Eq. 9.
las
?
exp
?
s
?
(n − 1)
χ2
n−1,1−α/2
?
,exp
?
s
?
(n − 1)
χ2
n−1,α/2
??
,
(27)
where χ2
square distribution with df = m. The expression in (Eq. 27) is
an exact 1 − α confidence interval for σg.
Similarly, a test for
m,pdenotes the 100pth percentile of the central chi
H0: σg≤ c
vs.
Ha: σg> c,
(28)
is essentially a test concerning the variance σ2
chisquare test for the variance can be applied.
l, and the usual
Journal of Occupational and Environmental HygieneNovember 2006 647
Page 7
Illustrative Examples
Example 1
ThedatarepresentairleadlevelscollectedbyNIOSHatthe
Alma American Labs, Fairplay, Colorado, for health hazard
evaluationpurpose(HETA89052)onFebruary23,1989.The
air lead levels were collected from 15 different areas within
the facility.
Air Lead Levels (µg/m3): 200, 120, 15, 7, 8, 6, 48, 61, 380,
80, 29, 1000, 350, 1400, 110
For this data, the mean (=254) is much larger than the
median (=80), which is an indication that the distribution
is right skewed. The normal probability plots (Minitab 14.0,
default method) were constructed for the actual lead levels
(Figure 1A) and for the logged lead levels (Figure 1B). It
is clear from Figures 1A and 1B that the distribution of the
data is far away from a normal distribution (pvalue < 0.05),
but a lognormal model adequately describes the data (pvalue
0.871). The pvalues are based on the AndersonDarling test.
Therefore, we apply the methods of this paper to make valid
inferencesaboutthemeanleadlevel.Basedontheloggeddata,
we have the observed values ¯ x = 4.333 and s = 1.739. Using
these numbers in Algorithm 1, we computed the 95% upper
limitforexp(η)as2405.Wealsocomputedthe95%lowerlimit
for the lognormal mean as 141. That is, the mean air lead level
within the facility exceeds 141 µg/m3with 95% confidence.
Suppose we want to test whether the mean is greater than
some arbitrary value (e.g., 120 µg/m3) that could be a limit
value
H0: µ ≥ 120 vs. Ha: µ < 120,
where µ = exp(η) (with η = µl+ σ2
unknown mean air lead levels within the lab facility. Using
again Algorithm 1, we computed the generalized pvalue as
l/2) denotes the actual
FIGURE 1.
Normal probability plots of (A) actual lead levels, and (B) logged air lead levels
648 Journal of Occupational and Environmental HygieneNovember 2006
Page 8
TABLE III.
centration of Metalworking Fluids (MWF) in 23 Plants
Summary Statistics for Airborne Con
MethodSample Size¯ xs
Thoracic MWF aerosol
(gravimetric analysis)
Closedface MWF analysis
23
−1.2770.835
23
−0.979 0.917
Note: ¯ x = mean of the logged data; s = standard deviation of the logged data.
0.97, and so we conclude that the data do not provide enough
evidence to indicate that the mean air lead levels within the
facility is less than 120 µg/m3.
Regarding the lognormal variance, we computed the maxi
mum likelihood estimate as 2337098 µg/m3. This estimate is
obtained by replacing µland σ2
and((n−1)s2/n).Wealsocomputeda95%confidenceinterval
forthelognormalvariance,usingthegeneralizedpivotstatistic
Tσ2 in (24), as (128538, 2956026772).
Finally,wecomputedthe95%CIforthegeometricstandard
deviation using the generalized variable in Eq. 26 as (3.57,
15.49);usingtheexactformulainEq.27,weget(3.57,15.53).
lin Eq. 21, respectively, by ¯ x
Example 2
In this example, we shall illustrate the generalized variable
procedures for testing the equality of the means of mea
surements obtained by two different methods. The data were
reportedinTableIofO’Brienet.al.,(23)andrepresenttotalmass
of metalworking fluids (MWF) obtained by thoracic MWF
aerosol and closedface MWF aerosol. Normal probability
plotsofloggeddataindicatedthatthelognormalityassumption
about the original data is tenable. The means and the standard
deviations of the logged data are given in Table III. Let µt
and µcdenote the true means of the airborne concentrations
by thoracic MWF aerosol and closedface MWF aerosol,
respectively. To test the equality of the means, we consider
H0: µt= µcvs. Ha: µt?= µc.
Using the summary statistics in Table III, we simulated
D = exp(Tη1)−exp(Tη2),whereTη1andTη2aregiveninEq.17,
100,000 times. The generalized pvalue for the above twotail
test can be estimated by 2 × min{proportion of Ds < 0,
proportionof Ds>0}.Oursimulationyieldedthegeneralized
pvalueof0.244.Thelower2.5andtheupper2.5percentilesof
D form a 95% confidence interval for the difference between
the means and is computed as (–0.657, 0.145). Thus, at the
5% level, both generalized pvalue and the confidence interval
indicate that there is no significant difference between the
means.
CONCLUSIONS
S
everal attempts have been made in the literature for
drawing inferences concerning the mean of a single
lognormal distribution. To a much lesser extent, attempts have
also been made to draw inferences for the ratio of the means
of two lognormal distributions. These problems have certain
inherent difficulties associated with them, and the available
solutions are either approximate, or are applicable only to
large samples, or are difficult to compute. In this article, we
have explored a novel approach for solving these problems,
based on the concepts of generalized pvalues and generalized
confidence intervals. It turns out that these concepts provide
a unified and versatile approach for handling any parametric
function associated with one or two lognormal distributions.
Even though analytic expressions are not available for the
resulting confidence intervals or pvalues, their computation is
botheasyandstraightforward.Wehaveprovidedthenecessary
programs for their computation, and we have also illustrated
our approach using several examples dealing with the analysis
of exposure data. In writing this article, our intention has
been to draw the attention of industrial hygienists to this new
methodology.
ACKNOWLEDGMENT
T
his research was supported by a grant from the National
Institute of Occupational Safety and Health (NIOSH).
REFERENCES
1. Oldham, P.: The nature of the variability of dust concentrations at the
coal face. Br. J. Ind. Med. 10:227–234 (1953).
2. Tuggle, R.M.: Assessment of occupational exposure using onesided
tolerance limits. Am. Ind. Hyg. Assoc. J. 43:338–346 (1982).
3. Rappaport, S. M., and S. Selvin: A method for evaluating the mean
exposure from a lognormal distribution. Am. Ind. Hyg. Assoc. J. 48:374–
379 (1987).
4. Selvin, S., and S.M. Rappaport: Note on the estimation of the mean
value from a lognormal distribution. Am. Ind. Hyg. Assoc. J. 50:627–630
(1989).
5. Selvin, S., S. M. Rappaport, R. Spear, J. Schulman, and M. Francis:
A note on the assessment of exposure using onesided tolerance limits.
Am. Ind. Hyg. Assoc. J. 48:89–93 (1987).
6. Borjanovic, S.S., S.V. Djordjevic, and M.D. VukovicPal: A method
for evaluating exposure to nitrous oxides by application of lognormal
distribution. J. Occup. Health 41:27–32 (1999).
7. Saltzman, B.E.: Health risk assessment of fluctuating concentrations
using lognormal models. J. Air Waste Manag. Assoc. 47:1152–1160
(1997).
8. Lyles,R.H.,andL.L.Kupper:Onstrategiesforcomparingoccupational
exposure data to limits. Am. Ind. Hyg. Assoc. J. 57:6–15 (1996).
9. Rappaport,S.M.:Assessmentoflongtermexposurestotoxicsubstances
in air. Ann. Occup. Hyg. 35:61–121 (1991).
10. Lyles, R.H., L.L. Kupper, and S.M. Rappaport: Assessing regulatory
compliance of occupational exposures via the balanced oneway random
effects ANOVA model. J. Agric. Biol. Environ. Statist. 2:64–86 (1997).
11. Krishnamoorthy, K., and T. Mathew: Onesided tolerance limits in
balanced and unbalanced oneway random models based on generalized
confidence limits. Technometrics 46:44–52 (2004).
12. Land, C.E.: Hypotheses tests and interval estimates. In Lognormal
Distribution(E.L.CrowandK.Shimizu,eds.).NewYork:MarcelDekker,
1988. pp. 87–112.
13. Zhou, X.H., and S. Gao: Confidence intervals for the lognormal mean.
Statist. Med. 16:783–790 (1997).
Journal of Occupational and Environmental HygieneNovember 2006649
Page 9
14. Armstrong, B.G.: Confidence intervals for arithmetic means of lognor
mally distributed exposures. Am. Ind. Hyg. Assoc. J. 53:481–485 (1992).
15. Land, C.: An evaluation of approximate confidence interval methods for
lognormal means. Technometrics 14:145–158 (1972).
16. Hewett, P., and G.H. Ganser: Simple procedures for calculating
confidence intervals around the sample mean and exceedance fraction
derived from lognormally distributed data. Appl. Occup. Environ. Hyg.
12:132–142 (1997).
17. Taylor, D.J., L.L. Kupper, and K.E. Muller: Improved approximate
confidenceintervalsforthemeanofalognormalrandomvariable.Statist.
Med. 21:1443–1459 (2002).
18. Zhou, X.H., S. Gao, and S.L. Hui: Methods for comparing the means of
two independent lognormal samples. Biometrics 53:1129–1135 (1997).
19. Krishnamoorthy,K.,andT.Mathew:Inferencesonthemeansoflognor
mal distributions using generalized pvalues and generalized confidence
intervals. J. Statist. Plan. Infer. 115:103–121 (2003).
20. Tsui, K.W., and S. Weerahandi: Generalized pvalues in significance
testingofhypothesesinthepresenceofnuisanceparameters.J.Am.Statist.
Assoc. 84:602–607 (1989)
21. Weerahandi, S.: Generalized confidence intervals. J. Am. Statist. Assoc.
88:899–905 (1993).
22. Weerahandi,S.:ExactStatisticalMethodsforDataAnalysis.NewYork:
SpringerVerlag, 1995.
23. O’Brien, D.M., G.M. Piacitelli, W.K. Sieber, R.T. Hughes, and J.D.
Catalano: An evaluation of shortterm exposures to metal working fluids
in small machine shops. Am. Ind. Hyg. Assoc. J. 62:342–348 (2001).
APPENDIX 1
The Generalized Confidence Interval
and Generalized PValue
A general setup where the concepts of generalized con
fidence intervals and generalized pvalues are defined is as
follows. Consider a random variable X whose distribution
depends on a scalar parameter of interest θ and a nuisance
parameter (parameter that is not of direct inferential interest)
η, where η could be a vector. Here X could also be a vector.
Suppose we are interested in computing a confidence interval
forθ.Letx denotetheobservedvalueof X,thatis,x represents
the data that has been collected. To obtain a generalized
confidence interval for θ, we need a generalized pivot statistic
(thepivotalquantitybasedonwhichinferentialprocedureswill
be developed) T1(X;x,θ,η) that is a function of the random
variable X, the observed data x, and the parameters θ and η,
and satisfying the following two conditions:
(i) Given x, the distribution of T1(X;x,θ,η) is free of the
unknown parameters θ and η;
(ii) The observedvalue
T1(x;x,θ,η) is equal to θ.
of
T1(X;x,θ,η),namely,
(A1)
The percentiles of T1(X;x,θ,η) can then be used to obtain
confidence intervals for θ. Such confidence intervals are
referred to as generalized confidence intervals. For example,
if T1−αdenotes the 100(1 − α)th percentile of T1(X;x,θ,η),
then T1−α is a generalized upper confidence limit for θ. A
lower confidence limit or twosided confidence limits can be
similarly defined.
Now suppose we are interested in testing the hypothesis
H0: θ ≤ θ0
vs.
Ha: θ > θ0,
(A2)
where θ0is a specified quantity. Suppose we can define a gen
eralized test variable T2(X;x,θ,η) satisfying the following
conditions:
(i) For a given x, the distribution of T2(X;x,θ,η) is free of
the nuisance parameter η;
(ii) Theobserved value
T2(x;x,θ,η) is free of any unknown parameters;
(iii) For a given x and η, the distribution of T2(X;x,θ,η) is
stochastically monotone in θ (i.e., stochastically increas
ing or decreasing in θ).
of
T2(X;x,θ,η),namely,
(A3)
In general, for a given x and η, T2(X;x,θ,η) is stochas
tically decreasing in θ, and the generalized pvalue for
testing Eq. A2 is given byP (T2(X;x,θ,η) ≤ t), where t =
T2(x;x,θ,η). On the other hand, if T2(X;x,θ,η) is stochas
tically decreasing in θ, then the generalized pvalue for
Eq. A2 is defined as P (T2(X;x,θ,η) ≥ t). In general, the
observed value t is equal to θ0, and as the distribution
of T2(X;x,θ,η) is free of the nuisance parameter η, the
generalized pvalue at θ0can be computed using Monte Carlo
simulation.
APPENDIX 2
Algorithm for Computing the Generalized PValue
and the Generalized Confidence Interval
The following algorithm given by Krishnamoorthy and
Mathew(17)can be used for computing the generalized pvalue
and the generalized confidence interval.
For a given logged data set, compute the observed sample
mean and variance, namely, ¯ x ands2, respectively.
For i = 1 to m
Generate a standard normal variate Z
Generate a chisquare random variate V2with degrees of
freedom n − 1
Set Tηi= ¯ x −
Z
V/√n − 1
s
√n+
s2
2V2/(n − 1)
Set Ki= 1 if Tηi> ln(LTA  PEL), else Ki= 0
(end i loop)
1
m
m ?
i=1
Kiis the generalized pvalue for testing the hypothe
ses in Eq. 2. The 100(1 − α)th percentile of Tη1,...,Tηm,
denoted by Tη,1−α, is the 1 − α generalized upper confidence
limitforη = µl+σ2
generalized upper limit for the lognormal mean.
Based on our experience, we recommend simulation con
sistingofatleast100,000(i.e.,thevalueofm)togetconsistent
results regardless of the initial seed used for random number
generation. The above algorithm can be easily programmed
in any programming language. A Fortran program and SAS
codes for computing generalized pvalues for onetail tests
and onesided confidence limits is posted at http://www.ucs.
louisiana.edu/∼kxk4695. Interested readers can download
these files from this address.
l/2.Furthermore,exp(Tη,1−α)isthe1−α
650Journal of Occupational and Environmental HygieneNovember 2006