Content uploaded by Rajeev Dhawan
Author content
All content in this area was uploaded by Rajeev Dhawan on May 26, 2014
Content may be subject to copyright.
Stochastic Frontier Production Function With
Errors-In-Variables
Rajeev Dhawan* =
Anderson Graduate School of Management
University of California, Los Angeles
&
Peter Jochumzen
Department of Economics
Lund University
July 1999
JEL Codes: C15;C21;D24
Keywords: Errors-In-Variables; Stochastic Frontier;
Technical Efficiency; Reliability Ratio
Abstract
This paper develops a procedure for estimating parameters of a cross-sectional stochastic
frontier production function when the factors of production suffer from measurement
errors. Specifically, we use Fuller’s (1987) reliability ratio concept to develop an
estimator for the model in Aigner et al (1977). Our Monte-Carlo simulation exercise
illustrates the direction and the severity of bias in the estimates of the elasticity
parameters and the returns to scale feature of the production function when using the
traditional maximum-likelihood estimator (MLE) in presence of measurement errors. In
contrast the reliability ratio based estimator consistently estimates these parameters even
under extreme degree of measurement errors. Additionally, estimates of firm level
technical efficiency are severely biased for traditional MLE in comparison to reliability
ratio estimator, rendering inter-firm efficiency comparisons infeasible. The seriousness
of measurement errors in a practical setting is demonstrated by using data for a cross-
section of publicly traded U.S. corporations.
* Corresponding Author: The mailing address is 110 Westwood Plaza, Los Angeles, CA 90095.
E-mail: rdhawan@anderson.ucla.edu, Phone: (310) 206-2529 and Fax: (310) 206-9940.
=This research originated in Jochumzen’s Ph.D. dissertation. The authors wish to thank Alok
Bohara, Kurt Brännäs, David Edgerton, Kishore Gawande and Almas Heshmati for their
numerous comments and suggestions that substantially improved this paper. We would also like
to thank seminar participants at Lund University, University of New Mexico, conference
participants at the econometric conference held at Stockholm School of Economics in May 1997,
and the 3rd Biennial Georgia Productivity Workshop held at University of Georgia in October
1998 for their helpful comments and suggestions. However, any errors are the responsibility of
the authors.
1. Introduction
Since the development of the stochastic frontier production function (SFPF) by Aigner et
al (1977) and Meeusen and van den Broeck (1977), evaluating the efficiency of individual firms
and industries has become popular with the increasing availability of firm-level data and growing
capacity of computers to process them1. Econometrically, the most common approach to
estimate SFPF is to specify a deterministic, parametric production function common to all
economic units and a stochastic component that consists of a two-part error term2. The first
component of this error term is a symmetric disturbance that represents statistical noise and
follows a normal distribution. The second part is a one-sided error that represents technical
inefficiency and which typically follows a half-normal or truncated normal distribution3. This
approach implicitly assumes that the explanatory variables or factors of production are measured
without any errors. It is difficult to justify that the data collected on input use represents true
measurements of their theoretical counterparts. For example, a major source of measurement
error in measuring flow of services from capital is the lack of information on the vintage nature
of capital stock. Additionally, the labor usage figures reported by firms can also be mismeasured
due to the non-availability of information on the skill levels of the labor force being used.
Consequently, the bias introduced by measurement errors can potentially be rather severe and
this paper devises a methodology to investigate the severity of this bias.
1 Both economists and policy makers have made use of this trend as the notion of frontier is consistent with the
theory of optimization in addition to identifying factors that can explain relative efficiencies of economic units. A
partial list of studies that use the SFPF approach for efficiency measurement related issues is: Kumbhakar
(1987,1988), Battese and Coelli (1992), Bauer (1992), Kumbhakar and Hjalmarsson (1995) and Dhawan and Gerdes
(1997).
2 Bauer (1990) and Greene (1993) contain detailed surveys of different econometric techniques for estimating SFPF
and technical efficiency.
3 An estimate of the technical inefficiency is then obtained from the mean or mode of the conditional distribution of
the one-sided error term given the composed error term (Jondrow et al 1982).
3
We utilize and extend Fuller’s (1987) reliability ratio concept to investigate the
measurement bias due to errors-in-variables in cross-sectional SFPF framework4. Briefly, the
concept is as follows. If x is the true (but unobserved) value of a variable, u is the measurement
error and z = x + u is the observed measurement, then the reliability ratio may be defined as the
ratio of variance of x to z. This means that a variable with no measurement error (z = x) has a
reliability ratio of one. Thus, lower the value of reliability ratio, higher is the degree of
measurement errors in the observed data. Next, given the reliability ratio consistent estimates
can be derived for the parameters of the model under consideration. As the reliability ratio may
be unknown most of the time in a practical setting, the best alternative is to derive a range of
estimates given plausible values for the reliability ratio5. One can then examine the sensitivity of
the estimates to this range in the reliability ratio. For example, while estimating the SFPF model
an issue is how sensitive is the estimated firm-specific technical efficiency to the degree of
measurement errors in the input data.
The outline of the paper is as follows. In section 2, we set up the cross-sectional SFPF
with no measurement errors as developed in Aigner, Lovell and Schmidt (1977). We then define
the setup of a SFPF model when inputs (capital and labor) are measured with errors and illustrate
how to estimate it. Section 3 presents a Monte-Carlo simulation study that illustrates the
superiority of the method developed to deal with measurement errors in section 2. An empirical
4 The concept of reliability ratio is not new and Fuller (1987) contains a detailed exposition of how to use this
concept to derive maximum-likelihood estimate of a multiple regression equation when the reliability ratio is
known.
5 The word plausible is used in the statement as not all degrees of measurement errors or reliability ratios can be
supported by the data in a practical setting. Thus, in this paper we also derive the expression for the upper bound for
the variance of the measurement error (lower bound for the reliability ratio) that can be supported by a given
empirical data set.
4
example based on U.S. firm-level data is given in section 4. Section 5 concludes with a
summary of findings and directions for future research.
2. SFPF in Cross-Section
2.1 Basic set-up
Starting with Aigner et al. (1977), the cross sectional SFPF may be written as:
yi = f( xi ; β) + ε i - ξ i . (1)
where, for each firm, yi is output in logs and xi is the (actual) k × 1 vector of inputs in log terms.
εi is an IID random variable which represents the statistical noise to the production and β is a k ×
1 vector of unknown parameters. Aigner et al. assume that εi ∼ N(0, σε2) so that the maximum
output firm i can produce using xi is then f(xi ; β ) + εi. They also focused on a linear model, i.e.
f(xi ; β ) = α + xi'β that is consistent with the Cobb-Douglas type production function
assumption. Technical inefficiency is then introduced as a positive random variable ξi. The
most common assumption made in the literature is that ξi follows a truncated normal with mean
zero (the positive half), i.e. ξi ∼ iidN+(0, σξ2) 6'7. Defining ei = εi - ξi as the compound residual,
equation (1) can be written as:
yi = α + xi'β + ei.(2)
The density of ei is well known (see Weinstein (1964)) and given below:
fee e
eii i
( ) exp
=
L
N
M
M
O
Q
P
P
−
L
N
M
O
Q
P
1
2 2 2
2 2
2
2
πσ
λ
σσ
erfc (3)
where λ
σ
σ
ξ
ε
= , σ σ σ
εξ
2
2
2
= + , σλσ
λ
ξ
2
2
2
2
1
=
+
, σσ
λ
ε
2
2
2
1
=
+
6 Other common assumptions are exponential (Meeusen and van den Brock (1977) and also in Aigner, Lovell and
Schmidt (1977)), gamma (Beckers and Hammond (1987) and truncated normal with non zero mean (Stevenson
(1980)). See Greene (1997a) for a detailed discussion regarding the merits and shortcomings of these different
distributional assumptions.
7 Where the variance of ξ i is equal to (1-2/π)σξ2.
5
and erfc(x) is defined as 1-erf(x) where the error function erf(x) is defined as:
erf( )x e dt
t
x
=−
z
2
2
0
π.
λ is directly related to the skewness of e while σ2 is directly related to the variance of e8. The
empirical distribution of e is key to estimation of the SFPF model in equation (2) as the variance
of the technical efficiency (σξ2) is derived from the estimated skewness of e.
With no measurement errors, estimation of parameters (α, β’, λ and σ2) in equation (2) is
straightforward. OLS will give us a consistent but inefficient estimate of β and an inconsistent
estimate of α. OLS, however, would not allow us to estimate the variance of the technical
efficiency σξ2 9. An alternative is to use maximum likelihood method. The log-likelihood
function following Aigner et al. (1977) can be written as :
lfy x
ei i
i
n
(, , , )ln ( )
αβ λ σαβ
2
1
= − −
=
∑
=− −
L
N
M
M
O
Q
P
P
R
S
|
T
|
U
V
|
W
|−− − −
= =
∑ ∑
ln (') ( ')ln( )erfc λαβ
σ
αβ
σπσ
y x y x n
i i
i
ni i
i
n
2222
2
1
2
2
1
2(4)
where the above likelihood can easily be modified for a different specification of the production
function f(xi ; β) as well as for different distributional assumptions on ξi.
2.2 Technical efficiency
The production function implicit in equation (2) when written in level terms is:
YX
i ij
j
k
ji i
=
L
N
M
O
Q
P
=
−
∏
e e e
αβεξ
1
8 Note that σ2 is not exactly the variance of e which is given by the following expression:
Var Var Var( ) ( ) ( )e= + = + −
F
H
I
K
εξσπσ
εξ
2 2
12
9 But OLS does a good job of testing whether the error term is skewed or not. Another alternative would be to use
Corrected OLS which gives consistent but inefficient estimates of the regression parameters (see Greene (1997)).
6
where the output Yi and input Xij are in levels. Thus, the technical efficiency of firm i
can be defined as ϕi = exp(-ξi)10 and it may be interpreted as the percentage of maximum
possible output achieved when the residual εi is zero. This involves the technical inefficiency
effect ξi , which is an unobservable random variable. Even if the true value of the parameter
vector β was known, only the difference, ei = εi - ξi, could be observed. Thus, if ei is the
compound residual for firm i , the predictor of firm-specific technical efficiency is the random
variable (ϕi | ei). The expected value of (ϕi | ei), with ei replaced by the residual
$
e
i
, is our best
prediction of technical efficiency for firm i given the residual $$$
e y x
i
i
i
= − − ′
αβ. Jondrow et al.
(1982) compute E(ξi |
$
e
i
) as the predicted technical inefficiency. This is based on the
approximation that ξi = -lnφi ≈ 1 - φi. We prefer to avoid this approximation and calculate
directly the conditional expected value of φi as suggested by Battese and Coelli (1988). This
after some tedious algebra is11:
E[exp(-ξi) |
$
e
i
] = exp 2
21 1
2 2
2
21
$ $ $
e e e
i i i
+
R
S
|
T
|
U
V
|
W
|−+
L
N
M
M
O
Q
P
P
R
S
|
T
|
U
V
|
W
|−L
N
MO
Q
P
R
S
T
U
V
W
−
σ σ
σ
σλ
σ
λ
σ
εξε
d
i
d
i
Φ Φ (5)
The average or mean technical efficiency of firms is straightforward to derive as:
E[exp(-ξi)] = E(ϕi) = 2exp(σξ2/2)[1-Φ(σξ)] (6)
2.3 SFPF with errors-in-variables
Let xi denote the actual (unobserved) k × 1 vector of inputs in log terms for firm i. We also
observe an equal number of variables zi which are related to xi in the following manner:
zi = xi + ui(7)
10 Since ξi ≥ 0, the technical efficiency will always be between zero and one. A firm is defined as fully efficient or
located on the frontier if ξi = 0 in which case the technical efficiency measure is equal to one.
11 Note that this expression converges to exp( $
ei) as σε2 → 0 which is what we should expect since $
ei in this case
estimates -ξi.
7
where ui denotes the k × 1 vector of measurement errors (also in logs) and E(ui | xi) = 0. This is
the error model approach12, which implies that the measurement errors are multiplicative in
nature13. In addition this model is non-calibrated in the sense that zi is not specified to be related
to xi in a systematic manner 14. The non-calibrated model is more suitable as we have no reason,
a priori, to believe that there is a systematic bias in the measurements of inputs. Given the
production function specification in equation (2), and assuming normal distributions for the
measurement errors and the unobserved explanatory variables, we get the following model
specification:
yi = α + xi'β + εi - ξ i
zi = xi + ui , where E(ui | xi) = 0, (8)
εi ∼ N(0, σε2), ξi ∼ iidN+(0, σξ2), xi ∼ N(µ,Σx) and ui ∼ N(0, Σu).
This specification implies that x and u are independent and that zi ∼ N(µ,Σz) with Σz = Σx
+ Σu.15 Finally, we assume that the true factors (xi), the measurement errors (ui), the stochastic
frontier error term (εi) and the technical inefficiency (ξi) are all independent random variables for
all i.
12 The other approach to modeling errors is the Berkson model where xi = zi + vi , and vi are the measurement errors.
Here, E(vi | zi) = 0 which implies that the expectation conditional on observed zi is zero as opposed to the error
models where the expectation conditional on actual xi is zero. In addition, in the error model, the observed value is
correlated with the measurement error while the actual value is correlated with the measurement error in the Berkson
model. Which approach to use depends on what one believes is independent of the measurement errors: the true
values of inputs or the observed values. In our view it is natural to assume that the true values are independent of
the measurement errors and thus we follow the error model approach.
13 They are multiplicative in the sense that that exp(zi) = exp(xi) · exp(ui) where exp(zi) is the actual observations on z
(not in logs) and similarly for x and u.
14 An error model would be calibrated if it was specified that zi = γ0 + γ1xi + ui.
15 Pal, Neogi and Ghosh (1998) have analyzed a similar setup with nonstochastic explanatory variables (both
observed and unobserved z and x). Although this approach has some advantages, in the sense that the the estimated
coefficients do not depend on the distributional assumption made for x and z and the likelihood function is easier to
derive, it is an unconventional assumption in the errors-in-variable literature (see for example Fuller (1987) and
Greene(1997b)). It is hard to motivate the errors-in-variables assumption that zi = xi + ui assuming that the
measurement errors are stochastic without assuming that x and z are stochastic too.
8
At this point it is clear that the model (8) is unidentifiable as we only observe zi while xi
and ui are unobserved. This implies that Σu and Σx cannot be identified separately which means
that ones needs additional information such as instrumental variables. As appropriate
instruments for labor and capital are hard to justify, one can then try to identify the “consistent
bounds” for parameter β as advocated by Klepper and Leamer (1984). In Klepper and Leamer
approach ones identifies all β’s that imply positive estimates for the variance of ei = εi - ξi and
positive semi-definite estimates of Σx and Σu.16 For the specification in equation (8) we cannot
use this approach as OLS does not provide estimates for σξ2 which is needed to calculate the
conditional as well as unconditional technical efficiency estimates. Instead, we will follow (and
extend) Fuller’s reliability ratio approach which, although computationally more expensive,
allows us to use maximum-likelihood method to consistently estimate all the parameters of the
model as well as the technical efficiencies.
The relevant reliability ratios are defined as following:
πi = Var(xi)/Var(zi) i = 1,…,k(9a)
πij = Cov(xi, xj)/Cov(zi, zj) = Cov(xi, xj)/[Cov(xi, xj) + Cov(ui, uj)]
i = 1,…,k and j = 1,…,k, i ≠ k (9b)
where πi is the (traditional) reliability ratio associated with variable i, 0 ≤ πi ≤ 1 and πi is equal to
one if there are no measurement errors for variable i. Typically, one specifies the covariance
matrix of the measurement errors u to be diagonal (see for example Klepper and Leamer (1984).
If this is the case, the reliability ratio πi’s are all we need for identifying the model. However, we
would like to allow for covariances between the different measurement error, ui’s, which then
16 The consistent bounds are found by running k + 1 regressions and if all these regressions are in the same orthant,
then the set of maximum likelihood estimates will be the convex hull of these estimates.
9
warrants the introduction of πij. πij is the ratio of the true (unobserved) covariance between two
variables to the observed covariance and will be called the covariance reliability ratio. Note that
if one assumes that the measurement errors of different variables are uncorrelated, then all the
πij’s are equal to one17,18. Since xi and ui are independent, it follows that:
Var(ui) = (1 - πi)Var(zi) i = 1,…,k(10a)
Cov(ui, uj) = (1 - πij)Cov(zi, zj) i = 1,…,k and j = 1,…,k, i ≠ k.(10b)
It is possible to collect all the reliability ratios into one matrix Π where:
Π=
L
N
M
M
M
O
Q
P
P
P
π π
π π
1 1
1
L
MOM
L
k
k k
With this notation, we may write, Σx = Π.*Σz and Σu = (1 - Π).*Σz where the notation “ .* “
means element by element multiplication and where 1 is a k × k matrix of ones. Because zi is
observed, it is possible to estimate the variance matrix Σz. In order to identify the model we have
specified so far we must also know either one of Π, Σx or Σu19.
2.4 Maximum likelihood estimation
Our goal is to estimate the parameters α, β, σε2, σξ2, or equivalently, α, β, σ2 and λ, in model (8)
given different values for reliability ratio Π. However, the joint distribution of the observations
on yi and zi will involve all the parameters. Given the matrix of reliability ratios Π these
parameters are given by ω´ = (µ´, vech(Σz)´, α, β´, λ, σ2). We will use a two-step procedure also
known as the method of limited information maximum likelihood (LIML) to maximize the
17 Also, unless the measurement errors are negatively correlated, the πij’s will be less than one.
18 If the variables xi and xj are uncorrelated, we let πij be equal to zero, irrespective of the fact whether the
measurement errors are correlated or uncorreleated.
19 Now, even if the reliability ratio is unknown for the particular variables under study one can then investigate the
sensitivity of the estimates to the changes in the reliability ratio.
10
likelihood function20. In the first step of this procedure we use the sample moments of z to
estimate µ and Σz. Substituting these estimates into the full likelihood will, under weak assump-
tions, lead to consistent estimates of (α, β´, λ, σ2). This second stage likelihood function is much
easier to maximize than the full likelihood, but the estimated covariance matrix of (α, β´, λ, σ2)
at this stage will no longer be consistent. However, by using the results of Murphy and Topel
(1985), we will show how the estimated covariance matrix of (α, β´, λ, σ2) can be adjusted to
yield consistent results.
Now, if we assume that x, and therefore z, is weakly exogenous with respect to (α, β´, λ,
σ2) we can then write the log-likelihood of the parameters as:
l( ) ln ,ln ln , ,ω ω ω ω ω= = +
= = =
∑ ∑ ∑
fyzfzfyz
i i
i
n
i
i
n
i i
i
n
b
g
c h c h
1
1 1
1
2 1 2
1
(11)
where ω1´ = (µ´, vech(Σz)´) and ω2´ = (α, β´, λ, σ2).
In LIML procedure we first maximize the first part of the likelihood expression in (11):
l1 1 1 1
1
( ) lnω ω=
=
∑fzi
i
nc h (12)
over w1. This will give us an estimate $( )
ω
1
z, which is then used to maximize the
remaining
portion of the likelihood:
l2 1 2 2 1 2
1
($,)ln ,$(),ω ω ω ω=
=
∑fyz z
i i
i
nc h (13)
over w2. This gives us an estimate $(,)
ω
2
yz. The exact expression for the first-step
likelihood is:
l12 2 121
= − − − − ′−
−
−
constant nznz z nz
Sz zln ( ) ( )Σ Σ Σtr µ µ (14)
20 Estimating these parameters using full information maximum likelihood (FIML) could not be done as the
likelihood function was too flat, mostly due to the elements in ΣΣz, to allow us to find the maximum.
11
where Sz z z z
zni i
i
= − − ′
∑
1()( ). It is well known that
z
and Sz are the ML estimates of µ and Σz,
and that they are independently distributed as
N
n
z
(
,
)
µ
1
Σ
and n
n n zn
−−
111
b g
b
g
Wishart Σ,. The
sample moments thus consistently estimate the population moments.
These results enable us to calculate an estimate of V1, the asymptotic covariance matrix
of $($,($) ) ( ,( ) )
′=′ ′ =′ ′
ωµ
1
vech vechΣ
z
z
zS. Although we are not directly interested in an estimate of
V1 but is needed later to obtain consistent estimate V2 which is the asymptotic covariance matrix
of $(,)
ω
2
yz. The asymptotic covariance matrix V1 is of dimension 1
23k k( )+ and can be written
as following:
VV V
V V
1
1 1
1 1
=
F
H
G
I
K
J
µµ µσ
σµ σσ
.
If we define
S
z
z
z
z
i i i
=
−
−
(
)(
)
then we can write following Edgerton and Jochumzen (1999) the
elements of V1 as21:
$
VS
nz11
µµ =
$)( )) )( ))VS S S S
n n i i
iz z1vech( vech( vech( vech(
σσ =′−′
∑
1 1
d
i
{
}
$ $ ()( ( ))V V z z S
ni i
i
1 1 1
2
µσ σµ
= = − ′
∑
vech
n
s
To calculate the second-step likelihood we need to find the conditional distribution of y given z.
Let v be the random variable x'β. Then by considering the joint density of y and v we have:
fyzfy v zdv fy v zfvzdv
yzy v z
v
y v zvz
v
|,| | ,|
(|) ( ,|) ( |,) ( |)= =
z
z
(15)
Because we are conditioning on z and x'β, we have:
f
y
v
z
f
y
v
v
z
fy v v fy v
y v ze v z
e v e
|,|,
|
(
|
,
)
(
|
,
)
(|) ( )
=
−
−
= − − = − −
α
α α (16)
The last equality follows as e is independent of z and x'β. By substituting (16) into (15) we get:
fyzfvzfy v dv
yzvze
v
| |
| | ( )
b
g
b
g
= − −
z
α(17)
21 Note that if zi is normal, the estimate of V1σµ will be zero.
12
where the integral is a single one over all possible values of x'β. Since x and z are normal, v | z
will also be normal. Straightforward application of the results for conditional density of a
multivariate normal gives us the expected value and variance of x'β | z 22
E(v | z) = µ'β + (z - µ)'Σz-1Π.*Σzβ(18a)
and
Var(v | z) = β '(Π.*Σz - Π.*Π.*Σz)β = β '(Π.*Σz .*Πc)β(18b)
where Πc = 1 - Π.
Thus, the density fv/z can be written as:
fvzxz
vz
zc
z z
zc
|(|)'.* .* exp ( ) .*
.* .*
= − ′−′− − ′
′
L
N
M
M
O
Q
P
P
−
1
22
12
πβ β
βµβµβ
β β
ΠΣΠ
ΣΠΣ
ΠΣΠ
c h c h (19)
Therefore, the conditional density of yi given zi is:
fyzy v
y v vzdv
i i
zc
i
v
iiz z
zc
(|)'.* .*
( )
exp ( ) ( ) .*
'.* .*
=− −
L
N
M
O
Q
P×
−− − −− − − ′
L
N
M
MO
Q
P
P
z
−
1
22
22
22
2
2
12
πσβ β
λα
σ
α
σ
µβ µβ
β β
ΠΣΠ
ΣΠΣ
ΠΣΠ
c h
c h
erfc
(20)
and the second step likelihood function l
2
is given by
l2 1 2 2 1 2
1
($,)ln ,$(),ω ω ω ω=
=
∑fyz z
i i
i
nc h =
1
22
22
22
1
2
2
12
πσβ β
λα
σ
α
σ
µβµβ
β β
'.*$.*
( )
exp ( ) $($)$.*$
'.*$.*
ΠΣΠ
ΣΠΣ
ΠΣΠ
zc
i
v
i
n
iiz z
zc
y v
y v vzdv
d i
d i
erfc − −
L
N
M
O
Q
P×
−− − −− − − ′
L
N
M
M
M
O
Q
P
P
P
z
∑
=
−(21)
22 Since E(xi'β) = µβ and Cov(xi'β, zi) = Σxβ we have
x
z
i
i
x x
xz
βµβ
µ
β β β
β
F
H
G
I
K
J∼
F
H
G
I
K
J
F
H
G
I
K
J
L
N
M
O
Q
P
N',' 'Σ Σ
Σ Σ
Then using the formula for the conditional distribution of xi'β given zi we get xi'β | zi ∼ N[µ'β + (zi - µ)'Σz-1Σxβ, β'(Σx -
ΣxΣz-1Σx)β]. This using Σx = Π.*Σz could then be written as N[µ'β + (z - µ)'Σz-1Π.*Σzβ, β '[Π.*Σz.*(1 - Π)]β].
13
The LIML estimates are found by substituting the first-step estimates of µ and Σz (
z
and
Sz) into (21) which is then maximized. It is possible to show that this integral will not simplify
unless every reliability ratio in Π is equal to one in which case the density above will converge to
the density of e in section 2. However, it is possible to evaluate the likelihood function
numerically for any matrix Π23. Note that the likelihood function will simplify further if all the
reliability ratios are equal to one common value say
)
π
. Conditional on this reliability ratio
)
π
, we
have:
E(xi'β | zi) = β 'µ +
)
π
β '(zi - µ) (22a)
Var(xi'β | zi) =
)
π
(1-
)
π
)β 'Σzβ(22b)
and a simpler form for the conditional density of y given z is:
fyzy v
y v v zdv
i i
z
i
v
i i
z
(|)( )
( )
exp ( ) ( ) ]
( )
=−′
− −
L
N
M
M
O
Q
P
P×
−− − −−′− − ′
−′
L
N
MO
Q
P
z
1
2 1 2
22 1
2 2
2
2
2
πσπ π β β
λα
σ
α
σ
µβ π µβ
π π β β
) )
)
) )
Σ
Σ
erfc
[(23)
The adjusted asymptotic covariance matrix of the second-step estimates ω2 (V
2
∗
) has to be
calculated as following:
V V V CV CRV CCV R V
2
2
2
1
1
1
2
∗
= + − −( ) (24)
where V2 is the unadjusted second-step covariance matrix. Also, C and R, following Murphy and
Topel (1985) who establish consistency of LIML under the usual regularity conditions, are
defined as24:
CE
=∂
∂
F
H
G
I
K
J∂
∂′
F
H
G
I
K
J
L
N
M
O
Q
P
l l
2
2
2
1
ω ω , R E
=∂
∂
F
H
G
I
K
J∂
∂′
F
H
G
I
K
J
L
N
M
O
Q
P
l l
2
2
1
1
ω ω .
23 We need to evaluate n integrals each time we calculate the value of the likelihood function which is not a problem
for a fast computer.
24 We used numerical derivatives to calculate C and R in our Monte-Carlo analysis.
14
2.5 Bounds on reliability ratio
Not all Π-matrices are possible as some Π-matrices will give rise to negative estimates of
the variance of εi. The reasoning is as following. Combining equations (2) and (7) one gets:
yi = α + zi'β + εi - ui'β - ξi (25)
where the error term is now composed of three parts. Since uiβ is normal, the skewness of the
data will determine the relative share of the variance between εi – uiβ on one hand and ξi on the
other in equation (25). Given this division, the data will determine variance of εi – uiβ
(suεβ−
2
=σε2 + β’Σuβ ), but not how suεβ−
2
is shared between the two terms εi and uiβ. With no
measurement errors, all the suεβ−
2
variance can be attributed to the variance of the residual ε. As
the variance of u increases, more of the suεβ−
2
variance will be due to the variance in uiβ. In the
extreme case all the variance of εi – uiβ is due to measurement errors. In this case σε2 = 0 and
suεβ−
2
= β'Σuβ. Attempting to increase the variance above this limit will result in negative
estimate of σε2. This in turn determines the lower limit for the reliability ratios. In practice, the
estimated variance of ε decreases when we decrease the reliability ratios and the lower limit of
the reliability ratios will be found where the estimated variance of ε goes to zero.
If we define bΠ as the estimated value of β given the reliability ratio matrix Π, the
restriction that the estimate of σε2 ≥ 0 is equivalent to bΠ 'ΣubΠ = bΠ '(1-Π.*Σz)bΠ ≤ suεβ−
2
which
is the restriction for determining feasible values of Π. Additional bounds on the reliability ratios
may be found if some other simplifying assumptions are made. This will be discussed in detail
in when doing the simulation exercise in section 3.
2.6 Technical efficiency with errors-in-variables
Our aim is now to estimate the technical efficiency of the firm and the mean for the
sample for different values of the reliability ratio Π. The expression for the mean technical
15
efficiency, E[exp(-ξi)], is the same as in equation (6) even with measurement errors since the
distribution of ξi is unaffected. However, the expression for the firm-specific technical
efficiency requires a slight modification. With measurement errors, the compound residual will
be given by ei* = εi - ui'β - ξi (see equation (25)) instead of ei = εi - ξi as in the case of no
measurement errors. Since ui'β is normal, the expression we derived for E[exp(-ξi) |
$
e
i
] in
equation (5) will be valid if we replace εi by εi - ui'β and redefine σ2 and λ as following:
σ*2 = σε2 + β'Σuβ + σξ2 and λ
σ
σβ β
ξ
ε
*=+′
2Σu
(26)
Consequently, the expression for conditional technical efficiency under errors will then be:
E[exp(-ξi)|
$
e
i
*]=
exp $ $ $
*
*
*
*
*
2
21 1
2 2
2
21
eee
iuiui
∗ ∗ ∗−
+ + ′
R
S
|
T
|
U
V
|
W
|−+ + ′
L
N
M
M
O
Q
P
P
R
S
|
T
|
U
V
|
W
|−
L
N
M
O
Q
P
R
S
T
U
V
W
σβ β σ
σ
σβ β λ
σ
λ
σ
εξε
ΣΦΣΦ
d
i
d
i
(27)
3. Simulation Study
3.1 Simulation set-up
This section compares the new estimator for the cross-sectional SFPF developed in the
previous section (henceforth called the EIV estimator) with the traditional ML estimator on
simulated data. The aim is to investigate the bias introduced by measurement errors in
estimating the production function parameters and the resulting technical efficiency estimates.
The model that we choose to simulate is a Cobb-Douglas production function with two inputs,
capital and labor. The choice of only two inputs was motivated by the desire to be as similar to
our empirical example presented in section 4 where the data allows for identification of only two
broadly defined category of inputs: total capital and total labor. In addition, the basic parameters
for simulation are chosen so as to closely mimic the actual data analyzed in section 4.
16
The starting point of the simulation is the following model specification:
ln(Yi) = α + βK ln(Ki) + βL ln(Li) + εi - ξi(28a)
where ln(Ki) = ln(Ki) + ln(UKi) and ln(Li) = ln(Li) + ln(ULi)(28b, 28c)
where Ki and Li are actual but unobserved amount of capital and labor of firm i and Ki and Li are
the measured counterparts. U, ε and ξ are as defined in section 2 and let σK2 denote the variance
of ln(Ki), σL2 the variance of ln(Li) and σKL the covariance between ln(Ki) and ln(Li).
Now, a slight modification of the model in (28) by writing it in per-capita terms is
preferred. This is achieved by subtracting ln(Li) from both sides of (28a) and subtracting (28c)
from (28b). There are three advantages of doing this. First, it is easier to find the maximum of
the likelihood function when regressing ln(Y/L) on ln(K/L) and ln(L) instead of regressing ln(Y)
on ln(K) and ln(L). Second, the parameter of ln(L) will directly estimate the degree of departure
from the constant returns to scale. Third, the per-capita specification allows us to find bounds on
the feasible reliability ratios as we discuss later in the next sub-section. Thus, after the
transformation the model in equations (28a,b,c) can be written as:
ln(Yi / Li) = α + βK ln(Ki / Li) + (βL + βK – 1)ln(Li) + εi - ξi.(29a)
ln(Ki / Li) = ln(Ki / Li) + ln(UKi / ULi), ln(Li) = ln(Li) + ln(ULi)(29b, 29c)
or equivalently as:
yi = α + xiγ + εi - ξi. (30a)
zi = xi + ui.(30b)
where yi = ln(Yi / Li), xi = [ln(Ki / Li), ln(Li)], zi = [ln(Ki / Li), ln(Li)], ui = [ln(UKi / ULi), ln(ULi)]
and γ = [βK, (βL + βK – 1)]25. We simulate x1, x2, u1 and u2 from normal distributions such that xi
25 Note that xi and ui are independent since ln(Ki / Li) and ln(UKi / ULi) are independent.
17
∼ N(0, 2
)
π
), ui ∼ N(0, 2(1-
)
π
)) with
)
π
being the common reliability ratio. Then by adding x to u,
we get z with the desired properties. Once x and u are simulated we then simulate εi from a N(0,
σε2) and ξi from a truncated N+(0, σξ2) with Var(εi) = 0.2 and Var(ξi) = 0.8 which implies that
σ2 = 1 and λ = 2. Finally, we create y by selecting α = 1.7, γ1 = 0.6 and γ2 = 0.1. This implies
increasing returns to scale with coefficients βK = 0.6 and βL = 0.5. This is the model structure
that we will use for our simulation study.
3.2 Restrictions on the reliability ratios
In contrast to the generalized bounds discussed in section 2.5, we will derive simplified bounds
when simulating independent series for x1 and x2. The independence assumption implies that the
covariance between actual ln(Ki / Li) and ln(Li) is zero. Obviously, we do not know what this
covariance is in reality but we can estimate the covariance between observed ln(Ki / Li) and
ln(Li). In the actual data that we have examined in section 4, this covariance is almost zero26.
Since Cov(z1, z2) = Cov(x1, x2) + Cov(u1, u2), and unless there is a reason to believe that the
covariance between the measurement errors of ln(Ki / Li) and ln(Li) is far from zero, it seems
reasonable to assume that Cov(x1, x2) is close to zero as well. The implications of setting these
covariances to zero are as following:
1. σKL = σL2 . 27
2. πKL = πL where again πKL is the “covariance reliability ratio” cov[ln(Ki), ln(Li)] / cov[ln(Ki),
ln(Li)]28. This simplification is very useful since it decreases the number of unknown
parameters from three to two when we do the simulations.
26 The actual correlation between ln(Ki / Li) and ln(Li) in the empirical data of section 4 was -0.03.
27 Since Cov(z1, z2) = Cov[ ln(Ki/Li), ln(Li) ] = Cov[ ln(Ki), ln(Li) ] – Var[ ln(Li) ] = σKL - σL2 =0, it follows.
28 Since Cov(x1, x2) = Cov[ ln(Ki/Li), ln(Li) ] = Cov[ ln(Ki), ln(Li) ] – Var[ ln(Li) ] = πKL·Cov[ ln(Ki), ln(Li) ] - πL·Var[
ln(Li) ]=0 and Cov[ ln(Ki), ln(Li) ] = Var[ ln(Li) ] from (1), this follows.
18
3. Var(z1) = Var[ ln(Ki/Li) ] = σK2 - σL2,Var(z2) = σL2.
4. Var(x1) = Var[ ln(Ki/Li) ] = πKσK2 - πLσL2, Var(x2) = πLσL2.
5. Var(u1) = Var[ ln(UKi/ULi) ] = (1-πK)σK2 - (1-πK)σL2,Var(u2) = (1-πK)σL2.
6. The reliability ratio of the variable ln(Ki/Li) expressed in terms of the reliability ratios of
Capital and Labor is Var[ln(Ki/Li) ] / Var[ ln(Ki/Li) ] = (πKσK2 - πLσL2)/(σK2 - σL2) by (4) and
(5). Note that if the reliability ratio of Capital and Labor are equal (to say
)
π
), then the
reliability ratio of ln(Ki/Li) is
)
π
itself.
By noting that the reliability ratio of ln(Ki/Li) must itself be between zero and one we can
find feasible bounds on the reliability ratios for capital and labor. These bounds are29:
πσ
σπ π σ
σ
σ σ
σ
LL
K
KLL
K
KL
K
2
2
2
2
2
2
2
≤ ≤ + −(31)
This expression evaluates to 0.845πL ≤ πK ≤ 0.845πL + 0.155 using σK2 = 8.15 and σL2 = 6.17 as
observed in the empirical data analyzed in section 4. These are powerful restrictions that
together with the condition that the estimate of σε2 be positive will limit the set of possible
reliability ratios that we can consider during the simulations. The table below shows the feasible
values for πK given the values of πL..
πL1.00 0.98 0.96 0.94 0.92 0.90 0.88 0.86 0.84 0.82 0.80 0.78 0.76 0.74
Min. πK0.84 0.83 0.81 0.79 0.78 0.76 0.74 0.73 0.71 0.69 0.66 0.66 0.64 0.62
Max. πK1.00 0.97 0.97 0.95 0.93 0.92 0.90 0.88 0.86 0.85 0.83 0.81 0.80 0.78
For each simulation round, we start by setting πL and πK in such a way that they are within the
bounds defined in equation (31). In practical terms this implies setting πL = πK as the bounds
29 The bounds are calculated by equating the expression (πKσK2 - πLσL2)/(σK2 - σL2) equal to 0 and 1, respectively.
19
expression do allow πL to be equal to πK 30. Thus, what matters during the simulations is whether
πL and πK are large (close to one) or small.
3.3 Simulation results: parameter estimates
Each simulation round consisted of 500 observations to estimate the parameters and this
was repeated 100 times. Table 1 presents the averages and the standard deviations of the
estimated parameters for the MLE method and the EIV estimator under different levels of
reliability ratios. In the table estimates of σξ2 and σε2 are derived from the estimates of σ2 and λ
using expressions defined in equation (3). The most striking result of the simulation study is the
severe downward bias in the MLE estimate of γ1 and γ2 as the common reliability ratio falls.
This implies that (where γ1 = βK and γ2 = βL + βK – 1) we underestimate the elasticity of capital
while we overestimate that of labor when there are measurement errors. For example, in the
simulated data the elasticity of capital was 0.6 while that of the labor was 0.5. With 80%
reliability in the data, the capital elasticity is underestimated by 20%, and for 70% reliability
ratio the estimates are completely reversed: 0.4175 for capital and 0.6499 for labor. Thus, the
biases are quite severe and clearly show the need for a procedure that consistently estimates the
elasticity parameters under even reasonable degree of measurement errors.
Table 1 results also imply that MLE tends to underestimate the return to scale parameter
γ2 . Therefore, if one wants to test for increasing returns, the MLE does a poor job whereas the
EIV estimator will pick out true increasing returns even for a 70% reliability ratio. Table 1 also
shows that the MLE based λ estimate is biased downward and σ2 is biased upwards. The
30 If we assume that the covariance between x1 and x2 is less than 0.05 in absolute terms, then -0.00727 + 1.0058πL ≤
πK ≤ 0.00727 + 1.0058πL is the equation that defines the bounds. Then if πL = 1, πKL must be between 0.9985 and
1.013 and if πL = 0.9, πKL must be between 0.8979 and 0.9124. Thus, setting πKL = πL seems to be the most
reasonable choice.
20
combined effect of these two on the estimate for the variance of technical efficiency (σξ2) is that
it seems to be estimated consistently whereas the variance of the residual (σε2) is biased upwards.
This is not surprising as the measurement errors being normally distributed, will be captured in
the σε2 term, thus biasing it upwards, leaving the σξ2 estimate to be unaffected. Thus, there will
be an upward bias in estimated σ2 and a downward bias in estimated λ under MLE. From table 1
it is clear that both σ2 and λ are consistently estimated by the EIV estimator. To sum, even with
extreme measurement errors, the EIV estimator succeeds in estimating the elasticity parameters,
returns to scale and the relevant variance estimates consistently.
3.4 Simulation results: technical efficiency
Table 2, column 2 presents the mean technical efficiency estimates for different
reliability ratios where the true value using equation (6) and σξ2 = 0.8 is 0.5536. Column 3 and 4
respectively have estimates of the expected value of the technical efficiency based on MLE and
EIV estimates from table 1. Here the MLE estimator does as well as the EIV estimator when it
comes to estimating the mean value of the technical efficiency. This happens because both MLE
and EIV produce estimates of σξ2 that are identical to the third decimal, and that is the only
parameter that determines the average technical efficiency. Hence, if you are only interested in
mean technical efficiency of the sample, you may just as well use the traditional MLE estimator,
even if the data suffers from measurement errors , as long as these are normally distributed.
Next, we analyze the technical efficiency of firm i once we have estimated the residual
for that firm. For each simulation round we compare the true technical efficiencies to the
estimated technical efficiencies calculated using the MLE and EIV techniques. This comparison
was done by calculating the average absolute deviation between the true technical efficiency and
the MLE and EIV estimates of it. The means and standard deviations of the absolute deviations
21
from the 100 simulation rounds are presented in table 3. It is clear from the table that EIV is
much more successful in estimating the firm-specific technical efficiency. The average absolute
deviation between the true value and the MLE estimate rises as the severity of the measurement
errors increases, unlike for the EIV estimator where the absolute deviation stays about the same.
In the last column of table 3 we report another test that proves the superiority of EIV over MLE
based firm-specific efficiency estimates. In this test the performance criterion is for what
percentage of the 500 observations the EIV technique results in an estimate closer to the true
value in comparison to the MLE. Based on this test EIV estimator based technical efficiency
estimate outperforms MLE very convincingly.
To summarize, the MLE estimator is seriously biased when it comes to estimating the
elasticity of labor and capital under measurement errors. MLE is also a poor choice if you want
to estimate the technical efficiency for a particular firm. However, both MLE and EIV estimator
estimate the mean technical efficiency level very well. Thus, in the presence of measurement
errors in the input data the EIV estimator developed in section 2 is the preferred choice.
4. Empirical Example
4.1 Data
In this section we examine the impact of measurement errors on SFPF estimates of a
production structure in actual data. We draw a cross-section of firms from the COMPUSTAT
industrial data files maintained by Standard and Poor. These files consist of all the publicly traded
firms on the U.S. stock exchanges for the period 1970-1989. The files provide information on
balance sheet components, cash flow and income statements and other relevant financial
information. The frequency of reporting is annual. We chose the year 1988 for our analysis as it
22
provided the most number of firms with relevant information31. The number of employees, Li , it
employs measures Labor use by a firm. Standard practice is to define labor in terms of hours
worked but this information is not available in COMPUSTAT. As we don't know the proportion of
skilled versus unskilled workers as well as their quality level, this imparts a source of measurement
error to our labor use variable. To calculate the output of a firm or the value added, Yi, the cost of
goods was subtracted from the sales figure32. To complete the value added calculations, total
inventories were added to the above measure. The measure of capital Ki is the book value of total
assets of a firm33. Thus, we have full information to estimate the production structure, and
accompanying level of technical efficiency for 484 firms.
4.2 Production structure
The model that we estimate is identical to the one considered in the simulation study (see
(30)). Tables 4a, 4b and 4c provide the necessary summary statistics for the data variables. In
particular note that the covariance between z1 and z2 is almost zero. As before, πK = Var(ln(K)) /
Var(ln(K)) and πL = Var(ln(L)) / Var(ln(L)) are the reliability ratios of capital and labor
respectively, while πKL is the “covariance reliability” ratio equal to Cov(ln(K), ln(L)) /
Cov(ln(K), ln(L)). As explained in the simulation section, it is reasonable to set πKL = πL if
31 We could have chosen the year 1989 which is the terminal year of the database. Because of non-reporting of
relevant information by quite a number of firms, the highest number of firms with usable information was present in
1988. Another reason for choosing 1988 was the fact that this year was characterized by a stable economic
environment, especially the inflation situation and financial market volatility.
32 Because the reporting procedure for the cost of goods component contains labor expenses, a component of the value
added by a firm, the labor expense component was added to the above calculation. Since not every firm reports this item
as an expense separate from cost of goods, this correction dropped the number of firms that could ultimately be used in
the analysis.
33 Using total assets as a proxy for productive, physical capital requires qualifications. First, this measure of assets
includes the current investment component of a firm. Second, this measure includes cash and other short term liquid
investments which may not be appropriate measures of physical capital. A justification for using this measure is the
theoretical models and empirical evidence that extend the notion of production structure by incorporating the effects
of liquidity and borrowing constraints [for e.g. see Gertler and Hubbard (1988), Dhawan (1997) etc.].
23
cov(z1, z2) is close to zero which is the case here. Also, we will only consider cases where πK =
πL =
)
π
34. Based on the summary statistics in table 4c, we can derive consistent estimates of the
expected value and the variance of z. Given a particular reliability ratio
)
π
, we can then find
consistent estimates of the expected value and variance of x as well as of the variance of u and
these are:
$.
.
µx=
F
H
G
I
K
J
4 93
115 $. .
. .
Σx=
−
−
F
H
G
I
K
J
)
π131 0 03
0 03 6 90 $( ) . .
. .
Σu= −
−
−
F
H
G
I
K
J
1131 0 03
0 03 6 90
)
π
Thus, given this information and the discussion regarding reliability ratio bounds in section 3.2,
the lowest possible value for the reliability ratio is 0.86. Any value lower than that is not feasible
given that data characteristics.
4.3 Parameter estimates
Table 5 presents the estimates of the parameters in equations (30a) and (30b) using three
techniques: OLS, the traditional MLE and the EIV estimator developed in this paper. The first
row presents the estimates when simple OLS technique is used which can be characterized as
estimating an “average” production function. As is well known, with no measurement errors,
OLS will provide us with consistent but inefficient estimates of γ, an inconsistent estimate of α
and no estimates for σε2 and σξ2. With measurement errors even the OLS estimate for the
parameter γ is inconsistent. In the second row the MLE based estimates are presented. Rows 3
to 9 display the estimates using the EIV technique based on likelihood function from equation
(23). Each row provides a set of estimates for a particular common reliability ratio. These
results should be interpreted as following: If the reliability ratio of labor and capital is 0.94 (say
34 Given a value of πL , πK may deviate according to table (A1) in the appendix but we found that the estimated
coefficients were not affected by setting it apart from πL .
24
for example), then the consistently estimated coefficients are in this row. Based on these
estimates for α, β, σ2 and λ, we can then derive estimates for the elasticity of capital and labor
(βK and βL) as well as the variances of ε and ξ (σε2 and σξ2) presented in table 6.
A number of interesting but not surprising results, given our simulation experience, are
apparent from Tables 5 and 6. First, MLE underestimates the elasticity of capital. According to
MLE, the return to capital is 0.6261 while it is as much as 0.7280 using the EIV technique and
for the reliability ratio is 0.86. We also find that MLE estimates return to scale very well which
then implies that it is over estimating the elasticity of labor. Second, as the reliability ratio
decreases the estimated λ increases while the estimated σε2 goes to zero. This happens because
as the reliability ratio decreases, the variance of uβ increases. Since it is the same data set, this
will happen at the expense of a decline in the variance of ε (σε2 ) and as it goes to zero λ which
is equal to σξ/σε will increase35. Third, we find that MLE estimates, σξ2, the variance of ξi very
well. This has important implications for the estimates of the technical efficiencies as discussed
later in the next sub-section. MLE also overestimates the variance in εi, which is natural, since it
assumes no measurement errors.
4.4 Estimates of Technical Efficiencies
We begin first by considering the mean or average technical efficiency under varying
degree of measurement errors presented in table 7. It is interesting to note that the average level
of firm efficiency is almost independent of the assumption on measurement errors. The EIV
estimates are also close to the MLE estimate of the average technical efficiency. This happens
35 As a matter of fact,
)
π
= 0.86 is a lower bound for the reliability ratios. There simply is not enough variation in
the data to support more measurement errors than this. With
)
π
= 0.86, the only disturbance to the model, except for
the technical inefficiencies, are measurement errors as ε vanishes in this case.
25
because the only parameter that determines the distribution of the technical efficiencies, σξ2, is
almost identical for MLE as well as for EIV technique regardless of the degree of measurement
errors. At first, this may suggest that measurement errors are not an issue when it comes to
technical efficiencies. However, as we know from the results of the simulation section, the EIV
estimator outperforms the MLE estimator for firm-specific efficiency quite well.
Given that we have 486 firms, it is not possible to present the estimates of technical
efficiency for each firm. To get an idea of the bias caused by measurement problems, we present
technical efficiency estimates of the first ten firms in table 9. From this table, we note that one
cannot predict the direction of the bias as the changes seems to be random. To explore this more,
and to get an idea about how severe the problem could be, we ranked all the firms in the sample
by their MLE based technical efficiency estimates. Then, as the reliability ratio was decreased, it
was found that the relative ranking of the firms changed. For the reliability ratio 0.98 the
maximum rank change was 23 on the upper side and 19 on the lower side. In addition, 50
percent of changes in ranks were between ± 2. For the lowest feasible reliability ratio of 0.86,
50% of the rank changes were within ± 15. For this particular reliability ratio the maximum rank
change was 132 on the upper side and 131 on the lower side! In percentage terms the maximal
change in firm level efficiency was 22% on the up side and 14% on the down side. This is an
important outcome since the technical efficiency estimate tells us what percentage of “frontier
output” the firm is delivering. This precludes the researcher, using MLE method under
measurement errors, from establishing a comparative efficiency rankings of the firms in the
sample as evident from the EIV estimate36.
36 In fact, we tested whether the changes in rankings was predictable (non-random) or not by running an AR(1)
regression on a given firm’s efficiency estimates for different reliability ratio assumption. It was observed that 90%
of the auto-regressive coefficients were above 0.95 , with at least 50% of them being at or above 1, making the rank
26
5. Summary and Conclusions
This paper investigates the impact measurement errors in inputs have on estimates of
production function parameters and firm-specific technical efficiency estimates in a cross-
sectional SFPF setting. We first develop the methodology for estimating the standard cross
sectional SFPF with measurement errors by using Fuller’s reliability ratio concept. Next, our
numerical simulation results show that the estimates (elasticity parameters) of the deterministic
frontier, the distribution of the stochastic part of the frontier and the distribution of the technical
inefficiency are very sensitive to the degree of measurement error. Our simulation results
indicate that MLE will bias the elasticity coefficient estimates, and consequently the returns to
scale feature. These biases are quite severe and clearly demonstrate the need for a method that
consistently estimates the production function parameters for even small degree of measurement
errors. The simulation exercise also shows that while MLE overestimates the variance of the
composite error term, it underestimates the skewness parameter with the result that the variance
of the technical efficiency parameter is consistently estimated. Although the mean level of
technical efficiency or average sample efficiency is unaffected by the presence of measurement
errors, the firm-specific estimate of technical efficiency will be seriously biased as it depends
upon the estimated skewness parameter. Additionally, we also develop theoretical bounds
regarding the possible values for the reliability ratios given the data summary statistics. These
bounds are extremely useful for a researcher in a practical setting when he/she is analyzing the
sensitivity of parameter estimates to the varying degree of belief regarding measurement errors.
changes to be very much a random outcome. A proper unit root test on these coefficients, although desirable could
not be conducted as only 8 observations exist for each firm, which is not enough to test for the presence of unit root.
27
Next, a practical applicability of the reliability ratio estimator developed in this paper was
demonstrated by applying it to actual firm level data from the U.S. industrial sector. For this
data set issues regarding returns to scale feature, elasticity coefficients and firm-specific
technical efficiency were explored in detail. Here we demonstrated how the relative ranking of
the firms, by their technical efficiency estimates, changed when the degree of measurement
errors was increased. Most importantly this change in ranking appeared to be random and not
related to the change in the degree of measurement error. In addition the percent change in the
firm-specific technical efficiency levels from its MLE estimate was also quite severe when the
degree of measurement error increased. This exercise has serious implications for economic
researchers who are engaged in inter-firm or inter-industry comparisons as ignoring
measurement errors and relying solely on simple MLE estimates will most likely lead to
erroneous efficiency comparisons.
The analysis in this paper has been undertaken for cross-sectional SFPF model with
Cobb-Douglas production structure that in many respects is very simplistic. Consequently,
practical issues such as analyzing technical change over time, and evolution of a firm’s
efficiency levels that requires a more general production structure (for e.g. Translog) in a panel
setting are a subject matter of future research.
28
References
Aigner, D.J., C.A.K. Lovell, and P. Schmidt, 1977, Formulation and estimation of stochastic
frontier production function models, Journal of Econometrics 6, 21-37.
Battese, G.E.and T.J. Coelli, 1988, Prediction of firm-level technical efficiencies with a
generalized frontier production function and panel data, Journal of Econometrics 38, 387-399.
Battese, G.E.and T.J. Coelli, 1992, Frontier production functions, technical efficiency and panel
data: with application to paddy farmers in india, Journal of Productivity Analysis 3, 153-169.
Bauer, P.W., 1990, Recent developments in the econometric estimation of frontiers, Journal of
Econometrics 46, 39-56.
Bauer, P.W., 1990, Decomposing TFP growth in the presence of cost in efficiency, non-constant
returns to scale, and technological progress, Journal of Productivity Analysis 1, 287-299.
Beckers, D. and C. Hammond , 1987, A tractable likelihood function for the normal-gamma
stochastic frontier model, Economics Letters 24, 33-38.
Dhawan, R., 1997, Asymmetric information and debt financing: The empirical importance of size
and balance sheet factors, International Journal of the Economics of Business 4, 189-202.
Dhawan, R. and G. Gerdes, 1997, Estimating technological change using a stochastic frontier
production function framework: evidence from U.S.. firm-level data, Journal of Productivity
Analysis 8, 431-446.
Edgerton, D. and P. Jochumzen, 1999, Estimation in binary choice models with measurement
errors, mimeo, Lund University.
Fuller, W.A., 1987, Measurement Error Models, Wiley, New York.
Gertler, M.L. and R.G. Hubbard, 1988, Financial factors and business fluctuations, in Financial
Market Volatitlity, Federal Reserve Bank of Kansas City.
Greene, W.H., 1993, The econometric approach to efficiency analysis, in H. Fried, C.A.K. Lovell
and S. Schmidt (eds), The Measurement of Productive Efficiency: Techniques and Applications,
Oxford University Press, Oxford.
Greene, W.H., 1997a, Frontier production functions, in M. Hashem Pesaran and P. Schmidt (eds),
Handbook of Applied Econometrics, Vol II: Microeconomics, Balckwell Publishers, Massachusetts.
Greene, W.H., 1997b, Econometric Analysis, Prentice Hall, New Jersey.
Jondrow, J., C.A.K. Lovell, I.S. Materov, and P. Schmidt, 1982, On the estimation of technical
inefficiency in the stochastic frontier production function model, Journal of Econometrics 19,
233-238.
Klepper, S. and E. E. Leamer, 1984, Consistent sets of estimates for regressions with errors in
all variables, Econometrica 52, 163-83.
Kumbhakar, S.C., 1987, Production frontiers and panel data: An application to U.S. class 1
railroads, Journal of Business and Economic Statistics 5, 249-255.
Kumbhakar, S.C., 1988, On the estimation of technical and allocative inefficiency using stochastic
frontier functions: The case of U.S. class 1 railroads, International Economic Review 29, 727-743.
Kumbhakar, S.C. and L. Hjalmarsson, 1995, Labor-use efficiency in Swedish social insurance
offices, Journal of Applied Econometrics 10, 33-47.
29
Meeusen, W.J. and van der Broeck, 1977, Efficiency estimation from cobb-douglas production
functions with composed error, International Economic Review 18, 435-444.
Murphy K. and R. Topel, 1985, Estimation and Inference in Two Step Econometric Models, Journal
of Business and Economic Statistics 3, 370-379.
Pal, M., Neogi, C. and B. Ghosh , 1998, Estimation of frontier production function with errors-in-
variables: An illustration from Indian industry, in Sr. Chakravarty, D. Coondoo and R. Mukherjee
(eds.) Quantitative Economics: Theory and Practice, Allied Publishers Limited, New Delhi, India.
Stevenson, R., (1980), Likelihood functions for generalized stochastic frontier estimation, Journal of
Econometrics 13, 58-66.
Weinstein, M., 1964, The sum of values from a normal and a truncated normal distribution,
Technometrics 6, 104-105.
30
Table 1. Traditional Maximum Likelihood Estimates of SFPF for Simulated Data
αγ1γ2σ2λσξ2σε2
True Value* 1.700 0.6000 0.1000 1.000 2.000 0.8000 0.2000
Traditional Maximum Likelihood Estimates
)
π
=1.00 1.711
(0.07)
0.5998
(0.02)
0.1030
(0.02)
1.013
(0.13)
2.132
(0.39)
0.8242
(0.15)
0.1883
(0.04)
)
π
=0.90 1.691
(0.08)
0.5376
(0.02)
0.09289
(0.02)
1.055
(0.14)
1.757
(0.36)
0.7903
(0.18)
0.2644
(0.05)
)
π
=0.80 1.685
(0.09)
0.4778
(0.02)
0.08143
(0.03)
1.100
(0.14)
1.606
(0.35)
0.7848
(0.19)
0.3154
(0.06)
)
π
=0.70 1.676
(0.12)
0.4175
(0.02)
0.06741
(0.03)
1.121
(0.16)
1.494
(0.39)
0.7641
(0.22)
0.3568
(0.07)
EIV Method Maximum Likelihood Estimates
)
π
=1.00 1.711
(0.07)
0.5998
(0.02)
0.1030
(0.02)
1.013
(0.13)
2.132
(0.39)
0.8242
(0.15)
0.1883
(0.04)
)
π
=0.90 1.691
(0.08)
0.5973
(0.03)
0.1032
(0.02)
0.9890
(0.14)
2.058
(0.49)
0.7903
(0.18)
0.1987
(0.05)
)
π
=0.80 1.685
(0.09)
0.5972
(0.03)
0.1018
(0.03)
0.9833
(0.14)
2.095
(0.62)
0.7848
(0.19)
0.1985
(0.06)
)
π
=0.70 1.676
(0.12)
0.5964
(0.03)
0.0963
(0.04)
0.9681
(0.17)
2.116
(0.85)
0.7641
(0.22)
0.2040
(0.08)
* The data was simulated from the model yi = α + xiγ + εi - ξi with zi = xi + ui. xi ∼ N(0, 2
)
π
), ui ∼ N(0, 2(1-
)
π
))
where
)
π
is the common reliability ratio of log of labor, log of capital (and thus of log capital by labor).
)
π
is varied
in the table and ε ∼ N(0, 0.2) and ξ ∼ N+(0, 0.8) . The standard errors are reported in parentheses.
Table 2. Mean Technical Efficiency And Reliability Ratio
Reliability ratio Actual Value MLE Estimate EIV Estimate
1.00 0.5536 0.5496 0.5496
0.90 0.5536 0.5553 0.5553
0.80 0.5536 0.5562 0.5562
0.70 0.5536 0.5598 0.5598
31
Table 3. Comparing Firm-Specific Technical Efficiency Estimates
Reliability ratio Average absolute
deviation between
EIV and true value
Average absolute
deviation between
MLE and true value
Percentage won by
EIV
1.00* N/A N/A N/A
0.90 0.0542
(1.8×10-3)
0.0755
(1.9×10-3)
94.47%
(1.0%)
0.80 0.0463
(1.6×10-3)
0.125
(3.1×10-3)
99.27%
(0.4%)
0.70 0.0406
(1.4×10-3)
0.161
(2.5×10-2)
99.72%
(0.2%)
* For a reliability ratio of 1, MLE and EIV will produce exactly the same estimates and the formulas for expected
value of the conditional technical efficiencies will coincide. N/A implies not applicable here. The standard errors
are reported in parentheses.
Table 4a. Transformed and Non-Transformed Data Variable Means
ln(Y)ln(K)ln(L)Y = ln(Y/L)Z1 = ln(K/L) Z2 = ln(L)
Mean: 5.55 6.08 1.15 4.40 4.93 1.15
Table 4b. Untransformed Data Variance and Covariance Matrix
ln(Y)ln(K)ln(L)
Ln(Y)8.23 7.17 6.86
Ln(K)7.17 8.14 6.86
Ln(L)6.86 6.86 6.90
Table 4c. Transformed Data Variance and Covariance Matrix
Y= ln(Y/L)Z1= ln(K/L)Z2= ln(L)
Y0.78 0.76 0.26
z10.76 1.31 -0.03
z20.26 -0.03 6.90
32
Table 5. SFPF Parameter Estimates: OLS, MLE And EIV*
αγ1γ2σ2λ
OLS 1.5036
(0.33)
0.5781
(0.07)
0.04132
(0.03)
N/A N/A
MLE 1.9195
(0.10)
0.6261
(0.02)
0.00071
(0.01)
0.7146
(0.07)
2.7072
(0.37)
EIV
)
π
=0.98
1.8565
(0.10)
0.6389
(0.02)
0.00073
(0.01)
0.7041
(0.07)
2.8904
(0.46)
EIV
)
π
=0.96
1.7910
(0.10)
0.6522
(0.02)
0.00074
(0.01)
0.6931
(0.07)
3.1270
(0.54)
EIV
)
π
=0.94
1.7226
(0.11)
0.6661
(0.02)
0.00076
(0.01)
0.6817
(0.06)
3.4486
(0.68)
EIV
)
π
=0.92
1.6512
(0.11)
0.6806
(0.02)
0.00078
(0.01)
0.6698
(0.06)
3.9188
(0.99)
EIV
)
π
=0.90
1.5767
(0.11)
0.6957
(0.02)
0.00079
(0.01)
0.6573
(0.06)
4.6979
(1.66)
EIV
)
π
=0.88
1.499
(0.12)
0.7115
(0.03)
0.00081
(0.01)
0.6443
(0.07)
6.376
(4.19)
EIV
)
π
=0.86
1.417
(0.14)
0.7280
(0.03)
0.00083
(0.01)
0.6307
(0.07)
18.51
(169)
* The standard errors are in parentheses and N/A means not applicable.
Table 6. Basic Production Structure Estimates
βKβLσξ2σε2
OLS 0.5781 0.4351 N/A N/A
MLE 0.6261 0.3746 0.6288 0.0858
EIV
)
π
=0.98 0.6389 0.3618 0.6288 0.0753
EIV
)
π
=0.96 0.6522 0.3485 0.6288 0.0643
EIV
)
π
=0.94 0.6661 0.3347 0.6288 0.0529
EIV
)
π
=0.92 0.6806 0.3203 0.6289 0.0409
EIV
)
π
=0.90 0.6957 0.3051 0.6288 0.0285
EIV
)
π
=0.88 0.7115 0.2893 0.6288 0.0155
EIV
)
π
=0.86 0.7280 0.2728 0.6289 0.00184
33
Table 7. Average Technical Efficiency For the Sample
MLE (
)
π
=1)
)
π
=0.98
)
π
=0.96
)
π
=0.94
)
π
=0.92
)
π
=0.90
)
π
=0.88
)
π
=0.86
Mean 0.6008 0.6008 0.6008 0.6008 0.6008 0.6012 0.6068 0.6180
Table 8. Predicted Firm Efficiency of the First 10 Firms
MLE(
)
π
=1)
)
π
=0.98
)
π
=0.96
)
π
=0.94
)
π
=0.92
)
π
=0.90
)
π
=0.88
)
π
=0.86
Firm 1 0.753 0.759 0.765 0.771 0.777 0.783 0.788 0.793
Firm 2 0.487 0.487 0.487 0.487 0.488 0.489 0.489 0.492
Firm 3 0.885 0.886 0.888 0.890 0.891 0.891 0.891 0.891
Firm 4 0.933 0.933 0.932 0.931 0.930 0.928 0.926 0.923
Firm 5 0.810 0.814 0.818 0.822 0.826 0.829 0.832 0.834
Firm 6 0.827 0.829 0.832 0.834 0.836 0.838 0.839 0.840
Firm 7 0.281 0.275 0.271 0.266 0.262 0.258 0.250 0.250
Firm 8 0.615 0.623 0.630 0.638 0.647 0.655 0.664 0.673
Firm 9 0.714 0..712 0.710 0.707 0.704 0.701 0.697 0.694
Firm 10 0.596 0.595 0.594 0.593 0.592 0.592 0.591 0.591