Content uploaded by Cun-Hui Zhang
Author content
All content in this area was uploaded by Cun-Hui Zhang on Jan 13, 2014
Content may be subject to copyright.
IMS Collections
Borrowing Strength: Theory Powering Applications –
A Festschrift for Lawrence D. Brown
Vol. 6 (2010) 272–282
c
Institute of Mathematical Statistics, 2010
DOI: 10.1214/10-IMSCOLL618
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Empirical Bayes in-season prediction of
baseball batting averages
Wenhua Jiang1and Cun-Hui Zhang2,∗
National Heart, Lung, and Blood Institute and Rutgers University
Abstract: The performance of a number of empirical Bayes methods are
examined for the in-season prediction of batting averages with the 2005 Major
League baseball data. Among the methodologies considered are new general
empirical Bayes estimators in homoscedastic and heteroscedastic partial linear
models.
1. Introduction
This paper is motivated by our desire to extend advances in empirical Bayes to more
general settings, including homoscedastic and heteroscedastic partial linear models,
and by Brown’s [3] field test of empirical Bayes with the 2005 Major League baseball
data.
The main thrust of empirical Bayes is the possibility of substantial reduction of
a compound risk in a collection of statistical decision problems by making decisions
in each problem with observations from all problems to be combined [14, 15, 17,
10, 1, 2, 18, 19, 7, 8, 21]. In the classic compound decision theory, the significance
of this phenomenon is proven for similar decision problems involving independent
observations and different unknown parameters. Baseball is a favorite example for
empirical Bayes since it raises problems involving many players in well understood
statistical models for counting data.
Before the benefits and importance of empirical Bayes became well understood,
there were heated debates about its usefulness, especially about what collection of
decision problems should be combined in actual statistical practice. In the midst
of such a debate, Efron and Morris [8] used predicting the baseball batting average
(ratio of base hits to the number of times at bat for an individual player) as a
primary example of a proper collection of decision problems to combine. In that
example, the James-Stein estimator is applied to the arcsin-root transformation
of the batting averages of 14 players for the first 45 at bats in the 1970 season
to predict their batting averages in the rest of the season. In fact, the 14 players
were selected as Roberto Clemente and all those having the same number of 45
at bats as him according to a certain 1970 issue of the New York Times. The
selection of Roberto Clemente, one of the most successful hitters in the preceding
∗Supported in part by NSF Grants DMS 0804626, DMS 0906420 and NSA Grant H98230-09-
1-0006.
1Office of Biostatistics, DECA, National Heart, Lung, and Blood Institute, 2 Rockledge Center,
Bethesda, Maryland, USA, e-mail: jiangw3@nhlbi.nih.gov
2Department of Statistics and Biostatistics, Rutgers University, Hill Center, Busch Campus,
Piscataway, NJ 08854, USA, e-mail: czhang@stat.rutgers.edu
Keywords and phrases: empirical Bayes, compound decisions, partial linear model, nonpara-
metric estimation, semiparametric estimation, sports, hitting, batting average.
AMS 2000 subject classifications: Primary 62J05, 62J07; secondary 62H12, 62H25.
272
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
Empirical Bayes prediction of batting averages 273
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
seasons, was designed “to embarrass the Jame-Stein rule” [8], since the method
does not favor heterogeneity with the unknown means. Incidentally, the selection
of these 14 players also provided homoscedasticity since the variance of the arcsin-
root transformed success rate in Bernoulli trials is approximately a quarter of the
reciprocal of the number of trials.
Brown [3] examined the performance of seven statistical methods for the in-
season prediction of batting averages with the 2005 Major League data. He used
the data from all players with at least 11 at bats in the first half season to predict
the batting averages in the second half season for all players with at lease 11 at bats
in both half seasons. This involves about 500 players. Since different players have
different at bats in the first half season, he raised an interesting heteroscedastic
compound decision problem and studied extensions of empirical Bayes methods for
such data. He observed a moderate correlation in the 2005 data between at bats
and batting averages, and significantly improved the performance of empirical Bayes
predictors by applying them separately to the groups of pitchers and nonpitchers.
Brown’s [3] results motivated us to consider a heteroscedastic partial linear
model. In this model, the unknown means are partially explained by a certain
linear effect but not completely, and the errors are normal variables with zero mean
and different known variances. In the baseball application, the linear component
could include pitcher-nonpitcher group and at bats effects. We extend a number of
empirical Bayes methods to this heteroscedastic partial linear model and examine
their performance with the 2005 baseball data.
The rest of the paper is organized in four sections to cover data description
and performance measures, partial linear models, extensions of empirical Bayes
methods, and experiments with the 2005 baseball data.
2. Data description and prediction loss functions
We shall keep most of Brown’s [3] notation and consider the same set of loss func-
tions for the prediction of batting averages.
The 2005 baseball data in [3] provide the name, status as a pitcher or not, at
bats for the entire 2005 regular season and for the months April through September,
and monthly total base hits for each Major League player. Post season records are
excluded and the records of a few October games are merges into September. For
convenience, we treat April, May and June as the first half season and the rest of
the regular season as the second half [3].
Let Sdenote the set of all players. For each i∈S,letNji and Hji denote the
number of at bats and number of hits for the first (second) half season for j=1
(j= 2). The corresponding batting averages are then
Rji =Hji/Nji.(1)
Let Sj={i:Nji ≥11}be the sets of all players with at least 11 at bats for each
half season. We consider the prediction of {R2i,i ∈S
1∩S
2}based on the data
{Δi,N
1i,H
1i,i ∈S
1},whereΔ
i=1ifthei-th player is a pitcher and Δi=0
otherwise.
A reasonable model for the data assumes that the number of hits H1iand H2i
are conditionally independent with the binomial distribution
Hji|(Δi,N
ji,p
i)∼Bin(Nji,p
i),j=1,2,(2)
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
274 W. Jiang and C.-H. Zhang
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
where piis the batting probability of the i-th player. The standard variance sta-
bilizing transformation for H∼Bin(N,p)isarcsin
H/N. Brown [3] suggested a
finer version of this arcsin-root transformation as
X=arcsinH+1/4
N+1/2≈Narcsin √p, 1
4N
to minimize the approximation error for the mean to the order of N−2. Under the
binomial model (2), the batting averages are transformed into
Xji =arcsinHji +1/4
Nji +1/2≈Nθi,σ
2
ji,(3)
where θi=arcsin√piand σ2
ji =1/(4Nji).
Let R1=(R1i,i ∈S
1)and R2=(R2i,i ∈S
1∩S
2)with Sj={i:Nji ≥11}.
Define vectors Δ,Xj,θjand σj=1/4Njanalogously via (3). The problem
is to predict R2or X2based on (Δ,X1,σ1). Thus, a predictor is a Borel map
δ(Δ,X1,σ1)∈RS2.
The data X2and σ2are used to validate the performance of predictors. As in
[3], we use error measures
TSE∗[δ]=
TSE[δ]
TSE[δ0]with
TSE[δ]=
i∈S1∩S2(X2i−δi)2−σ2
2i
and its weighted version
TWSE∗[δ]=
TWSE[δ]
TWSE[δ0]with
TWSE[δ]=
i∈S1∩S2
(X2i−δi)2−σ2
2i
4σ2
1i
.
These error measures compare predictors δof X2with the naive δ0=(X1i,i ∈
S1∩S
2). They can be viewed as approximations for the (weighted) MSE for the
estimation of θ,since
E
TSE[δ]=E
i∈S1∩S2
(δi−θi)2,E
TWSE[δ]=E
i∈S1∩S2
(δi−θi)2
4σ2
1i
,
in the normal model. For the prediction of R2we use the error measure
TSE∗
R[
R]=
TSER[
R]
TSE
R[
R0],
where
R=(
Ri,i ∈S
1∩S
2)with
Ri=sin
2(δi) for any predictor δof X2,
TSER[
R]=i∈S1∩S2(
R2i−R2i)2−R2i(1 −R2i)/N2iand
R0=sin
2(δ0).
3. A partial linear model
Let y=(y1,...,y
n)be a response vector and Z=(z1,...,zn)=(zij )n×pbe a
covariate matrix. In a partial linear model, the mean of yis partially explained by
a linear function of Zbut not completely. This can be written as
y=Zβ +ξ+ε,(4)
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
Empirical Bayes prediction of batting averages 275
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
where β=(β1,...,β
p)is a vector of unknown deterministic regression coefficients,
ξ=(ξ1,...,ξ
n) is an unknown vector not dependent on known covariates, and
ε=(ε1,...,ε
n)is a vector of independent random errors with Eεi=0and
Var(εi)=σ2
i. The partial linear model is homoscedastic if σiare all equal and
heteroscedastic otherwise. In the homoscedastic model, the common variance σ2can
be estimated with the data in (4) if ξiare known to be sparse, e.g. by the σin (11).
In the heteroscedastic model, the estimation of ξrequires known σ=(σ1,...,σ
n)
or the availability of consistent or χ2type estimates of σ2
i.
The partial linear model (4) is different from the partial (semi, semiparametric)
linear regression model in which ξiis assumed to be a smooth (possibly nonlinear)
function of some additional covariates.
From a likelihood point of view, the linear effect Zβ in (4) may not help since a
sum of the linear effect and a general unknown effect is still a general unknown effect.
However, after removing the linear effect, the benefits of the compound estimation
of ξcould be far greater than that of the direct compound estimation of the mean of
y. This is also clear from an empirical Bayes point of view since the sum of Zβ and
a vector ξwith i.i.d. components is typically not a vector with i.i.d. components.
The partial linear model is closely related to the parametric random effects model
yik =βzik +ξi+εik,k=1,...,K
i,(5)
where ξiare assumed to be i.i.d. variables from a distribution depending on an
unknown parameter τ(e.g. N(0,τ2)). Existing work on random effects models typ-
ically focuses on the estimation of the parameters βand τin the case where the
identifiability of τis ensured by the multiplicity Ki>1. Parametric statistical
inference for {β,τ}and the noise level in (5) is well understood.
For the baseball data, (4) provides an interpretation of the data in (3) with
X1=y,θ=Zβ +ξ,1
4N1
=σ2,|S1|=n.(6)
We use the least squares method (LSE)
θ=(
n
i=1 ziz
i)−1n
i=1 ziyi(heteroscedas-
ticity ignored) and weighted least squares estimator (WLSE)
θ=(
n
i=1 ziz
i/
σ2
i)−1n
i=1 ziyi/σ2
ito carry out a preliminary examination of possible choices of
Zfor the linear component. Brown’s [3] results suggest a strong group effect with
Δ(Pitcher) and a moderate at bats N1(AB) effect. This is confirmed in Table 1.
It turns out that the regression analysis exhibits no Pitcher-AB interaction. The
scatterplots in Figure 1 demonstrate a moderate AB effect for nonpitchers and a
very small AB effect for pitchers. Since most data points are within ±1.96σifrom
the regression line, the model for the noise level σ2
i=1/(4N1i) explains a vast
majority of the residuals in the regression analysis, suggesting a sparse ξ.
Tab l e 1
Squared multiple correlation coefficients, R2
Models LSE WLSE
Pitcher 0.256 0.190
AB 0.247 0.208
Pitcher+AB 0.342 0.292
Pitcher*AB 0.342 0.293
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
276 W. Jiang and C.-H. Zhang
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Fig 1. Scatterplots of the transformed batting average (X1)vsatbats(N1) with weighted least
squares reg ression li ne ±1.961/(4N1);R2=0.151 for nonpitchers and R2=0.009 for pitchers.
4. Empirical Bayes methods in partial linear models
We consider in separate subsections the general (nonparametric) empirical Bayes,
parametric (linear) empirical Bayes, and the James-Stein methods.
4.1. General empirical Bayes
Suppose that for a deterministic or given unknown vector θ=(θ1,...,θ
n),the
observation y=(y1,...,y
n)is distributed as independent yi∼f(y|θi)ν(dy)with
aknownf(·|·). Given a loss function L(a, θ) for the statistical inference about θi,
the compound risk for a statistical procedure
θ=(
θ1,...,
θn)is
Eθ
n
i=1
L(
θi,θ
i)
n.(7)
For any separable statistical procedure of the form
θi=t(yi), the compound risk
is identical to the Bayes risk
Eθ
n
i=1
L(
θi,θ
i)
n=L(t(y),θ)f(y|θ)ν(dy)dGn(θ)(8)
with respect to the unknown prior Gn(A)=n
i=1 I{θi∈A}/n.Thus,theriskof
separable rules is minimized at the ideal Bayes rule
t∗
Gn(y)=argmin
aL(a, θ)f(y|θ)dGn(θ).(9)
In the original formulation of Robbins [14, 15], empirical Bayes seeks statistical pro-
cedures which approximate the ideal Bayes rule t∗
Gn(·) or its performance. Robbins
[16] referred to his approach as general empirical Bayes.
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
Empirical Bayes prediction of batting averages 277
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
From a methodological point of view, there is minimum difference in the general
empirical Bayes approach between the cases of deterministic θ(the compound set-
ting) and i.i.d. θi∼G(the empirical Bayes setting), since one seeks an approxima-
tion of a (nominal) Bayes rule t∗
Gfor some (nearly) completely unknown Ganyway.
However, theoretical results in the compound setting typically have broader impact
(e.g. on minimaxity and adaptive estimation). The use of (9) as a benchmark, orig-
inally designed to “beat” standard (minimax, invariant, or maximum likelihood)
procedures, has been further justified via the minimax theory for the estimation of
sparse vectors [6] and the approximate risk equivalence between exchangeable and
separable rules [9].
Jiang and Zhang [11] developed and studied a general maximum likelihood em-
pirical Bayes (GMLEB) method for the approximation of t∗
Gnin (9):
θ=t∗
G(y),
G=argmax
G∈G
n
i=1 f(yi|θ)G(dθ),(10)
where Gis the class of all distribution functions. This procedure estimates t∗
Gnwith
the generalized (nonparametric) maximum likelihood method [12, 22]. Different
kernel estimates of the ideal Bayes rule for the estimation of θhave been developed
in [20] and [5].
Nothing in the general empirical Bayes formulation prevents adding a parameter
βas long as the link (8) between the compound and empirical Bayes settings holds
and the corresponding ideal Bayes rule t∗
Gn,βcan be estimated.
In the homoscedastic partial linear model (4) where ε∼N(0,σ
2In), the GMLEB
can be directly extended with
θi=t∗
G,σyi−
βzi,t
∗
G,σ(y)=argmin
aL(a, θ)ϕ(y/σ)dGn(θ),(11)
where σ=σfor known σand σ=median(|yi−
βzi|)/median(|N(0,1)|) otherwise,
ϕ(x)=e−x2/2/√2πis the N(0,1) density and
{
β,
G}=argmax
β,G∈G
n
i=1 ϕyi−βzi−ξ
σG(dξ).(12)
It follows from (8) that for known {β,σ}, the compound risk for rules of the form
t(y−Zβ) is minimized by the ideal Bayes rule t∗
Gn,σ(y−Zβ)withGn(A)=
n
i=1 I{ξi∈A}/n.Thus,forknownσ, (11) simply replaces the unknown {β,G
n}
in the ideal Bayes rule with their joint generalized maximum likelihood estimator.
For unknown σ, we compute {
β,
G, σ}by iterating the estimating equations for β,
Gnand σ.
The direct link (8) between the compound and empirical Bayes settings breaks
down in the heteroscedastic partial linear model with known σi, since it is unrea-
sonable to use a prior to mix known quantities σi. However, the GMLEB still makes
sense in the empirical Bayes setting where ξiare (nominally) treated as i.i.d. vari-
ables from an unknown distribution G.Forεi∼N(0,σ
2
i) in (4) and the squared
loss L(a, θ)=(a−θ)2,
θi=
βzi+ξϕ(yi−
βzi−ξ)/σi
G(dξ)
ϕ(yi−
βzi−ξ)/σi
G(dξ)
,(13)
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
278 W. Jiang and C.-H. Zhang
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
where
βand
Gcould be solved by iterating
⎧
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎩
β=argmax
b
n
i=1 σ−1
iϕ((yi−bzi−ξ)/σi)
G(dξ),
G=argmax
G∈G
n
i=1 σ−1
iϕ((yi−
βzi−ξ)/σi)G(dξ).
(14)
Another general empirical Bayes method in the heteroscedastic partial linear
model first rescales the data to the unit variance and then applies the GMLEB (11)
in the homoscedastic partial linear model. This effectively treats Gas a nominal
prior for ζi=ξi/σi. We call this estimator the weighted general maximum likelihood
empirical Bayes (WGMLEB) since the approach is parallel to the extension of LSE
to WLSE. Explicitly, the WGMLEB is
θi=
βzi+σiζϕ(yi/σi−
βzi/σi−ζ)
G(dζ)
ϕ(yi/σi−
βzi/σi−ζ)
G(dζ)
(15)
for L(a, θ)=(a−θ)2,where
βand
Gcould be solved by iterating
⎧
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎩
β=argmax
b
n
i=1 ϕ(yi/σi−bzi/σi−ζ)
G(dζ),
G=argmax
G∈G
n
i=1 ϕ(yi/σi−
βzi/σi−ζ)G(dζ).
(16)
The WGMLEB maintains the link (8) between the compound estimation and em-
pirical Bayes under the weighted compound risk Eθn
i=1(
θi−θi)2/(σ2
in).
4.2. Parametric empirical Bayes
A parametric empirical Bayes method, as defined in[13], approximates
t∗
τ(y)=argmin
aL(a, θ)f(y|θ)dG(θ|τ),(17)
where τis an unknown parameter and G(·|·) is a known family of distributions.
Brown [3] extended parametric empirical Bayes methods to the heteroscedastic
model
yi=ξi+εi,ξ
i∼N(μ, τ ),ε
i∼N(0,σ
2
i)
for the estimation of ξiunder the squared loss. Specifically, he considered the max-
imum likelihood (ML) or method of moments (MM) estimates of (μ, τ ). Direct
extension of these methods to the partial linear model yields
θi=
βzi+τ2(yi−
βzi)
σ2
i+τ2,(18)
where
βand τare computed by iteratively solving
⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
β=n
i=1 ziz
i/(τ2+σ2
i)−1n
i=1 ziyi/(τ2+σ2
i),
n
i=1(yi−
βzi)2/(σ2
i+τ2)2=n
i=1 1/(σ2
i+τ2)(ML),
τ2=n
i=1(yi−
βzi)2/(n−p)−n
i=1 σ2
i/n+
(MM).
(19)
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
Empirical Bayes prediction of batting averages 279
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
4.3. The James-Stein estimator
A distinct feature of the James-Stein estimator is its dominance of a standard (e.g.
minimum risk unbiased or invariant) estimator under the squared loss, although it
has an empirical Bayes interpretation [7, 8]. In this spirit, the name James-Stein
estimator is typically reserved for its extensions with this dominance feature to
explicitly allow their applications to small samples.
In the homoscedastic partial linear model (4) with εi∼N(0,σ
2), one way of
achieving this dominance is to apply the James-Stein estimator separately to the
projection of yto the linear span of the columns of Zand its orthogonal comple-
ment.
In the heteroscedastic partial linear model (4) with known σi, one may first scale
to unit variance and then apply the James-Stein estimator in the homoscedastic
partial linear model to achieve dominance of the naive estimator yunder the σ−1
i
weighted mean squared error. This was done in [3] for the common mean model
(zi=1,∀i). In partial linear models with general zi∈Rp, this James-Stein
estimator is
(20) θ=1−p−2
n
i=1(
βzi)2/σ2
i+
Z
β+1−n−p−2
n
i=1(yi−
βzi)2/σ2
i+(y−Z
β)
with the WLSE
β={n
i=1 ziz
i/σ2
i}−1n
i=1 ziyi/σ2
i. Brown and Zhao [5] pro-
posed further shrinkage of (20) to reduce the weighted mean squared error.
5. Empirical Bayes prediction with the baseball data
We report applications of the GMLEB in (13)-(14), WGMLEB in (15)-(16) and the
Jame-Stein estimator in (20) to the 2005 baseball data described in Section 2, and
compare them with the LSE, WLSE and Brown’s [3] results. The data map is given
in (6) and the predictors are
X2i=
θi,
R2i=sin
2
θi,i∈S
1∩S
2,
for any estimators under consideration. The models always include the intercept
and are given in parentheses, with (Pitcher*AB) as the model (Pitcher + AB +
interaction). For example, James-Stein(Null) means the estimator (20) with zi=1
as in Brown [3]. Since, pis small and nis large in all cases, the shrinkage of the
regression component Z
βand positive part in (20) were not implemented with the
James-Stein estimator in our experiments.
Table 2 reports results based on data from all players with at least 11 at bats
in the first half season for the prediction of the batting averages of all players
with at least 11 at bats in both half seasons. Brown’s [3] results are included in
the first block of rows as the parametric EB(MM) and EB(ML) in (18)-(19) with
zi= 1, a modified nonparametric empirical Bayes estimator (NPEB) of Brown
and Greenshtein’s [4], and a Bayes estimator with a harmonic prior (HP; [18]). We
refer to Brown [3] for detailed description and further discussion of these methods.
In the next 5 blocks of rows, we report results in the (Null), (AB), (Pitcher),
(AB+Picher) and (AB*Pitcher) models for the LSE, WLSE, James-Stein, GMLEB
and WGMLEB methods.
Table 3 reports results for separate applications of the predictors to the non-
pitcher and pitcher groups. Again, Brown’s [3] results are included in the first
block of rows, followed by our results in the (Null) and (AB) models.
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
280 W. Jiang and C.-H. Zhang
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Tab l e 2
Midseason prediction for all batters (|S1|,|S1∩S
2|) = (567,499)
TSE
∗
TSE
∗
R
TWSE
∗
Naive 1 1 1
EB(MM) 0.593 0.606 0.626
EB(ML) 0.902 0.925 0.607
NPEB 0.508 0.509 0.560
HP 0.884 0.905 0.600
LSE(Null) 0.853 0.897 1.116
WLSE(Null) 1.074 1.129 0.742
James-Stein(Null) 0.535 0.540 0.502
GMLEB(Null) 0.663 0.671 0.547
WGMLEB(Null) 0.306 0.298 0.427
LSE(AB) 0.518 0.535 0.686
WLSE(AB) 0.537 0.527 0.545
James-Stein(AB) 0.370 0.352 0.443
GMLEB(AB) 0.410 0.397 0.455
WGMLEB(AB) 0.301 0.291 0.424
LSE(Pitcher) 0.272 0.283 0.559
WLSE(Pitcher) 0.324 0.343 0.519
James-Stein(Pitcher) 0.243 0.244 0.427
GMLEB(Pitcher) 0.259 0.266 0.429
WGMLEB(Pitcher) 0.208 0.204 0.401
LSE(Pitcher+AB) 0.242 0.246 0.477
WLSE(Pitcher+AB) 0.219 0.215 0.435
James-Stein(Pitcher+AB) 0.184 0.175 0.391
GMLEB(Pitcher+AB) 0.191 0.183 0.387
WGMLEB(Pitcher+AB) 0.184 0.175 0.385
LSE(Pitcher*AB) 0.240 0.244 0.476
WLSE(Pitcher*AB) 0.204 0.201 0.429
James-Stein(Pitcher*AB) 0.171 0.162 0.386
GMLEB(Pitcher*AB) 0.178 0.170 0.382
WGMLEB(Pitcher*AB) 0.177 0.167 0.382
Tab l e 3
Midseason prediction for nonpitchers and pitchers (|S1|,|S1∩S
2|) = (486,435) for nonpitchers
(|S1|,|S1∩S
2|)=(81,64) for pitchers
Nonpitchers Pitchers
widehatTS E∗
TWSE
∗
TSE
∗
TWSE
∗
Naive 1 1 1 1
EB(MM) 0.387 0.494 0.129 0.191
EB(ML) 0.398 0.477 0.117 0.180
NPEB 0.372 0.527 0.212 0.266
HP 0.391 0.473 0.128 0.190
LSE(Null) 0.378 0.606 0.127 0.235
WLSE(Null) 0.468 0.561 0.127 0.234
James-Stein(Null) 0.348 0.469 0.165 0.202
GMLEB(Null) 0.378 0.465 0.134 0.178
WGMLEB(Null) 0.326 0.446 0.173 0.212
LSE(AB) 0.333 0.514 0.115 0.218
WLSE(AB) 0.290 0.465 0.087 0.182
James-Stein(AB) 0.262 0.436 0.142 0.177
GMLEB(AB) 0.257 0.415 0.111 0.154
WGMLEB(AB) 0.261 0.423 0.141 0.178
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
Empirical Bayes prediction of batting averages 281
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Take-away messages
The empirical Bayes in-season prediction results with the 2005 baseball data suggest
the following four messages:
I. Empirical Bayes methods may substantially improve upon the least squares
predictor even when the linear model assumption seems to hold well: Empirical
Bayes predictors outperform lease squares predictors in all models, groups and
loss functions studied with the exception of
TSE∗for pitchers in Table 3. The
exception is probably due to the smaller sample size and the mismatch between the
loss function and the weighted empirical Bayes methods. The regression analysis in
Section 3 exhibits satisfactory fit in both the mean and variance.
II. Empirical Bayes methods may capture a great portion of the effects of missing
covariables in the linear model: The phenomenon is clear in comparisons between
the empirical Bayes predictors in smaller models and the lease squares predictors
in larger models. It may have significant implications in dealing with latent effects
and in semilinear regression when a nonparametric component suffers the curse of
dimensionality.
III. The GMLEB is highly competitive compared with the James-Stein predic-
tor with moderately large samples: In fact, the WGMLEB outperforms the (also
weighted) James-Stein predictor in a great majority of combinations of models,
groups and loss functions.
IV. The heteroscedastic partial linear model is tricky: The unweighted GMLEB
may not handle the correlation between the mean and variance well compared with
the weighted empirical Bayes methods, as the (Null) model in Table 2 demonstrates.
Still, the unweighted GMLEB significantly outperforms both least squares methods
in the case.
References
[1] Brown, L. D. (1966). On the admissibility of invariant estimators of one or
more location parameters. Ann. Math. Statist. 37 1087–1136.
[2] Brown, L. D. (1971). Admissible estimators, recurrent diffusions, and insol-
uble boundary value problems. Ann. Math. Statist. 42 855–903.
[3] Brown, L. D. (2008). In-season prediction of batting averages: A field test of
empirical Bayes and Bayes methodologies. Ann. Apply. Statist. 2113–152.
[4] Brown, L. D. and Greenshtein, E. (2009). Empirical Bayes and compound
decision approaches for estimation of a high dimensional vector of normal
means. Ann. Statist. 37 1685–1704.
[5] Brown, L. D. and Zhao,L.H.(2009). Estimators for Gaussian models
having a block-wise structure. Statist. Sinica 19 885–903.
[6] Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over p-balls
for q-error. Probab. Theory Related Fields 99 277–303.
[7] Efron, B. and Morris, C. (1972). Empirical Bayes on vector observations:
An extension of Stein’s method. Biometrika 59 335–347.
[8] Efron, B. and Morris, C. (1973). Combining possibly related estimation
problems (with discussion). J. Roy. Statist. Soc. Ser. B 35 379–421.
[9] Greenshtein, E. and Ritov, Y. (2009). Asymptotic efficiency of simple
decisions for the compound decision problem. In Optimality: The Third Erich
L. Lehmann Symposium (J. Rojo, ed.). IMS Lecture Notes—Monograph Series
57 266–275.
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
282 W. Jiang and C.-H. Zhang
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
[10] James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc.
Fourth Berkeley Symp. Math. Statist. Probab. 1361–379. Univ. California
Press, Berkeley.
[11] Jiang, W. and Zhang, C.-H. (2009). General maximum likelihood empirical
Bayes estimation of normal means. Ann. Statist. 37 1647–1684.
[12] Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likeli-
hood estimator in the presence of infinitely many incidental parameters. Ann.
Math. Statist. 27 887–906.
[13] Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and
applications. J. Amer. Statist. Assoc. 78 47–55.
[14] Robbins, H. (1951). Asymptotically subminimax solutions of compound sta-
tistical decision problems. In Proc. Second Berkeley Symp. Math. Statist.
Probab. 1131–148. Univ. California Press, Berkeley.
[15] Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc.
Third Berkeley Symp. Math. Statist. Probab. 1157–163. Univ. California Press,
Berkeley.
[16] Robbins, H. (1983). Some thoughts on empirical Bayes estimation. Ann. Sta-
tist. 11 713–723.
[17] Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a
multivariate normal distribution. In Proc. Third Berkeley Symp. Math. Statist.
Probab. 1157–163. Univ. California Press, Berkeley.
[18] Strawderman, W. E. (1971). Proper Bayes estimators of the multivariate
normal mean. Ann. Math. Statist. 42 385–388. MR0397939
[19] Strawderman, W. E. (1973). Proper Bayes minimax estimators of the mul-
tivariate normal mean for the case of common unknown variances. Ann. Math.
Statist. 44 1189–1194. MR0365806
[20] Zhang, C.-H. (1997). Empirical Bayes and compound estimation of normal
means. Statist. Sinica 7181–193.
[21] Zhang, C.-H. (2003). Compound decision theory and empirical Bayes
method. Ann. Statist. 33 379–390.
[22] Zhang, C.-H. (2009). Generalized maximum likelihood estimation of normal
mixture densities. Statist. Sinica 19 1297–1318.
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010