ArticlePDF Available

Empirical Bayes in-season prediction of baseball batting averages

Authors:

Abstract and Figures

The performance of a number of empirical Bayes methods are examined for the in-season prediction of batting averages with the 2005 Major League baseball data. Among the methodologies considered are new general empirical Bayes estimators in homoscedastic and heteroscedastic partial linear models.
Content may be subject to copyright.
IMS Collections
Borrowing Strength: Theory Powering Applications –
A Festschrift for Lawrence D. Brown
Vol. 6 (2010) 272–282
c
Institute of Mathematical Statistics, 2010
DOI: 10.1214/10-IMSCOLL618
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Empirical Bayes in-season prediction of
baseball batting averages
Wenhua Jiang1and Cun-Hui Zhang2,
National Heart, Lung, and Blood Institute and Rutgers University
Abstract: The performance of a number of empirical Bayes methods are
examined for the in-season prediction of batting averages with the 2005 Major
League baseball data. Among the methodologies considered are new general
empirical Bayes estimators in homoscedastic and heteroscedastic partial linear
models.
1. Introduction
This paper is motivated by our desire to extend advances in empirical Bayes to more
general settings, including homoscedastic and heteroscedastic partial linear models,
and by Brown’s [3] field test of empirical Bayes with the 2005 Major League baseball
data.
The main thrust of empirical Bayes is the possibility of substantial reduction of
a compound risk in a collection of statistical decision problems by making decisions
in each problem with observations from all problems to be combined [14, 15, 17,
10, 1, 2, 18, 19, 7, 8, 21]. In the classic compound decision theory, the significance
of this phenomenon is proven for similar decision problems involving independent
observations and different unknown parameters. Baseball is a favorite example for
empirical Bayes since it raises problems involving many players in well understood
statistical models for counting data.
Before the benefits and importance of empirical Bayes became well understood,
there were heated debates about its usefulness, especially about what collection of
decision problems should be combined in actual statistical practice. In the midst
of such a debate, Efron and Morris [8] used predicting the baseball batting average
(ratio of base hits to the number of times at bat for an individual player) as a
primary example of a proper collection of decision problems to combine. In that
example, the James-Stein estimator is applied to the arcsin-root transformation
of the batting averages of 14 players for the first 45 at bats in the 1970 season
to predict their batting averages in the rest of the season. In fact, the 14 players
were selected as Roberto Clemente and all those having the same number of 45
at bats as him according to a certain 1970 issue of the New York Times. The
selection of Roberto Clemente, one of the most successful hitters in the preceding
Supported in part by NSF Grants DMS 0804626, DMS 0906420 and NSA Grant H98230-09-
1-0006.
1Office of Biostatistics, DECA, National Heart, Lung, and Blood Institute, 2 Rockledge Center,
Bethesda, Maryland, USA, e-mail: jiangw3@nhlbi.nih.gov
2Department of Statistics and Biostatistics, Rutgers University, Hill Center, Busch Campus,
Piscataway, NJ 08854, USA, e-mail: czhang@stat.rutgers.edu
Keywords and phrases: empirical Bayes, compound decisions, partial linear model, nonpara-
metric estimation, semiparametric estimation, sports, hitting, batting average.
AMS 2000 subject classifications: Primary 62J05, 62J07; secondary 62H12, 62H25.
272
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
Empirical Bayes prediction of batting averages 273
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
seasons, was designed “to embarrass the Jame-Stein rule” [8], since the method
does not favor heterogeneity with the unknown means. Incidentally, the selection
of these 14 players also provided homoscedasticity since the variance of the arcsin-
root transformed success rate in Bernoulli trials is approximately a quarter of the
reciprocal of the number of trials.
Brown [3] examined the performance of seven statistical methods for the in-
season prediction of batting averages with the 2005 Major League data. He used
the data from all players with at least 11 at bats in the first half season to predict
the batting averages in the second half season for all players with at lease 11 at bats
in both half seasons. This involves about 500 players. Since different players have
different at bats in the first half season, he raised an interesting heteroscedastic
compound decision problem and studied extensions of empirical Bayes methods for
such data. He observed a moderate correlation in the 2005 data between at bats
and batting averages, and significantly improved the performance of empirical Bayes
predictors by applying them separately to the groups of pitchers and nonpitchers.
Brown’s [3] results motivated us to consider a heteroscedastic partial linear
model. In this model, the unknown means are partially explained by a certain
linear effect but not completely, and the errors are normal variables with zero mean
and different known variances. In the baseball application, the linear component
could include pitcher-nonpitcher group and at bats effects. We extend a number of
empirical Bayes methods to this heteroscedastic partial linear model and examine
their performance with the 2005 baseball data.
The rest of the paper is organized in four sections to cover data description
and performance measures, partial linear models, extensions of empirical Bayes
methods, and experiments with the 2005 baseball data.
2. Data description and prediction loss functions
We shall keep most of Brown’s [3] notation and consider the same set of loss func-
tions for the prediction of batting averages.
The 2005 baseball data in [3] provide the name, status as a pitcher or not, at
bats for the entire 2005 regular season and for the months April through September,
and monthly total base hits for each Major League player. Post season records are
excluded and the records of a few October games are merges into September. For
convenience, we treat April, May and June as the first half season and the rest of
the regular season as the second half [3].
Let Sdenote the set of all players. For each i∈S,letNji and Hji denote the
number of at bats and number of hits for the first (second) half season for j=1
(j= 2). The corresponding batting averages are then
Rji =Hji/Nji.(1)
Let Sj={i:Nji 11}be the sets of all players with at least 11 at bats for each
half season. We consider the prediction of {R2i,i ∈S
1∩S
2}based on the data
{Δi,N
1i,H
1i,i ∈S
1},wher
i=1ifthei-th player is a pitcher and Δi=0
otherwise.
A reasonable model for the data assumes that the number of hits H1iand H2i
are conditionally independent with the binomial distribution
Hji|i,N
ji,p
i)Bin(Nji,p
i),j=1,2,(2)
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
274 W. Jiang and C.-H. Zhang
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
where piis the batting probability of the i-th player. The standard variance sta-
bilizing transformation for HBin(N,p)isarcsin
H/N. Brown [3] suggested a
finer version of this arcsin-root transformation as
X=arcsinH+1/4
N+1/2Narcsin p, 1
4N
to minimize the approximation error for the mean to the order of N2. Under the
binomial model (2), the batting averages are transformed into
Xji =arcsinHji +1/4
Nji +1/2Nθi
2
ji,(3)
where θi=arcsinpiand σ2
ji =1/(4Nji).
Let R1=(R1i,i ∈S
1)and R2=(R2i,i ∈S
1∩S
2)with Sj={i:Nji 11}.
Define vectors Δ,Xj,θjand σj=1/4Njanalogously via (3). The problem
is to predict R2or X2based on (Δ,X1,σ1). Thus, a predictor is a Borel map
δ(Δ,X1,σ1)RS2.
The data X2and σ2are used to validate the performance of predictors. As in
[3], we use error measures
TSE[δ]=
TSE[δ]
TSE[δ0]with
TSE[δ]=
i∈S1∩S2(X2iδi)2σ2
2i
and its weighted version
TWSE[δ]=
TWSE[δ]
TWSE[δ0]with
TWSE[δ]=
i∈S1∩S2
(X2iδi)2σ2
2i
4σ2
1i
.
These error measures compare predictors δof X2with the naive δ0=(X1i,i
S1∩S
2). They can be viewed as approximations for the (weighted) MSE for the
estimation of θ,since
E
TSE[δ]=E
i∈S1∩S2
(δiθi)2,E
TWSE[δ]=E
i∈S1∩S2
(δiθi)2
4σ2
1i
,
in the normal model. For the prediction of R2we use the error measure
TSE
R[
R]=
TSER[
R]
TSE
R[
R0],
where
R=(
Ri,i ∈S
1∩S
2)with
Ri=sin
2(δi) for any predictor δof X2,
TSER[
R]=i∈S1∩S2(
R2iR2i)2R2i(1 R2i)/N2iand
R0=sin
2(δ0).
3. A partial linear model
Let y=(y1,...,y
n)be a response vector and Z=(z1,...,zn)=(zij )n×pbe a
covariate matrix. In a partial linear model, the mean of yis partially explained by
a linear function of Zbut not completely. This can be written as
y=+ξ+ε,(4)
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
Empirical Bayes prediction of batting averages 275
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
where β=(β1,...,β
p)is a vector of unknown deterministic regression coefficients,
ξ=(ξ1,...,ξ
n) is an unknown vector not dependent on known covariates, and
ε=(ε1,...,ε
n)is a vector of independent random errors with i=0and
Var(εi)=σ2
i. The partial linear model is homoscedastic if σiare all equal and
heteroscedastic otherwise. In the homoscedastic model, the common variance σ2can
be estimated with the data in (4) if ξiare known to be sparse, e.g. by the σin (11).
In the heteroscedastic model, the estimation of ξrequires known σ=(σ1,...,σ
n)
or the availability of consistent or χ2type estimates of σ2
i.
The partial linear model (4) is different from the partial (semi, semiparametric)
linear regression model in which ξiis assumed to be a smooth (possibly nonlinear)
function of some additional covariates.
From a likelihood point of view, the linear effect in (4) may not help since a
sum of the linear effect and a general unknown effect is still a general unknown effect.
However, after removing the linear effect, the benefits of the compound estimation
of ξcould be far greater than that of the direct compound estimation of the mean of
y. This is also clear from an empirical Bayes point of view since the sum of and
a vector ξwith i.i.d. components is typically not a vector with i.i.d. components.
The partial linear model is closely related to the parametric random effects model
yik =βzik +ξi+εik,k=1,...,K
i,(5)
where ξiare assumed to be i.i.d. variables from a distribution depending on an
unknown parameter τ(e.g. N(02)). Existing work on random effects models typ-
ically focuses on the estimation of the parameters βand τin the case where the
identifiability of τis ensured by the multiplicity Ki>1. Parametric statistical
inference for {β}and the noise level in (5) is well understood.
For the baseball data, (4) provides an interpretation of the data in (3) with
X1=y,θ=+ξ,1
4N1
=σ2,|S1|=n.(6)
We use the least squares method (LSE)
θ=(
n
i=1 ziz
i)1n
i=1 ziyi(heteroscedas-
ticity ignored) and weighted least squares estimator (WLSE)
θ=(
n
i=1 ziz
i/
σ2
i)1n
i=1 ziyi2
ito carry out a preliminary examination of possible choices of
Zfor the linear component. Brown’s [3] results suggest a strong group effect with
Δ(Pitcher) and a moderate at bats N1(AB) effect. This is confirmed in Table 1.
It turns out that the regression analysis exhibits no Pitcher-AB interaction. The
scatterplots in Figure 1 demonstrate a moderate AB effect for nonpitchers and a
very small AB effect for pitchers. Since most data points are within ±1.96σifrom
the regression line, the model for the noise level σ2
i=1/(4N1i) explains a vast
majority of the residuals in the regression analysis, suggesting a sparse ξ.
Tab l e 1
Squared multiple correlation coefficients, R2
Models LSE WLSE
Pitcher 0.256 0.190
AB 0.247 0.208
Pitcher+AB 0.342 0.292
Pitcher*AB 0.342 0.293
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
276 W. Jiang and C.-H. Zhang
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Fig 1. Scatterplots of the transformed batting average (X1)vsatbats(N1) with weighted least
squares reg ression li ne ±1.961/(4N1);R2=0.151 for nonpitchers and R2=0.009 for pitchers.
4. Empirical Bayes methods in partial linear models
We consider in separate subsections the general (nonparametric) empirical Bayes,
parametric (linear) empirical Bayes, and the James-Stein methods.
4.1. General empirical Bayes
Suppose that for a deterministic or given unknown vector θ=(θ1,...,θ
n),the
observation y=(y1,...,y
n)is distributed as independent yif(y|θi)ν(dy)with
aknownf(·|·). Given a loss function L(a, θ) for the statistical inference about θi,
the compound risk for a statistical procedure
θ=(
θ1,...,
θn)is
Eθ
n
i=1
L(
θi
i)
n.(7)
For any separable statistical procedure of the form
θi=t(yi), the compound risk
is identical to the Bayes risk
Eθ
n
i=1
L(
θi
i)
n=L(t(y))f(y|θ)ν(dy)dGn(θ)(8)
with respect to the unknown prior Gn(A)=n
i=1 I{θiA}/n.Thus,theriskof
separable rules is minimized at the ideal Bayes rule
t
Gn(y)=argmin
aL(a, θ)f(y|θ)dGn(θ).(9)
In the original formulation of Robbins [14, 15], empirical Bayes seeks statistical pro-
cedures which approximate the ideal Bayes rule t
Gn(·) or its performance. Robbins
[16] referred to his approach as general empirical Bayes.
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
Empirical Bayes prediction of batting averages 277
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
From a methodological point of view, there is minimum difference in the general
empirical Bayes approach between the cases of deterministic θ(the compound set-
ting) and i.i.d. θiG(the empirical Bayes setting), since one seeks an approxima-
tion of a (nominal) Bayes rule t
Gfor some (nearly) completely unknown Ganyway.
However, theoretical results in the compound setting typically have broader impact
(e.g. on minimaxity and adaptive estimation). The use of (9) as a benchmark, orig-
inally designed to “beat” standard (minimax, invariant, or maximum likelihood)
procedures, has been further justified via the minimax theory for the estimation of
sparse vectors [6] and the approximate risk equivalence between exchangeable and
separable rules [9].
Jiang and Zhang [11] developed and studied a general maximum likelihood em-
pirical Bayes (GMLEB) method for the approximation of t
Gnin (9):
θ=t
G(y),
G=argmax
GG
n
i=1 f(yi|θ)G(),(10)
where Gis the class of all distribution functions. This procedure estimates t
Gnwith
the generalized (nonparametric) maximum likelihood method [12, 22]. Different
kernel estimates of the ideal Bayes rule for the estimation of θhave been developed
in [20] and [5].
Nothing in the general empirical Bayes formulation prevents adding a parameter
βas long as the link (8) between the compound and empirical Bayes settings holds
and the corresponding ideal Bayes rule t
Gn,βcan be estimated.
In the homoscedastic partial linear model (4) where εN(0
2In), the GMLEB
can be directly extended with
θi=t
G,σyi
βzi,t
G,σ(y)=argmin
aL(a, θ)ϕ(y/σ)dGn(θ),(11)
where σ=σfor known σand σ=median(|yi
βzi|)/median(|N(0,1)|) otherwise,
ϕ(x)=ex2/2/2πis the N(0,1) density and
{
β,
G}=argmax
β,GG
n
i=1 ϕyiβziξ
σG().(12)
It follows from (8) that for known {β}, the compound risk for rules of the form
t(y) is minimized by the ideal Bayes rule t
Gn(y)withGn(A)=
n
i=1 I{ξiA}/n.Thus,forknownσ, (11) simply replaces the unknown {β,G
n}
in the ideal Bayes rule with their joint generalized maximum likelihood estimator.
For unknown σ, we compute {
β,
G, σ}by iterating the estimating equations for β,
Gnand σ.
The direct link (8) between the compound and empirical Bayes settings breaks
down in the heteroscedastic partial linear model with known σi, since it is unrea-
sonable to use a prior to mix known quantities σi. However, the GMLEB still makes
sense in the empirical Bayes setting where ξiare (nominally) treated as i.i.d. vari-
ables from an unknown distribution G.ForεiN(0
2
i) in (4) and the squared
loss L(a, θ)=(aθ)2,
θi=
βzi+ξϕ(yi
βziξ)i
G()
ϕ(yi
βziξ)i
G()
,(13)
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
278 W. Jiang and C.-H. Zhang
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
where
βand
Gcould be solved by iterating
β=argmax
b
n
i=1 σ1
iϕ((yibziξ)i)
G(),
G=argmax
GG
n
i=1 σ1
iϕ((yi
βziξ)i)G().
(14)
Another general empirical Bayes method in the heteroscedastic partial linear
model first rescales the data to the unit variance and then applies the GMLEB (11)
in the homoscedastic partial linear model. This effectively treats Gas a nominal
prior for ζi=ξii. We call this estimator the weighted general maximum likelihood
empirical Bayes (WGMLEB) since the approach is parallel to the extension of LSE
to WLSE. Explicitly, the WGMLEB is
θi=
βzi+σiζϕ(yii
βziiζ)
G()
ϕ(yii
βziiζ)
G()
(15)
for L(a, θ)=(aθ)2,where
βand
Gcould be solved by iterating
β=argmax
b
n
i=1 ϕ(yiibziiζ)
G(),
G=argmax
GG
n
i=1 ϕ(yii
βziiζ)G().
(16)
The WGMLEB maintains the link (8) between the compound estimation and em-
pirical Bayes under the weighted compound risk Eθn
i=1(
θiθi)2/(σ2
in).
4.2. Parametric empirical Bayes
A parametric empirical Bayes method, as defined in[13], approximates
t
τ(y)=argmin
aL(a, θ)f(y|θ)dG(θ|τ),(17)
where τis an unknown parameter and G(·|·) is a known family of distributions.
Brown [3] extended parametric empirical Bayes methods to the heteroscedastic
model
yi=ξi+εi
iN(μ, τ )
iN(0
2
i)
for the estimation of ξiunder the squared loss. Specifically, he considered the max-
imum likelihood (ML) or method of moments (MM) estimates of (μ, τ ). Direct
extension of these methods to the partial linear model yields
θi=
βzi+τ2(yi
βzi)
σ2
i+τ2,(18)
where
βand τare computed by iteratively solving
β=n
i=1 ziz
i/(τ2+σ2
i)1n
i=1 ziyi/(τ2+σ2
i),
n
i=1(yi
βzi)2/(σ2
i+τ2)2=n
i=1 1/(σ2
i+τ2)(ML),
τ2=n
i=1(yi
βzi)2/(np)n
i=1 σ2
i/n+
(MM).
(19)
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
Empirical Bayes prediction of batting averages 279
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
4.3. The James-Stein estimator
A distinct feature of the James-Stein estimator is its dominance of a standard (e.g.
minimum risk unbiased or invariant) estimator under the squared loss, although it
has an empirical Bayes interpretation [7, 8]. In this spirit, the name James-Stein
estimator is typically reserved for its extensions with this dominance feature to
explicitly allow their applications to small samples.
In the homoscedastic partial linear model (4) with εiN(0
2), one way of
achieving this dominance is to apply the James-Stein estimator separately to the
projection of yto the linear span of the columns of Zand its orthogonal comple-
ment.
In the heteroscedastic partial linear model (4) with known σi, one may first scale
to unit variance and then apply the James-Stein estimator in the homoscedastic
partial linear model to achieve dominance of the naive estimator yunder the σ1
i
weighted mean squared error. This was done in [3] for the common mean model
(zi=1,i). In partial linear models with general ziRp, this James-Stein
estimator is
(20) θ=1p2
n
i=1(
βzi)22
i+
Z
β+1np2
n
i=1(yi
βzi)22
i+(yZ
β)
with the WLSE
β={n
i=1 ziz
i2
i}1n
i=1 ziyi2
i. Brown and Zhao [5] pro-
posed further shrinkage of (20) to reduce the weighted mean squared error.
5. Empirical Bayes prediction with the baseball data
We report applications of the GMLEB in (13)-(14), WGMLEB in (15)-(16) and the
Jame-Stein estimator in (20) to the 2005 baseball data described in Section 2, and
compare them with the LSE, WLSE and Brown’s [3] results. The data map is given
in (6) and the predictors are
X2i=
θi,
R2i=sin
2
θi,i∈S
1∩S
2,
for any estimators under consideration. The models always include the intercept
and are given in parentheses, with (Pitcher*AB) as the model (Pitcher + AB +
interaction). For example, James-Stein(Null) means the estimator (20) with zi=1
as in Brown [3]. Since, pis small and nis large in all cases, the shrinkage of the
regression component Z
βand positive part in (20) were not implemented with the
James-Stein estimator in our experiments.
Table 2 reports results based on data from all players with at least 11 at bats
in the first half season for the prediction of the batting averages of all players
with at least 11 at bats in both half seasons. Brown’s [3] results are included in
the first block of rows as the parametric EB(MM) and EB(ML) in (18)-(19) with
zi= 1, a modified nonparametric empirical Bayes estimator (NPEB) of Brown
and Greenshtein’s [4], and a Bayes estimator with a harmonic prior (HP; [18]). We
refer to Brown [3] for detailed description and further discussion of these methods.
In the next 5 blocks of rows, we report results in the (Null), (AB), (Pitcher),
(AB+Picher) and (AB*Pitcher) models for the LSE, WLSE, James-Stein, GMLEB
and WGMLEB methods.
Table 3 reports results for separate applications of the predictors to the non-
pitcher and pitcher groups. Again, Brown’s [3] results are included in the first
block of rows, followed by our results in the (Null) and (AB) models.
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
280 W. Jiang and C.-H. Zhang
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Tab l e 2
Midseason prediction for all batters (|S1|,|S1∩S
2|) = (567,499)
TSE
TSE
R
TWSE
Naive 1 1 1
EB(MM) 0.593 0.606 0.626
EB(ML) 0.902 0.925 0.607
NPEB 0.508 0.509 0.560
HP 0.884 0.905 0.600
LSE(Null) 0.853 0.897 1.116
WLSE(Null) 1.074 1.129 0.742
James-Stein(Null) 0.535 0.540 0.502
GMLEB(Null) 0.663 0.671 0.547
WGMLEB(Null) 0.306 0.298 0.427
LSE(AB) 0.518 0.535 0.686
WLSE(AB) 0.537 0.527 0.545
James-Stein(AB) 0.370 0.352 0.443
GMLEB(AB) 0.410 0.397 0.455
WGMLEB(AB) 0.301 0.291 0.424
LSE(Pitcher) 0.272 0.283 0.559
WLSE(Pitcher) 0.324 0.343 0.519
James-Stein(Pitcher) 0.243 0.244 0.427
GMLEB(Pitcher) 0.259 0.266 0.429
WGMLEB(Pitcher) 0.208 0.204 0.401
LSE(Pitcher+AB) 0.242 0.246 0.477
WLSE(Pitcher+AB) 0.219 0.215 0.435
James-Stein(Pitcher+AB) 0.184 0.175 0.391
GMLEB(Pitcher+AB) 0.191 0.183 0.387
WGMLEB(Pitcher+AB) 0.184 0.175 0.385
LSE(Pitcher*AB) 0.240 0.244 0.476
WLSE(Pitcher*AB) 0.204 0.201 0.429
James-Stein(Pitcher*AB) 0.171 0.162 0.386
GMLEB(Pitcher*AB) 0.178 0.170 0.382
WGMLEB(Pitcher*AB) 0.177 0.167 0.382
Tab l e 3
Midseason prediction for nonpitchers and pitchers (|S1|,|S1∩S
2|) = (486,435) for nonpitchers
(|S1|,|S1∩S
2|)=(81,64) for pitchers
Nonpitchers Pitchers
widehatTS E
TWSE
TSE
TWSE
Naive 1 1 1 1
EB(MM) 0.387 0.494 0.129 0.191
EB(ML) 0.398 0.477 0.117 0.180
NPEB 0.372 0.527 0.212 0.266
HP 0.391 0.473 0.128 0.190
LSE(Null) 0.378 0.606 0.127 0.235
WLSE(Null) 0.468 0.561 0.127 0.234
James-Stein(Null) 0.348 0.469 0.165 0.202
GMLEB(Null) 0.378 0.465 0.134 0.178
WGMLEB(Null) 0.326 0.446 0.173 0.212
LSE(AB) 0.333 0.514 0.115 0.218
WLSE(AB) 0.290 0.465 0.087 0.182
James-Stein(AB) 0.262 0.436 0.142 0.177
GMLEB(AB) 0.257 0.415 0.111 0.154
WGMLEB(AB) 0.261 0.423 0.141 0.178
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
Empirical Bayes prediction of batting averages 281
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Take-away messages
The empirical Bayes in-season prediction results with the 2005 baseball data suggest
the following four messages:
I. Empirical Bayes methods may substantially improve upon the least squares
predictor even when the linear model assumption seems to hold well: Empirical
Bayes predictors outperform lease squares predictors in all models, groups and
loss functions studied with the exception of
TSEfor pitchers in Table 3. The
exception is probably due to the smaller sample size and the mismatch between the
loss function and the weighted empirical Bayes methods. The regression analysis in
Section 3 exhibits satisfactory fit in both the mean and variance.
II. Empirical Bayes methods may capture a great portion of the effects of missing
covariables in the linear model: The phenomenon is clear in comparisons between
the empirical Bayes predictors in smaller models and the lease squares predictors
in larger models. It may have significant implications in dealing with latent effects
and in semilinear regression when a nonparametric component suffers the curse of
dimensionality.
III. The GMLEB is highly competitive compared with the James-Stein predic-
tor with moderately large samples: In fact, the WGMLEB outperforms the (also
weighted) James-Stein predictor in a great majority of combinations of models,
groups and loss functions.
IV. The heteroscedastic partial linear model is tricky: The unweighted GMLEB
may not handle the correlation between the mean and variance well compared with
the weighted empirical Bayes methods, as the (Null) model in Table 2 demonstrates.
Still, the unweighted GMLEB significantly outperforms both least squares methods
in the case.
References
[1] Brown, L. D. (1966). On the admissibility of invariant estimators of one or
more location parameters. Ann. Math. Statist. 37 1087–1136.
[2] Brown, L. D. (1971). Admissible estimators, recurrent diffusions, and insol-
uble boundary value problems. Ann. Math. Statist. 42 855–903.
[3] Brown, L. D. (2008). In-season prediction of batting averages: A field test of
empirical Bayes and Bayes methodologies. Ann. Apply. Statist. 2113–152.
[4] Brown, L. D. and Greenshtein, E. (2009). Empirical Bayes and compound
decision approaches for estimation of a high dimensional vector of normal
means. Ann. Statist. 37 1685–1704.
[5] Brown, L. D. and Zhao,L.H.(2009). Estimators for Gaussian models
having a block-wise structure. Statist. Sinica 19 885–903.
[6] Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over p-balls
for q-error. Probab. Theory Related Fields 99 277–303.
[7] Efron, B. and Morris, C. (1972). Empirical Bayes on vector observations:
An extension of Stein’s method. Biometrika 59 335–347.
[8] Efron, B. and Morris, C. (1973). Combining possibly related estimation
problems (with discussion). J. Roy. Statist. Soc. Ser. B 35 379–421.
[9] Greenshtein, E. and Ritov, Y. (2009). Asymptotic efficiency of simple
decisions for the compound decision problem. In Optimality: The Third Erich
L. Lehmann Symposium (J. Rojo, ed.). IMS Lecture Notes—Monograph Series
57 266–275.
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
282 W. Jiang and C.-H. Zhang
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
[10] James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc.
Fourth Berkeley Symp. Math. Statist. Probab. 1361–379. Univ. California
Press, Berkeley.
[11] Jiang, W. and Zhang, C.-H. (2009). General maximum likelihood empirical
Bayes estimation of normal means. Ann. Statist. 37 1647–1684.
[12] Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likeli-
hood estimator in the presence of infinitely many incidental parameters. Ann.
Math. Statist. 27 887–906.
[13] Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and
applications. J. Amer. Statist. Assoc. 78 47–55.
[14] Robbins, H. (1951). Asymptotically subminimax solutions of compound sta-
tistical decision problems. In Proc. Second Berkeley Symp. Math. Statist.
Probab. 1131–148. Univ. California Press, Berkeley.
[15] Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc.
Third Berkeley Symp. Math. Statist. Probab. 1157–163. Univ. California Press,
Berkeley.
[16] Robbins, H. (1983). Some thoughts on empirical Bayes estimation. Ann. Sta-
tist. 11 713–723.
[17] Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a
multivariate normal distribution. In Proc. Third Berkeley Symp. Math. Statist.
Probab. 1157–163. Univ. California Press, Berkeley.
[18] Strawderman, W. E. (1971). Proper Bayes estimators of the multivariate
normal mean. Ann. Math. Statist. 42 385–388. MR0397939
[19] Strawderman, W. E. (1973). Proper Bayes minimax estimators of the mul-
tivariate normal mean for the case of common unknown variances. Ann. Math.
Statist. 44 1189–1194. MR0365806
[20] Zhang, C.-H. (1997). Empirical Bayes and compound estimation of normal
means. Statist. Sinica 7181–193.
[21] Zhang, C.-H. (2003). Compound decision theory and empirical Bayes
method. Ann. Statist. 33 379–390.
[22] Zhang, C.-H. (2009). Generalized maximum likelihood estimation of normal
mixture densities. Statist. Sinica 19 1297–1318.
imsart-coll ver. 2010/04/27 file: imscoll618.tex date: July 9, 2010
... So far, however, most work on the normal means problem and its variants has considered only a single sequence of observations X i1 , and it appears that the side information problem (2.1) has not yet been widely studied. Jiang et al. (2010), Cohen et al. (2013), Tan (2016), and Kou and Yang (2017) proposed methods that can integrate X i2 , but these essentially require knowledge of the nature of the relationship between θ i1 and X i2 , and may not work well when this relationship is misspecified. Banerjee et al. (2018) studied the side information problem, but only for sparse θ ·1 . ...
... The above view of side information is slightly different from that of existing frameworks. Previous methods (Jiang et al., 2010;Kou and Yang, 2017;Tan, 2016) posit some functional relationship, typically linear, between θ i1 and the observed X i2 , rather than between θ i1 and the true mean θ i2 . For example, Kou and Yang (2017) assume that θ i1 = h(X i2 ) + e i for some error term e i , where h(x) must be known up to a finite-dimensional parameter. ...
... The increasing importance of sports analytics, particularly in baseball, has given rise to a number of machine learning approaches surveyed by Koseler and Stephan (2017). However, the classes of problems investigated in such approaches have been typically limited to predicting which pitch will be thrown (Ganeshapillai and Guttag, 2012;Hoang et al., 2015), a player's batting average or other offensive statistics (Jiang and Zhang, 2010;Lyle, 2007), likelihood of catching a baseball (Das and Das, 1994), likelihood of winning (Yang and Swartz, 2004), and the like. There are no prior approaches to predict the outcomes for a pair of pitcher and batter at the level of resolution of a single pitch. ...
Article
Full-text available
The field of quantitative analytics has transformed the worldof sports over the last decade. To date, these analytic ap-proaches are statistical at their core, characterizing what isand what was, while using this information to drive decisionsabout what to do in the future. However, as we often viewteam sports, such as soccer, hockey, and baseball, as pairwisewin-lose encounters, it seems natural to model these as zero-sum games. We propose such a model for a baseball at-bat,which is a matchup between a pitcher and a batter. Specifi-cally, we propose a novel model of this encounter as a zero-sum stochastic game, in which the goal of the batter is to geton base, an outcome the pitcher aims to prevent. The valueof this game is the on-base percentage (i.e., the probabilitythat the batter gets on base). In principle, this stochastic gamecan be solved using classical approaches. The main techni-cal challenges lie in predicting the distribution of pitch loca-tions as a function of pitcher intention, predicting the distri-bution of outcomes if the batter decides to swing at a pitch,and characterizing the level of patience of a particular batter.We address these challenges by proposing novel pitcher andbatter representations as well as a novel deep neural networkarchitecture for outcome prediction. Our experiments usingKaggle data from the 2015 to 2018 Major League Baseballseasons demonstrate the efficacy of the proposed approach.
... Model parameters are chosen 13 via maximum likelihood estimation (MLE) rather than via the specification of a hyper-prior, and this results in updated "raw" data-sets which can then be used for fitting the conditional Gaussian skill models (2) for each player to each target region. Our specific model, the DM distribution, is a well-known generalization of the Beta-Binomial distribution and has been applied extensively in baseball analytics for example; see Jiang and Zhang (2010) and Robinson (2017). More generally, the DM distribution is often very useful for modelling proportional data and Minka (2000) provides an excellent review of the model as well as various approaches to estimating it via MLE. ...
Preprint
Full-text available
We perform an exploratory data analysis on a data-set for the top 16 professional players from the 2019 season. The goal is to use this data-set to fit player skill models which can then be used in dynamic zero-sum games (ZSGs) that model real-world matches between players. We identify several problems that arise due to natural limitations in the data and propose an empirical Bayesian approach - the Dirichlet-Multinomial (DM) model - that overcomes some of these problems. We explain how the remaining problems can be finessed using the DM model with a limited action set in the ZSGs.
... It is useful because it brings managers a clear view of players' in-season performance. Moreover, Jiang, and Zhang (2010) further boosted Brown's research [20]. They focused on different kinds of empirical Bayes methods and predicted in-season players' performance with those methods like Brown did. ...
Article
Full-text available
Player performance prediction is a serious problem in every sport since it brings valuable future information for managers to make important decisions. In baseball industries, there already existed variable prediction systems and many types of researches that attempt to provide accurate predictions and help domain users. However, it is a lack of studies about the predicting method or systems based on deep learning. Deep learning models had proven to be the greatest solutions in different fields nowadays, so we believe they could be tried and applied to the prediction problem in baseball. Hence, the predicting abilities of deep learning models are set to be our research problem in this paper. As a beginning, we select numbers of home runs as the target because it is one of the most critical indexes to understand the power and the talent of baseball hitters. Moreover, we use the sequential model Long Short-Term Memory as our main method to solve the home run prediction problem in Major League Baseball. We compare models’ ability with several machine learning models and a widely used baseball projection system, sZymborski Projection System. Our results show that Long Short-Term Memory has better performance than others and has the ability to make more exact predictions. We conclude that Long Short-Term Memory is a feasible way for performance prediction problems in baseball and could bring valuable information to fit users’ needs.
... The increasing importance of sports analytics, particularly in baseball, has in turn given rise to a number of machine learning approaches surveyed by Koseler and Stephan (2017). However, the classes of problems investigated in such approaches have been so far typically limited to predicting which pitch will be thrown (Ganeshapillai and Guttag 2012;Hoang et al. 2015), a player's batting average or other offensive statistics (Jiang and Zhang 2010;Lyle 2007), likelihood of catching a baseball (Das and Das 1994), likelihood of winning (Yang and Swartz 2004), and the like. There do not appear to be prior approaches to predict the outcomes for a pair of pitcher and batter at the level of resolution of a single pitch, which is one of our main contributions. ...
Preprint
Full-text available
The field of quantitative analytics has transformed the world of sports over the last decade. To date, these analytic approaches are statistical at their core, characterizing what is and what was, while using this information to drive decisions about what to do in the future. However, as we often view team sports, such as soccer, hockey, and baseball, as pairwise win-lose encounters, it seems natural to model these as zero-sum games. We propose such a model for one important class of sports encounters: a baseball at-bat, which is a matchup between a pitcher and a batter. Specifically, we propose a novel model of this encounter as a zero-sum stochastic game, in which the goal of the batter is to get on base, an outcome the pitcher aims to prevent. The value of this game is the on-base percentage (i.e., the probability that the batter gets on base). In principle, this stochastic game can be solved using classical approaches. The main technical challenges lie in predicting the distribution of pitch locations as a function of pitcher intention, predicting the distribution of outcomes if the batter decides to swing at a pitch, and characterizing the level of patience of a particular batter. We address these challenges by proposing novel pitcher and batter representations as well as a novel deep neural network architecture for outcome prediction. Our experiments using Kaggle data from the 2015 to 2018 Major League Baseball seasons demonstrate the efficacy of the proposed approach.
... Discovering signals in large-scale modern data is like looking for needles in the haystack -a geneticist locates genes that are associated with a disease [Efron et al., 2001, Leng et al., 2013; a neural scientist discovers differential brain activity Perone Pacifico et al. [2004], Schwartzman et al. [2008]; a technology firm screens multiple potential innovations among thousands of A/B testings [Goldberg andJohndrow, 2017, Azevedo et al., 2020]; a personalized health system optimizes physical activities with a reinforcement learning algorithm by identifying immediate treatment effects [Liao et al., 2020]; a manager picks the superstars and underdogs among sports teams [Brown, 2008, Jiang and Zhang, 2010, Cai et al., 2019; an economist estimates the effects of a large number of treatments [Heckman andSinger, 1984, Abadie andKasy, 2019]. In addition, these large-scale studies often collect data from multiple sources introducing heterogeneity among units. ...
Preprint
Large-scale modern data often involves estimation and testing for high-dimensional unknown parameters. It is desirable to identify the sparse signals, ``the needles in the haystack'', with accuracy and false discovery control. However, the unprecedented complexity and heterogeneity in modern data structure require new machine learning tools to effectively exploit commonalities and to robustly adjust for both sparsity and heterogeneity. In addition, estimates for high-dimensional parameters often lack uncertainty quantification. In this paper, we propose a novel Spike-and-Nonparametric mixture prior (SNP) -- a spike to promote the sparsity and a nonparametric structure to capture signals. In contrast to the state-of-the-art methods, the proposed methods solve the estimation and testing problem at once with several merits: 1) an accurate sparsity estimation; 2) point estimates with shrinkage/soft-thresholding property; 3) credible intervals for uncertainty quantification; 4) an optimal multiple testing procedure that controls false discovery rate. Our method exhibits promising empirical performance on both simulated data and a gene expression case study.
Article
We perform an exploratory data analysis on a data-set for the top 16 professional darts players from the 2019 season. We use this data-set to fit player skill models which can then be used in dynamic zero-sum games (ZSGs) that model real-world matches between players. We propose an empirical Bayesian approach based on the Dirichlet-Multinomial (DM) model that overcomes limitations in the data. Specifically we introduce two DM-based skill models where the first model borrows strength from other darts players and the second model borrows strength from other regions of the dartboard. We find these DM-based models outperform simpler benchmark models with respect to Brier and Spherical scores, both of which are proper scoring rules. We also show in ZSGs settings that the difference between DM-based skill models and the simpler benchmark models is practically significant. Finally, we use our DM-based model to analyze specific situations that arose in real-world darts matches during the 2019 season.
Preprint
Full-text available
Multivariate, heteroscedastic errors complicate statistical inference in many large-scale denoising problems. Empirical Bayes is attractive in such settings, but standard parametric approaches rest on assumptions about the form of the prior distribution which can be hard to justify and which introduce unnecessary tuning parameters. We extend the nonparametric maximum likelihood estimator (NPMLE) for Gaussian location mixture densities to allow for multivariate, heteroscedastic errors. NPMLEs estimate an arbitrary prior by solving an infinite-dimensional, convex optimization problem; we show that this convex optimization problem can be tractably approximated by a finite-dimensional version. We introduce a dual mixture density whose modes contain the atoms of every NPMLE, and we leverage the dual both to show non-uniqueness in multivariate settings as well as to construct explicit bounds on the support of the NPMLE. The empirical Bayes posterior means based on an NPMLE have low regret, meaning they closely target the oracle posterior means one would compute with the true prior in hand. We prove an oracle inequality implying that the empirical Bayes estimator performs at nearly the optimal level (up to logarithmic factors) for denoising without prior knowledge. We provide finite-sample bounds on the average Hellinger accuracy of an NPMLE for estimating the marginal densities of the observations. We also demonstrate the adaptive and nearly-optimal properties of NPMLEs for deconvolution. We apply the method to two astronomy datasets, constructing a fully data-driven color-magnitude diagram of 1.4 million stars in the Milky Way and investigating the distribution of chemical abundance ratios for 27 thousand stars in the red clump.
Article
Full-text available
This article concerns the canonical empirical Bayes problem of estimating normal means under squared-error loss. General empirical estimators are derived which are asymptotically minimax,and optimal. Uniform convergence and,the speed of convergence are considered. The general empirical Bayes estimators are compared with the shrinkage estimators of Stein (1956) and James and Stein (1961). Estimation of the mixture density and its derivatives are also discussed. Key words and phrases: Asymptotic optimality, empirical Bayes, minimaxity, nor-
Article
Full-text available
We study the generalized maximum likelihood estimator of location and location-scale mixtures of normal densities. A large deviation inequality is ob-tained which provides the convergence rate n −p/(2+2p) (log n) κp in the Hellinger distance for mixture densities when the mixing distributions have bounded finite p-th weak moment, p > 0, and the convergence rate n −1/2 (log n) κ when the mixing distributions have an exponential tail uniformly. Our results are applicable to the estimation of the true density of independent identically distributed observations from a normal mixture, as well as the estimation of the average marginal densities of independent not identically distributed observations from different normal mix-tures. The validity of our results for mixing distributions with p-th weak moment, 0 < p < 2, and for not identically distributed observations, is of special interest in compound estimation and other problems involving sparse normal means.
Article
We have two sets of parameters we wish to estimate, and wonder whether the James‐Stein estimator should be applied separately to the two sets or once to the combined problem. We show that there is a class of compromise estimators, Bayesian in nature, which will usually be preferred to either alternative. “The difficulty here is to know what problems are to be combined together — why should not all our estimation problems be lumped together into one grand melée?” G eorge B arnard commenting on the James–Stein estimator, 1962.
Article
Let X be a random variable which for simplicity we shall assume to have discrete values x and which has a probability distribution depending in a known way on an unknown real parameter A, p(xλ)=Pr[X=xΛ=λ], p\left( {x|\lambda } \right) = Pr[X = x|\Lambda = \lambda ], (1) A itself being a random variable with a priori distribution function G(λ)=Pr[Λ  λ]. G\left( \lambda \right) = \operatorname{P} r[\Lambda {\text{ }}\underline \leqslant {\text{ }}\lambda ]. (2)
Article
This article reviews the state of multiparameter shrinkage estimators with emphasis on the empirical Bayes viewpoint, particularly in the case of parametric prior distributions. Some successful applications of major importance are considered. Recent results concerning estimates of error and confidence intervals are described and illustrated with data.
Article
Consider estimating the mean vector θ from dataN n (θ,σ 2I) withl q norm loss,q≧1, when θ is known to lie in ann-dimensionall p ball,p∈(0, ∞). For largen, the ratio of minimaxlinear risk to minimax risk can bearbitrarily large ifp<q. Obvious exceptions aside, the limiting ratio equals 1 only ifp=q=2. Our arguments are mostly indirect, involving a reduction to a univariate Bayes minimax problem. Whenp<q, simple non-linear co-ordinatewise threshold rules are asymptotically minimax at small signal-to-noise ratios, and within a bounded factor of asymptotic minimaxity in general. We also give asymptotic evaluations of the minimax linear risk. Our results are basic to a theory of estimation in Besov spaces using wavelet bases (to appear elsewhere).
Article
It has long been customary to measure the adequacy of an estimator by the smallness of its mean squared error. The least squares estimators were studied by Gauss and by other authors later in the nineteenth century. A proof that the best unbiased estimator of a linear function of the means of a set of observed random variables is the least squares estimator was given by Markov [12], a modified version of whose proof is given by David and Neyman [4]. A slightly more general theorem is given by Aitken [1]. Fisher [5] indicated that for large samples the maximum likelihood estimator approximately minimizes the mean squared error when compared with other reasonable estimators. This paper will be concerned with optimum properties or failure of optimum properties of the natural estimator in certain special problems with the risk usually measured by the mean squared error or, in the case of several parameters, by a quadratic function of the estimators. We shall first mention some recent papers on this subject and then give some results, mostly unpublished, in greater detail.
Article
The statistician is considering several independent normal linear models with identical structures, and desires to estimate the vector of unknown parameters in each of them. An estimator is constructed which dominates the usual Gauss-Markov estimator in terms of total squared error loss. This estimator is shown to have good efficiency in the Bayesian situation where the parameter vectors themselves have a normal prior distribution. A practical example is given.
Article
When statistical decision problems of the same type are considered in large groups the minimax solution may not be the “best,” since there may exist solutions which are “asymptotically subminimax.” This is shown in detail for a classical problem in the theory of testing hypotheses.