Content uploaded by Gunnar Taraldsen
Author content
All content in this area was uploaded by Gunnar Taraldsen on May 18, 2021
Content may be subject to copyright.
Content uploaded by Gunnar Taraldsen
Author content
All content in this area was uploaded by Gunnar Taraldsen on Nov 18, 2020
Content may be subject to copyright.
Content uploaded by Gunnar Taraldsen
Author content
All content in this area was uploaded by Gunnar Taraldsen on Nov 11, 2020
Content may be subject to copyright.
Content uploaded by Gunnar Taraldsen
Author content
All content in this area was uploaded by Gunnar Taraldsen on Nov 11, 2020
Content may be subject to copyright.
Confidence in Correlation
doi:10.13140/RG.2.2.23673.49769
Gunnar Taraldsen
Department of Mathematical Sciences
Norwegian University of Science and Technology
May 18, 2021
Abstract
In 1895 Karl Pearson published his definition of the empirical correlation
coefficient, but the idea of statistical correlation was anticipated substan-
tially before this. Linear regression, and the associated correlation, is the
principal statistical methodology in many applied sciences. We derive an
explicit formula for the exact confidence density of the correlation. This can
be used to replace the approximations currently in use.
Keywords: confidence distributions; fiducial inference; correla-
tion coefficient; binormal distribution; Gaussian law
1 Introduction
The result of an experiment is given by four points with (x, y) coordinates (773,727),
(777,735), (284,286), and (519,573). There are reasons a priori for assuming a
linear relationship. This is further supported by Figure 1, and a high value for the
coefficient of determination R2= 97.00%. The R2equals the square of the empiri-
cal correlation r= 98.49%. An approximate 95% one sided confidence interval for
the correlation ρbased on the Fisher (1921) z-transformation is [66.08,100]%. Lin-
ear interpolation in the table presented by Fisher (1930, p.434) gives an exact 95%
confidence interval [67.42,100]%. 1Our Theorem 1, without linear interpolation,
gives the true exact 95% confidence interval [67.39,100]%.
The previous analysis is probably familiar to many readers with the possible
exception of the exact solution. Unfortunately, the exact solution by Fisher (1930)
166.4037 + (71.6298 - 66.4037)*(98.4893 - 98.4298)/(98.7371-98.4298) = 67.4156
1
Figure 1: A sample of size 4 with a regression line.
seems to be essentially forgotten. It can, and should, be implemented in standard
statistical software and practice. The purpose of this paper is to explain the
necessary theory, and to expand on the analysis given by Fisher (1930). The main
result is an explicit formula for the exact confidence density for the correlation.
This can be seen as adding an important example to the theory of confidence
distributions as presented by Schweder and Hjort (2016). It can also be seen as
an important example of fiducial inference as formulated by Hannig et al. (2016)
and others.
Much has been written on the correlation coefficient. A major source of in-
spiration for the presented proof is the above mentioned references and the work
of Hotelling (1953). In stead of giving a more thorough introduction we refer to
Rodgers and Nicewander (1988) and Rovine and von Eye (1997) which give further
references and several different interpretations of the correlation.
2 Theory
The correlation ρbetween two random variables Xand Yequals the cosine of the
angle βbetween X−µXand Y−µYin the Hilbert space of finite variance random
2
variables. It is given by
ρ= cos(β) = hX−µX, Y −µYi
σXσY
(1)
exactly as for the calculus definition for vectors in R2. The inner product hX, Y i=
E(XY ) = RX(ω)Y(ω) P(dω) defines kXk2=hX, Xiand orthogonality X⊥Y
by hX, Y i= 0. The mean µXand standard deviation σXof the random variable
Xequals the projection µX=h1, X iand the norm σX=kX−µXk.
The reader may feel that the Hilbert space approach is unnecessarily abstract.
It is, in fact, rather useful. The problem has now been reformulated into the
problem of making inference regarding an angle between two vectors based on a
random sample. The empirical correlation rfor a random sample of size nis then,
naturally, given by the cosine of the angle between the vectors (xi−x), (yi−y)
in Rn. This was the key geometrical idea when Fisher (1915) derived his explicit
formula for the probability density of the empirical correlation. The Hilbert space
approach is also very well described and motivated by Brockwell and Davis (1991)
in their text on time series where the correlation function of a process is the main
tool.
The best linear predictor ˆ
Yof Ygiven Xis the projection
ˆ
Y=µY+ρσY
X−µX
σX
(2)
of Yonto the subspace spanned by the orthonormal basis {1,(X−µX)/σX}.
Alternatively, equation (2) can be seen to correspond to the elementary definition
of the cosine from a triangle in a plane. The angle in the triangle is given by
the angle βbetween the vectors X−µXand Y−µYspanning a two dimensional
subspace. Equation (2) gives that the correlation can be interpreted as the slope of
the best predictor line for standardized variables. This is possibly the most direct
natural interpretation in applications. Furthermore, it can be generalized to give
a similar interpretation for partial correlation.
If Xand Yare jointly Gaussian, then ˆ
Y= E(Y|X), and the conditional law of
Ygiven X=xis Gaussian. This gives the link between equation (2) and ordinary
regression
yi=a+bxi+σvi(3)
where v1, . . . , vnis a random sample from the standard normal law. Comparison
of equation (3) with equation (2) gives the constant term a=µY−ρµXσY/σX,
the slope b=ρσY/σX, and the conditional variance σ2= (1 −ρ2)σ2
Y. The binor-
mal is hence parameterized alternatively by (µX, σX, a, b, σ2). A random sample
((x1, y1),...,(xn, yn)) of size nfrom the binormal can be generated by equation (3)
3
where xi=µX+σXuiand u1, . . . , unis a random sample from the standard normal
law. Combining gives
xi
yi=µX
µY+σX0
ρσYp1−ρ2σY·ui
vi(4)
The data generating equation (4), and multivariate generalizations beyond
Gaussian, is treated by Fraser (1968,1979). This involves group actions, maximal
invariants, and leads to optimal inference methods as demonstrated by Taraldsen
and Lindqvist (2013). A particular consequence of equation (4), proved by Fraser
(1964, p.853), is
√uρ
p1−ρ2−√vr
√1−r2=z(5)
where u∼χ2(ν), v ∼χ2(ν−1), z ∼N(0,1) are independent.
Equation (5) gives the law of ρwhen ris known. The degrees of freedom
ν=n−1 for sample size n. With known mean the ν=n, and ris the cosine
of the angle between the vectors (xi−µX), (yi−µY) in Rn. The following result
holds for any real ν > 1 as a consequence of equation (5).
Theorem 1. Let rbe the empirical correlation of a random sample of size nfrom
the binormal. The confidence density for the correlation ρis
π(ρ|r, ν) = ν(ν−1)Γ(ν−1)
√2πΓ(ν+1
2)(1−r2)ν−1
2·(1−ρ2)ν−2
2·(1−rρ)1−2ν
2F(3
2,−1
2;ν+1
2;1 + rρ
2)
where Fis the Gaussian hypergeometric function and ν=n−1>1.
Proof. The idea is that equation (5) gives the conditional densities, and hence the
marginal densities after integration over u, v. This integration is done by a change
of variables resulting in a gamma integral and the above density. The details are
as follows.
The conditional density of sgiven u, v is normal by equation (5) with (s|u, v)∼
N(pv
ut, 1/u). Using this, the law of u, v, and ds = (1 −ρ2)−3/2dρ give the joint
density of ρ, u, v as
(1 −ρ2)−3/2·uν
2
−1e−u
2
2ν
2Γ(ν
2)·vν−1
2
−1e−v
2
2ν−1
2Γ(ν−1
2)·ru
2πe−u
2(s−√v
ut)2(6)
The terms in the exponential are
−1
2"u
1−ρ2−2√uvρr
p(1 −ρ2)(1 −r2)+v
1−r2#=−ν(s2
1−2s1s2rρ +s2
2)
2(1 −r2)(7)
4
using new coordinates (s1, s2) defined by νs2
1=u(1 −r2)/(1 −ρ2) and νs2
2=v.
Let s1=√αexp(−β/2) and s2=√αexp(β/2). The density for ρ, α, β from
equation (6) is
21−ννν
√πΓ(ν
2)Γ(ν−1
2)(1 −r2)−ν+1
2(1 −ρ2)ν−2
2e−βαν−1e
−να(cosh(β)−ρr)
1−r2(8)
Integration over αgives π(ρ|r, ν) using the identity π(ν−2)! = √π2ν−2Γ(ν
2)Γ(ν−1
2)
and adjusting an integral representation of F(Olver et al.,2010, 14.3.9,14.12.4).
The Fisher (1921) z-transformation argument implies
1
2ln(1 + ρ
1−ρ)−1
2ln(1 + r
1−r)≈z/√ν−2 (9)
Replacing equation (5) with this gives the z-transform approximate confidence
density
˜π(ρ|r, ν) = rν−2
2π(1 −ρ2)−1e2−ν
8[ln( (1+ρ)(1−r)
(1−ρ)(1+r))]2
(10)
for ν > 2.
The very last formula in the classical book by Fisher (1973) gives
π(ρ|r, ν) = (1 −r2)ν−1
2·(1 −ρ2)ν−2
2
π(ν−2)! ∂ν−2
ρr θ−1
2sin 2θ
sin3θ(11)
where cos θ=−ρr and 0 < θ < π. This formula was derived by C. R. Rao.
3 Examples
Fisher (1930, p.534) considers the case with an observed correlation r= 99% from
a sample of size n= 4. Fisher, relying on calculations of Miss F. E. Allan, states
that the corresponding 5% ρis equal to about 76.5%. Using equation (5) on these
ρand rvalues confirms this.
Consider next the data presented in Figure 1. Theorem 1applied on these data
confirms the calculations giving the stated confidence intervals. More complete
information is given by the confidence densities shown in Figure 2. The empirical
correlation is r= 0.9849. The exact confidence density in Figure 2illustrates the
corresponding uncertainty corresponding to all possible confidence intervals with
all possible confidence levels.
5
Figure 4: The confidence density and the z-transform density.
Figure 3shows the cd4 counts for 20 HIV-positive subjects (Efron,1998, p.101).
The x-axis gives the baseline count and the y-axis gives the count after one year
of treatment with an experimental antiviral drug. The empirical correlation is
r= 0.7232, and the equitail z-transform 90% approximate confidence interval is
[47.41,86.51]%. Figure 4shows the closeness of the confidence density and the
z-transform density. The exact equitail 90% confidence interval from Theorem 1
is [46.54,85.74]%. It is shifted to the left as also can be inferred from Figure 4.
As a final example, consider certain pairs of measurements with r= 0.534
taken on n= 8 children at age 24 months in connection with a study at a univer-
sity hospital in Hong Kong. Figure 5shows again the closeness of the confidence
density and the z-transform density. Schweder and Hjort (2016, p.227, Figure
7.8) discusses this example in much more detail including different bootstrap ap-
proaches. Using the method of Fisher (1930) they arrive at the same plot of the
confidence density using the exact distribution for the empirical correlation. This
provides additional verification of the exact result in Theorem 1.
7
Figure 5: The confidence density and the z-transform density.
4 Conclusion
The z-transform has been historically convenient since confidence intervals can be
calculated directly from tables of the standard normal distribution. Today, there
seems to be little reason for using this in stead of the exact result in Theorem 1
since the hypergeometric function is implemented in standard numerical libraries.
References
Brockwell, P. J. and R. A. Davis (1991). Time Series: Theory and Methods.
Springer Series in Statistics. New York: Springer-Verlag.
DiCiccio, T. J. and B. Efron (1996). Bootstrap Confidence Intervals. Statistical
Science 11 (3), 189–212.
Efron, B. (1998). R. A. Fisher in the 21st century (Invited paper presented at the
1996 R. A. Fisher Lecture). Statistical Science 13 (2), 95–122.
Fisher, R. A. (1915). Frequency distribution of the values of the correlation coef-
ficent in samples from an indefinitely large population. Biometrika 10, 507–21.
8
Fisher, R. A. (1921). On the ’probable error’ of a coefficient of correlation deduced
from a small sample. Metron 1 (4), 1–32.
Fisher, R. A. (1930). Inverse probability. Proc. Camb. Phil. Soc. 26, 528–535.
Fisher, R. A. (1973). Statistical Methods and Scientific Inference. Hafner press.
Fraser, D. A. S. (1964). On the definition of fiducial probability. Bull. Int.
Statist.Inst 40, 842–856.
Fraser, D. A. S. (1968). The Structure of Inference. John Wiley.
Fraser, D. A. S. (1979). Inference and Linear Models. McGraw-Hill.
Hannig, J., H. Iyer, R. C. S. Lai, and T. C. M. Lee (2016). Generalized Fiducial
Inference: A Review and New Results. Journal of the American Statistical
Association 111 (515), 1346–1361.
Hotelling, H. (1953). New Light on the Correlation Coefficient and its Transforms.
Journal of the Royal Statistical Society. Series B (Methodological) 15 (2), 193–
232.
Olver, F. W. J., D. W. Lozier, R. F. Boisvert, and C. W. Clark (Eds.) (2010).
NIST Handbook of Mathematical Functions. Cambridge University Press.
Rodgers, J. L. and W. A. Nicewander (1988). Thirteen Ways to Look at the
Correlation Coefficient. The American Statistician 42 (1), 59–66.
Rovine, M. J. and A. von Eye (1997). A 14th Way to Look at a Correlation
Coefficient: Correlation as the Proportion of Matches. The American Statisti-
cian 51 (1), 42–46.
Schweder, T. and N. L. Hjort (2016). Confidence, Likelihood, Probability: Statis-
tical Inference with Confidence Distributions. Cambridge University Press.
Taraldsen, G. and B. H. Lindqvist (2013). Fiducial theory and optimal inference.
Annals of Statistics 41 (1), 323–341.
9