Technical ReportPDF Available

The Relationships between Variances of Normal Distribution and Sample Median: Sample size from 200 to 1000

Authors:
  • Ji-Tong CO. LTD

Abstract

Find out the function that k(n) is regressed on n, where n is from 200 to 1000.
The Relationships between Variances of Normal Distribution
and Sample Median: Sample size from 200 to 1000
Mei-Yu Lee
CEO of Ji-Tong Co., LTD., Taipei, Taiwan
Graduate Institute of Management, Minghsin University of Science and Technology, Hsinchu, Taiwan
1 Introduction
Statistics is mainly a subject of statistical inference for parameters of Normal
distribution, and seldom mentions the sampling distribution of median statistics. The
sample median is often mentioned in economic or financial indicators. For example,
the median of individual wages in national statistics.
The discussion on the sampling distribution of the population median's statistic can be
from Hojo and Pearson (1931), who established the distribution of median and
quantile from random samples of Normal distribution. Nair (1940) established a
confidence interval table for the median. Bassett (2006) tried to find the stable
distribution of the median. Bassett (2019) reviewed and described the limiting
distribution that median stable distribution approaches to the normal distribution,
based on Schroder(1970) who defined the functional equation of Median stable
distribution. Even if the researches in literature, the topic for the median's distribution
could be viewed as a virgin land almost. Because the median variance and distribution
have not been discovered yet, based on the normal distribution, this paper intends to
generate the median variance and examine the association between the median
variance and the variance of the Normal distribution's parameters. This research can
assist users in finding the median variance directly from the variance of the normal
distribution.
I found that the relationship between the median variance and the variance of Normal
distribution forms a ratio which is affected by the sample size. Besides, the calculation
of the median is different depending on the sample size is odd or even. The discussion
on the association also needs to be divided into odd and even cases.
2 Benchmarked Model Setting
Let
 
,,Normal~X 2
where μ is the mean and σ2 is the variance, and cdf is
1
   
,,,x,
x
expxf X0
2
2
1
2
2

where –∞ < x < ∞, –∞ < μ < μ and σ >0. Let Z = (X – μ) / σ ~ Normal(0, 1), whose
p.d.f. is
 
,x,
x
expxf X
2
2
12
If there are
 
,,Normal~Z,...,Z,Z
iid
n10
21
then for each Z, its pdf and df are as Table
1.
Table 1.
Z pdf and df Coefficient
Mathematical Mean: 0.00044
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00015
S.D. : 1.00007
Skewed Coef. : 0.00035
Kurtosis Coef. : 3.00160
MAD : 0.79790
Range : 10.82439
Mid_range : -0.09182
Median : 0.00038
Q1 : -0.67428
Q2 : 0.00038
Q3 : 0.67457
IQR : 1.34885
C.V. : none
Denote W3 = Median (
n
Z,...,Z,Z 21
), where
n
Z,...,Z,Z 21
are random variables.
W3 will be formed by n random samples. Let Var(W3) = k(n), where k(n) is a
function of n.. If the population is the normal distribution with the mean, μ, and the
2
variance, σ2, the expected value of a sample median from n random samples is μ, and
the variance of a sample median is σ2 k(n). The main question is how to obtain the
k(n) as n changes.
2.1 Simulation process
The simulation process follows the mathmetical analysis, numerical analysis as
following.
Step 1. Simulate the Z distribution and obtain
n
Z,...,Z,Z 21
.
The first step is to simulate Z distribution from a normal distribution with the
expected valaue, μ, and the variance, σ2. Then we have n random variables i.i.d.
from Z distribution, which implies n random samples.
Step 2. Run 10 billion times and sort the n random samples per time. Then
obtain 10 billion values of the median and variance of the median.
Since n random samples per time can calculate one value of the median and the
varirance of the median, we can run 10 billion times to generate 10 billion values of
the median and its variances.
Step 3. Calculate W3 and obtain 10 billion values of W3.
Step 4. Repeat Step 1 to Step 4 for each value of n, n is from 200 to 1000.
The values of n can be changed from 2 to 1000. This action helps us find the
relationship of W3 and n.
Step 5. Calculate the the W3 function of n.
The regression estimation is based on nonlinear regression to estimate 37 nonlinear
model. We choose the the best nonlinear model with the highest value of R2.
3 Result
3.1 The non-linear model analysis of all data
Let X1 = the sample size = n, Y1 = Var((sample median-mu)/sigma) = k(n). All
sample size case has that the estimated function is
3
k(n) = 0.0000117288 + 1.5636672124 / n.
Table 2. ANOVA result of all data case
Y=0.0000117288+1.5636672124*1/X, X=the sample size, Y= Var((sample mean-mu)/sigma),
H0:slope=0
ANOVA
Source df SS MS F
Regression 1 0.0001206036 0.0001206036 3082895.0849418547
Error 39 0.0000000015 0.0000000000
Total 40 0.0001206051
H0:slope=0, test statistic=3082895.084942 , p value=0.000000
R2=0.999987, R2(adj)=0.999987,MSE=0.000000,
Table 2 is the ANOVA result. R2 is 99.9987%, adjusted R2 is 99.9987%. However,
sample median should be devided into the cases of odd and even numbers.
3.2 n is even
When we choose the even number case and run the regression, then obtain
k(n) =0.0000148965+1.5599936862/n.
Table 3 is the ANOVA result of even number case. R2 is 99.9997% and adjusted R2 is
99.9996%. The fitness is better than that considering all data case.
Table 3. ANOVA result of even number case
Y=0.0000148965+1.5599936862*1/X, X=the sample size, Y= Var((sample mean-mu)/sigma),
H0:slope=0
ANOVA
Source df SS MS F
Regression 1 0.0000616260 0.0000616260 5451166.0203510961
Error 19 0.0000000002 0.0000000000
Total 20 0.0000616262
H0:slope=0, test statistic=5451166.020351 , p value=0.000000
R2=0.999997, R2(adj)=0.999996,MSE=0.000000
4
3.3 n is odd
When we choose the even number case and run the regression, then obtain
k(n) = 0.0000084608 + 1.5674001064 / n.
Table 4 is the ANOVA result of even number case. R2 is 99.9999% and adjusted R2 is
99.9999%. The fitness is better than that considering all data case.
Table 4. ANOVA result of odd number case
Y=0.0000084608+1.5674001064*1/X, X=the sample size, Y= Var((sample mean-mu)/sigma),
H0:slope=0
ANOVA
Source df SS MS F
Regression 1 0.0000589142 0.0000589142 15562232.8413353510
Error 18 0.0000000001 0.0000000000
Total 19 0.0000589143
H0:slope=0, test statistic=15562232.841335 , p value=0.000000
R2=0.999999, R2(adj)=0.999999,MSE=0.000000
Hojo T., Pearson, K. (1931). Distribution of the Median, Quartiles and Interquartile
Distance in Samples From a Normal Population. Biometrika, 23(3/4), 315–363.
NAIR, K.R. (1940). Table of confidence interval for the median in samples from any
continuous population. Sankhyā: The Indian Journal of Statistics, 4(4), 551-558.
Bassett, G. (2016). Median Stable Distributions, book chapter in: Liu, Regina Y., and
Joseph W. McKean, eds.Robust Rank-Based and Nonparametric Methods, Vol.
168. Springer, 249-260.
Bassett, G.W. (2019). Review of Median Stable Distributions and Shröder's Equation.
Journal of Econometrics, 213(2019), 289-295.
Shröder, E. (1870). Ueber iterirte Functionen, Mathematische Annalen, 3, 296-322.
5
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.