Content uploaded by Mei-Yu Lee

Author content

All content in this area was uploaded by Mei-Yu Lee on Sep 01, 2020

Content may be subject to copyright.

The Relationships between Variances of Normal Distribution

and Sample Median: Sample size from 200 to 1000

Mei-Yu Lee

CEO of Ji-Tong Co., LTD., Taipei, Taiwan

Graduate Institute of Management, Minghsin University of Science and Technology, Hsinchu, Taiwan

1 Introduction

Statistics is mainly a subject of statistical inference for parameters of Normal

distribution, and seldom mentions the sampling distribution of median statistics. The

sample median is often mentioned in economic or financial indicators. For example,

the median of individual wages in national statistics.

The discussion on the sampling distribution of the population median's statistic can be

from Hojo and Pearson (1931), who established the distribution of median and

quantile from random samples of Normal distribution. Nair (1940) established a

confidence interval table for the median. Bassett (2006) tried to find the stable

distribution of the median. Bassett (2019) reviewed and described the limiting

distribution that median stable distribution approaches to the normal distribution,

based on Schroder(1970) who defined the functional equation of Median stable

distribution. Even if the researches in literature, the topic for the median's distribution

could be viewed as a virgin land almost. Because the median variance and distribution

have not been discovered yet, based on the normal distribution, this paper intends to

generate the median variance and examine the association between the median

variance and the variance of the Normal distribution's parameters. This research can

assist users in finding the median variance directly from the variance of the normal

distribution.

I found that the relationship between the median variance and the variance of Normal

distribution forms a ratio which is affected by the sample size. Besides, the calculation

of the median is different depending on the sample size is odd or even. The discussion

on the association also needs to be divided into odd and even cases.

2 Benchmarked Model Setting

Let

,,Normal~X 2

where μ is the mean and σ2 is the variance, and cdf is

1

,,,x,

x

expxf X0

2

2

1

2

2

where –∞ < x < ∞, –∞ < μ < μ and σ >0. Let Z = (X – μ) / σ ~ Normal(0, 1), whose

p.d.f. is

,x,

x

expxf X

2

2

12

If there are

,,Normal~Z,...,Z,Z

iid

n10

21

then for each Z, its pdf and df are as Table

1.

Table 1.

Z pdf and df Coefficient

Mathematical Mean: 0.00044

Geometrical Mean : none

Harmonic Mean : none

Variance : 1.00015

S.D. : 1.00007

Skewed Coef. : 0.00035

Kurtosis Coef. : 3.00160

MAD : 0.79790

Range : 10.82439

Mid_range : -0.09182

Median : 0.00038

Q1 : -0.67428

Q2 : 0.00038

Q3 : 0.67457

IQR : 1.34885

C.V. : none

Denote W3 = Median (

n

Z,...,Z,Z 21

), where

n

Z,...,Z,Z 21

are random variables.

W3 will be formed by n random samples. Let Var(W3) = k(n), where k(n) is a

function of n.. If the population is the normal distribution with the mean, μ, and the

2

variance, σ2, the expected value of a sample median from n random samples is μ, and

the variance of a sample median is σ2 k(n). The main question is how to obtain the

k(n) as n changes.

2.1 Simulation process

The simulation process follows the mathmetical analysis, numerical analysis as

following.

Step 1. Simulate the Z distribution and obtain

n

Z,...,Z,Z 21

.

The first step is to simulate Z distribution from a normal distribution with the

expected valaue, μ, and the variance, σ2. Then we have n random variables i.i.d.

from Z distribution, which implies n random samples.

Step 2. Run 10 billion times and sort the n random samples per time. Then

obtain 10 billion values of the median and variance of the median.

Since n random samples per time can calculate one value of the median and the

varirance of the median, we can run 10 billion times to generate 10 billion values of

the median and its variances.

Step 3. Calculate W3 and obtain 10 billion values of W3.

Step 4. Repeat Step 1 to Step 4 for each value of n, n is from 200 to 1000.

The values of n can be changed from 2 to 1000. This action helps us find the

relationship of W3 and n.

Step 5. Calculate the the W3 function of n.

The regression estimation is based on nonlinear regression to estimate 37 nonlinear

model. We choose the the best nonlinear model with the highest value of R2.

3 Result

3.1 The non-linear model analysis of all data

Let X1 = the sample size = n, Y1 = Var((sample median-mu)/sigma) = k(n). All

sample size case has that the estimated function is

3

k(n) = 0.0000117288 + 1.5636672124 / n.

Table 2. ANOVA result of all data case

Y=0.0000117288+1.5636672124*1/X, X=the sample size, Y= Var((sample mean-mu)/sigma),

H0:slope=0

ANOVA

Source df SS MS F

Regression 1 0.0001206036 0.0001206036 3082895.0849418547

Error 39 0.0000000015 0.0000000000

Total 40 0.0001206051

H0:slope=0, test statistic=3082895.084942 , p value=0.000000

R2=0.999987, R2(adj)=0.999987,MSE=0.000000,

Table 2 is the ANOVA result. R2 is 99.9987%, adjusted R2 is 99.9987%. However,

sample median should be devided into the cases of odd and even numbers.

3.2 n is even

When we choose the even number case and run the regression, then obtain

k(n) =0.0000148965+1.5599936862/n.

Table 3 is the ANOVA result of even number case. R2 is 99.9997% and adjusted R2 is

99.9996%. The fitness is better than that considering all data case.

Table 3. ANOVA result of even number case

Y=0.0000148965+1.5599936862*1/X, X=the sample size, Y= Var((sample mean-mu)/sigma),

H0:slope=0

ANOVA

Source df SS MS F

Regression 1 0.0000616260 0.0000616260 5451166.0203510961

Error 19 0.0000000002 0.0000000000

Total 20 0.0000616262

H0:slope=0, test statistic=5451166.020351 , p value=0.000000

R2=0.999997, R2(adj)=0.999996,MSE=0.000000

4

3.3 n is odd

When we choose the even number case and run the regression, then obtain

k(n) = 0.0000084608 + 1.5674001064 / n.

Table 4 is the ANOVA result of even number case. R2 is 99.9999% and adjusted R2 is

99.9999%. The fitness is better than that considering all data case.

Table 4. ANOVA result of odd number case

Y=0.0000084608+1.5674001064*1/X, X=the sample size, Y= Var((sample mean-mu)/sigma),

H0:slope=0

ANOVA

Source df SS MS F

Regression 1 0.0000589142 0.0000589142 15562232.8413353510

Error 18 0.0000000001 0.0000000000

Total 19 0.0000589143

H0:slope=0, test statistic=15562232.841335 , p value=0.000000

R2=0.999999, R2(adj)=0.999999,MSE=0.000000

Hojo T., Pearson, K. (1931). Distribution of the Median, Quartiles and Interquartile

Distance in Samples From a Normal Population. Biometrika, 23(3/4), 315–363.

NAIR, K.R. (1940). Table of confidence interval for the median in samples from any

continuous population. Sankhyā: The Indian Journal of Statistics, 4(4), 551-558.

Bassett, G. (2016). Median Stable Distributions, book chapter in: Liu, Regina Y., and

Joseph W. McKean, eds.Robust Rank-Based and Nonparametric Methods, Vol.

168. Springer, 249-260.

Bassett, G.W. (2019). Review of Median Stable Distributions and Shröder's Equation.

Journal of Econometrics, 213(2019), 289-295.

Shröder, E. (1870). Ueber iterirte Functionen, Mathematische Annalen, 3, 296-322.

5