Content uploaded by Mohd. Saleh Ahmed

Author content

All content in this area was uploaded by Mohd. Saleh Ahmed on Apr 26, 2014

Content may be subject to copyright.

Journal of Applied Statistical Science ISSN 1067-5817

Volume 12, Number 3, pp. 169-177 @2003 Nova Science Publishers, Inc.

________________________________________________

1Department of Statistics, Yarmouk University, Irbid, Jordan

QUANTILE INTERVAL ESTIMATION OF FINITE

POPULATION USING MULTIVARIATE AUXILIARY

INFORMATION

M S Ahmed, Walid Abu-Dayyeh1 and Hani Samawi

Department of Mathematics & Statistics

Sultan Qaboos University, Muscat, Oman

(e-mail: m_s_ahmed@yahoo.com)

SUMMARY

This paper derives a class of method for estimating confidence intervals for finite population

quantiles using multivariate auxiliary information. This method provides better performance over the

method of Rueda and Arcos (1998). A simulation study is carried out for a real data set to make a

relative comparison.

Key words and Phrases: Finite population estimation, Multivariate ratio estimation, distribution function,

quantiles, confidence interval, multivariate auxiliary variables

AMS 1991 Subject Classifications: 62D05

1. INTRODUCTION

Some estimators for estimating the finite population quantiles by reversing improved estimates of

the distribution functions using a single auxiliary variable had been suggested by Chambers and

Dunstan (1986) and Rao et al. (1990). Using multivariate auxiliary information, Rueada and Arcos

(1998) proposed a method for estimating the interval of finite population quantiles. The motivation of

these types of estimator had been explained in detailed by Mak and Kuk (1993). In this paper, we

propose a generalized class to find confidence intervals for quantiles.

Suppose U =(1,2, ...N) is a finite population of size N and any random sample, s, of size n, is

drawn according to a sampling plan, (s), with inclusion probabilities

i.

The population cumulative distribution function of a random variable Y is

Ui iY yyNYF )()( 1

(1.1)

where

1)( i

yy

, when

yyi

and

0)( i

yy

otherwise.

M S Ahmed, Walid Abu-Dayyeh and Hani Samawi

The

th quantile, denoted by Y

for any 0<

<1, is defined as

Y

= inf

})(:{

yFy Y

(1.2)

Let Y and X be respectively the survey and auxiliary variables, where X is known for the population.

Let the design based empirical cumulative distribution function

si i

si iiY yyyF 11 /)()(

ˆ

(1.3)

where

1)( i

yy

, when

yyi

and

0)( i

yy

otherwise. The inverse of

Y

F

ˆ

, or the empirical

quantitle function

1

ˆ

Y

F

, is then defined as

)(

ˆ1

Y

F

= inf

})(

ˆ

:{

yFy Y

= y

(1.4)

It is noted

)(

ˆ

yFY

and

)}(

ˆ

{

ˆ1

yFFY YY

so that estimating

Y

is equivalent for estimating the

unobservable random variable

)(

ˆ

YFY

or the propotion of sample

Yyi

The design based empirical cumulative distribution function for ratio type estimator is defined as

)(

ˆ

)(

ˆ

)(

)(

ˆ

)(

ˆ

)(

ˆxF

yF

xF

xF

yF

yF

X

Y

X

X

Y

R

(1.5)

2. THE CLASS OF ESTIMATORS AND ITS PROPERTIES

Suppose Y and Xj (j=1,2, ..., k) are the survey variable and k-auxiliary variables respectively. The

purpose is to find a confidence interval of the quantile Y for any given

by using k-auxiliary

variables. We observe (xi, yi) for is and assume that the finite population distribution function of Xj’s

(j=1,2, ..., k) are known.

For large n,

)(

ˆ

yFY

is approximately normally distributed, then

1})(

ˆ

{21 cyFcP Y

(2.1)

where

))(

ˆ

(

2

1

yFVzc Y

,

))(

ˆ

(

2

2

yFVzc Y

.

Hence, a 100(1-)% confidence interval for

Y

is given by

[

)(

ˆ1

1cFY

and

)(

ˆ2

1cFY

] (2.2)

If a single auxiliary variable X is correlated with Y, we propose the following class of estimators

)](

ˆ

[)(

ˆ

)](

ˆ

)([)(

ˆ

)(

ˆ

xFhyFxFxFhyFyF XYXXYc

(2.3)

Quantile Interval Estimation of Finite Population Using Multivatiate….

where h is a suitably choosen statistic.

If h=

)(

ˆ

)(

ˆ

yF

yF

X

Y

,

)(

)(

ˆ

)(

ˆ

)(

ˆ

xF

xF

yF

yF X

X

Y

c

is a ratio type estimator, h=

)(

)(

ˆ

yF

yF

X

Y

is

)(

ˆ

)(

)(

ˆ

)(

ˆ

xF

xF

yF

yF X

X

Y

c

is a product type estimator and h=

)](

ˆ

[

)](

ˆ

),(

ˆ

[

xFV

xFyFCov

b

X

XY

,

)](

ˆ

[)(

ˆ

)](

ˆ

)([)(

ˆ

)(

ˆ

xFbyFxFxFbyFyF XYXXYc

is a regression type estimator.

Now ,

1})(

ˆ

{21 dyFdP c

(2.4)

where

))(

ˆ

(

2

1

yFVzd c

and

))(

ˆ

(

2

2

yFVzd c

,

For large n, we assume that

)(hEH

or

)(hEH

then

)](

ˆ

[)](

ˆ

[2)](

ˆ

[)](

ˆ

[)](

ˆ

[2

xFVyFHCovxFVHyFVyFV XYXYc

The optimum choice of H=b and the minimum variance is

)](

ˆ

[)](

ˆ

[)](

ˆ

[2

xFVbyFVyFV XYtm

Hence

1})])(

ˆ

[)(

ˆ

({ 21 dxFhyFdP XY

, then a 100(1-)% confidence interval for

Y

is

given by

[

)])(

ˆ

[(

ˆ1

1

xFhdF XY

,

)])(

ˆ

[(

ˆ2

1

xFhdF XY

] (2.5)

For k-auxiliary variables Xj (j=1,2, ..., k), we propose the class of estimator for

)(

ˆ

yFY

tzyFxFtyFyF Y

k

jXjYml j

)(

ˆ

)](

ˆ

[)(

ˆ

)(

ˆ

1

(2.6)

where

))(

ˆ

(},{

xFzzz j

X

jj

and

)...(21

k

tttt

Now, for large n, we assume

)(tET

or

)(tET

, the approximate variance of

)(

ˆ

yFml

is

)](

ˆ

[

yFV ml

MTTmTv

2

0

(2.7)

M S Ahmed, Walid Abu-Dayyeh and Hani Samawi

where

)](

ˆ

[

0

yFVv Y

,

jj

MM

is the dispersion matrix with

)](

ˆ

),(

ˆ

[

xFxFCovM jj XXjj

and

j

mm

with

)](

ˆ

),(

ˆ

[

xFyFCovm j

XYj

for

kjj ,..,2,1,

.

Then

1})])(

ˆ

[)(

ˆ

({ 2

1

1exFtyFeP k

iXiY i

and a 100(1-)% confidence interval for

Y

is

[

))](

ˆ

[(

ˆ

1

1

1

k

iXiY xFteF i

,

))](

ˆ

[(

ˆ

1

2

1

k

iXiY xFteF i

] (2.8)

where

))(

ˆ

(

2

1

yFVze ml

,

))(

ˆ

(

2

2

yFVzd ml

and

The optimum choice of T is

mMTopt 1

and the minimum variance is

mMmvyFV mlm1

0

)(

ˆ

(

(2.9)

Rueda and Acros (1998) suggested a estimation procedure as

R

k

jRj

k

jjXjXYjmR FyFxFxFyFyF jjj

ˆ

)(

ˆ

)())(

ˆ

/)(

ˆ

()(

ˆ

11

(2.10)

where

)

ˆ

...,,

ˆ

,

ˆ

(

ˆ

21

k

RRRR FFFF

and

),...,,( 21

k

, such that

1

e

,

)1 ..., ,1 ,1,1(

e

. Now, the

variance of

)(

ˆ

yFmR

is given by

)(

ˆyFV mR

, (2.11)

where

jj

is the dispersion matrix with

)

ˆ

,

ˆ

(jj RRjj FFCov

for

kjj ,..,2,1,

.

The optimum choices of

is

)/( 11 eee

opt

and the minimum variance is

11

min )()(

ˆ

eeyFV mR

. (2.12)

It is noted that the variance function of Rueda and Acros (1998) is not correct.

Under the normal approximation,

1))(

ˆ

{21 cyFcP mR

,

where

))(

ˆ

(

2

1

yFVzc mR

and

))(

ˆ

(

2

2

yFVzc mR

.

Rewrite,

Quantile Interval Estimation of Finite Population Using Multivatiate….

)(

ˆ

/)(

ˆ

)(

ˆ

)(

ˆ

11

jX

k

jjY

k

jRjmR xFyFyFyF jj

and

1)(

ˆ

/)(

ˆ

)(

ˆ

/

1

1

2

1

1

1

k

jjXjY

k

jjXj xFcyFxFcP jj

.

Hence an approximate 100(1-)% confidence interval for

Y

is given by

1

1

2

1

1

1

1

1)(

ˆ

/

ˆ

,)(

ˆ

/

ˆk

jjXj

k

jjXj xFcFxFcF jj

. (2.13)

It is mentioned that Rueda and Acros (1998) is particular case of (2.6).

Another method is given here as

k

jYRjmw yFyFyF j

10)(

ˆ

)(

ˆ

)(

ˆ

, such that

k

jj

0

1

. (2.14)

Rewrite,

uyFyFyFyFyF Y

k

jRYjYmw j

)(

ˆ

))(

ˆ

)(

ˆ

()(

ˆ

)(

ˆ

1

, (2.15)

where

jRY uyFyF j )(

ˆ

)(

ˆ

,

),...,,( 21

k

uuuu

and

)..., ,,( 21

k

.

Now, the variance of

)(

ˆ

yFmw

is

V(

)(

ˆ

yFmw

) =

Mmv

2

0

, (2.16)

which is same as (2.7) for

T

and with minimum variance(2.9)

Under normal approximation,

1))(

ˆ

{21 dyFdP mw

,

where

))(

ˆ

(

2

1

yFVzd mw

and

))(

ˆ

(

2

2

yFVzd mw

.

Then

M S Ahmed, Walid Abu-Dayyeh and Hani Samawi

1)(

ˆ

)(

ˆ

)(

ˆ

1

102

1

101

k

jjXjY

k

jjXj xFdyFxFdP jj

.

Hence an approximate 100(1-)% confidence interval for

y

is

1

2

10

1

1

1

10

1/)(

ˆˆ

,/)(

ˆˆ dxFFdxFF k

jjXjY

k

jjXjY jj

(2.17)

which is the desired improved interval estimator. Tankou and Dharmadhikari (1989) proved that (2.7)

is always greater than (2.12). So, the interval of (2.13) will always be shorter than (2.8).

3. RESULTS WITH TWO AUXILIARY VARIABLES

Suppose a simple random sample (without replacement) of size n is drawn from a finite population

of size N, then the unbiased estimate of

)(

vFV

is

nvvvF si iV /)()(

ˆ

.

Then

))(

ˆ

(vFE V

and

)1()())(

ˆ

(11

NnvFV V

for v = y and xj j=1,2, …, k

Suppose

jj

P

is the proportion of population units with

jj xx

and

jj xx

, then

jj

P

if

jj

. Further

))(())(

ˆ

),(

ˆ

(),( 211

jjjxjxjjjj PNnxFxFCovttCovM jj

,

))(()(

ˆ

),(

ˆ

(),( 211

0

yjYjxjyj PNnyFxFCovttCovm j

))(

ˆ

),(

ˆ

())(

ˆ

)(

ˆ

(),(

ˆ

())(

ˆ

(

)

ˆ

,

ˆ

(

jXjXjXjXYY

RRjj

xFxFCovxFxFyFCovyFVar

FFCov

jjjj

jj

))(()

ˆ

,

ˆ

(11 jjjyyjRRjj PPPNnFFCov jj

))((2)

ˆ

(11 yjRjj PNnFV j

,

)1(/)())(

ˆ

),(

ˆ

(2

jjjxjxjj PxFxFCorr jj

Quantile Interval Estimation of Finite Population Using Multivatiate….

)1(/)())(

ˆ

),(

ˆ

(2

yjjxYyj PxFyFCorr j

jjjj NnM

)1()( 11

,

yjyj Nnm

)1()( 11

)1)(1()( 11 jjjyyjjj Nn

)1)((2)

ˆ

(11 yjRjj NnFV j

.

Now, detailed results are given for two auxiliary variables. For k=2, we have

)1(21

1)1(2

)1()(

21221

12211

11

yyy

yyy

Nn

)/( 11 eee

opt

=

1221

1221

12

02

01 1

1

)1(2

1

yy

yy

opt

)1(2

)1()1)(1(4

)1()()(

ˆ

12

2

122121

11

min

yyyy

mR NnyFV

1212

1221

2

12

1

1

1

yy

yy

mM

,

)1()( 112

0 Nn

t

and

)2()1(/ 1221

22

21

12

12

21

0

yyyyt

mMm

Vmin(

)(

ˆ

yFmw

) =

)]2()1(1)[1()( 1221

22

21

12

12

11

yyyy

Nn

.

The estimates of these may be obtained by replacing

yj

by its estimate

)()1(

ˆ211

jjyj p

,

where

yj

p

is the proportion of sample units such that

yy

and

jj xx

in the sample. In the

following section we have given a simulation study with a real data to compare the above results.

M S Ahmed, Walid Abu-Dayyeh and Hani Samawi

4. SIMULATION AND CONCLUSIONS

The relative comparisons among these methods are given by using a real data set. The data for the

illustration has been taken from Koil thasil, District Handbook of Aligarh, India, 1990, of 340 villages.

y is the number of cultivators,

1

x

is area, and

2

x

is the number of households in a village. A simulation

study of 10,000 repeated samples without replacement of sizes 35, 50 and 70 (10%, 15% and 20%)

respectively, for the quartiles (=0.25, 0.50 and 0.75) have been conducted to compare these two

methods. We compute the expected length, coverage, variance and percentage relative gain in

efficiency (PRGE) for the proposed optimum method (2.8), say, ML method over the Rueda and Acros

(1998) method (2.13), say, RA and another method (2.17), say MW method at 95% confidence level.

Suppose d is the length( d=U-L, where U and L are upper and lower limits respectively), Em(d), Covm,

and Vm(d) are expected length, proportion of coverage and the variance of the length for m=10,000

repeated samples, and the percentage relative gain in efficiency,

PRGE=

100

)(

)()(

MLm

MLmhm

dV

dVdV

, h = RA and MW

Note that Rueda and Acros (1998) gave the numerical illustrations for the median using 500 repeated

samples. Their simulation results and conclusions may not be true because they use incorrect minimum

variance and estimated optimum

(see, Rueda and Acros (1998), page 211, line 5 and last).

Now, Table 1, Table 2 and Table 3 show the simulation results for the quantiles = 0.25, 0.50 and 0.75

respectively for m=10,000.

Table 1: Expected length, variance, proportion coverage,

and percentage relative gain in efficiency,

=0.25.

n

Method

Expected

Variance

Coverage

PRGE

22.116

44.648

0.861

11.123

70

21.049

40.321

0.859

0.353

17.888

40.179

0.851

26.871

82.195

0.842

15.709

50

25.407

73.092

0.836

2.895

20.908

71.036

0.832

33.884

154.557

0.850

22.005

35

31.762

129.291

0.844

2.060

27.526

126.681

0.843

Quantile Interval Estimation of Finite Population Using Multivatiate….

Table 2: Expected length, variance, proportion coverage,

and percentage relative gain in efficiency,

=0.50.

n

Method

Expected

Variance

Coverage

PRGE

29.259

55.236

0.879

14.083

75

27.819

49.334

0.868

1.894

24.853

48.417

0.859

39.515

153.831

0.843

16.977

50

37.350

136.304

0.840

3.649

31.711

131.506

0.839

44.798

263.259

0.826

24.515

35

42.085

228.670

0.811

8.156

34.757

211.426

0.809

Table 3: Expected length, variance, proportion coverage, and

percentage relative gain in efficiency,

=0.75.

n

Method

Expected

Variance

Coverage

PRGE

50.130

438.939

0.832

42.383

70

47.296

391.443

0.822

26.976

38.558

308.279

0.813

63.307

946.278

0.799

49.877

50

58.306

800.568

0.785

26.799

45.621

631.368

0.778

74.918

2135.947

0.702

100.387

35

68.247

1829.200

0.685

71.609

39.269

1065.909

0.684

From the above tables, one may conclude that ML method has shortest expected length with

smallest variance. The expected lengths of all methods decrease with increasing the sample sizes. The

differences among the proportion coverage of all methods are negligible for all cases. The proportion

of coverages are improved by increasing the sample sizes in all cases. The PRGE of ML method over

RA and MW are found efficient all situations.

REFERENCES

Chambers, R L and Dunstan, R (1986) Estimating distribution functions from survey data. Biometrika

73(3):597-604

Mak, Tak K and Kuk, Anthony (1993) A new method for estimating finite population quantiles using

auxiliary information. The Canadian Journal of Statistics 21(1):29-38

Rao, J N K, Kover, J G, and Mantel, H J (1990) On estimating distribution functions and quantiles

from survey data using auxiliary information. Biometrika 77(2):365-375

Rueada Gracia, M and Arcos Cebrian, A (1998) Quantile interval estimation in finite population using

a multivariate ratio estimator. Metrika 47:203-213

Tankou, V and Dharmadhikari, S (1989) Improvement of ratio-type estimators. Biommetrical Journal

31(7):795-802