Content uploaded by Ramy Aldallal
Author content
All content in this area was uploaded by Ramy Aldallal on Apr 30, 2022
Content may be subject to copyright.
Statistics & Operations Research Transactions
SORT 38 (2) July-December 2014, 251-270
Statistics &
Operations Research
Transactions
c
Institut d’Estad´
ıstica de Catalunya
sort@idescat.cat
ISSN: 1696-2281
eISSN: 2013-8830
www.idescat.cat/sort/
Exact prediction intervals for future current
records and record range from any
continuous distribution
H. M. Barakat1, E. M. Nigm1and R. A. Aldallal2
Abstract
In this paper, a general method for predicting future lower and upper current records and record
range from any arbitrary continuous distribution is proposed. Two pivotal statistics with the same
explicit distribution for lower and upper current records are developed to construct prediction
intervals for future current records. In addition, prediction intervals for future observations of the
record range are constructed. A simulation study is applied on normal and Weibull distributions to
investigate the efficiency of the suggested method. Finally, an example for real lifetime data with
unknown distribution is analysed.
MSC: 62G30, 62G32, 62M20, 62F25.
Keywords: Current record values, record range, pivotal quantity, prediction interval, coverage
probability.
1. Introduction
Let {Xi;i≥1}be a sequence of iid continuous random variables each distributed
according to cumulative distribution function (cdf) FX(x) = P(X≤x)and probability
density function (pdf) fX(x).An observation Xjwill be called an upper record value if
its value exceeds that of all previous observations. Thus, Xjis an upper record if Xj>Xi
for every i<j. An analogous definition, with the inequality being reversed, deals with
lower record values. The times at which the records occur are called record times.
1Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt
2Department of Mathematics and Statistics, MSA University, Cairo, Egypt
Received: September 2013
Accepted: July 2014
252 Exact prediction intervals for future current records and record range...
There are some situations wherein upper and lower records are observed together,
such as the case of weather data. In these cases, It is quite conceivable to consider lower
and upper records jointly, when a new record of either kind (upper or lower) occurs, and
these records are called current records. In this paper, we denote them by Uc
nand Lc
n,
respectively, and call the nth upper current record and the nth lower current record of the
sequence {Xn}when the nth record of any kind (either an upper or lower) is observed. It
can be noticed that Uc
n+1=Uc
nif Lc
n+1<Lc
nand that Lc
n+1=Lc
nif Uc
n+1>Uc
n.That is, the
upper current record value is the largest observation seen to date at the time when the nth
record (of either kind) is observed. According to the definition, Lc
0=Uc
0=X1.For n≥1,
the interval (Lc
n,Uc
n) is then referred to as the record coverage. The record range is then
defined by Rc
n=Uc
n−Lc
n.The record range may also be defined as the nth record range in
the sequence of the usual sample range Rn=max(X1,X2,...,Xn)−min(X1,X2,...,Xn),
where by definition Rc
0=0 and Rc
1=R2.Notice that a new record range is attained
once a new upper or lower record is observed (see, Basak, 2000). Both current record
values and record range can be detected in several real-life situations. For example, the
consistency of the production process is required to meet a product’s specifications. If
the record range is large, then it is likely that large number of products will lie outside
the specifications of the product. Predictions of future upper and lower current records,
as well as record range, are of natural interest in this context. Prediction of future events
is a problem of great interest and plays an important role in many applications, such
as meteorology, hydrology, industrial stress testing and athletic events. Several authors
have considered prediction problems involving record values. For example, Ahmadi
and Balakrishnan (2004) derived distribution-free confidence intervals to estimate the
fixed quantiles of an arbitrary unknown distribution, based on current records of an iid
sequence from that distribution. Raqab and Balakrishnan (2008) obtained distribution-
free prediction intervals for records from the Y-sequence based on record values from
the X-sequence of iid random variables from the same distribution. Raqab (2009)
obtained prediction intervals for the current records from a future iid sequence based
on observed current records from an independent iid sequence of the same distribution.
Ahmadi and Balakrishnan (2011) discussed the prediction of future order statistics based
on the current record values. In this paper, we consider two pivotal quantities for the
lower and upper current records based on an arbitrary cdf FXwith the same explicit
distribution-free (not depending on the cdf FX). By using these pivotal quantities,
prediction intervals of future observations of lower-upper current records and record
range are explicitly derived. Moreover, simulation study is applied on normal and
Weibull distributions to investigate the efficiency of the suggested method. Finally, an
example of real lifetime data is analysed, where it is assumed that the distribution of the
data is unknown.
H. M. Barakat, E. M. Nigm and R. A. Aldallal 253
2. Auxiliary results
Houchens (1984) used an inductive argument to derive the pdf of Uc
n,Lc
nand Rc
n,based
on an arbitrary cdf FX,(in the sequel we write Uc
nkX,Lc
nkXand Rc
nkXto indicate that
these statistics are based on the cdf FX), respectively by
fUc
nkX(x) = 2nfX(x)h1−¯
FX(x)
n−1
∑
k=0
[−log ¯
FX(x)]k
k!i,(2.1)
fLc
nkX(x) = 2nfX(x)h1−FX(x)
n−1
∑
k=0
[−logFX(x)]k
k!i
and
fRc
nkX(r) = 2n
(n−1)!Z∞
−∞
fX(r+x)fX(x)
h−log(1−FX(r+x) + FX(x))in−1
dx,0<r<∞,
where ¯
FX(x) = 1−FX(x).
Houchens (1984) deduced a useful representation for Uc
nkY,when Yhas a negative
exponential with parameter 2, i.e., Y∼EX(2).Namely,
Uc
nkYd
=Y0+Y1+... +Yn,(2.2)
where “ d
=”means identical in distribution and Yi’s are independent random variables
such that Y0∼EX(2)and the remaining Yi∼EX(1).An analogous representation for
the lower current record can be easily obtained by noting that
f−Uc
nkX(x) = fUc
nkX(−x) = 2nfX(−x)h1−¯
FX(−x)
n−1
∑
k=0
(−log ¯
FX(−x))k
k!i
=2nf−X(x)h1−¯
F−X(x)
n−1
∑
k=0
(−log ¯
F−X(x))k
k!i,
which yields
−Uc
nkXd
=Lc
nk −X.(2.3)
Applying (2.3), we get −Uc
nkYd
=−Y0−Y1− · ·· − Yn
d
=Z0+Z1+···+Zn,where
Z0∼EX+(2),Zi∼EX+(1),i=1,2,...,n,and EX+(β)is the positive exponential cdf
254 Exact prediction intervals for future current records and record range...
with parameter β.Thus, by applying again (2.3) and noting that Y∼EX(β)⇒Z=
−Y∼EX+(β),we get
Lc
nkZd
=Z0+Z1+... +Zn,
where Z∼EX+(2),Z0∼EX+(2)and Zi∼EX+(1),i=1,2,...,n.
3. Main results
The following theorem is the main result of this article. In what follows we assume that
FXis a continuous cdf with the generalized inverse function F−1
X(y) = inf{x:FX(x)≥y}.
Theorem 3.1. Let Uc
n=Uc
nkX,Lc
n=Lc
nkX and Rc
n=Rc
nkX be the upper current
record, the lower current record and the record range based on the cdf FX,respectively.
Furthermore, let 0<α,β<1and m =1,2,... Then,
1. Uc
n,F−1
X1−¯
F
1+tm:α
X(Uc
n)is (1−α)%confidence interval for U c
n+m.
2. F−1
X(F1+tm:β
X(Lc
n)),Lc
nis (1−β)%confidence interval for Lc
n+m,
3. Rc
n=Uc
n−Lc
n,F−1
X1−¯
F
1+tm:α
X(Uc
n)−F−1
XF1+tm:β
X(Lc
n) is γ%confidence
interval for Rc
n+m,where γ≥max(1−α−β,0)(e.g., γ≥0.98 if α=β=0.01).
Theorem 3.1 will follow from the following lemma, which is proved in the Appendix
and individually expresses an interesting fact.
Lemma 3.1. Let U ⋆
n=Uc
nkY and L⋆
n=Lc
nkZ,where Y ∼EX(2)and Z ∼EX+(2).
Then, for every m =1,2,..., the two pivotal statistics ¯
Tm=U⋆
n+m−U⋆
n
U⋆
nand Tm=L⋆
n+m−L⋆
n
L⋆
n
have the same pdf f (t),where
f(t) = 2n−1m tm−1
(t+1
2)m+1−
n−1
∑
k=0k+m
k2n−k−1mtm−1
(t+1)k+m+1.(3.1)
Remark 3.1. One can easily check that R∞
0f(t)dt =1,by using the two formulas
Z∞
0
tN
(t+a)Mdt =aN−M+1
N
∑
i=0N
i(−1)i+1
N−i−M+1,a>0,
H. M. Barakat, E. M. Nigm and R. A. Aldallal 255
and
N
∑
i=0
(−1)i
M+iN
i=N!(M−1)!
(M+N)!,
for any two positive integers Nand M,for which N<M−1.
Proof of Theorem 3.1. On applying Lemma 3.1, we get P0≤¯
Tm≤tm:α=1−α,and
P0≤Tm≤tm:β=1−β.Therefore, we get
P0≤U∗
n+m−U∗
n
U∗
n
≤tm:α=PU∗
n≤U∗
n+m≤U∗
n(1+tm:α)=1−α(3.2)
and
P0≤L∗
n+m−L∗
n
L∗
n
≤tm:β=P0≥L∗
n+m−L∗
n≥L∗
ntm:β=1−β(3.3)
(note that L∗
n≤0). Thus, the first two relations of Theorem 3.1 (1. and 2.) follow imme-
diately by applying the transformations U⋆
n=−2log(¯
FX(Uc
n)) and L⋆
n=2log(FX(Lc
n)),
respectively, on the relations (3.2) and (3.3).
In order to find the confidence interval for the record range we use the two well-
known relations
P(C1C2)≥max(P(C1) + P(C2)−1,0),
for any two events C1and C2,and
{a+¯a≤X+Y≤b+¯
b} ⊂ { ¯a<X<¯
b,a<Y<b},
for any two random variables Xand Y,to get
PRc
n=Uc
n−Lc
n≤Rc
n+m≤F−1
X1−¯
F
1+tm:α
X(Uc
n)−F−1
XF
1+tm:β
X(Lc
n)
≥PUc
n≤Uc
n+m≤F−1
X1−¯
F
1+tm:α
X(Uc
n),−Lc
n≤−Lc
n+m≤−F−1
XF
1+tm:β
X(Lc
n)
=γ≥max(1−α−β,0).
This completes the proof.
256 Exact prediction intervals for future current records and record range...
By using an argument similar to the one applied in Lemma 3.1, the proofs of the
following two results are in the appendix.
Lemma 3.2. The joint pdf’s of U ⋆
1,U⋆
2,...,U⋆
nand L⋆
1,L⋆
2,...,L⋆
nare given respectively
by
fU⋆
n,U⋆
n−1,...,U⋆
1(yn,yn−1,...,y1) = e−yn[ey1/2−1],0<y1<y2<···<yn,
and
fL⋆
n,L⋆
n−1,...,L⋆
1(zn,zn−1,...,z1) = ezn[e−z1/2−1],zn<zn−1<··· <z1<0.
Lemma 3.2 opens the way for interesting inferential study based on the current records.
Actually, by noting that U⋆
n=−2log(¯
FX(Uc
nkX)) and L⋆
n=2log(FX(Lc
nkX)),we
can obtained the likelihood functions based on the upper and lower current records,
respectively, as
fUc
nkX,...,Uc
1kX(xn,...,x1) = ¯
F2
X(xn)FX(x1)
¯
FX(x1) n
∏
j=1
2fX(xj)
¯
FX(xj)!,x1<x2<···<xn
and
fLc
nkX,...,Lc
1kX(xn,...,x1) = ¯
F2
X(xn)¯
FX(x1)
FX(x1) n
∏
j=1
2fX(xj)
FX(xj)!,xn<xn−1<···<x1.
The above likelihood functions can be used to obtain the point estimators of any
unknown parameters of the cdf FX,especially if the available data are the current record
values.
Lemma 3.3. Each of the sequence {Uc
nkX}and {Lc
nkX}forms a Markov chain.
Tables 1, 2 and 3 give the values of tm:θ,where Rtm:θ
0f(t)dt =1−θ,for the values of
n=2,3,...,20,m=1,2,...,5 and θ=0.1,0.05,0.01.The calculations in these tables
are carried out by Mathematica 8.
H. M. Barakat, E. M. Nigm and R. A. Aldallal 257
Table 1: P(¯
Tm≤tm:0.1) = P(Tm≤tm:0.1) = 0.9.
n m =1m=2m=3m=4m=5
2 0.893932 1.64789 2.12161 3.09928 3.81681
3 0.637903 1.15382 1.64826 2.13481 2.61746
4 0.496616 0.887298 1.25887 1.62313 1.98369
5 0.406947 0.720864 1.01764 1.30767 1.59422
6 0.34491 0.607108 0.853803 1.09426 1.33144
7 0.299402 0.524443 0.735349 0.940467 1.14249
8 0.264573 0.461651 0.645749 0.824454 1.00024
9 0.237047 0.41233 0.575618 0.733861 0.88935
10 0.214737 0.37256 0.519236 0.661177 0.80051
11 0.196285 0.339808 0.472924 0.601579 0.727758
12 0.180767 0.312366 0.434205 0.551829 0.667099
13 0.167533 0.289036 0.401353 0.509676 0.615756
14 0.156111 0.268957 0.373128 0.473505 0.571739
15 0.146152 0.251494 0.348617 0.442127 0.533588
16 0.137392 0.236166 0.327132 0.41465 0.500204
17 0.129626 0.222602 0.308145 0.390389 0.470749
18 0.122693 0.210515 0.291243 0.368811 0.444567
19 0.116466 0.199676 0.276101 0.349494 0.421142
20 0.110841 0.1899 0.262458 0.332101 0.400062
Table 2: P(¯
Tm≤tm:0.05) = P(Tm≤tm:0.05) = 0.95.
n m =1m=2m=3m=4m=5
2 1.33466 2.36039 3.35038 4.32794 5.29962
3 0.917775 1.58465 2.22095 2.84604 3.46562
4 0.699294 1.18883 1.65183 2.10471 2.55247
5 0.565044 0.950237 1.31202 1.66461 2.01244
6 0.474206 0.791113 1.08708 1.37465 1.6578
7 0.408643 0.677553 0.927536 1.16979 1.4079
8 0.359082 0.592484 0.808619 1.01759 1.2227
9 0.320292 0.526398 0.716627 0.900193 1.08012
10 0.2891 0.473584 0.643377 0.806939 0.967069
11 0.263469 0.430412 0.583685 0.731109 0.875286
12 0.242029 0.394463 0.534115 0.668256 0.799318
13 0.223828 0.364064 0.492298 0.615323 0.735419
14 0.208182 0.338022 0.45655 0.570139 0.680937
15 0.194588 0.315462 0.42564 0.531123 0.633941
16 0.182666 0.29573 0.39865 0.497096 0.592992
17 0.172124 0.278324 0.374878 0.46716 0.556997
18 0.162736 0.262857 0.353783 0.440621 0.525111
19 0.154321 0.249021 0.334935 0.416931 0.49667
20 0.146736 0.23657 0.317995 0.395657 0.471145
258 Exact prediction intervals for future current records and record range...
Table 3: P(¯
Tm≤tm:0.01) = P(Tm≤tm:0.01) = 0.99.
n m =1m=2m=3m=4m=5
2 2.85847 4.79726 6.66544 8.50916 10.3413
3 1.79354 2.91229 3.97659 5.02093 6.05546
4 1.29678 2.06118 2.78104 3.4839 4.17816
5 1.01294 1.58618 2.12161 2.64218 3.15505
6 0.830235 1.28585 1.70856 2.11803 2.52051
7 0.703094 1.07977 1.42729 1.76285 2.09202
8 0.609623 0.929976 1.22414 1.50739 1.78474
9 0.538056 0.816357 1.07087 1.31536 1.55434
10 0.481519 0.727304 0.951294 1.166 1.37557
11 0.435735 0.655671 0.855492 1.04666 1.23301
12 0.397907 0.596827 0.777067 0.949212 1.11681
13 0.366128 0.547641 0.711716 0.868181 1.02035
14 0.339055 0.505923 0.656439 0.799775 0.939036
15 0.315716 0.470098 0.609086 0.741278 0.869594
16 0.295387 0.439003 0.568075 0.690695 0.80962
17 0.277521 0.411761 0.532217 0.646532 0.757316
18 0.261696 0.387699 0.500602 0.607646 0.711309
19 0.247582 0.366292 0.472522 0.573149 0.670533
20 0.234914 0.347123 0.447416 0.542341 0.63415
4. Simulation study
In order to check the efficiency of the presented method in Theorem 3.1, a simulation
study is conducted for two important lifetime distributions: Weibull[1,2],with scale and
shape parameters 1 and 2, respectively, and Normal[0,1].For each of these distributions,
we generate a random sample of size 100. Moreover, for each of these random samples,
the lower and upper current record values are picked up and then the corresponding
record ranges are computed. By accident, we got the same number, 12, of current
records (lower and upper) for the two random samples (i.e., for the two distributions).
Table 4 gives these 12 observed values of Uc
nkXand Lc
nkX,as well as Rc
nkX,
where X∼Weibull[1,2],or X∼Normal[0,1].Now, we assume that we have only
observed the first 9 values of current records (lower and upper) (i.e., 75% of the observed
values of the current records) and we want to predict the three next ones (i.e., 25% of
the observed values of the current records). Theorem 3.1 enables us to get predictive
confidence intervals for these three next values. Tables 5 and 6 give these predictive
confidence intervals for Uc
9+mkX,Lc
9+mkXand Rc
9+mkX,where m=1,2,3,for the
cdf’s X∼Weibull[1,2]and X∼Normal[0,1],respectively.
Algorithm
Step 1: select the cdf FXfrom which the data will come,
Step 2: choose the values of N,
H. M. Barakat, E. M. Nigm and R. A. Aldallal 259
Step 3: generate a random sample of size Nfrom FX,
Step 4: pick up the lower and upper current record values from the observed data and
compute the corresponding record range values. Let the number of the observed lower
and upper current record values be n.Choose the value of M,which is about 25% of n,
Step 5: choose a significant coefficient θand numerically solve the equation
Ztm:θ
0
f(t)dt =1−θ,m=1,2,...,M,
using (3.1) (after replacing nin (3.1) by n−M) and Mathematica 8,
Step 6: determine the lower and upper bounds of the predictive confidence intervals for
Uc
n−M+mkX,Lc
n−M+mkXand Rc
n−M+mkX,m=1,2,..,M,by using Theorem 3.1 and
the step 5.
The presented results in Tables 5 and 6 show that all the true values of Uc
9+mkX,
Lc
9+mkXand Rc
9+mkX,where m=1,2,are included in their predictive confidence
intervals for the two cdf’s X∼Weibull [1,2]and X∼Normal[0,1].Moreover, almost,
the true values of these statistics are also included in their predictive confidence intervals
for the two cdf’s, for m=3.Nevertheless, the length of the predictive confidence interval
increases (i.e., we get less accuracy) with increasing the value of m,i.e. the number of
the unobserved data is increased. Therefore, we advise predicting no more than one
fourth of the data that we have.
Table 4: Current records and record range from Weibull[1,2]and N ormal[0,1].
Weibull[1,2]Normal[0,1]
n Uc
nLc
nRc
nn Uc
nLc
nRc
n
1 3.84915 3.84915 0 1 −0.187968 −0.187968 0
2 3.84915 0.446312 3.402838 2 −0.187968 −0.35455 0.166582
3 5.64291 0.446312 5.196598 3 0.1652 −0.35455 0.51975
4 5.64291 0.375142 5.267768 4 0.1652 −1.21013 1.37533
5 5.64291 0.192999 5.449911 5 1.40996 −1.21013 2.62009
6 6.1647 0.192999 5.971701 6 1.40996 −1.37108 2.78104
7 10.2282 0.192999 10.035201 7 1.40996 −1.66077 3.07073
8 10.2282 0.108285 10.119915 8 2.07656 −1.66077 3.73733
9 10.2282 0.0235643 10.2046357 9 2.07656 −1.90336 3.97992
10 10.5855 0.0235643 10.5619357 10 2.10684 −1.90336 4.0102
11 12.9219 0.0235643 12.8983357 11 2.10684 −2.15466 4.2615
12 12.9219 0.0202959 12.9016041 12 2.96574 −2.15466 5.1204
260 Exact prediction intervals for future current records and record range...
Table 5: Predictive confidence intervals for the next three observations of current records and record range
from Weibull[1,2],with different significance levels (SL’s) 90%,95% and 99%.
for m =1SL =90% SL =95% SL =99%
Uc
10 (10.2282,12.6528) (10.2282,13.5042) (10.2282,15.7315)
Lc
10 (0.00818032,0.0235643) (0.00564575,0.0235643) (0.00214174,0.0235643)
Rc
10 (10.2046357,12.6446) (10.2046357,13.4986) (10.2046357,15.7294)
for m =2SL =90% SL =95% SL =99%
Uc
11 (10.2282,14.4456) (10.2282,15.6123) (10.2282,18.5781)
Lc
11 (0.00374765,0.0235643) (0.00225577,0.0235643) (0.000621026,0.0235643)
Rc
11 (10.2046357,14.4418) (10.2046357,15.61) (10.2046357,18.5774)
for m =3SL =90% SL =95% SL =99%
Uc
12 (10.2282,16.1157) (10.2282,17.558) (10.2282,21.1813)
Lc
12 (0.00181212,0.0235643) (0.000967742,0.0235643) (0.000200224,0.0235643)
Rc
12 (10.2046357,16.1139) (10.2046357,17.577) (10.2046357,21.1811)
Table 6: Predictive confidence intervals for the next three observations of current records and record range
from Normal[0,1],with different SL’s 90%,95% and 99%.
for m =1SL =90% SL =95% SL =99%
Uc
10 (2.07656,2.43784) (2.07656,2.55498) (2.07656,2.84252)
Lc
10 (−2.24886,−1.90336) (−2.36081,−1.90336) (−2.63544,−1.90336)
Rc
10 (3.97992,4.6867) (3.97992,4.91579) (3.97992,5.47796)
for m =2SL =90% SL =95% SL =99%
Uc
11 (2.07656,2.67961) (2.07656,2.82774) (2.07656,3.17785)
Lc
11 (−2.47987,−1.90336) (−2.62134,−1.90336) (−2.9555,−1.90336)
Rc
11 (3.97992,5.15948) (3.97992,5.44908) (3.97992,6.13335)
for m =3SL =90% SL =95% SL =99%
Uc
12 (2.07656,2.88969) (2.07656,3.06129) (2.07656,3.45983)
Lc
12 (−2.68049,−1.90336) (−2.84427,−1.90336) (−3.22445,−1.90336)
Rc
12 (3.97992,5.57018) (3.97992,5.90556) (3.97992,6.68428)
H. M. Barakat, E. M. Nigm and R. A. Aldallal 261
5. The case when the cdf F is unknown and real data example
Undoubtedly the lack of knowledge of the distribution of the resulted data in any sta-
tistical experiment is the most frequent case. In fact the assumption that the distribution
Fis known is unreal. However, we can overcome this problem by using the observed
data that we have (i.e., X1,X2,...,XN) to select a statistical distribution that best fits
this data set. Actually, we cannot “just guess” and use any other particular distribution
without testing several alternative models as this can result in analysis errors. In most
cases, we need to fit two or more distributions, compare the results, and select the most
valid model (see Example 5.1). Naturally, the “candidate” distributions we fit should be
chosen depending on the nature of our observed data. For example, in the case of a life
testing experiment we should fit non-negative distributions such as Gamma or Weibull.
Obviously when this procedure is applied, all we need, is that the size Nof the ob-
served data to be large enough to carry the necessary identification methods (e.g., build
a histogram) and goodness-of-fit tests (e.g., the Kolmogorov-Smirnov test) based on the
empirical cdf of X1,...,XN.In Example 5.1, we consider N=130 realistic observations
(cf. Arnold, et al. 1998, Page 49) with unknown distribution. These data yield 14 cur-
rent records (lower-upper). The first 11 of them resulted from the first 48 observations.
Thus, we look for the best distribution Fthat fits these data (the 48 observations). After
that we predict the last three current records and their corresponding record ranges by
applying the results of Theorem 3.1 on the first 11 current records and their correspond-
ing record ranges. We find almost all the predictions are accurate even when we select
another fitted distribution for the data but with less goodness-of-fit to the data than the
first one.
Example 5.1. The following data (read row-wise) represent the average July tempera-
tures (in degrees centigrade) of Neuenburg, Switzerland, during the period 1864-1993
(from Klupppelberg and Schwere, 1995).
19.0 20.1 18.4 17.4 19.7 21.0 21.4 19.2 19.9 20.4 20.9 17.2 20.2 17.8 18.1
15.6 19.4 21.7 16.2 16.4 19.0 20.6 19.0 20.7 15.8 17.7 16.8 17.1 18.1 18.4
18.7 18.7 18.4 19.2 18.0 18.7 20.7 19.4 19.2 17.4 22.0 21.4 19.3 16.8 18.2
16.2 15.9 22.1 17.5 15.3 16.5 17.4 17.0 18.3 18.3 15.3 18.2 21.5 17.0 21.6
18.2 18.1 17.6 18.2 22.6 19.9 17.1 17.2 17.3 19.4 20.1 20.1 17.0 19.4 17.5
16.8 17.0 19.9 18.2 19.2 18.5 20.8 19.5 21.1 15.8 21.3 21.2 18.8 22.3 18.6
16.8 18.2 17.2 18.4 18.7 21.1 16.3 17.4 18.0 19.5 21.2 16.8 17.4 20.7 18.4
19.8 18.7 20.5 18.3 18.2 18.2 19.2 20.2 18.2 17.4 19.2 16.3 17.4 20.3 23.4
19.2 20.2 19.3 19.0 18.8 20.3 19.7 20.7 19.6 18.1
The above data yield 14 current records. These current records and their corresponding
record ranges are presented in Table 7. First, we try to fit the first 48 observations, for
several cdf’s such as exponential, logistic, Gamma, normal, Weibull, Gumbel, Laplace
262 Exact prediction intervals for future current records and record range...
and inverse Gamma distributions. The methods of maximum likelihood and moments
are used to estimate the parameters of the candidate cdf’s. After that we apply the
Anderson-Darling, Cram´
er-von Mises, and Kolmogorov-Smirnov goodness of fit tests to
check the fitting of these cdf’s. Among these cdf’s, we found that only the Gamma, nor-
mal and logistic distributions fit these data. Moreover, the Gamma[119.277,0.157808]
distribution is the best cdf that fits these data (in the average w.r.t the three applied
goodness of fit tests and the two used methods of estimation) the second cdf is Normal
[18.8229,1.71722],while the third is logistic distribution Logistic[18.8205,1.01236],
see Tables 8-10 and Figures 1-3. The predictive confidence intervals for the next three
statistics Uc
11+m,Lc
11+mand Rc
11+m,m=1,2,3,for the Gamma, normal and logistic cdf’s
are represented in Tables 11-13, respectively. These tables show that almost all the true
values of the above three statistics are included in the predictive confidence intervals.
This result shows that our suggested method is stable regardless the choice of the cdf
that fits the data.
Table 7: Current records and record ranges which are resulted from all our data.
n1 2 3 4 5 6 7 8 9 10 11 12 13 14
Uc
n19.0 20.1 20.1 20.1 21.0 21.4 21.4 21.4 21.7 22.0 22.1 22.1 22.6 23.4
Lc
n19.0 19.0 18.4 17.4 17.4 17.4 17.2 15.6 15.6 15.6 15.6 15.3 15.3 15.3
Rc
n0 1.1 1.7 2.7 3.6 4.0 4.2 5.8 6.1 6.4 6.5 6.8 7.3 8.1
Table 8: Fitting the first 48 observations for gamma cdf.
Distribution/Test-Method Gamma[α,β]
Maximum Likelihood ˆ
αML =119.277
ˆ
βML =0.157808
P-Value Statistic
Kolmogorov-Smirnov 0.995234 0.0569809
Anderson-Darling 0.977713 0.235002
Cram´
er-Von-Mises 0.983675 0.0274912
Moments ˆ
αM=120.149
ˆ
βM=0.156663
P-Value Statistic
Kolmogorov-Smirnov 0.994289 0.0578043
Anderson-Darling 0.974785 0.241202
Cram´
er-Von-Mises 0.981763 0.0281783
H. M. Barakat, E. M. Nigm and R. A. Aldallal 263
Table 9: Fitting the first 48 observations for normal cdf.
Distribution/Test-Method Normal [µ,σ]
Maximum Likelihood ˆ
µML =18.8229
ˆ
σML =1.71722
P-Value Statistic
Kolmogorov-Smirnov 0.994086 0.0579686
Anderson-Darling 0.982812 0.222963
Cram´
er-Von-Mises 0.987088 0.0261305
Moments ˆ
µM=18.8229
ˆ
σM=1.71722
P-Value Statistic
Kolmogorov-Smirnov 0.994086 0.0579686
Anderson-Darling 0.982812 0.222963
Cram´
er-Von-Mises 0.987088 0.0261305
Table 10: Fitting the first 48 observations for logistic cdf.
Distribution/Test-Method Logistic[µ,β]
Maximum Likelihood ˆ
µML =18.8205
ˆ
βML =1.01236
P-Value Statistic
Kolmogorov-Smirnov 0.98876 0.061264
Anderson-Darling 0.964482 0.260431
Cram´
er-Von-Mises 0.979247 0.02903
Moments ˆ
µM=18.8229
ˆ
βM=0.946754
P-Value Statistic
Kolmogorov-Smirnov 0.927317 0.0756047
Anderson-Darling 0.838543 0.409448
Cram´
er-Von-Mises 0.882778 0.0489246
264 Exact prediction intervals for future current records and record range...
Figure 1: Plot showing the goodness-of-fit for gamma cdf.
Figure 2: Plot showing the goodness-of-fit for normal cdf.
Figure 3: Plot showing the goodness-of-fit for logistic cdf.
H. M. Barakat, E. M. Nigm and R. A. Aldallal 265
Table 11: Predictive confidence intervals for Uc
11+m,Lc
11+mand Rc
11+m,m=1,2,3,from
Gamma[119.277,0.157808].
for m =1SL =90% SL =95% SL =99%
Uc
12 (22.1,22.6482) (22.1,22.8253) (22.1,23.2593)
Lc
12 (15.1588,15.6) (15.0195,15.6) (14.6846,15.6)
Rc
12 (6.5,7.4894) (6.5,7.8058) (6.5,8.5747)
for m =2SL =90% SL =95% SL =99%
Uc
13 (22.1,23.021) (22.1,23.2463) (22.1,23.7782)
Lc
13 (14.8674,15.6) (14.6945,15.6) (14.2959,15.6)
Rc
13 (6.5,8.1536) (6.5,8.5516) (6.5,9.4823)
for m =3SL =90% SL =95% SL =99%
Uc
14 (22.1,23.3496) (22.1,23.6122) (22.1,24.222)
Lc
14 (14.6161,15.6) (14.4189,15.6) (13.9732,15.6)
Rc
14 (6.5,8.7335) (6.5,9.1933) (6.5,10.2488)
Table 12: Predictive confidence intervals for Uc
11+m,Lc
11+mand Rc
11+m,m=1,2,3,from
Normal [18.8229,1.71722].
for m =1SL =90% SL =95% SL =99%
Uc
12 (22.1,22.5884) (22.1,22.756) (22.1,23.1421)
Lc
12 (15.107,15.6) (14.9494,15.6) (14.5665,15.6)
Rc
12 (6.5,7.4814) (6.5,7.8066) (6.5,8.5756)
for m =2SL =90% SL =95% SL =99%
Uc
13 (22.1,22.9307) (22.1,23.1306) (22.1,23.5979)
Lc
13 (14.7762,15.6) (14.578,15.6) (14.1147,15.6)
Rc
13 (6.5,8.1545) (6.5,8.5526) (6.5,9.4832)
for m =3SL =90% SL =95% SL =99%
Uc
14 (22.1,23.2219) (22.1,23.4528) (22.1,23.9826)
Lc
14 (14.4875,15.6) (14.2586,15.6) (13.7333,15.6)
Rc
14 (6.5,8.7344) (6.5,9.1942) (6.5,10.2493)
266 Exact prediction intervals for future current records and record range...
Table 13: Predictive confidence intervals for Uc
11+m,Lc
11+mand Rc
11+m,m=1,2,3,from
Logistic[18.8205,1.01236].
for m =1SL =90% SL =95% SL =99%
Uc
12 (22.1,22.77) (22.1,22.997) (22.1,23.5757)
Lc
12 (14.9403,15.6) (14.7169,15.6) (14.1475,15.6)
Rc
12 (6.5,7.8297) (6.5,8.2801) (6.5,9.4282)
for m =2SL =90% SL =95% SL =99%
Uc
13 (22.1,23.2539) (22.1,23.5578) (22.1,24.3102)
Lc
13 (14.4641,15.6) (14.1651,15.6) (13.4251,15.6)
Rc
13 (6.5,8.7898) (6.5,9.3927) (6.5,10.8851)
for m =3SL =90% SL =95% SL =99%
Uc
14 (22.1,23.7001) (22.1,24.0702) (22.1,24.9755)
Lc
14 (14.0251,15.6) (13.6612,15.6) (12.771,15.6)
Rc
14 (6.5,9.675) (6.5,10.409) (6.5,12.2045)
6. Conclusion
In this paper we focused on the prediction of upper and lower records. The obtained
results are useful when people are interested in knowing extreme values on different
periods, areas, etc. and their range of variation. Theorem 3.1 suggests a new method
to estimate confidence intervals for upper, lower and range records. This new method
depends on constructing two pivotal statistics with the same distribution for lower and
upper current records. The real data Example 5.1, shows that when the cdf of the
data is unknown, this method is applicable with acceptable degree of accuracy, even
if we fail to assign the type of the distribution of the data with a high accuracy. It is
worth mentioning that the result and the method of the proofs of this paper are quite
different from the known results concerning the prediction problems of record values.
For example, Ahmadi and Balakrishnan (2004) used only the current records to estimate
the fixed quantiles of the given cdf (unknown cdf), while Raqab and Balakrishnan (2008)
obtained distribution-free prediction intervals for the usual records (not the current
records). Finally Raqab (2009) predicted the current records, by using the two-sample
prediction plan, where the variable to be predicted comes from an independent future
sample. In this paper, we consider the one-sample prediction plan, where the variable to
be predicted comes from the same sample so that it may be correlated with the observed
data.
H. M. Barakat, E. M. Nigm and R. A. Aldallal 267
Acknowledgement
The authors would like to thank the anonymous referees for constructive suggestions
and comments that improved the representation substantially.
Appendix
Proof of Lemma 3.1. By using (2.2), we get
P(U⋆
n+m≤x|U⋆
n=y) = P(Y0+Y1+···+Yn+···+Yn+m≤x|Y0+Y1+... +Yn=y)
=P(Yn+1+···+Yn+m≤x−y|Y0+Y1+···+Yn=y) = P(Yn+1+···+Yn+m≤x−y).(1)
On the other hand, since Yi∼EX(1),for i=n+1,...,n+m,then
fU⋆
n+m|U⋆
n(x|y) = fYn+1+···+Yn+m(x−y) = (x−y)m−1
(m−1)!e−(x−y)I(0,∞)(x−y),(2)
where IA(.)is the usual indicator function of the set A.Therefore, by combining (1) and
(2) with (2.1), we get
fU⋆
n+m,U⋆
n(x,y) = fU⋆
n+m|U⋆
n(x|y)fU⋆
n(y)
=(x−y)m−1
(m−1)!e−(x−y)2n(1
2e−y/2)h1−e−y/2
n−1
∑
k=0
(−loge−y/2)k
k!i
=2n−1(x−y)m−1e−(x−y/2)
(m−1)!h1−e−y/2
n−1
∑
k=0
yk
2kk!i.(3)
Now, by using the transformation ¯
Tm=U⋆
n+m−U⋆
n
U⋆
nand W=U⋆
n,we get
f¯
Tm,W(t,w) = 2n−1wmtm−1e−w(t+1
2)
(m−1)!−2n−1tm−1e−w(t+1)
(m−1)!
n−1
∑
k=0
wk+m
2kk!.
Thus, we conclude that
f¯
Tm(t) = Z∞
0
f¯
Tm,W(t,w)dw =2n−1m tm−1
(t+1
2)m+1−
n−1
∑
k=0k+m
k2n−k−1mtm−1
(t+1)k+m+1.
268 Exact prediction intervals for future current records and record range...
Similarly, we can show, for any x≤z≤0,that P(L⋆
n+m≤x|L⋆
n=z) = P(Zn+1+···+
Zn+m≤x−z).Since Zi∼EX+(1),for i=n+1,...,n+m,then
fL⋆
n+m|L⋆
n(x|z) = fZn+1+···+Zn+m(x−z) = (−(x−z))m−1
(m−1)!e(x−z)I(−∞,0)(x−z).
Thus,
fL⋆
n+m,L⋆
n(x,z) = fL⋆
n+m|L⋆
n(x|z)fL⋆
n(z)
=2n−1(−(x−z))m−1e(x−z/2)
(m−1)!h1−ez/2
n−1
∑
k=0
(−z)k
2kk!i,x≤z≤0.
Now, by using the transformation Tm=L⋆
n+m−L⋆
n
L⋆
nand V=L⋆
n,we get
fTm,V(t,v) = 2n−1(−v)mtm−1ev(t+1
2)
(m−1)!−2n−1tm−1ev(t+1)
(m−1)!
n−1
∑
k=0
(−v)k+m
2kk!,v≤0,t≥0.
Then, we conclude that
fTm(t) = Z0
−∞
fTm,V(t,v)dv =2n−1m tm−1
(t+1
2)m+1−
n−1
∑
k=0k+m
k2n−k−1mtm−1
(t+1)k+m+1.
This completes the proof.
Proof of Lemma 3.2. Clearly, (3) yields
fU⋆
n,U⋆
n−1(yn,yn−1) = 2n−2e−(yn−yn−1/2)h1−e−yn−1/2
n−2
∑
k=0yn−1/2k
k!i.
On the other hand, by applying the same argument as in Lemma 3.1, we can show that
P(U⋆
n≤yn,U⋆
n−1≤yn−1|U⋆
n−2=yn−2)
=P(Yn−1+Yn≤yn−yn−2,Yn−1≤yn−1−yn−2|Y0+Y1+···+Yn−2=yn−2)
=P(Yn−1+Yn≤yn−yn−2,Yn−1≤yn−1−yn−2).
Since, fYn−1,Yn(yn−1,yn) = e−yn−1−yn,we get
fYn−1,Yn−1+Yn(yn−1−yn−2,yn−yn−2) = e−(yn−yn−2),yn−2<yn−1<yn.
H. M. Barakat, E. M. Nigm and R. A. Aldallal 269
Therefore, fU⋆
n,U⋆
n−1|U⋆
n−2(yn,yn−1|yn−2) = e−(yn−yn−2),which by using (2.1) implies
fU⋆
n,U⋆
n−1,U⋆
n−2(yn,yn−1,yn−2) = e−(yn−yn−2)fU⋆
n−2(yn−2)
=2n−3e−(yn−yn−2/2)h1−e−yn−2/2
n−3
∑
k=0
(yn−2/2)k
k!i.
Therefore, by induction we get the claimed result for the upper current records and the
result for the lower current records can be proved by applying the same argument.
Proof of Lemma 3.3. Since the proof of the lemma for the two sequences {Uc
nkX}and
{Lc
nkX}are very similar, we only prove the lemma for the 1st sequence. For any two
positive integers t<s,we can easily, by applying the same argument in the proof of
Lemmas 3.1, 3.2, to show that
P(Uc
skX≤xs|Uc
1kX=x1,...,Uc
tkX=xt)
=P(U⋆
s≤x⋆
s|U⋆
1=x⋆
1,...,U⋆
t=x⋆
t) = P(Yt+1+···+Ys≤x⋆
s−x⋆
t),
where x⋆
i=−2log[¯
FX(xi)],i=t,s.Therefore,
fUc
skX|Uc
1k,...,Uc
tkX(xs|x1,...,xt) = (x⋆
s−x⋆
t)m−1
(m−1)!e−(x⋆
s−x⋆
t)I(0,∞)(x⋆
s−x⋆
t).
This completes the proof.
References
Ahmadi, J. and Balakrishnan, N. (2011). Distribution-free prediction intervals for order statistics based on
record coverage. Korean Statistical Society, 40, 181–192.
Ahmadi, J. and Balakrishnan, N. (2008). Prediction intervals for future records. Statistics & Probability
Letters, 78, 395–405.
Ahmadi, J. and Balakrishnan, N. (2004). Confidence intervals for quantiles in terms of record range. Statis-
tics & Probability Letters, 68, 1955–1963.
Arnold, B. C., Balakrishnan, N. and Nagaraja, H. N. (1998). Records. Wiley, New York.
Basak, P. (2000). An application of record range and some characterization results. In: Balakrishnan, N.
(ed.) Advances on Methodological and Applied Aspects of Probability and Statistics. Gordon and
Breach Science Publishers, New york: 83–95.
Houchens, R. L. (1984). Record Value, Theory and Inference, Ph. D. Dissertation, University of California,
Riverside, CA.
Raqab, M. R. (2009). Distribution-free prediction intervals for the future current record statistics. Statistical
Papers, 50, 429–439.