ArticlePDF Available

Abstract

In this paper, a general method for predicting future lower and upper current records and record range from any arbitrary continuous distribution is proposed. Two pivotal statistics with the same explicit distribution for lower and upper current records are developed to construct prediction intervals for future current records. In addition, prediction intervals for future observations of the record range are constructed. A simulation study is applied on normal and Weibull distributions to investigate the efficiency of the suggested method. Finally, an example for real lifetime data with unknown distribution is analysed.
Statistics & Operations Research Transactions
SORT 38 (2) July-December 2014, 251-270
Statistics &
Operations Research
Transactions
c
Institut d’Estad´
ıstica de Catalunya
sort@idescat.cat
ISSN: 1696-2281
eISSN: 2013-8830
www.idescat.cat/sort/
Exact prediction intervals for future current
records and record range from any
continuous distribution
H. M. Barakat1, E. M. Nigm1and R. A. Aldallal2
Abstract
In this paper, a general method for predicting future lower and upper current records and record
range from any arbitrary continuous distribution is proposed. Two pivotal statistics with the same
explicit distribution for lower and upper current records are developed to construct prediction
intervals for future current records. In addition, prediction intervals for future observations of the
record range are constructed. A simulation study is applied on normal and Weibull distributions to
investigate the efficiency of the suggested method. Finally, an example for real lifetime data with
unknown distribution is analysed.
MSC: 62G30, 62G32, 62M20, 62F25.
Keywords: Current record values, record range, pivotal quantity, prediction interval, coverage
probability.
1. Introduction
Let {Xi;i1}be a sequence of iid continuous random variables each distributed
according to cumulative distribution function (cdf) FX(x) = P(Xx)and probability
density function (pdf) fX(x).An observation Xjwill be called an upper record value if
its value exceeds that of all previous observations. Thus, Xjis an upper record if Xj>Xi
for every i<j. An analogous definition, with the inequality being reversed, deals with
lower record values. The times at which the records occur are called record times.
1Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt
2Department of Mathematics and Statistics, MSA University, Cairo, Egypt
Received: September 2013
Accepted: July 2014
252 Exact prediction intervals for future current records and record range...
There are some situations wherein upper and lower records are observed together,
such as the case of weather data. In these cases, It is quite conceivable to consider lower
and upper records jointly, when a new record of either kind (upper or lower) occurs, and
these records are called current records. In this paper, we denote them by Uc
nand Lc
n,
respectively, and call the nth upper current record and the nth lower current record of the
sequence {Xn}when the nth record of any kind (either an upper or lower) is observed. It
can be noticed that Uc
n+1=Uc
nif Lc
n+1<Lc
nand that Lc
n+1=Lc
nif Uc
n+1>Uc
n.That is, the
upper current record value is the largest observation seen to date at the time when the nth
record (of either kind) is observed. According to the definition, Lc
0=Uc
0=X1.For n1,
the interval (Lc
n,Uc
n) is then referred to as the record coverage. The record range is then
defined by Rc
n=Uc
nLc
n.The record range may also be defined as the nth record range in
the sequence of the usual sample range Rn=max(X1,X2,...,Xn)min(X1,X2,...,Xn),
where by definition Rc
0=0 and Rc
1=R2.Notice that a new record range is attained
once a new upper or lower record is observed (see, Basak, 2000). Both current record
values and record range can be detected in several real-life situations. For example, the
consistency of the production process is required to meet a product’s specifications. If
the record range is large, then it is likely that large number of products will lie outside
the specifications of the product. Predictions of future upper and lower current records,
as well as record range, are of natural interest in this context. Prediction of future events
is a problem of great interest and plays an important role in many applications, such
as meteorology, hydrology, industrial stress testing and athletic events. Several authors
have considered prediction problems involving record values. For example, Ahmadi
and Balakrishnan (2004) derived distribution-free confidence intervals to estimate the
fixed quantiles of an arbitrary unknown distribution, based on current records of an iid
sequence from that distribution. Raqab and Balakrishnan (2008) obtained distribution-
free prediction intervals for records from the Y-sequence based on record values from
the X-sequence of iid random variables from the same distribution. Raqab (2009)
obtained prediction intervals for the current records from a future iid sequence based
on observed current records from an independent iid sequence of the same distribution.
Ahmadi and Balakrishnan (2011) discussed the prediction of future order statistics based
on the current record values. In this paper, we consider two pivotal quantities for the
lower and upper current records based on an arbitrary cdf FXwith the same explicit
distribution-free (not depending on the cdf FX). By using these pivotal quantities,
prediction intervals of future observations of lower-upper current records and record
range are explicitly derived. Moreover, simulation study is applied on normal and
Weibull distributions to investigate the efficiency of the suggested method. Finally, an
example of real lifetime data is analysed, where it is assumed that the distribution of the
data is unknown.
H. M. Barakat, E. M. Nigm and R. A. Aldallal 253
2. Auxiliary results
Houchens (1984) used an inductive argument to derive the pdf of Uc
n,Lc
nand Rc
n,based
on an arbitrary cdf FX,(in the sequel we write Uc
nkX,Lc
nkXand Rc
nkXto indicate that
these statistics are based on the cdf FX), respectively by
fUc
nkX(x) = 2nfX(x)h1¯
FX(x)
n1
k=0
[log ¯
FX(x)]k
k!i,(2.1)
fLc
nkX(x) = 2nfX(x)h1FX(x)
n1
k=0
[logFX(x)]k
k!i
and
fRc
nkX(r) = 2n
(n1)!Z
fX(r+x)fX(x)
hlog(1FX(r+x) + FX(x))in1
dx,0<r<,
where ¯
FX(x) = 1FX(x).
Houchens (1984) deduced a useful representation for Uc
nkY,when Yhas a negative
exponential with parameter 2, i.e., YEX(2).Namely,
Uc
nkYd
=Y0+Y1+... +Yn,(2.2)
where d
=”means identical in distribution and Yi’s are independent random variables
such that Y0EX(2)and the remaining YiEX(1).An analogous representation for
the lower current record can be easily obtained by noting that
fUc
nkX(x) = fUc
nkX(x) = 2nfX(x)h1¯
FX(x)
n1
k=0
(log ¯
FX(x))k
k!i
=2nfX(x)h1¯
FX(x)
n1
k=0
(log ¯
FX(x))k
k!i,
which yields
Uc
nkXd
=Lc
nk −X.(2.3)
Applying (2.3), we get Uc
nkYd
=Y0Y1− · ·· Yn
d
=Z0+Z1+···+Zn,where
Z0EX+(2),ZiEX+(1),i=1,2,...,n,and EX+(β)is the positive exponential cdf
254 Exact prediction intervals for future current records and record range...
with parameter β.Thus, by applying again (2.3) and noting that YEX(β)Z=
YEX+(β),we get
Lc
nkZd
=Z0+Z1+... +Zn,
where ZEX+(2),Z0EX+(2)and ZiEX+(1),i=1,2,...,n.
3. Main results
The following theorem is the main result of this article. In what follows we assume that
FXis a continuous cdf with the generalized inverse function F1
X(y) = inf{x:FX(x)y}.
Theorem 3.1. Let Uc
n=Uc
nkX,Lc
n=Lc
nkX and Rc
n=Rc
nkX be the upper current
record, the lower current record and the record range based on the cdf FX,respectively.
Furthermore, let 0<α,β<1and m =1,2,... Then,
1. Uc
n,F1
X1¯
F
1+tm:α
X(Uc
n)is (1α)%confidence interval for U c
n+m.
2. F1
X(F1+tm:β
X(Lc
n)),Lc
nis (1β)%confidence interval for Lc
n+m,
3. Rc
n=Uc
nLc
n,F1
X1¯
F
1+tm:α
X(Uc
n)F1
XF1+tm:β
X(Lc
n) is γ%confidence
interval for Rc
n+m,where γmax(1αβ,0)(e.g., γ0.98 if α=β=0.01).
Theorem 3.1 will follow from the following lemma, which is proved in the Appendix
and individually expresses an interesting fact.
Lemma 3.1. Let U
n=Uc
nkY and L
n=Lc
nkZ,where Y EX(2)and Z EX+(2).
Then, for every m =1,2,..., the two pivotal statistics ¯
Tm=U
n+mU
n
U
nand Tm=L
n+mL
n
L
n
have the same pdf f (t),where
f(t) = 2n1m tm1
(t+1
2)m+1
n1
k=0k+m
k2nk1mtm1
(t+1)k+m+1.(3.1)
Remark 3.1. One can easily check that R
0f(t)dt =1,by using the two formulas
Z
0
tN
(t+a)Mdt =aNM+1
N
i=0N
i(1)i+1
NiM+1,a>0,
H. M. Barakat, E. M. Nigm and R. A. Aldallal 255
and
N
i=0
(1)i
M+iN
i=N!(M1)!
(M+N)!,
for any two positive integers Nand M,for which N<M1.
Proof of Theorem 3.1. On applying Lemma 3.1, we get P0¯
Tmtm:α=1α,and
P0Tmtm:β=1β.Therefore, we get
P0U
n+mU
n
U
n
tm:α=PU
nU
n+mU
n(1+tm:α)=1α(3.2)
and
P0L
n+mL
n
L
n
tm:β=P0L
n+mL
nL
ntm:β=1β(3.3)
(note that L
n0). Thus, the first two relations of Theorem 3.1 (1. and 2.) follow imme-
diately by applying the transformations U
n=2log(¯
FX(Uc
n)) and L
n=2log(FX(Lc
n)),
respectively, on the relations (3.2) and (3.3).
In order to find the confidence interval for the record range we use the two well-
known relations
P(C1C2)max(P(C1) + P(C2)1,0),
for any two events C1and C2,and
{a+¯aX+Yb+¯
b} ⊂ { ¯a<X<¯
b,a<Y<b},
for any two random variables Xand Y,to get
PRc
n=Uc
nLc
nRc
n+mF1
X1¯
F
1+tm:α
X(Uc
n)F1
XF
1+tm:β
X(Lc
n)
PUc
nUc
n+mF1
X1¯
F
1+tm:α
X(Uc
n),Lc
nLc
n+mF1
XF
1+tm:β
X(Lc
n)
=γmax(1αβ,0).
This completes the proof.
256 Exact prediction intervals for future current records and record range...
By using an argument similar to the one applied in Lemma 3.1, the proofs of the
following two results are in the appendix.
Lemma 3.2. The joint pdf’s of U
1,U
2,...,U
nand L
1,L
2,...,L
nare given respectively
by
fU
n,U
n1,...,U
1(yn,yn1,...,y1) = eyn[ey1/21],0<y1<y2<···<yn,
and
fL
n,L
n1,...,L
1(zn,zn1,...,z1) = ezn[ez1/21],zn<zn1<··· <z1<0.
Lemma 3.2 opens the way for interesting inferential study based on the current records.
Actually, by noting that U
n=2log(¯
FX(Uc
nkX)) and L
n=2log(FX(Lc
nkX)),we
can obtained the likelihood functions based on the upper and lower current records,
respectively, as
fUc
nkX,...,Uc
1kX(xn,...,x1) = ¯
F2
X(xn)FX(x1)
¯
FX(x1) n
j=1
2fX(xj)
¯
FX(xj)!,x1<x2<···<xn
and
fLc
nkX,...,Lc
1kX(xn,...,x1) = ¯
F2
X(xn)¯
FX(x1)
FX(x1) n
j=1
2fX(xj)
FX(xj)!,xn<xn1<···<x1.
The above likelihood functions can be used to obtain the point estimators of any
unknown parameters of the cdf FX,especially if the available data are the current record
values.
Lemma 3.3. Each of the sequence {Uc
nkX}and {Lc
nkX}forms a Markov chain.
Tables 1, 2 and 3 give the values of tm:θ,where Rtm:θ
0f(t)dt =1θ,for the values of
n=2,3,...,20,m=1,2,...,5 and θ=0.1,0.05,0.01.The calculations in these tables
are carried out by Mathematica 8.
H. M. Barakat, E. M. Nigm and R. A. Aldallal 257
Table 1: P(¯
Tmtm:0.1) = P(Tmtm:0.1) = 0.9.
n m =1m=2m=3m=4m=5
2 0.893932 1.64789 2.12161 3.09928 3.81681
3 0.637903 1.15382 1.64826 2.13481 2.61746
4 0.496616 0.887298 1.25887 1.62313 1.98369
5 0.406947 0.720864 1.01764 1.30767 1.59422
6 0.34491 0.607108 0.853803 1.09426 1.33144
7 0.299402 0.524443 0.735349 0.940467 1.14249
8 0.264573 0.461651 0.645749 0.824454 1.00024
9 0.237047 0.41233 0.575618 0.733861 0.88935
10 0.214737 0.37256 0.519236 0.661177 0.80051
11 0.196285 0.339808 0.472924 0.601579 0.727758
12 0.180767 0.312366 0.434205 0.551829 0.667099
13 0.167533 0.289036 0.401353 0.509676 0.615756
14 0.156111 0.268957 0.373128 0.473505 0.571739
15 0.146152 0.251494 0.348617 0.442127 0.533588
16 0.137392 0.236166 0.327132 0.41465 0.500204
17 0.129626 0.222602 0.308145 0.390389 0.470749
18 0.122693 0.210515 0.291243 0.368811 0.444567
19 0.116466 0.199676 0.276101 0.349494 0.421142
20 0.110841 0.1899 0.262458 0.332101 0.400062
Table 2: P(¯
Tmtm:0.05) = P(Tmtm:0.05) = 0.95.
n m =1m=2m=3m=4m=5
2 1.33466 2.36039 3.35038 4.32794 5.29962
3 0.917775 1.58465 2.22095 2.84604 3.46562
4 0.699294 1.18883 1.65183 2.10471 2.55247
5 0.565044 0.950237 1.31202 1.66461 2.01244
6 0.474206 0.791113 1.08708 1.37465 1.6578
7 0.408643 0.677553 0.927536 1.16979 1.4079
8 0.359082 0.592484 0.808619 1.01759 1.2227
9 0.320292 0.526398 0.716627 0.900193 1.08012
10 0.2891 0.473584 0.643377 0.806939 0.967069
11 0.263469 0.430412 0.583685 0.731109 0.875286
12 0.242029 0.394463 0.534115 0.668256 0.799318
13 0.223828 0.364064 0.492298 0.615323 0.735419
14 0.208182 0.338022 0.45655 0.570139 0.680937
15 0.194588 0.315462 0.42564 0.531123 0.633941
16 0.182666 0.29573 0.39865 0.497096 0.592992
17 0.172124 0.278324 0.374878 0.46716 0.556997
18 0.162736 0.262857 0.353783 0.440621 0.525111
19 0.154321 0.249021 0.334935 0.416931 0.49667
20 0.146736 0.23657 0.317995 0.395657 0.471145
258 Exact prediction intervals for future current records and record range...
Table 3: P(¯
Tmtm:0.01) = P(Tmtm:0.01) = 0.99.
n m =1m=2m=3m=4m=5
2 2.85847 4.79726 6.66544 8.50916 10.3413
3 1.79354 2.91229 3.97659 5.02093 6.05546
4 1.29678 2.06118 2.78104 3.4839 4.17816
5 1.01294 1.58618 2.12161 2.64218 3.15505
6 0.830235 1.28585 1.70856 2.11803 2.52051
7 0.703094 1.07977 1.42729 1.76285 2.09202
8 0.609623 0.929976 1.22414 1.50739 1.78474
9 0.538056 0.816357 1.07087 1.31536 1.55434
10 0.481519 0.727304 0.951294 1.166 1.37557
11 0.435735 0.655671 0.855492 1.04666 1.23301
12 0.397907 0.596827 0.777067 0.949212 1.11681
13 0.366128 0.547641 0.711716 0.868181 1.02035
14 0.339055 0.505923 0.656439 0.799775 0.939036
15 0.315716 0.470098 0.609086 0.741278 0.869594
16 0.295387 0.439003 0.568075 0.690695 0.80962
17 0.277521 0.411761 0.532217 0.646532 0.757316
18 0.261696 0.387699 0.500602 0.607646 0.711309
19 0.247582 0.366292 0.472522 0.573149 0.670533
20 0.234914 0.347123 0.447416 0.542341 0.63415
4. Simulation study
In order to check the efficiency of the presented method in Theorem 3.1, a simulation
study is conducted for two important lifetime distributions: Weibull[1,2],with scale and
shape parameters 1 and 2, respectively, and Normal[0,1].For each of these distributions,
we generate a random sample of size 100. Moreover, for each of these random samples,
the lower and upper current record values are picked up and then the corresponding
record ranges are computed. By accident, we got the same number, 12, of current
records (lower and upper) for the two random samples (i.e., for the two distributions).
Table 4 gives these 12 observed values of Uc
nkXand Lc
nkX,as well as Rc
nkX,
where XWeibull[1,2],or XNormal[0,1].Now, we assume that we have only
observed the first 9 values of current records (lower and upper) (i.e., 75% of the observed
values of the current records) and we want to predict the three next ones (i.e., 25% of
the observed values of the current records). Theorem 3.1 enables us to get predictive
confidence intervals for these three next values. Tables 5 and 6 give these predictive
confidence intervals for Uc
9+mkX,Lc
9+mkXand Rc
9+mkX,where m=1,2,3,for the
cdf’s XWeibull[1,2]and XNormal[0,1],respectively.
Algorithm
Step 1: select the cdf FXfrom which the data will come,
Step 2: choose the values of N,
H. M. Barakat, E. M. Nigm and R. A. Aldallal 259
Step 3: generate a random sample of size Nfrom FX,
Step 4: pick up the lower and upper current record values from the observed data and
compute the corresponding record range values. Let the number of the observed lower
and upper current record values be n.Choose the value of M,which is about 25% of n,
Step 5: choose a significant coefficient θand numerically solve the equation
Ztm:θ
0
f(t)dt =1θ,m=1,2,...,M,
using (3.1) (after replacing nin (3.1) by nM) and Mathematica 8,
Step 6: determine the lower and upper bounds of the predictive confidence intervals for
Uc
nM+mkX,Lc
nM+mkXand Rc
nM+mkX,m=1,2,..,M,by using Theorem 3.1 and
the step 5.
The presented results in Tables 5 and 6 show that all the true values of Uc
9+mkX,
Lc
9+mkXand Rc
9+mkX,where m=1,2,are included in their predictive confidence
intervals for the two cdf’s XWeibull [1,2]and XNormal[0,1].Moreover, almost,
the true values of these statistics are also included in their predictive confidence intervals
for the two cdf’s, for m=3.Nevertheless, the length of the predictive confidence interval
increases (i.e., we get less accuracy) with increasing the value of m,i.e. the number of
the unobserved data is increased. Therefore, we advise predicting no more than one
fourth of the data that we have.
Table 4: Current records and record range from Weibull[1,2]and N ormal[0,1].
Weibull[1,2]Normal[0,1]
n Uc
nLc
nRc
nn Uc
nLc
nRc
n
1 3.84915 3.84915 0 1 0.187968 0.187968 0
2 3.84915 0.446312 3.402838 2 0.187968 0.35455 0.166582
3 5.64291 0.446312 5.196598 3 0.1652 0.35455 0.51975
4 5.64291 0.375142 5.267768 4 0.1652 1.21013 1.37533
5 5.64291 0.192999 5.449911 5 1.40996 1.21013 2.62009
6 6.1647 0.192999 5.971701 6 1.40996 1.37108 2.78104
7 10.2282 0.192999 10.035201 7 1.40996 1.66077 3.07073
8 10.2282 0.108285 10.119915 8 2.07656 1.66077 3.73733
9 10.2282 0.0235643 10.2046357 9 2.07656 1.90336 3.97992
10 10.5855 0.0235643 10.5619357 10 2.10684 1.90336 4.0102
11 12.9219 0.0235643 12.8983357 11 2.10684 2.15466 4.2615
12 12.9219 0.0202959 12.9016041 12 2.96574 2.15466 5.1204
260 Exact prediction intervals for future current records and record range...
Table 5: Predictive confidence intervals for the next three observations of current records and record range
from Weibull[1,2],with different significance levels (SL’s) 90%,95% and 99%.
for m =1SL =90% SL =95% SL =99%
Uc
10 (10.2282,12.6528) (10.2282,13.5042) (10.2282,15.7315)
Lc
10 (0.00818032,0.0235643) (0.00564575,0.0235643) (0.00214174,0.0235643)
Rc
10 (10.2046357,12.6446) (10.2046357,13.4986) (10.2046357,15.7294)
for m =2SL =90% SL =95% SL =99%
Uc
11 (10.2282,14.4456) (10.2282,15.6123) (10.2282,18.5781)
Lc
11 (0.00374765,0.0235643) (0.00225577,0.0235643) (0.000621026,0.0235643)
Rc
11 (10.2046357,14.4418) (10.2046357,15.61) (10.2046357,18.5774)
for m =3SL =90% SL =95% SL =99%
Uc
12 (10.2282,16.1157) (10.2282,17.558) (10.2282,21.1813)
Lc
12 (0.00181212,0.0235643) (0.000967742,0.0235643) (0.000200224,0.0235643)
Rc
12 (10.2046357,16.1139) (10.2046357,17.577) (10.2046357,21.1811)
Table 6: Predictive confidence intervals for the next three observations of current records and record range
from Normal[0,1],with different SL’s 90%,95% and 99%.
for m =1SL =90% SL =95% SL =99%
Uc
10 (2.07656,2.43784) (2.07656,2.55498) (2.07656,2.84252)
Lc
10 (2.24886,1.90336) (2.36081,1.90336) (2.63544,1.90336)
Rc
10 (3.97992,4.6867) (3.97992,4.91579) (3.97992,5.47796)
for m =2SL =90% SL =95% SL =99%
Uc
11 (2.07656,2.67961) (2.07656,2.82774) (2.07656,3.17785)
Lc
11 (2.47987,1.90336) (2.62134,1.90336) (2.9555,1.90336)
Rc
11 (3.97992,5.15948) (3.97992,5.44908) (3.97992,6.13335)
for m =3SL =90% SL =95% SL =99%
Uc
12 (2.07656,2.88969) (2.07656,3.06129) (2.07656,3.45983)
Lc
12 (2.68049,1.90336) (2.84427,1.90336) (3.22445,1.90336)
Rc
12 (3.97992,5.57018) (3.97992,5.90556) (3.97992,6.68428)
H. M. Barakat, E. M. Nigm and R. A. Aldallal 261
5. The case when the cdf F is unknown and real data example
Undoubtedly the lack of knowledge of the distribution of the resulted data in any sta-
tistical experiment is the most frequent case. In fact the assumption that the distribution
Fis known is unreal. However, we can overcome this problem by using the observed
data that we have (i.e., X1,X2,...,XN) to select a statistical distribution that best fits
this data set. Actually, we cannot “just guess” and use any other particular distribution
without testing several alternative models as this can result in analysis errors. In most
cases, we need to fit two or more distributions, compare the results, and select the most
valid model (see Example 5.1). Naturally, the “candidate” distributions we fit should be
chosen depending on the nature of our observed data. For example, in the case of a life
testing experiment we should fit non-negative distributions such as Gamma or Weibull.
Obviously when this procedure is applied, all we need, is that the size Nof the ob-
served data to be large enough to carry the necessary identification methods (e.g., build
a histogram) and goodness-of-fit tests (e.g., the Kolmogorov-Smirnov test) based on the
empirical cdf of X1,...,XN.In Example 5.1, we consider N=130 realistic observations
(cf. Arnold, et al. 1998, Page 49) with unknown distribution. These data yield 14 cur-
rent records (lower-upper). The first 11 of them resulted from the first 48 observations.
Thus, we look for the best distribution Fthat fits these data (the 48 observations). After
that we predict the last three current records and their corresponding record ranges by
applying the results of Theorem 3.1 on the first 11 current records and their correspond-
ing record ranges. We find almost all the predictions are accurate even when we select
another fitted distribution for the data but with less goodness-of-fit to the data than the
first one.
Example 5.1. The following data (read row-wise) represent the average July tempera-
tures (in degrees centigrade) of Neuenburg, Switzerland, during the period 1864-1993
(from Klupppelberg and Schwere, 1995).
19.0 20.1 18.4 17.4 19.7 21.0 21.4 19.2 19.9 20.4 20.9 17.2 20.2 17.8 18.1
15.6 19.4 21.7 16.2 16.4 19.0 20.6 19.0 20.7 15.8 17.7 16.8 17.1 18.1 18.4
18.7 18.7 18.4 19.2 18.0 18.7 20.7 19.4 19.2 17.4 22.0 21.4 19.3 16.8 18.2
16.2 15.9 22.1 17.5 15.3 16.5 17.4 17.0 18.3 18.3 15.3 18.2 21.5 17.0 21.6
18.2 18.1 17.6 18.2 22.6 19.9 17.1 17.2 17.3 19.4 20.1 20.1 17.0 19.4 17.5
16.8 17.0 19.9 18.2 19.2 18.5 20.8 19.5 21.1 15.8 21.3 21.2 18.8 22.3 18.6
16.8 18.2 17.2 18.4 18.7 21.1 16.3 17.4 18.0 19.5 21.2 16.8 17.4 20.7 18.4
19.8 18.7 20.5 18.3 18.2 18.2 19.2 20.2 18.2 17.4 19.2 16.3 17.4 20.3 23.4
19.2 20.2 19.3 19.0 18.8 20.3 19.7 20.7 19.6 18.1
The above data yield 14 current records. These current records and their corresponding
record ranges are presented in Table 7. First, we try to fit the first 48 observations, for
several cdf’s such as exponential, logistic, Gamma, normal, Weibull, Gumbel, Laplace
262 Exact prediction intervals for future current records and record range...
and inverse Gamma distributions. The methods of maximum likelihood and moments
are used to estimate the parameters of the candidate cdf’s. After that we apply the
Anderson-Darling, Cram´
er-von Mises, and Kolmogorov-Smirnov goodness of fit tests to
check the fitting of these cdf’s. Among these cdfs, we found that only the Gamma, nor-
mal and logistic distributions fit these data. Moreover, the Gamma[119.277,0.157808]
distribution is the best cdf that fits these data (in the average w.r.t the three applied
goodness of fit tests and the two used methods of estimation) the second cdf is Normal
[18.8229,1.71722],while the third is logistic distribution Logistic[18.8205,1.01236],
see Tables 8-10 and Figures 1-3. The predictive confidence intervals for the next three
statistics Uc
11+m,Lc
11+mand Rc
11+m,m=1,2,3,for the Gamma, normal and logistic cdf’s
are represented in Tables 11-13, respectively. These tables show that almost all the true
values of the above three statistics are included in the predictive confidence intervals.
This result shows that our suggested method is stable regardless the choice of the cdf
that fits the data.
Table 7: Current records and record ranges which are resulted from all our data.
n1 2 3 4 5 6 7 8 9 10 11 12 13 14
Uc
n19.0 20.1 20.1 20.1 21.0 21.4 21.4 21.4 21.7 22.0 22.1 22.1 22.6 23.4
Lc
n19.0 19.0 18.4 17.4 17.4 17.4 17.2 15.6 15.6 15.6 15.6 15.3 15.3 15.3
Rc
n0 1.1 1.7 2.7 3.6 4.0 4.2 5.8 6.1 6.4 6.5 6.8 7.3 8.1
Table 8: Fitting the first 48 observations for gamma cdf.
Distribution/Test-Method Gamma[α,β]
Maximum Likelihood ˆ
αML =119.277
ˆ
βML =0.157808
P-Value Statistic
Kolmogorov-Smirnov 0.995234 0.0569809
Anderson-Darling 0.977713 0.235002
Cram´
er-Von-Mises 0.983675 0.0274912
Moments ˆ
αM=120.149
ˆ
βM=0.156663
P-Value Statistic
Kolmogorov-Smirnov 0.994289 0.0578043
Anderson-Darling 0.974785 0.241202
Cram´
er-Von-Mises 0.981763 0.0281783
H. M. Barakat, E. M. Nigm and R. A. Aldallal 263
Table 9: Fitting the first 48 observations for normal cdf.
Distribution/Test-Method Normal [µ,σ]
Maximum Likelihood ˆ
µML =18.8229
ˆ
σML =1.71722
P-Value Statistic
Kolmogorov-Smirnov 0.994086 0.0579686
Anderson-Darling 0.982812 0.222963
Cram´
er-Von-Mises 0.987088 0.0261305
Moments ˆ
µM=18.8229
ˆ
σM=1.71722
P-Value Statistic
Kolmogorov-Smirnov 0.994086 0.0579686
Anderson-Darling 0.982812 0.222963
Cram´
er-Von-Mises 0.987088 0.0261305
Table 10: Fitting the first 48 observations for logistic cdf.
Distribution/Test-Method Logistic[µ,β]
Maximum Likelihood ˆ
µML =18.8205
ˆ
βML =1.01236
P-Value Statistic
Kolmogorov-Smirnov 0.98876 0.061264
Anderson-Darling 0.964482 0.260431
Cram´
er-Von-Mises 0.979247 0.02903
Moments ˆ
µM=18.8229
ˆ
βM=0.946754
P-Value Statistic
Kolmogorov-Smirnov 0.927317 0.0756047
Anderson-Darling 0.838543 0.409448
Cram´
er-Von-Mises 0.882778 0.0489246
264 Exact prediction intervals for future current records and record range...
Figure 1: Plot showing the goodness-of-fit for gamma cdf.
Figure 2: Plot showing the goodness-of-fit for normal cdf.
Figure 3: Plot showing the goodness-of-fit for logistic cdf.
H. M. Barakat, E. M. Nigm and R. A. Aldallal 265
Table 11: Predictive confidence intervals for Uc
11+m,Lc
11+mand Rc
11+m,m=1,2,3,from
Gamma[119.277,0.157808].
for m =1SL =90% SL =95% SL =99%
Uc
12 (22.1,22.6482) (22.1,22.8253) (22.1,23.2593)
Lc
12 (15.1588,15.6) (15.0195,15.6) (14.6846,15.6)
Rc
12 (6.5,7.4894) (6.5,7.8058) (6.5,8.5747)
for m =2SL =90% SL =95% SL =99%
Uc
13 (22.1,23.021) (22.1,23.2463) (22.1,23.7782)
Lc
13 (14.8674,15.6) (14.6945,15.6) (14.2959,15.6)
Rc
13 (6.5,8.1536) (6.5,8.5516) (6.5,9.4823)
for m =3SL =90% SL =95% SL =99%
Uc
14 (22.1,23.3496) (22.1,23.6122) (22.1,24.222)
Lc
14 (14.6161,15.6) (14.4189,15.6) (13.9732,15.6)
Rc
14 (6.5,8.7335) (6.5,9.1933) (6.5,10.2488)
Table 12: Predictive confidence intervals for Uc
11+m,Lc
11+mand Rc
11+m,m=1,2,3,from
Normal [18.8229,1.71722].
for m =1SL =90% SL =95% SL =99%
Uc
12 (22.1,22.5884) (22.1,22.756) (22.1,23.1421)
Lc
12 (15.107,15.6) (14.9494,15.6) (14.5665,15.6)
Rc
12 (6.5,7.4814) (6.5,7.8066) (6.5,8.5756)
for m =2SL =90% SL =95% SL =99%
Uc
13 (22.1,22.9307) (22.1,23.1306) (22.1,23.5979)
Lc
13 (14.7762,15.6) (14.578,15.6) (14.1147,15.6)
Rc
13 (6.5,8.1545) (6.5,8.5526) (6.5,9.4832)
for m =3SL =90% SL =95% SL =99%
Uc
14 (22.1,23.2219) (22.1,23.4528) (22.1,23.9826)
Lc
14 (14.4875,15.6) (14.2586,15.6) (13.7333,15.6)
Rc
14 (6.5,8.7344) (6.5,9.1942) (6.5,10.2493)
266 Exact prediction intervals for future current records and record range...
Table 13: Predictive confidence intervals for Uc
11+m,Lc
11+mand Rc
11+m,m=1,2,3,from
Logistic[18.8205,1.01236].
for m =1SL =90% SL =95% SL =99%
Uc
12 (22.1,22.77) (22.1,22.997) (22.1,23.5757)
Lc
12 (14.9403,15.6) (14.7169,15.6) (14.1475,15.6)
Rc
12 (6.5,7.8297) (6.5,8.2801) (6.5,9.4282)
for m =2SL =90% SL =95% SL =99%
Uc
13 (22.1,23.2539) (22.1,23.5578) (22.1,24.3102)
Lc
13 (14.4641,15.6) (14.1651,15.6) (13.4251,15.6)
Rc
13 (6.5,8.7898) (6.5,9.3927) (6.5,10.8851)
for m =3SL =90% SL =95% SL =99%
Uc
14 (22.1,23.7001) (22.1,24.0702) (22.1,24.9755)
Lc
14 (14.0251,15.6) (13.6612,15.6) (12.771,15.6)
Rc
14 (6.5,9.675) (6.5,10.409) (6.5,12.2045)
6. Conclusion
In this paper we focused on the prediction of upper and lower records. The obtained
results are useful when people are interested in knowing extreme values on different
periods, areas, etc. and their range of variation. Theorem 3.1 suggests a new method
to estimate confidence intervals for upper, lower and range records. This new method
depends on constructing two pivotal statistics with the same distribution for lower and
upper current records. The real data Example 5.1, shows that when the cdf of the
data is unknown, this method is applicable with acceptable degree of accuracy, even
if we fail to assign the type of the distribution of the data with a high accuracy. It is
worth mentioning that the result and the method of the proofs of this paper are quite
different from the known results concerning the prediction problems of record values.
For example, Ahmadi and Balakrishnan (2004) used only the current records to estimate
the fixed quantiles of the given cdf (unknown cdf), while Raqab and Balakrishnan (2008)
obtained distribution-free prediction intervals for the usual records (not the current
records). Finally Raqab (2009) predicted the current records, by using the two-sample
prediction plan, where the variable to be predicted comes from an independent future
sample. In this paper, we consider the one-sample prediction plan, where the variable to
be predicted comes from the same sample so that it may be correlated with the observed
data.
H. M. Barakat, E. M. Nigm and R. A. Aldallal 267
Acknowledgement
The authors would like to thank the anonymous referees for constructive suggestions
and comments that improved the representation substantially.
Appendix
Proof of Lemma 3.1. By using (2.2), we get
P(U
n+mx|U
n=y) = P(Y0+Y1+···+Yn+···+Yn+mx|Y0+Y1+... +Yn=y)
=P(Yn+1+···+Yn+mxy|Y0+Y1+···+Yn=y) = P(Yn+1+···+Yn+mxy).(1)
On the other hand, since YiEX(1),for i=n+1,...,n+m,then
fU
n+m|U
n(x|y) = fYn+1+···+Yn+m(xy) = (xy)m1
(m1)!e(xy)I(0,)(xy),(2)
where IA(.)is the usual indicator function of the set A.Therefore, by combining (1) and
(2) with (2.1), we get
fU
n+m,U
n(x,y) = fU
n+m|U
n(x|y)fU
n(y)
=(xy)m1
(m1)!e(xy)2n(1
2ey/2)h1ey/2
n1
k=0
(logey/2)k
k!i
=2n1(xy)m1e(xy/2)
(m1)!h1ey/2
n1
k=0
yk
2kk!i.(3)
Now, by using the transformation ¯
Tm=U
n+mU
n
U
nand W=U
n,we get
f¯
Tm,W(t,w) = 2n1wmtm1ew(t+1
2)
(m1)!2n1tm1ew(t+1)
(m1)!
n1
k=0
wk+m
2kk!.
Thus, we conclude that
f¯
Tm(t) = Z
0
f¯
Tm,W(t,w)dw =2n1m tm1
(t+1
2)m+1
n1
k=0k+m
k2nk1mtm1
(t+1)k+m+1.
268 Exact prediction intervals for future current records and record range...
Similarly, we can show, for any xz0,that P(L
n+mx|L
n=z) = P(Zn+1+···+
Zn+mxz).Since ZiEX+(1),for i=n+1,...,n+m,then
fL
n+m|L
n(x|z) = fZn+1+···+Zn+m(xz) = ((xz))m1
(m1)!e(xz)I(,0)(xz).
Thus,
fL
n+m,L
n(x,z) = fL
n+m|L
n(x|z)fL
n(z)
=2n1((xz))m1e(xz/2)
(m1)!h1ez/2
n1
k=0
(z)k
2kk!i,xz0.
Now, by using the transformation Tm=L
n+mL
n
L
nand V=L
n,we get
fTm,V(t,v) = 2n1(v)mtm1ev(t+1
2)
(m1)!2n1tm1ev(t+1)
(m1)!
n1
k=0
(v)k+m
2kk!,v0,t0.
Then, we conclude that
fTm(t) = Z0
fTm,V(t,v)dv =2n1m tm1
(t+1
2)m+1
n1
k=0k+m
k2nk1mtm1
(t+1)k+m+1.
This completes the proof.
Proof of Lemma 3.2. Clearly, (3) yields
fU
n,U
n1(yn,yn1) = 2n2e(ynyn1/2)h1eyn1/2
n2
k=0yn1/2k
k!i.
On the other hand, by applying the same argument as in Lemma 3.1, we can show that
P(U
nyn,U
n1yn1|U
n2=yn2)
=P(Yn1+Ynynyn2,Yn1yn1yn2|Y0+Y1+···+Yn2=yn2)
=P(Yn1+Ynynyn2,Yn1yn1yn2).
Since, fYn1,Yn(yn1,yn) = eyn1yn,we get
fYn1,Yn1+Yn(yn1yn2,ynyn2) = e(ynyn2),yn2<yn1<yn.
H. M. Barakat, E. M. Nigm and R. A. Aldallal 269
Therefore, fU
n,U
n1|U
n2(yn,yn1|yn2) = e(ynyn2),which by using (2.1) implies
fU
n,U
n1,U
n2(yn,yn1,yn2) = e(ynyn2)fU
n2(yn2)
=2n3e(ynyn2/2)h1eyn2/2
n3
k=0
(yn2/2)k
k!i.
Therefore, by induction we get the claimed result for the upper current records and the
result for the lower current records can be proved by applying the same argument.
Proof of Lemma 3.3. Since the proof of the lemma for the two sequences {Uc
nkX}and
{Lc
nkX}are very similar, we only prove the lemma for the 1st sequence. For any two
positive integers t<s,we can easily, by applying the same argument in the proof of
Lemmas 3.1, 3.2, to show that
P(Uc
skXxs|Uc
1kX=x1,...,Uc
tkX=xt)
=P(U
sx
s|U
1=x
1,...,U
t=x
t) = P(Yt+1+···+Ysx
sx
t),
where x
i=2log[¯
FX(xi)],i=t,s.Therefore,
fUc
skX|Uc
1k,...,Uc
tkX(xs|x1,...,xt) = (x
sx
t)m1
(m1)!e(x
sx
t)I(0,)(x
sx
t).
This completes the proof.
References
Ahmadi, J. and Balakrishnan, N. (2011). Distribution-free prediction intervals for order statistics based on
record coverage. Korean Statistical Society, 40, 181–192.
Ahmadi, J. and Balakrishnan, N. (2008). Prediction intervals for future records. Statistics & Probability
Letters, 78, 395–405.
Ahmadi, J. and Balakrishnan, N. (2004). Confidence intervals for quantiles in terms of record range. Statis-
tics & Probability Letters, 68, 1955–1963.
Arnold, B. C., Balakrishnan, N. and Nagaraja, H. N. (1998). Records. Wiley, New York.
Basak, P. (2000). An application of record range and some characterization results. In: Balakrishnan, N.
(ed.) Advances on Methodological and Applied Aspects of Probability and Statistics. Gordon and
Breach Science Publishers, New york: 83–95.
Houchens, R. L. (1984). Record Value, Theory and Inference, Ph. D. Dissertation, University of California,
Riverside, CA.
Raqab, M. R. (2009). Distribution-free prediction intervals for the future current record statistics. Statistical
Papers, 50, 429–439.
... Since the k-th member of the sequence of the classical record-values is also known as the k-th record-value, the record-values defined in [21] is also called generalized record-values. Some properties and applications for current records are given in Barakat et al. [11]. Stigler [46] introduced the concept of order statistics process, which may be considered as fractional order statistics for non-integer index. ...
... where E Y (1) (t s ) and E Y (1) (t r ) are computed numerically. 26,14,27,15,16,16,11,10,14,12,15,40,29,13,20,41,31,28,11. ...
... where E Y (1) (t s ) and E Y (1) (t r ) are computed numerically. 26,14,27,15,16,16,11,10,14,12,15,40,29,13,20,41,31,28,11. ...
Article
Full-text available
In this paper, exact prediction intervals of the record-values process are constructed. The record-values process model, may be considered as the collection of record-values with integer or non-integer indices. It includes both usual k-th record-values and fractional k-th record-values models. For constructing the prediction intervals, two predictive pivotal quantities are developed. The distributions of the predictive pivotal quantities are derived and it is revealed that the distribution functions of the predictive pivotal quantities are similar for the upper and lower fractional record-values. More results are obtained for the exponential upper record-values process, including two point predictors and their exact mean square errors. Some e�cient algorithms are given and Monte Carlo simulation studies are conducted for comparing pivotal quantities. Finally, three real data sets are analyzed.
... Dataset I: Temperature Dataset This dataset, reported by Barakat et al. (2014)[23], depicts the average July temperatures ( • C) for Neuenburg, Switzerland, between 1864 and 1993. The observations are as follows. ...
Article
Full-text available
We present the truncated Lindley-G (TLG) model, a novel class of probability distributions with an additional shape parameter, by composing a unit distribution called the truncated Lindley distribution with a parent distribution function G(x). The proposed model’s characteristics including critical points, moments, generating function, quantile function, mean deviations, and entropy are discussed. Also, we introduce a regression model based on the truncated Lindley–Weibull distribution considering two systematic components. The model parameters are estimated using the maximum likelihood method. In order to investigate the behavior of the estimators, some simulations are run for various parameter settings, censoring percentages, and sample sizes. Four real datasets are used to demonstrate the new model’s potential.
... Barakat et al. [8] obtained some current record recurrence relations for some distributions and also moments recurrence relations for record range when the data follows the exponential distribution. Once again, Barakat et al. [9] worked on the current record and record range, but at this time, they established a prediction interval for a future value of them. To read more about the record range and current records and the applications titled to them, we pointed to Raqab [20], Ahmadi et al. [3] and Ahmadi and Balakrishnan [1] and [2]. ...
Article
Full-text available
This paper tries to find some formulas for calculating the moment generating function for upper and lower current records picked from generalized exponential distributed data and the joint moment generating function between them. After that, some formulas are derived from the previous ones to find the moments of each and the product moments of both upper and lower current records. Then, various recurrence relations are established for most of the mentioned formulas. After that, an integral form of the moments of record range is founded followed by a numerical example with simulated data to clarify the effectiveness of the formulas found in the study and how they can make the calculation process easier and faster. Finally, a conclusion part is added, to sum up what has been done and the results.
... By integrating the integal part we reach (5). Now for the product moments, we have ...
Article
Full-text available
To predict a future upper record value based on Kumaraswamy distributed data, an explicit expression for single and product moments has been established along with some enhanced expressions that makes the applying process on mathematical softwares easier. The best linear unbiased estimator approach for estimating the parameters and the prediction of future record values have been considered and some important tables have been created to help in the calculation processes. Two illustrative examples based on a simulation study and a real-life data are provided to assess the performance of the introduced results.
... Many authors have considered the prediction of future events, especially future order statistics and generalized order statistics, in the life-testing experiments. Among these authors are Aly et al. (2019), Barakat et al. (2014aBarakat et al. ( , 2020Barakat et al. ( , 2021a, Fan et al. (2019), Hsieh (1996, Kaminsky and Nelson (1998), Lawless (2003), Shah et al. (2020), Valiollahi et al. (2017), and Wu et al. (2020). ...
Article
Full-text available
In the present paper, two pivotal statistics are suggested to construct prediction intervals of future observations from the exponential and Pareto distributions in the context of ordered ranked set sample. Our study encompasses two cases. The first case, when the sample size is assumed to be fixed and the second case when the sample size is assumed to be a positive integer-valued random variable. In addition to deriving explicit forms for the distribution functions of the two pivotal statistics, we consider some special cases for the random size of the sample. Moreover, a simulation study is carried out to assess the efficiency of the suggested methods. Finally, an example representing lifetime data is analyzed.
... Inference based on record values have been extensively studied by many authors, including Kaminsky and Nelson [6] , Ahsanullah [7,8] , Dunsmore [9] , Nagaraja [10] , Balakrishnan et al. [11] , AL-Hussaini and Ahmed [12] , Raqab and Balakrishnan [13] , Barakat et al. [14] and Barakat et al. [15] , among others. The problem of reconstructing missing records based on current available records is of special interest in a wide fields of applications. ...
Article
Full-text available
In this paper, reconstructing past fractional upper (lower) records from any absolutely continuous distribution is proposed. For this purpose, two pivotal quantities are given and their exact distributions are derived. More detailed results, including the case of unknown parameters, are given for the exponential and Fr chet distributions. Moreover, the exact mean square reconstructor errors are obtained and some comparisons between the pivotal quantities are performed. To explore the efficiency of the obtained results, a simulation study is conducted and two real data sets are analyzed.
Article
In this paper, two linear predictors of the fractional kth upper record-value based on two-parameter exponential distribution are proposed. Moreover, a free scale-location predictive interval is constructed for the future fractional kth upper record-value. The prediction results are formulated in a general set-up relying on two fractional kth upper record-values. Some important distributional properties for each point predictor are revealed. Furthermore, the mean square error and Pitman’s measure of closeness are used to compare the point predictors. Finally, a simulation study is carried out and a real data set is analyzed to explore the efficiency of the suggested methods.
Article
Full-text available
We suggest a new method for constructing an efficient point predictor for the future order statistics when the sample size is a random variable. The suggested point predictor is based on some characterization properties of the distributions of order statistics. For several distributions, including the mixture distribution, the performance of the suggested predictor is evaluated by means of a comprehensive simulation study. Three examples of real lifetime data-sets are analyzed by using this method and compared with an efficient recent method given by Barakat et al. [1], that deals with non-random sample sizes. One of these examples predicts the accumulative new cases per million for infection of the new Coronavirus (COVID-19).
Article
Full-text available
Prediction of records plays an important role in many applications, such as, meteorology, hydrology, industrial stress testing and athletic events. In this paper, based on the observed current records of an iid sequence sample drawn from an arbitrary unknown distribution, we develop distribution-free prediction intervals as well as prediction upper and lower bounds for current records from another iid sequence. We also present sharp upper bounds for the expected lengths of the so obtained prediction intervals. Numerical computations of the coverage probabilities are presented for choosing the appropriate limits of the prediction intervals.
Article
Full-text available
Suppose upper records were observed from a X-sequence of iid continuous random variables, and that another independent Y-sequence of iid variables from the same distribution is to be observed. In this paper, we then derive various exact distribution-free prediction intervals for records from the Y-sequence based on the record values from the X-sequence. Specifically, distribution-free prediction intervals for individual records as well as outer and inner prediction intervals are derived based on X-records, and exact expressions for the coverage probabilities of these intervals are also derived. A data representing the records of temperatures is used to illustrate all the results developed here.
Article
In this paper, based on the largest and smallest observations at the times when a new record of either kind (upper or lower) occurs, we discuss the prediction of future order statistics. The proposed prediction intervals are distribution-free in that the corresponding coverage probabilities are known exactly without any assumption about the parent distribution other than that it being continuous. An exact expression for the prediction coefficient of these intervals is derived. Similarly, prediction intervals for future records based on observed order statistics are also obtained. Finally, two real-life data, one involving the average July temperatures in Neurenburg, Switzerland, and the other involving the amount of annual rainfall at the Los Angeles Civic Center, are used to illustrate the procedures developed here.
Article
Often, in industrial stress testing, meteorological data analysis, athletic events, and other similar situations, measurements may be made sequentially and only values larger or smaller than all previous ones are observed. When the number of records is fixed in advance, the data are referred to as inversely sampled record breaking data. In this paper, we introduce some properties of current records. Distribution-free confidence intervals are derived to estimate the fixed quantiles of an arbitrary unknown distribution, based on current records of an iid sequence from that distribution. Several universal upper bounds for the expectation of the length of the confidence intervals are derived. Some tables are also provided in order to choose the appropriate records. The results may be of interest in some life testing situations.
Record Value, Theory and Inference
  • R L Houchens
Houchens, R. L. (1984). Record Value, Theory and Inference, Ph. D. Dissertation, University of California, Riverside, CA.