ArticlePDF Available

Fuzzy Confidence Intervals by the Likelihood Ratio: Testing Equality of Means—Application on Swiss SILC Data

Authors:

Abstract and Figures

We propose a practical procedure of construction of fuzzy confidence intervals by the likelihood method where the observations and the hypotheses are considered to be fuzzy. We use the bootstrap technique to estimate the distribution of the likelihood ratio. The chosen bootstrap algorithm consists on randomly drawing observations by preserving the location and dispersion measures of the original fuzzy data set. A metric $$d_{SGD}^{\theta ^{\star }}$$ d SGD θ ⋆ based on the well-known signed distance measure is considered in this case. We expose a simulation study to investigate the influence of the fuzziness of the computed maximum likelihood estimator on the constructed confidence intervals. Based on these intervals, we introduce a hypothesis test for the equality of means of two groups with its corresponding decision rule. The highlight of this paper is the application of the defended approach on the Swiss SILC Surveys. We empirically investigate the influence of the fuzziness vs. the randomness of the data as well as of the maximum likelihood estimator on the confidence intervals. In addition, we perform an empirical analysis where we compare the mean of the group “Swiss nationality” to the group “Other nationalities” for the variables Satisfaction of health situation and Satisfaction of financial situation.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
SN Computer Science (2022) 3: 374
https://doi.org/10.1007/s42979-022-01257-z
SN Computer Science
ORIGINAL RESEARCH
Fuzzy Confidence Intervals bytheLikelihood Ratio: Testing Equality
ofMeans—Application onSwiss SILC Data
RédinaBerkachy1 · LaurentDonzé1
Received: 20 July 2021 / Accepted: 20 June 2022 / Published online: 15 July 2022
© The Author(s) 2022
Abstract
We propose a practical procedure of construction of fuzzy confidence intervals by the likelihood method where the obser-
vations and the hypotheses are considered to be fuzzy. We use the bootstrap technique to estimate the distribution of the
likelihood ratio. The chosen bootstrap algorithm consists on randomly drawing observations by preserving the location and
dispersion measures of the original fuzzy data set. A metric
d
𝜃
SGD
based on the well-known signed distance measure is con-
sidered in this case. We expose a simulation study to investigate the influence of the fuzziness of the computed maximum
likelihood estimator on the constructed confidence intervals. Based on these intervals, we introduce a hypothesis test for the
equality of means of two groups with its corresponding decision rule. The highlight of this paper is the application of the
defended approach on the Swiss SILC Surveys. We empirically investigate the influence of the fuzziness vs. the randomness
of the data as well as of the maximum likelihood estimator on the confidence intervals. In addition, we perform an empirical
analysis where we compare the mean of the group “Swiss nationality” to the group “Other nationalities” for the variables
Satisfaction of health situation and Satisfaction of financial situation.
Keywords Bootstrap technique· Likelihood ratio· Fuzzy confidence interval· Fuzzy statistics· Fuzzy hypotheses·
Equality of means· Fuzzy data· Statistical inference· Fuzzy analysis of variance (FANOVA)
Introduction andMotivation
A typical hypothesis testing procedure can be accomplished
by, for example, constructing confidence intervals for a par-
ticular parameter. This method is widely used in practice.
However, once we consider the data and/or the hypotheses
to be fuzzy, the corresponding statistical methods have to
be updated. Some approaches already exist in the theory of
fuzzy sets. For instance, Kruse and Meyer [17] presented a
theoretical definition of fuzzy confidence intervals. Several
researchers have afterwards proposed refined definitions
of fuzzy confidence intervals. For instance, Viertl and
Yeganeh [22] proposed a definition of the so-called confi-
dence regions. Their main application was in the Bayesian
context. Kahraman etal. [16] described some approaches
to the construction of fuzzy confidence intervals, as well as
the concept of hesitant fuzzy confidence intervals. Couso
and Sanchez [9] provided an approach that considers the
inner and outer approximations of confidence intervals in the
context of fuzzy observations. Unfortunately, these various
approaches are limited because they were all conceived to
test a specific parameter with a pre-defined distribution. It
would therefore be advantageous to develop a unified gen-
eral approach to fuzzy confidence intervals.
In classical statistics, the likelihood ratio method is con-
sidered an alternative tool for the construction of confidence
intervals. In the fuzzy environment, this method using uncer-
tain data has multiple advantages.
Gil and Casals [13] used the likelihood ratio in a hypoth-
esis testing procedure where fuzziness is contained in the
data. In Berkachy and Donzé [5], we proposed a practical
procedure to construct confidence intervals by the likeli-
hood ratio method which is seen in some sense general. The
This article is part of the topical collection “Computational
Intelligence” guest edited by Kurosh Madani, Kevin Warwick, Juan
Julian Merelo, Thomas Bäck and Anna Kononova.
* Rédina Berkachy
Redina.Berkachy@UniFR.CH
Laurent Donzé
Laurent.Donze@UniFR.CH
1 Applied Statistics andModelling, Department
ofInformatics, University ofFribourg, Boulevard de Pérolles
90, 1700Fribourg, Switzerland
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374374 Page 2 of 15
SN Computer Science
procedure can be easily adapted to specific cases. However,
the distribution of the likelihood ratio is a priori unknown
and has to be estimated or derived from strong assumptions.
Under classical assumptions, we note that this ratio is known
to be
𝜒2
-distributed with degrees of freedom correspond-
ing to the number of constraints applied to parameters. We
propose to use the bootstrap technique extended to the fuzzy
environment to estimate the distribution of the likelihood
ratio. A main contribution of Berkachy and Donzé [7] is to
provide two algorithms to constitute the bootstrapped sam-
ples mainly using the location and dispersion characteristics
calculated based on a new version of the signed distance
measure written as the
d𝜃
SGD
metric and detailed in Berkachy
[1]. We highlight that the Expectation-Maximization (EM)
algorithm based on the fuzziness of data described by
Denoeux [10] is used to calculate the maximum likelihood
estimators (ML-estimators).
The defended procedure is considered efficient and com-
putationally light because we do not have to consider every
single value of the support set of the involved fuzzy num-
bers, as in the traditional fuzzy method. Indeed, four con-
veniently chosen values are used in the construction process.
The presented calculations are done using the R package
FuzzySTs shown in [8]. We propose to use our fuzzy con-
fidence interval to test the equality of means. We expose
a procedure and give the corresponding decision rule. An
application on Swiss SILC data, described in [21], give us
the opportunity to apply and test our methods. Consideration
on sensitivity and robustness are also shown.
The paper is organised as follows. We open the paper
in “Definitions” with fundamental definitions of fuzziness.
In “The signed distance”, we present the definition of the
signed distance measure, followed by the definition of the
d𝜃
SGD
metric in “The
d
𝜃
SGD
Metric”. “Traditional fuzzy con-
fidence intervals” is devoted to the construction of the tra-
ditional fuzzy confidence intervals. In “Fuzzy confidence
intervals by the likelihood method”, we discuss our concept
of fuzzy confidence intervals constructed using the likeli-
hood method and detail the bootstrap algorithms to approxi-
mate the distribution of the likelihood ratio. In addition, a
simulation study illustrates the proposed algorithms. We end
the paper with “Application on SILC 2017” by the applica-
tion on the Swiss SILC data.
Denitions
Let us first expose the basic definitions and concepts of
fuzziness.
Definition 1 (Fuzzy set) If A is a collection of objects
denoted generically by x, then a fuzzy set or class
X
in A is
a set of ordered pairs:
where the mapping
𝜇
X
representing the “grade of member-
ship” is a crisp real valued function such that
is called the membership function.
It is useful to show the support and the kernel of a given
fuzzy set. They are given as follows:
Definition 2 (Support and kernel of a fuzzy set)
The support and the kernel of a fuzzy set
X
denoted
respectively by supp
X
and core
X
, are given by:
In other terms, the support of a fuzzy set
X
is a crisp set
containing all the elements such that their membership func-
tion is not zero. In the same manner, the core of the fuzzy
set
X
is a crisp set containing all elements with degree of
membership equal to one.
We often characterize a given fuzzy set by a collection
of crisp sets called the
𝛼
-level sets. They are given in the
following manner:
Definition 3 (
𝛼
-level set or
𝛼
-cut)
An
𝛼
-level set
of the fuzzy set
X
is the (crisp) set of
elements such that:
The
𝛼
-level set is a closed bounded and non-empty interval
denoted generally by
[
XL
𝛼
;
XR
𝛼]
where for
𝛼∈[0;1]
,
XL
𝛼
and
XR
𝛼
are the left and right hand sides of
called respectively
the left and right
𝛼
-cuts such that:
Furthermore, a fuzzy number
X
, also called Left–Right
(L–R) fuzzy number, can be represented by the family set
of his
𝛼
-cuts
{
X
𝛼
𝛼∈[0;1]
}
. This set is a union of finite
compact and bounded intervals
[
XL
𝛼
(𝛼);
XR
𝛼
(𝛼)
]
such that,
𝛼∈[0;1]
,
where
X
L
𝛼
(𝛼
)
and
X
R
𝛼
(𝛼
)
are the functions of the left and right
hand sides of
X
.
(1)
X
=
{
(x,𝜇
X
(x)) xA
},
𝜇
X
[0;1]
x𝜇
X
(x
)
(2)
supp
X=
{
x𝜇
X
(x)>0
},
(3)
core
X=
{
x𝜇
X
(x)=1
}.
(4)
X𝛼
=
{
xA𝜇
X
(x)
𝛼
}.
(5)
X
L
𝛼
=inf
{
x𝜇
X
(x)
𝛼)
}
and
XR
𝛼
=sup
{
x𝜇
X
(x)
𝛼)
}.
(6)
X
=
0𝛼1[
XL
𝛼(𝛼);
XR
𝛼(𝛼)
],
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374 Page 3 of 15 374
SN Computer Science
Remark 1 For sake of simplicity, common shapes of L–R
fuzzy numbers are often used in practice. We particularly
mention triangular fuzzy numbers denoted by a triplet as
X=(p,q,r)
, with
p,q
and
r
, and trapezoidal fuzzy
numbers denote by a quadruple as
X=(p,q,r,s)
, with
p,q,r
, and
s
.
The Signed Distance
The signed distance measure was firstly used in the con-
text of ranking fuzzy numbers by Yao and Wu [23]. It has
also served in some other contexts: Berkachy and Donzé
[3] used it in the assessment of linguistic questionnaires;
Berkachy and Donzé [6] used it in hypotheses testing; etc.
Although this measure is considered to be simple in terms
of computations, it has interested specialists because of its
directionality. This latter means that it can be positive or
negative, indicating the direction between two particular
fuzzy numbers. From another side, Dubois and Prade [11]
described it as the expected value of a given fuzzy number.
This measure is briefly written as follows:
Definition 4 (Signed distance between two fuzzy sets)
Let
X
and
Y
be two sets of the class of fuzzy sets
𝔽(
)
.
Their respective
𝛼
-cuts are written as
and
Y𝛼
such that
their left and right
𝛼
-cuts denoted respectively by
XL
𝛼
,
XR
𝛼
,
YL
𝛼
and
YR
𝛼
are integrable for all
𝛼∈[0;1]
.
The signed distance
dSGD
between
X
and
Y
is the mapping
such that
We are often interested by the signed distance of a par-
ticular fuzzy number measured from the fuzzy origin
0
as
follows:
Definition 5 (Signed distance of a fuzzy set)
The signed distance of the fuzzy set
X
measured from the
fuzzy origin
0
is given by:
d
SGD
𝔽(𝔽()
X×
Y
dSGD(
X,
Y),
(7)
d
SGD(
X,
Y)= 1
21
0[
XL
𝛼(𝛼)+
XR
𝛼(𝛼)−
YL
𝛼(𝛼)−
YR
𝛼(𝛼)
]
d𝛼
.
(8)
d
SGD(
X,
0)= 1
2
1
0[
XL
𝛼(𝛼)+
XR
𝛼(𝛼)
]
d𝛼
.
The
d
SGD
Metric
Although the signed distance
dSGD
is seen as advantageous in
terms of simplicity and accessibility, it presents also impor-
tant drawbacks as detailed in Berkachy [2]. The major ones
are given as follows:
1. Mainly because of its directionality, this distance cannot
be defined as a full metric. It lacks topological charac-
teristics, such as separability and symmetry.
2. It coincides with a central location measure. Thus, this
distance depends strongly on its extreme values. In other
words, neither the inner points between the extreme val-
ues nor the shape of the fuzzy numbers could affect this
measure.
For these reasons, we propose a new
L2
metric denoted by
d
𝜃
SGD
. It is seen as a generalisation of the signed distance
dSGD
. This new metric depends on a weight parameter called
𝜃
. Using
d
𝜃
SGD
, we take into account the deviation in the
shapes and its possible irregularities from one side, and the
central location measure from another one. This measure has
the necessary and sufficient conditions to constitute a metric
of fuzzy quantities as proved in Berkachy [2]. Let us first
define the so-called deviations of the shape of a given fuzzy
number written in terms of the distance
dSGD
:
Definition 6 (Left and right deviations [2])
Consider
X
to be a fuzzy number with its
𝛼
-level set
X𝛼
=[
XL
𝛼
,
XR
𝛼]
,
X
𝔽()
. The left and right deviations of the
shape of
X
denoted by
dev
L
X
and
dev
R
X
can be written by:
where
dSGD(
X,
0)
is the signed distance of
X
measured from
the fuzzy origin
0
.
We define now the new metric
d𝜃
SGD
as follows:
Definition 7 (The
d𝜃
SGD
distance [2])
Consider two fuzzy numbers
X
and
Y
of the class of non-
empty compact and bounded fuzzy numbers. Let
𝜃
be the
weight chosen for the modelling of the shape of these fuzzy
numbers such that
0𝜃1
. Based on the signed distance
between
X
and
Y
, the
L2
metric
d𝜃
SGD
is the mapping
(9)
devL
X(𝛼)=d
SGD
(
X,
0)−
XL
𝛼,
(10)
dev
R
X(𝛼)=
XR
𝛼
d
SGD
(
X,
0)
,
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374374 Page 4 of 15
SN Computer Science
such that
It is important at this stage to show a direct mathematical
relationship between the
d𝜃
SGD
metric and the signed dis-
tance
dSGD
. Therefore, let us recall the concept of the nearest
trapezoidal symmetrical fuzzy number. Further information
in addition to the detailed proof, are shown in [2]. Remark
that this concept will be used in the process of generation of
random samples in the forthcoming sections.
Definition 8 (Nearest trapezoidal fuzzy number [2])
The nearest symmetrical trapezoidal fuzzy number
S
written by the quadruple
S
=[s
0
2𝜖,s
0
𝜖,s
0
+𝜖,s
0
+2𝜖
]
to a fuzzy number
X
with respect to the metric
d𝜃
SGD
is given
such that
Fuzzy Condence Intervals foraPre‑Dened
Parameter
A fuzzy confidence interval is a very great tool for statisti-
cal inference. We first present the definition of a traditional
fuzzy confidence intervals in “Traditional fuzzy confidence
intervals”, followed by our procedure of estimation of inter-
vals by the likelihood ratio method and using the bootstrap
technique in “Fuzzy confidence intervals by the likelihood
method”. Based on the designed confidence intervals, we
also introduce a hypotheses test for the equality of means. A
simulation study is finally provided in “Simulation study”.
d𝜃
SGD 𝔽(𝔽()
+
X×
Yd𝜃
SGD
(
X,
Y)
,
(11)
d
𝜃
SGD(
X,
Y)=dSGD(
X,
Y)2+𝜃
1
0
max devR
Y(𝛼
)
devL
X(𝛼), devR
X(𝛼)−devL
Y(𝛼)
d𝛼
2
1
2.
(12)
s0
=d
SGD(
X,
0
),
(13)
𝜖
=9
14
dSGD
(
X,
0
)
3
71
0
XL
𝛼
(
2𝛼
)
d𝛼
.
Traditional Fuzzy Confidence Intervals
A given confidence interval is often produced for a particu-
lar parameter denoted by
𝜃
. In an epistemic approach, this
interval is considered to be fuzzy. This fuzziness is a direct
consequence of the fuzziness of the considered parameter.
Kruse and Meyer [17] proposed a main approach to write a
fuzzy confidence intervals in such conditions. Many proce-
dures have been derived to compute this interval. A known
one relies on considering a pre-defined distribution as seen
in the following construction procedure:
First, let
X1,,Xn
be a random sample of size n. We
consider this sample to be fuzzy, and we call
X1,,
Xn
its
fuzzy perception. For a particular parameter denoted by
𝜃
,
we are interested in testing the following hypotheses:
To accomplish this task, an idea could be to construct a
fuzzy confidence interval for
𝜃
at a given significance level
𝛿
. Based on [17], a two-sided fuzzy confidence interval
Π
for
𝜃
is defined by:
Definition 9 (Fuzzy confidence interval [17])
Let
[𝜋1,𝜋2]
be a symmetrical confidence interval for a
particular parameter
𝜃
at the significance level
𝛿
. A fuzzy
confidence interval
Π
is a convex and normal fuzzy set
such that its left and right
𝛼
-cuts, respectively written by
Π𝛼
=[
ΠL
𝛼
,
ΠR
𝛼]
, are written in the following manner:
The constructed fuzzy confidence interval has a confi-
dence of
1𝛿
if for a parameter
𝜃
, the equation
is verified. A one-sided fuzzy confidence interval is likewise
conceivable. A left one-sided fuzzy confidence interval at a
confidence level
1𝛿
denoted by
Π𝛼
is written by its
𝛼
-level
sets as follows:
In the same way, the
𝛼
-cuts of a right one-sided one are
written by:
H0𝜃=𝜃0against H1𝜃𝜃0.
Π
L
𝛼=inf {a∶∃xi∈(
Xi)𝛼,i=1, ,n,
such that 𝜋1(x1,,xn)a}
,
Π
R
𝛼=sup{a∶∃xi∈(
Xi)𝛼,i=1, ,n,
such that 𝜋
2
(x
1
,,xn)a
}.
(14)
P(
ΠL
𝛼
𝜃
ΠR
𝛼)
1𝛿,𝛼∈[0;1
]
Π𝛼
=[
ΠL
𝛼
,∞]
.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374 Page 5 of 15 374
SN Computer Science
Detailed examples illustrating this definition can be found
in Berkachy [2] and Berkachy and Donzé [7].
Fuzzy Confidence Intervals bytheLikelihood
Method
We presented in Berkachy and Donzé [7] and Berkachy and
Donzé [5] a generalisation of the traditional construction
procedure. The aim was to show a practical tool based on
the concept of likelihood ratio method to estimate fuzzy
confidence intervals, in which the fuzziness contained in
the variables is conveniently taken into consideration. We
highlight that the likelihood ratio is a common tool in clas-
sical statistics as well. In the fuzzy environment, Gil and
Casals [13] as instance used it in the context of hypotheses
testing. In this section, we briefly recall the defended proce-
dure. Note that further detailed information can be found in
Berkachy and Donzé [7] and Berkachy [2].
We first define the likelihood function of a fuzzy obser-
vation. Consider
Xi
to be a fuzzy variable, with its fuzzy
perception. Therefore, we denote by
Xi
a fuzzy random vari-
able (FRV) such that its corresponding fuzzy realisation
xi
is associated with a measurable membership function given
by
𝜇
x
i
in the sense of Borel, i.e.
𝜇
x
ix
[0;1]
. Based on the
probability concepts proposed in Zadeh [24], the likelihood
function described in the fuzzy context can be expressed by:
Definition 10 (Likelihood function of a fuzzy observation)
Let
𝜃
be a vector of fuzzy parameters in the parameter
space
Θ
. For a single fuzzy observation
xi
, the likelihood
function can be given by:
This probability can also be written using the
𝛼
-cuts of
the involved fuzzy numbers.
Let now
x be a fuzzy sample composed of all the fuzzy
realisations
xi
of the fuzzy random variables
Xi
,
i=1, ,n
.
The corresponding likelihood function L(
𝜃;x
)
can then be
given by:
Π𝛼
= [−∞,
ΠR
𝛼
]
.
(15)
L
(
𝜃;xi)=P(xi;
𝜃)=
𝜇xi(x)f(x;
𝜃)dx
.
(16)
L
(
𝜃;x)=P(x;
𝜃
)
(17)
=
𝜇x1(x)f(x;
𝜃)dx
𝜇xn(x)f(x;
𝜃)
dx
(18)
=
n
i=1
𝜇xi(x)f(x;
𝜃)dx
.
It is then important to write the log-likelihood function
l
(
𝜃
;
x)
as follows:
Now consider
𝜃
the maximum likelihood estimator (ML-
estimator) of the fuzzy parameter
𝜃
. The likelihood ratio
is written by:
such that
L
(
𝜃
;
x)
is the likelihood function related to the fuzzy
parameter
𝜃
, and
L
(
𝜃;x
)
is the likelihood function evaluated
at the estimator
𝜃
. It is essential at this stage to write the
logarithm of this ratio, given also by the difference between
the log-likelihood functions evaluated at
𝜃
and at
𝜃
. There-
fore, the statistic LR can be given in the following manner:
such that
L
(
𝜃
;
x
)
0
,
L
(
𝜃
;
x
)
0
and are both finite.
Under classical statistical assumptions in the crisp case,
the ratio LR is proven to be asymptotically
𝜒2
-distributed
with a given number of degrees of freedom. In the fuzzy
statistical theory, a main issue is that we do not have any
proven asymptotic property for the distribution of this ratio.
Hence, we propose to solve this problem using the so-called
bootstrap techniques.
We remind that constructing a
100(1𝛿)
% confidence
interval means to find every value of
𝜃
for which we reject
or we do not reject the null hypothesis
H0
. For this construc-
tion, consider
𝜂
to be the
(1𝛿)
-quantile of the distribu-
tion of the statistic LR. We could then write the confidence
interval by:
This latter is equivalent to
In other terms, the constructed interval has to be composed
of all possible values of
𝜃
, for which the log-likelihood
maximum varies by
𝜂
2
at most. Based on the statistic LR, a
mandatory condition for this construction is that for every
value of the parameter
𝜃
, the fuzzy confidence interval by
the likelihood ratio
ΠLR
given by its left and right
𝛼
-cuts
[(
Π
LR
)L
𝛼
;(
Π
LR
)R
𝛼]
has to verify the following equation
(19)
l
(
𝜃;x)=log L(
𝜃;x)
=log
𝜇x1(x)f(x;
𝜃)dx ++log
𝜇xn(x)f(x;
𝜃)dx
.
L
(
𝜃;x)
L
(
𝜃;x)
,
(20)
LR
=−2 log
L(
𝜃;x)
L(
𝜃;x)
=2
[
l(
𝜃;x)−l(
𝜃;x)
],
(21)
2[
l(
𝜃;x)−l(
𝜃;x)
]
𝜂
.
(22)
l
(
𝜃;x)
l(
𝜃;x)−
𝜂
2.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374374 Page 6 of 15
SN Computer Science
To insure the above-mentioned conditions, we propose the
following procedure of construction of fuzzy confidence
intervals as seen in [7] and [2].
Procedure
We propose a revisited approach of construction of fuzzy
confidence intervals by the likelihood ratio method, in which
the data set is assumed to be vague. Consequently, the log-
likelihood function becomes fuzzy-dependent, as well as
the considered parameter. We could directly deduce that the
needed ML-estimator have to be also fuzzy. Assume then
that the calculated crisp ML-estimator is modelled by a well-
chosen fuzzy number. It is natural to see that the support
set of this fuzzy number is the set of crisp elements. There-
fore, every element of this set have to be accordingly used
in the calculation process of the log-likelihood function.
However, this task seems to be very tedious because of the
computational burden of such process. Thereby, we suggest
to choose specific values which triggers the calculation of
the so-called threshold points. The process of calculating the
fuzzy confidence interval will be based on the intersection
between these threshold points and the log-likelihood curve.
The complete description of the procedure is as follows:
First, let us expose the so-called standardising func-
tion. This latter is intentionally proposed to preserve the
[0;1]-interval identity as a basic property of
𝛼
-level sets.
It is written as:
Definition 11 (Standardising function [1])
Consider a value
𝜃
contained in the the support set of a
fuzzy number
𝜃
, i.e.
𝜃
supp
(
𝜃
)
. The standardising func-
tion
Istand
is given by:
where
Ia
and
Ib
are arbitrary real values such that
Ia
l
(𝜃,
x
)
I
b
and
IaIb
. We have that
Istand(
l(𝜃,x)
)
is
bounded and
0
I
stand
(l(
𝜃
,
x))
1
.
The steps of the calculation process are written in the
following manner:
1. Let
𝜃
be a fuzzy parameter. We first have to calculate the
log-likelihood function
l
(
𝜃;x
)
shown in Eq.19.
2. The support and the core sets defining the fuzzy number
modelling the ML-estimator are composed of an infin-
ity of values. We choose the lower and upper bounds
(23)
P(
(
ΠLR)L
𝛼
𝜃
(
ΠLR)R
𝛼
)
1𝛿,𝛼∈[0;1]
.
I
stand
l(𝜃,x)Istand
(
l(𝜃,x)
)
=
l(𝜃,x)−Ia
Ib
Ia
,
of these sets only. The aim is to consider a reduced
number of elements only. Therefore, let p, q, r and s,
pqrs
, be the considered four elements, and
supp(
𝜃
) and core(
𝜃
) be respectively the support and
the core sets of
𝜃
. The four values p, q, r and s are then:
In addition, the fuzzy parameter is bounded and the sets
supp
(
𝜃
)
and
core(
𝜃)
are not empty. This leads to conclude
that the four values p, q, r and s always exist. We high-
light that our intentional choice of elements is in some
sense evident. Note that assuming the symmetry of the
probability function, the left and right-hand sides of a
log-likelihood function are monotonic and continuous.
3. Next,
𝜂
has to be estimated. The bootstrap technique is
suggested as described in the next section.
4. Based on the estimated parameter
𝜂
, we construct the
threshold values denoted by
I1
,
I2
,
I3
and
I4
correspond-
ing to the chosen values p, q, r and s, respectively. The
idea is to evaluate
𝜃
for each of the four values on the
right-hand side of Eq.22. The threshold values are then
calculated as follows:
5. Next, we calculate
Imin
and
Imax
, the minimum and maxi-
mum thresholds, written as:
Computing
Imin
and
Imax
and including them in the cal-
culation process are essential at this stage. The reason
for that is that we want to cover the entire interval of the
possible values verifying Eq.22.
6. We find now the intersection between the log-likelihood
function and the threshold values
I1
,
I2
,
I3
and
I4
. Con-
sider
𝜃L
1
,
𝜃L
2
,
𝜃L
3
,
𝜃L
4
and
𝜃R
1
,
𝜃R
2
,
𝜃R
3
,
𝜃R
4
to be the
intersection abscissas. Note that the letters “L” and “R”
refer to the left and right sides of a particular entity. We
calculate these abscissas by solving the following equa-
tions:
(24)
p=min(supp(
𝜃));q=min(core(
𝜃));
(25)
r=max(core(
𝜃)) and s=max(supp(
𝜃)).
(26)
I
1=l(p;x)−
𝜂
2
;I2=l(q;x)−
𝜂
2;
(27)
I
3=l(r;x)−
𝜂
2
and I4=l(s;x)−
𝜂
2.
(28)
Imin =min(I1,I2,I3,I4),
(29)
and Imax =max(I1,I2,I3,I4).
(30)
lL
(𝜃
L
1
;x)=I
1
and l
R
(𝜃
R
1
;x)=I
1,
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374 Page 7 of 15 374
SN Computer Science
7. Next, we compute the minimum and maximum left inter-
section abscissas given by
The minimum and maximum right intersection abscis-
sas are similarly written as:
We remark that the left and right side intersection
abscissas are single and real values.
8. The previously calculated entities are accordingly used
to construct the
𝛼
-cuts of the fuzzy confidence interval
using the likelihood ratio method
ΠLR
. We define the left
and right
𝛼
-cuts
(
Π
LR
)
𝛼
=
[
(
Π
LR
)L
𝛼
;(
Π
LR
)R
𝛼]
in the follow-
ing manner:
Note that Berkachy [2] gives the complete proof that the
defended fuzzy confidence interval
ΠLR
verifies Definition
9. Concerning the coverage rate, it is also proven that the
Eq.23 theoretically holds.
Bootstrap Technique fortheApproximation
oftheLikelihood Ratio andits Distribution
The bootstrap technique formally described by Efron [12]
is a great tool to empirically estimate a specific sampling
distribution using observed data. This technique is based on
drawing a large number of samples from a primary random
sample taken from an unknown distribution. This operation
leads to construct a so-called bootstrap distribution of the
(31)
lL
(𝜃
L
2
;x)=I
2
and l
R
(𝜃
R
2
;x)=I
2,
(32)
lL
(𝜃
L
3
;x)=I
3
and l
R
(𝜃
R
3
;x)=I
3,
(33)
lL
(𝜃
L
4
;x)=I
4
and l
R
(𝜃
R
4
;x)=I
4.
(34)
𝜃L
inf
=inf(𝜃
L
1
,𝜃
L
2
,𝜃
L
3
,𝜃
L
4
)
,
(35)
and
𝜃
L
sup
=sup(𝜃
L
1
,𝜃
L
2
,𝜃
L
3
,𝜃
L
4
)
.
(36)
𝜃R
inf
=inf(𝜃
R
1
,𝜃
R
2
,𝜃
R
3
,𝜃
R
4
)
,
(37)
and
𝜃
R
sup
=sup(𝜃
R
1
,𝜃
R
2
,𝜃
R
3
,𝜃
R
4
)
.
(38)
(
ΠLR)L
𝛼=
{
𝜃𝜃L
inf 𝜃𝜃L
sup and
𝛼=Istand
(
l(𝜃,x)
)
=
l(𝜃,x)−Imin
I
max
I
min },
(39)
(
ΠLR)R
𝛼=
{
𝜃𝜃R
inf 𝜃𝜃R
sup and
𝛼=Istand
(
l(𝜃,x)
)
=
l(𝜃,x)−Imin
I
max
I
min }.
statistic of interest. To sum up, this approach seems to esti-
mate such distributions using random simulation-based cal-
culation processes. The bootstrap technique has also served
in fuzzy statistics. As such, Gonzalez-Rodriguez etal. [15]
used it in the hypotheses testing procedure for the mean of
fuzzy random variables. In the same direction, Montenegro
etal. [18] concluded that a bootstrap process is considered
to be computationally lighter than asymptotic ones.
In our strategy, we propose to use a bootstrap methodol-
ogy to empirically estimate the distribution of the likeli-
hood ratio LR exposed in Eq.20, i.e. the difference of the
log-likelihood function evaluated at
𝜃
compared to the one
evaluated at
𝜃
. Berkachy and Donzé [7] has introduced two
approaches to construct the bootstrap imprecise samples as
follows:
1. The first one is based on simply generating with replace-
ment D bootstrap samples. For each sample, we calcu-
late after the needed deviance.
2. The second one is based on generating D samples by
preserving the couple of location and dispersion char-
acteristics respectively denoted by
(s0,𝜖)
, of the nearest
symmetrical trapezoidal fuzzy numbers described in
Definition 8. Note that these fuzzy numbers calculated
rely on the primary data set.
Further description of both approaches remain at disposal
in [7] and [2]. From [2], we can clearly see that no nota-
ble differences exist between the use of both algorithms.
Although the design of both algorithms is somehow differ-
ent, the obtained results seemed to be very similar. For this
reason, we will detail hereafter only the second approach,
considered to be more complicated than the first one, but
conceptually very attractive. The algorithm based on the
second bootstrap approach using the characteristics
(s0,𝜖)
is then given by the following steps:
Algorithm:
1. Consider a primary sample. For each observation of this
sample, calculate the set of characteristics
(s0,𝜖)
.
2. From the calculated set of characteristics
(s0,𝜖)
, ran-
domly draw with replacement and with equal prob-
abilities a new set of characteristics
(s0,𝜖)
. Construct a
bootstrap sample based on this set.
3. Calculate the deviance
2[
l(
𝜃;x)−l(
𝜃;x)
]boot
for each
bootstrap sample.
4. Recursively repeat the Steps 2 and 3 a large number D
of times. A bootstrap distribution composed of a number
D of values has to be constructed.
5. Calculate
𝜂
, the
(1𝛿)
-quantile of the bootstrap distri-
bution of the statistic LR.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374374 Page 8 of 15
SN Computer Science
This algorithm mandatory requires to calculate a maximum
likelihood estimator. For this task, Denoeux [10] proposed
a tool based on the fuzzy EM algorithm. This latter can be
computed using the R package EM.Fuzzy described in
[19]. Nevertheless, this methodology presents a drawback
since it produces a crisp estimator instead of a fuzzy one.
Unfortunately, methods for calculating a fuzzy maximum
likelihood estimator in such contexts is not yet established.
For this reason, we model the obtained EM crisp-based
estimator using a triangular symmetrical fuzzy number as
instance. The calculated crisp element will be chosen to be
the core of the modelling fuzzy number. Note that the choice
of the symmetrical shape of the fuzzy number is intentional,
since the purpose is to reduce as much as possible the com-
plexity relate to this choice.
We finally highlight that all the previously described steps
of the calculation procedure can be easily computed using
our R package FuzzySTs described in [8]. This package
is a complete user-friendly one, for which the development
is made for application purposes. In addition, a detailed
numerical example of the defended procedure with its inter-
pretation can be found in [7].
Inference: Comparison ofMeans
Fuzzy confidence intervals are very useful in statistical infer-
ence. We propose to use these intervals in a more or less
complex testing situation. Indeed, we introduce a pragmatic
approach to perform a hypotheses test for comparing the
means of groups using the constructed fuzzy confidence
intervals. The fuzzy analysis of variance (FANOVA) is often
used for this purpose as seen in Berkachy and Donzé [4],
Parchami and al. [20], and Gonzalez-Rodriguez and al. [14].
We complete this analysis by proposing a test based on fuzzy
confidence intervals. Our approach is as follows:
Similarly to the approach of a classical analysis of vari-
ance (ANOVA), we define the null hypothesis
H0
that the
means related to the two groups are equal, against the alter-
native one
H1
that the pair of means is not equal, at a signifi-
cance level
𝛿
. The null and alternative hypotheses
H0
and
H1
can then be written as follows:
where
𝜇1
and
𝜇2
are the means of the groups 1 and 2
respectively.
For the groups 1 and 2, we first construct the fuzzy con-
fidence intervals by the likelihood ratio method denoted by
ΠLR1
and
ΠLR2
. Our strategy of hypothesis testing is to ana-
lyse the overlapping between the fuzzy confidence inter-
vals for each group mean. The aim is to be able to identify
whether the means of groups are potentially equal or not,
using these intervals. In case of perfect overlapping, we
H0𝜇1=𝜇2, against H1𝜇1𝜇2,
could infer that there is no difference between the means.
Since the metrics described in “The
d
𝜃
SGD
Metric” are seen
as powerful alternative for the difference between two fuzzy
sets, we propose to calculate the
d
𝜃
SGD
metric of both con-
structed intervals, denoted by
d
𝜃
SGD
(
Π
LR1
,
Π
LR2)
. The objec-
tive is then to quantify the overlapping between them. We
highlight that the choice of the
d𝜃
SGD
metric is intentional
since taking into consideration the shape of the fuzzy con-
fidence intervals and its possible irregularities is crucial in
this situation. In addition, a mapping into
+
is important
on an absolute manner.
Next, we would like to define a decision rule according to
the obtained overlap between both fuzzy sets. As such, we
propose to “normalise” the calculated distance in order to
obtain a relative ratio. Thus, by translating one set, the other
remaining fixed, we calculate an optimal distance of rejec-
tion between the fuzzy confidence intervals, as the position
of the intervals such that both intervals become tangent. This
distance is denoted by
d
𝜃
SGD
(
Π
LR1
,
Π
LR2
)
opt
. The following
statistic R given by:
will help us to reject or not the null hypothesis. The decision
rule can then be written as follows:
Decision rule: The statistic R belongs to the interval
[0;1]. The rules are then:
The closer the statistic R is to the value 0, the strongest
we do not reject the null hypothesis
H0
.
The closer the statistic R is to the value 1, the strongest
we reject the null hypothesis
H0
.
Simulation Study
In [7], we have showed a simulation study illustrating the use
of the two defended bootstrap algorithms in the process of
calculation of the fuzzy confidence intervals. This study is
based of randomly generating data sets taken from a normal
distribution N(5,1) and composed by
N=50
, 100 and 500
observations. The observations are then modelled by trian-
gular symmetrical fuzzy numbers of spread 2.
Following the well-described procedure, the fuzzy confi-
dence intervals by the likelihood ratio for the theoretical of
the constructed data sets were computed at the confidence
level
1𝛿=10.05
. The algorithm presented in “Boot-
strap technique for the approximation of the likelihood ratio
and its distribution” has been used to estimate the boot-
strapped quantile
𝜂
.
In addition, since the number of iterations did not really
influence the outcome of the calculations,
D=1000
itera-
tions were considered for all our calculations. Concerning
(40)
R
=
d𝜃
SGD(
ΠLR1,
ΠLR2)
d𝜃
SGD
(
Π
LR1
,
Π
LR2
)
opt
,
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374 Page 9 of 15 374
SN Computer Science
the crisp-based estimators calculated using the fuzzy EM
algorithm, we have considered the following two fuzzy num-
bers to model the estimators:
the first one is a triangular symmetrical fuzzy number of
spread 2;
the second one is a triangular symmetrical fuzzy number
of spread 1.
We would like to explore the influence of the degree of
fuzziness in the modelling procedure of the estimators on
the constructed fuzzy confidence intervals. For sake of com-
parison, we additionally use the fuzzy sample mean as a
fuzzy estimator.
We show in Table1 the 95%-quantiles of the bootstrapped
distribution of the likelihood ratio, where data sets of size
50, 100 and 500 are considered. It is clear to see that the
quantiles corresponding to the considered sample sizes are
in some sense very close. We could directly remark also
that modelling the ML-estimator using less fuzziness (fuzzy
number with spread 1) leads to a lower quantile, compared to
modelling the ML-estimator using greater fuzziness (fuzzy
number with spread 2).
Based on the boostrapped quantiles shown in Table1,
we now calculate the fuzzy confidence intervals using the
likelihood ratio method following the instructions given in
Procedure”. For sake of simplicity, we will develop the
case with
N=500
observations only for the construction
of confidence intervals. Table1 gives the lower and upper
bounds of the support and the core sets of the calculated
fuzzy confidence intervals.
Concerning the interpretation of the choice of degree of
fuzziness related to the ML-estimators, it is clear to see that
less fuzziness leads a smaller support set of the calculated
confidence interval. In other terms, this choice affects the
obtained fuzzy confidence interval. Therefore, carefully
modelling the ML-estimator is crucial.
By traditional fuzzy tools, a fuzzy confidence
interval defined in the same settings is given by
Π=(3.907, 4.907, 5.080, 6.080)
. An important conclusion of
the difference between the traditional and the defended fuzzy
confidence intervals is that the core sets are slightly larger
in the case of bootstrap intervals using the ML-estimators.
Note that for these intervals, interpreting the spread of the
support sets is in some sense difficult since they are affected
by the degree of fuzziness of the ML-estimator. In case the
fuzzy sample mean is used as an estimator, the obtained
fuzzy confidence interval has tighter support and core sets
compared to the traditional fuzzy confidence interval.
Simulation Study onCoverage Rates
In [7], we have conducted a simulation study on coverage
rates corresponding to the fuzzy confidence intervals calcu-
lated using the likelihood ratio method. A large number of
data sets composed of
N=100, 500
and 1000 observations
are generated. We consider these data sets to be uncertain
and we model each observation by a triangular symmetrical
fuzzy number with a spread 2. The objective of this study
is to estimate for the mean, fuzzy confidence intervals by
the likelihood ratio method for one side, and the tradi-
tional fuzzy ones from another one, at the confidence level
1𝛿=10.05
, and consequently calculate the coverage
rates of these intervals in order to compare them. We note
that the ML-estimators were modelled by fuzzy numbers of
spreads 1 and 2. Similarly to the previous study, the fuzzy
sample mean was also used for the sake of comparison only.
Table 1 The 95%-quantiles of the bootstrapped distribution of LR
and the corresponding fuzzy confidence intervals by the likelihood
ratio (data set of 500 observations)—case of a data set taken from a
normal distribution N(5, 1) modelled using triangular symmetrical
fuzzy numbers at 1000 iterations
Bootstrap quantiles
Sample size N = 50 N = 100 N = 500
Bootstrap quantile using the sample mean 1.802 1.845 2.118
Bootstrap quantile using the ML-estimator (spread 2) 1.854 1.971 2.201
Bootstrap quantile using the ML-estimator (spread 1) 1.563 1.671 1.864
Fuzzy confidence intervals
Support set Core set
Lower Upper Lower Upper
Fci using the sample mean 3.991 5.996 4.945 5.042
Fci using the ML estimator (spread 2) 3.803 6.184 4.795 5.193
Fci using the ML estimator (spread 1) 4.303 5.685 4.797 5.191
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374374 Page 10 of 15
SN Computer Science
Some important conclusions of the comparison of coverage
rates mentioned in this study are as follows:
The difference between the coverage rates of the boot-
strap fuzzy confidence intervals and the ones of the tra-
ditional fuzzy interval is very slight.
The coverage rate of the fuzzy confidence intervals by the
likelihood method where the fuzzy sample mean is used
as an estimator, is the same as the rate for the traditional
fuzzy one.
All the coverage rates of LR fuzzy confidence guarantee
the required theoretical 95% confidence level.
The fuzziness of the ML-estimators do not influence the
coverage rates of the calculated fuzzy confidence inter-
vals.
Application onSILC 2017
The Swiss SILC surveys are large and complex surveys con-
ducted every year by the Swiss Federal Statistical Office
[21] on Statistics on Income and Living Conditions in
Switzerland. In our application, we use the data from the
2017 edition. After selecting only the active population, i.e.
persons which age greater or equal to 18, and persons who
are respondent for the whole household, we get a sample
of 8120 observations. In this study, we do not apply any
weighting scheme. We consider in particular two variables
describing respectively the health situation (PW5020) and
the financial condition (HQ5010). Both variables are coded
on a Likert scale from 0 to 10, where the value 0 means not
satisfied. We are mainly interested in the difference between
the group of Swiss nationality from one side, and the group
of other nationalities from another one. As such, we per-
form the defended calculation approaches for the variables
PW5020 and HQ5010 by each of both groups.
First of all, we would like to construct the fuzzy confi-
dence intervals by the likelihood ratio method at the con-
fidence level 95% for the variables PW5020 and HQ5010
as previously discussed. Each modality of these variables
is modelled by a triangular symmetrical fuzzy number of
spread 2. In other words, the value 3 is for example modelled
by the triangular fuzzy number (2,3,4). It is the same for all
the other modalities of both variables. The obtained intervals
for both variables are shown in Fig.1. The support and the
core sets of these intervals are also given in Table2. We
highlight that the bootstrap algorithm presented in “Boot-
strap technique for the approximation of the likelihood ratio
and its distribution” is used.
We focus the analyses of this section on two main axes
described as follows:
1. The influence of the variation in the fuzziness of the
ML-estimator from one side, and the influence of the
variation of the confidence level from another one on the
constructed FCI. These analyses are performed based on
the variable PW5020;
2. The hypothesis test on the means of the groups Swiss
vs. Other nationalities using the constructed fuzzy confi-
dence intervals as proposed in “Inference: comparison of
means”. This analysis is based on the variable PW5020
and on the variable HQ5010.
Fuzziness vs. Randomness
As previously mentioned, the objective of a first set of analy-
ses is to investigate the influence of the variation of fuzzi-
ness related to the ML-estimator from one side, and the vari-
ation of the confidence level from another one.
In our context, the ML-estimator is calculated by the EM-
algorithm defined in the fuzzy environment as seen in [10].
This approach unfortunately leads to a crisp-based estimator,
and we consequently need to re-fuzzify it. For this reason,
678910
FCI for the variable − Satisfaction of health −
for Swiss vs. Other nationalities
θ
α
0.00.2 0.40.6 0.81.0
FCI for Swiss nationality
FCI for Other nationalities
678910
FCI for the variable − Satisfaction of financial situation −
for Swiss vs. Other nationalities
θ
α
0.00.2 0.40.6 0.8 1.0
FCI for Swiss nationality
FCI for Other nationalities
(a)
(b)
Fig. 1 Fuzzy confidence intervals (FCI) of the variables Health
situation—PW5020(a) and Satisfaction of nancial
situation—HQ5010(b) at the confidence level 95%
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374 Page 11 of 15 374
SN Computer Science
we have proposed to model this estimator by a triangular
symmetrical fuzzy number of spread 2. Such construction
is in some sense natural. We are interested in investigating
the influence of such choice on the constructed confidence
intervals. For this reason, we propose on a second stage to
model these estimators by triangular symmetrical fuzzy
numbers of spread 1 in order to understand the influence of
such variation.
In Tables3 and 4, we show the support and the core
sets of the obtained confidence intervals by the described
procedure. By these tables, we could clearly confirm the
conclusion of the simulation study performed in “Simula-
tion study”. In other words, independently from the chosen
groups of the variable PW5020, the degree of fuzziness
of the fuzzy confidence interval is strongly affected by the
choice of fuzziness of the ML-estimator. For the same con-
fidence level, more fuzziness of the ML-estimator leads to
a greater support set of the constructed confidence interval.
Therefore, it would be ideal if a complete fuzzy-based
approach of calculation of ML-estimator exists.
In terms of confidence levels, Table3 shows the fuzzy
confidence intervals at the confidence level 95% while
Table4 gives the ones at the confidence level 70%. By com-
paring both tables, one could clearly remark that no impor-
tant variation in terms of spread of the constructed fuzzy
intervals, is depicted. A very small fluctuation is seen only.
A general conclusion that we could propose is that the ran-
domness in the data is less influencing the constructed fuzzy
confidence intervals than the fuzziness in the data and in the
ML-estimator.
Test onEquality ofMeans
We are now interested in comparing the means of groups
based on the interpretation of fuzzy confidence intervals. We
would like then to determine whether the means of the two
Table 2 The fuzzy confidence intervals by the likelihood ratio for the
groups “Swiss” and “Other” nationalities of the variables Health
situation—PW5020 and Satisfaction of nancial
situation—HQ5010 at the confidence level 95% using triangular
symmetrical fuzzy numbers of spread 2
Health situation—PW5020
Nationality Support set Core set
Lower Upper Lower Upper
Swiss 6.705 9.068 7.703 8.069
Other 6.549 9.161 7.544 8.166
Financial situation—HQ5010
Nationality Support set Core set
Lower Upper Lower Upper
Swiss 6.742 8.581 7.589 7.732
Other 6.078 7.942 6.951 7.072
Table 3 The fuzzy confidence intervals by the likelihood ratio for the groups “Swiss” and “Other” nationalities of the variable Health situ-
ation—PW5020 at the confidence level 95% using triangular symmetrical fuzzy numbers of spread 2
Swiss nationality
Support set Core set
Lower Upper Lower Upper
Fci using the ML estimator (spread 2) 6.705 9.068 7.703 8.069
Fci using the ML estimator (spread 1) 7.205 8.568 7.703 8.069
Other nationality
Support set Core set
Lower Upper Lower Upper
Fci using the ML estimator (spread 2) 6.549 9.161 7.544 8.166
Fci using the ML estimator (spread 1) 7.048 8.662 7.649 8.161
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374374 Page 12 of 15
SN Computer Science
groups “Swiss” versus “Other” nationality of the variables
PW5020 and HQ5010 are equal to each other, or not. Such
hypotheses test are usually made by the well-known analysis
of variance (ANOVA) in the classical statistical theory, or
by the fuzzy analysis of variance (FANOVA) in the fuzzy
set theory. For this reason, we perform the ANOVA as well
as the FANOVA for the variables PW5020 and HQ5010
by nationality. The summaries are given in Tables5 and 6
respectively.
Table5 shows that for the variable PW5020, the null
hypothesis of equality of means is not rejected with a p-value
of 0.605 by ANOVA and 0.597 by FANOVA. Let us now
Table 4 The fuzzy confidence intervals by the likelihood ratio for the groups “Swiss” and “Other” nationalities of the variable Health situ-
ation—PW5020 at the confidence level 70% using triangular symmetrical fuzzy numbers of spread 2
Swiss nationality
Support set Core set
Lower Upper Lower Upper
Fci using the ML estimator (spread 2) 6.715 9.058 7.713 8.059
Fci using the ML estimator (spread 1) 7.214 8.559 7.713 8.060
Other nationality
Support set Core set
Lower Upper Lower Upper
Fci using the ML estimator (spread 2) 6.570 9.139 7.565 8.145
Fci using the ML estimator (spread 1) 7.069 8.641 7.629 8.081
Table 5 The results of the classical analysis of variance (ANOVA) and the fuzzy analysis of variance (FANOVA) for the variable Health
situation—PW5020
AN OVA
Df Sum Sq F value P value
Factor 1 1 0.267 0.605
Residuals 7423 24437
FANOVA
Df Sum Sq F value P value
Factor 1 0.854 0.279 0.597
Residuals 7423 22740.454
Table 6 The results of the classical analysis of variance (ANOVA) and the fuzzy analysis of variance (FANOVA) for the variable Satisfac-
tion of nancial situation— HQ5010
AN OVA
Df Sum Sq F value P value
Factor 1 484 111.7 0
∗∗∗
Residuals 8084 35052
FANOVA
Df Sum Sq F value P value
Factor 1 451.175 112.049 0
∗∗∗
Residuals 8084 32551.067
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374 Page 13 of 15 374
SN Computer Science
perform the same hypothesis test but using the fuzzy confi-
dence intervals. First of all, the Fig.1a is a great tool to visu-
alise the potential equality of means. From this figure, we
could clearly see that the support and the core sets of both
intervals overlap completely. In other terms, we could have
an evident suspicion of the equality between the perception
of health situation by Swiss natives or Other nationalities.
We would like to apply now the testing procedure described
in “Inference: Comparison of means”. Thus, we calculate
the actual distance between the fuzzy confidence interval by
the likelihood ratio for the group “Swiss”, i.e.
ΠLRCH
, and the
one for the group “Other”, i.e.
Π
LR
OT
, using the metric
d
𝜃
SGD
.
This distance is given by
for which the optimal distance between these two intervals
can be found by the translation technique such that
The statistic R can then be computed using the Eq.40. We
get:
Decision rule: Since the statistic R is very close to the value
0, we strongly not reject the null hypothesis of equality of
means between both groups. This conclusion confirms our
ANOVA and FANOVA results.
We perform the same analysis for the variable HQ5010.
By Table6, it is evident to see that the null hypothesis of
equality of means is strongly rejected in the ANOVA and
the FANOVA with a P value of 0. If we consider the fuzzy
confidence intervals by the likelihood ratio for both studied
groups shown in Fig.1b, we remark that both intervals par-
tially overlap. In this case, the overlapping is exclusively a
matter of the support sets of the intervals. The core sets do
not overlap. By the hypotheses testing procedure presented
in “Inference: Comparison of means”, we calculate the dis-
tance
d
𝜃
SGD
(
Π
LRCH
,
Π
LROT )
and we get:
In this case, the maximal distance between the fuzzy confi-
dence interval by the likelihood ratio for the group “Swiss”,
i.e.
ΠLRCH
, and the one for the group “Other”, i.e.
ΠLROT
is
Therefore, the statistic R is calculated as:
Decision rule: Since the statistic R is relatively far from
the value 0, we tend to strongly reject the null hypothesis
d𝜃
SGD
(
Π
LRCH
,
Π
LROT
)=
0.111,
d𝜃
SGD
(
Π
LRCH
,
Π
LROT
)
opt
=
2.489.
R=0.045.
d𝜃
SGD
(
Π
LRCH
,
Π
LROT
)=
0.652.
d
𝜃
SGD
(
Π
LRCH
,
Π
LROT
)
opt
=
1.852.
R=0.352.
of equality of means between both groups, confirming our
ANOVA and FANOVA results.
To sum up, a clear conclusion of the use of such decision
rule is that the closer the statistic R to the value 0, the strong-
est we do not reject the hypothesis of equality of means.
However, once we move away from the neighbourhood of
0, we strongly enter into the rejection region. By our deci-
sion rule, one could reject or not the hypothesis of equality
of means with confidence. Accordingly, a final remark is
that the use of the statistic R has to be prudent since R is
in some sense an indicator and not a threshold of rejection.
The adopted criteria in our methodology is mainly related to
the shape and the position of the fuzzy confidence interval.
Conclusion
This study proposed a complete practical procedure of esti-
mation of fuzzy confidence intervals using the likelihood
ratio method. The bootstrap technique was also used. A
complex bootstrap algorithm based on preserving the loca-
tion and dispersion measures related to the
d
𝜃
SGD
metric is
exposed. Such calculations are often seen as expensive
in terms of computational burden. Our procedure is well-
designed in order to avoid such complexities. Based on
the developed fuzzy confidence intervals, we introduced a
hypothesis test for the equality of means of two groups with
its corresponding decision rule.
Our approaches have been applied on the Swiss SILC
surveys where two main axes of analyses were developed:
the influence of the fuzziness versus the randomness of the
data and of the maximum likelihood estimator on the con-
fidence intervals from one side, and the application of the
hypotheses testing procedure for comparing the mean of the
group “Swiss nationality” to that of the group “Other nation-
alities” for the variables Satisfaction of health situation and
Satisfaction of financial situation.
Main conclusions of these analyses are that the random-
ness in the data is less influencing the fuzzy confidence
intervals than the fuzziness in the data and of the ML-esti-
mator. Furthermore, one could clearly see that fuzzy confi-
dence intervals can be considered as a proper tool to perform
a preliminary analysis for the comparison of means. Thus,
the introduced statistic is in some sense an indicator of rejec-
tion of the hypothesis of equality of means of two groups.
Furthermore, we have to mention that it is often diffi-
cult to conduct a hypothesis test based on fuzzy confidence
intervals. It needs deeper knowledge of the hypotheses, the
considered estimators and distributions as instance. Based
on our constructed procedure, we managed to propose a con-
ceptually different way of comparison between two means.
This method shows also an indicator of overlapping, which
can be strongly helpful in the decision making process. From
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374374 Page 14 of 15
SN Computer Science
another side, the defended hypothesis test by the likelihood
ratio method is an interesting alternative to the classical test.
Our approach is practicable, direct, and easily reproducible
using our R toolbox given in Berkachy and Donzé [8]. This
latter is well conceived and implemented in a way of hav-
ing a good performance in terms of time and computational
burden needed for numerical purposes. Multiple benchmark
tests have been previously performed to ensure its perfor-
mance. Thus, this R package is a great tool, since it can be
easily used to carry out such calculations of fuzzy confi-
dence intervals in a user-friendly environment. We highlight
that fuzzy hypotheses can also be taken into consideration
in the calculation of these fuzzy intervals by our toolbox.
Finally, the encountered problem is mainly in the fuzzi-
fication of the crisp-based maximum likelihood estimator.
An approach leading to a fuzzy-based one is very welcome.
From another side, refining our decision rule regarding test-
ing the hypotheses of equality of means could also be inter-
esting to investigate in future research.
Funding Open access funding provided by University of Fribourg.
Declarations
Conflict of Interest The authors declare that they have no conflict of
interest.
Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adapta-
tion, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons licence, and indicate if changes
were made. The images or other third party material in this article are
included in the article's Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in
the article's Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a
copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.
References
1. Berkachy R. The signed distance measure in fuzzy statistical
analysis. Some theoretical, empirical and programming advances.
Ph.D. thesis, University of Fribourg, Switzerland, 2020.
2. Berkachy R. The signed distance measure in fuzzy statistical anal-
ysis. In: Theoretical, empirical and programming advances fuzzy
management methods. Berlin: Springer International Publishing;
2021. https:// doi. org/ 10. 1007/ 978-3- 030- 76916-1.
3. Berkachy R, Donzé L. Individual and global assessments with
signed distance defuzzification, and characteristics of the output
distributions based on an empirical analysis. In: Proceedings of
the 8th International Joint Conference on Computational Intel-
ligence—volume 1: FCTA,, pp. 75–82 2016. https:// doi. org/ 10.
5220/ 00060 36500 750082
4. Berkachy R, Donzé L. Fuzzy one-way ANOVA using the
signed distance method to approximate the fuzzy product. LFA
2018—Rencontres francophones sur la Logique Floue et ses
Applications, Cépaduès, 2018, pp 253–264.
5. Berkachy R, Donzé L. Fuzzy confidence interval estimation by
likelihood ratio. In: Proceedings of the 2019 Conference of the
International Fuzzy Systems Association and the European Soci-
ety for Fuzzy Logic and Technology (EUSFLAT 2019), 2019.
6. Berkachy R, Donzé L. Testing hypotheses by fuzzy methods: a
comparison with the classical approach. Cham: Springer Interna-
tional Publishing; 2019. p. 1–22.
7. Berkachy R, Donzé L. Fuzzy confidence intervals by the likeli-
hood ratio with bootstrapped distribution. In: Proceedings of the
12th International Joint Conference on Computational Intelligence
- FCTA. INSTICC, SciTePress; 2020. pp 231–242. https:// doi. org/
10. 5220/ 00100 23602 310242
8. Berkachy R, Donzé L. FuzzySTs: Fuzzy statistical tools, R pack-
age, url = https://CRAN.R-project.org/package=FuzzySTs (2020).
https:// CRAN.R- proje ct. org/ packa ge= Fuzzy STs
9. Couso I, Sanchez L. Inner and outer fuzzy approximations of con-
fidence intervals. Fuzzy Sets Syst 2011;184(1):68 – 83. https://
doi. org/ 10. 1016/j. fss. 2010. 11. 004. http:// www. scien cedir ect. com/
scien ce/ artic le/ pii/ S0165 01141 00045 50. Preference Modelling
and Decision Analysis (Selected Papers from EUROFUSE 2009).
10. Denoeux T. Maximum likelihood estimation from fuzzy data
using the EM algorithm. Fuzzy Sets Syst. 2011;183(1):72–91.
https:// doi. org/ 10. 1016/j. fss. 2011. 05. 022 (Theme: information
processing).
11. Dubois D, Prade H. The mean value of a fuzzy number. Fuzzy Sets
Syst. 1987;24(3):279–300. https:// doi. org/ 10. 1016/ 0165- 0114(87)
90028-5. http:// www. scien cedir ect. com/ scien ce/ artic le/ pii/ 01650
11487 900285. Fuzzy numbers.
12. Efron B. Bootstrap methods: another look at the jackknife. Ann
Stat 1979;7(1):1–26. http:// www. jstor. org/ stable/ 29588 30.
13. Gil MA, Casals MR. An operative extension of the likelihood ratio
test from fuzzy data. Stat Pap. 1988;29(1):191–203. https:// doi.
org/ 10. 1007/ BF029 24524.
14. Gonzalez-Rodriguez G, Colubi A, Gil MA. Fuzzy data treated as
functional data: a one-way ANOVA test approach. Comput Stat
Data Anal. 2012;56(4):943–55. https:// doi. org/ 10. 1016/j. csda.
2010. 06. 01.
15. Gonzalez-Rodriguez G, Montenegro M, Colubi A, Gil MA. Boot-
strap techniques and fuzzy random variables: synergy in hypoth-
esis testing with fuzzy data. Fuzzy Sets Syst 2006;157(19):2608
– 2613 (2006). https:// doi. org/ 10. 1016/j. fss. 2003. 11. 021. http://
www. scien cedir ect. com/ scien ce/ artic le/ pii/ S0165 01140 60020 89.
Fuzzy sets and probability/statistics theories.
16. Kahraman C, Otay I, Öztayşi B. Fuzzy extensions of confidence
intervals: estimation for
𝜇
,
2
, and p. Cham: Springer International
Publishing; 2016. p. 129–54.
17. Kruse R, Meyer KD. Statistics with vague data, vol. 6. Nether-
lands: Springer; 1987.
18. Montenegro M, Colubi A, Rosa Casals M, Ángeles Gil M. Asymp-
totic and bootstrap techniques for testing the expected value of a
fuzzy random variable. Metrika. 2004;59(1):31–49. https:// doi.
org/ 10. 1007/ s0018 40300 270.
19. Parchami A. EM Fuzzy: EM algorithm for maximum likeli-
hood estimation by non-precise information, R package, url =
https://CRAN.R-project.org/package=EM.Fuzzy (2018). https://
CRAN.R- proje ct. org/ packa ge= EM. Fuzzy.
20. Parchami A, Nourbakhsh M, Mashinchi M. Analysis of variance
in uncertain environments. Complex Intell Syst. 2017;3(3):189–
96. https:// doi. org/ 10. 1007/ s40747- 017- 0046-8.
21. Swiss Federal Statistical Office: Statistics on income and living
conditions (SILC). surveys 2015–2017 (2017). https:// www. bfs.
admin. ch/ bfs/ en/ home/ stati stics/ econo mic- social- situa tion- popul
ation/ surve ys/ silc. asset detail. 18227 44. html.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
SN Computer Science (2022) 3: 374 Page 15 of 15 374
SN Computer Science
22. Viertl R, Yeganeh SM. Fuzzy confidence regions. Cham: Springer
International Publishing; 2016. p. 119–27.
23. Yao JS, Wu K. Ranking fuzzy numbers based on decom-
position principle and signed distance. Fuzzy Sets Syst.
2000;116(2):275–88.
24. Zadeh L. Probability measures of fuzzy events. J Math Anal Appl.
1968;23(2):421–7.
Publisher's Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Testing one-way analysis of variance (ANOVA) is used for experimental data analysis in which there is a continuous response variable and a single independent classification variable. In this paper, we extend one-way ANOVA to a case where observed data are imprecise numbers rather than real numbers. Several fast computable formulas are calculated for symmetric triangular and normal fuzzy data. Similar to the classical testing ANOVA, the total observed variation in the response variable is explained as the sum of observed variation due to the effects of the classification variable and the observed variation due to random error. A real case is given to clarify the proposed method.
Book
The main focus of this book is on presenting advances in fuzzy statistics, and on proposing a methodology for testing hypotheses in the fuzzy environment based on the estimation of fuzzy confidence intervals, a context in which not only the data but also the hypotheses are considered to be fuzzy. The proposed method for estimating these intervals is based on the likelihood method and employs the bootstrap technique. A new metric generalizing the signed distance measure is also developed. In turn, the book presents two conceptually diverse applications in which defended intervals play a role: one is a novel methodology for evaluating linguistic questionnaires developed at the global and individual levels; the other is an extension of the multi-ways analysis of variance to the space of fuzzy sets. To illustrate these approaches, the book presents several empirical and simulation-based studies with synthetic and real data sets. In closing, it presents a coherent R package called “FuzzySTs” which covers all the previously mentioned concepts with full documentation and selected use cases. Given its scope, the book will be of interest to all researchers whose work involves advanced fuzzy statistical methods.
Chapter
Testing hypotheses could sometimes benefit from the fuzzy context of data or from the lack of precision in specifying the hypotheses. A fuzzy approach is therefore needed for reflecting the right decision regarding these hypotheses. Different methods of testing hypotheses in a fuzzy environment have already been presented. On the basis of the classical approach, we intend to show how to accomplish a fuzzy test. In particular, we consider that the fuzziness does not only come from data but from the hypotheses as well. We complete our review by explaining how to defuzzify the fuzzy test decision by the signed distance method in order to obtain a crisp decision. The detailed procedures are presented with numerical examples of real data. We thus present the pros and cons of both the fuzzy and classical approaches. We believe that both approaches can be used in specific conditions and contexts, and guidelines for their uses should be identified.
Chapter
Nous présentons une approche d'ANOVA simple étendue à l'environnement flou. Sachant que le calcul d'un produit flou s'avère compliqué, une approximation de ce produit est requise. Notre contribution est d'approximer la différence entre deux nombres flous par une méthode intitulée méthode de la ”signed distance”. Le but est alors de réduire la complexité du calcul du produit. De plus, nous avons trouvé que dans des conditions particulières, notre approche retourne la même décision émise par l'approche classique. De ce fait, l'approche classique est à nos yeux vue comme un cas particulier de l'approche défendue où l'on considère des nombres ”crisp” qui sont bien évidemment des nombres flous particuliers. Nous clôturons ce papier par deux exemples numériques détaillés suivis d'une comparaison de la décision émise par l'approche floue avec celle émise dans le cas classique.
Chapter
Confidence regions are usually based on exact data. However, continuous data are always more or less non-precise, also called fuzzy. For fuzzy data the concept of confidence regions has to be generalized. This is possible and the resulting confidence regions are fuzzy subsets of the parameter space. The construction is explained for classical statistics as well as for Bayesian analysis. An example is given in the last section.
Chapter
Even though classical point and interval estimations (PIE) are one of the most studied fields in statistics, there are a few numbers of studies covering fuzzy point and interval estimations. In this pursuit, this study focuses on analyzing the works on fuzzy PIE for the years between 1980 and 2015. In the chapter, the literature is reviewed through Scopus database and the review results are given by graphical illustrations. We also use the extensions of fuzzy sets such as interval-valued intuitionistic fuzzy sets (IVIFS) and hesitant fuzzy sets (HFS) to develop the confidence intervals based on these sets. The chapter also includes numerical examples to increase the understandability of the proposed approaches.