Conference PaperPDF Available

# A Problem in the Bayesian Analysis of Data without Gold Standards

Authors:
A Problem in the Bayesian Analysis of Data without Gold Standards
Nick Gray, Marco De Angelis, Dominic Calleja and Scott Ferson
Institute for Risk and Uncertainty, University of Liverpool, United Kingdom.
E-mail: nickgray@liverpool.ac.uk
We review methods of calculating the positive predictive value of a test (the probability of having a condition given
a positive test for it) in situations where there is no ’gold standard’ way to determine the true classiﬁcation. We show
that Bayesian methods lead to illogical results and instead show that a new approach using imprecise probabilities
is logically consistent.
Keywords: diagnostics, Bayes’ rule, false positives, prevalence, sensitivity, speciﬁcity, uncertainty, gold standard
1. Introduction
In many applications, the problem with classiﬁca-
tion is the lack of a ‘gold standard’ of evidence.
That is to say when we cannot decisively assign
an observation to a particular category. This phe-
nomenon is pervasive, arising in many ﬁelds, from
structural health in engineering, to supervised
learning in computer science, or patient diagnosis
in medicine. Tests, on which classiﬁcations rest,
are often imperfect; they yield false alarms (false
positives), fail to detect threats (false negatives),
or are prone to other misclassiﬁcations.
For instance, medical practitioners commonly
diagnose a patients health condition based on
some diagnostic, which in isolation is not deﬁni-
tive. Although sometimes multiple tests can be
combined together in order to become deﬁnitive.
(Joseph et al. 1995, Albert 2009) The diagnostic
result has some statistical uncertainty associated
with detecting the true health state. Naively in-
terpreting the result from a medical test can lead
to an incorrect assessment for a patient’s true
health condition. Bayes’ rule is commonly used to
estimate the actual probability of some individual
being a member of a class, subject to some piece
of evidence. (Mossman & Berger 2001)
It is often impractical or even impossible to
know gold standard information about classiﬁca-
tion. In medicine there are many diseases for
which there is no way to conclusively determine
whether a patient has a particular disease. For in-
stance, for patients with Giant Cell Arteritis, even
after a biopsy has been undertaken there is still
uncertainty about the true health state. (Hunder
et al. 1990)
There is also the situation where gold stan-
dard information can only be gathered from some
classes, yet for others it is unknown. For ex-
ample, some prison authorities use classiﬁcation
algorithms to assess whether a prisoner is likely
to reoffend when released from prison on parole.
(Fry 2018) As authorities are unwilling to release
prisoners if the test says that they are likely to
whether the recidivism test was accurate or not. If
a prisoner fails the test, and thus remains impris-
oned, there is no available data on whether they
would have re-offended had they been released.
Therefore, we could never know if it was a true
negative or a false negative.
Winkler & Smith (2004) have argued that the
traditional application of Bayes’ rule in medical
counselling is inappropriate and represents a con-
fusion in the medical decision-making literature.
They propose in its place a radically different for-
mulation that makes special use of the information
about the test results for new patients, although
not their actual disease status. As the ground
truth cannot be established, they instead construct
two new confusion matrices; one based upon the
assumption the test is a true positive, and the
other assuming the test is a false positive. They
then make use of these alternative facts in order
to update the tests sensitivity, speciﬁcity and un-
derlying prevalence of the disease, thus reducing
the test’s uncertainty asymptotically as the test is
applied.
According to Google Scholar, the Winkler and
Smith paper has only been referenced 11 times
since its publication: Jafar et al. (2007), Finkel
(2008), Zuk (2008), Proeve (2009), Weber (2009),
Raab (2010), Low-Choy et al. (2011), Cuevas
(2015), Cuevas et al. (2016), Rzepi´
nski (2018),
Rushdi & Rushdi (2018). The majority of these
are in papers relating to medical decision making.
However, Proeve (2009) concerns child abuse de-
cision making and Low-Choy et al. (2011) con-
cerns plant pest dispersal. Only Cuevas et al.
(2016) appears to actually make use of their
method. Nevertheless, we think their argument
deserves a clear rebuttal because of the centrality
of the issue in diagnostic testing, and the remark-
able delicacy of Bayesian reasoning by which
Proceedings of the 29th European Safety and Reliability Conference.
Edited by
Michael Beer and Enrico Zio
2019 European Safety and Reliability Association.
Research Publishing, Singapore.
ISBN: 978-981-11-2724-3; doi:10.3850/978-981-11-2724-3 0458-cd 2628
Proceedings of the 29th European Safety and Reliability Conference
2629
such a profound disagreement could emerge and
escape resolution for many years.
2. The Standard Bayesian Approach to
Calculating Positive Predictive Value
Throughout this paper, we will refer to the fol-
lowing hypothetical dataset for TNtrials of a
diagnostic test for condition D: we have αtrue
positives, βfalse positives, γfalse negatives and
δtrue negatives. Let T+be the total number of
positive tests, Tthe total number of negative
tests, TSthe total number of sick people and TW
the total number of well people. This allows us to
construct the confusion matrix shown in Table 1.
Table 1. Sample data.
Has Problem No Problem Total
Positive Test α β T+
Negative Test γ δ T
Total TSTWTN
The probability that someone has the disease
given they have had a positive test result is given
by Pr(D|+), whilst the probability that they do
not have the disease is Pr(¬D|+). Throughout,
we will only consider positive test results, how-
ever all the arguments made could equally apply
to negative test results. The prevalence is given by
p=TS
TT
=α+γ
α+β+γ+δ,(1)
the sensitivity by
s=α
TS
=α
α+γ,(2)
and the speciﬁcity by
t=δ
TW
=δ
β+δ.(3)
We will also deﬁne the ratio of positive tests out
of total number tested as
h=T+
TT
=α+β
α+β+γ+δ.(4)
Often the values of p,sand tare published inde-
pendent of each other. In which case the following
notation can also be used: p=pk/pn,s=sk/sn
and t=tk/tn.
Given the published values of p,sand t, Bayes’
rule gives the probability that a patient has D
given they have tested positive is:
Pr(D|+, p, s, t) = ps
ps + (1 p)(1 t).(5)
(Baron 1994, Lesaffre & Lawson 2012) We are
usually more interested in obtaining the proba-
bility that is only conditioned on a positive test
outcome, Pr(D|+), this is also known as the
positivew predictive value (PPV). When p,sand
tare available in scalar form, we can obtain
Pr(D|+) = Pr(D|+, p, s, t).
The Mossman & Berger (2001) paper takes this
a step further and considers the following hypo-
thetical situation:
Mr Smith has tested positive for disorder D,
he asks his doctor the following ”Given the pub-
lished estimates for prevalence, sensitivity and
speciﬁcity, what is the 95% conﬁdence interval for
my probability of having Dgiven my positive test
results and the imprecision in the estimates?”
When there is uncertainty about the values of
p,sand tthey can be described by distributions
Smith & Winkler (1999), Smith et al. (2000).
There are a couple of ways in which in PPV can
be determined, the simplest is to estimate it using
the 5 but replacing p,sand twith their expected
values:
Pr (D|+, p, s, t) =
E(p)E(s)
E(p)E(s) + (1 E(p))(1 E(t))
(6)
In order to obtain a distribution for the PPV,
Mossman & Berger (2001) use a convolution of
the distributions of p,sand twithin Eq. 5. In
their numerical calculation they sample random
variables from the distributions of p,sand tand
use Eq. 5 to ﬁnd the distribution of the PPV.
Both Mossman & Berger (2001) and Winkler &
Smith (2004) use Jeffery’s prior for p,sand t:
fj(x|a, b) = B(x|a+ 1/2, b a+ 1/2) (7)
where Bis the beta distribution, however alterna-
Using priors deﬁned by pk= 5,pn= 100,
sk= 9,sn= 10,tk= 9 and tn= 10, where
the values of p,sand thave come from inde-
pendent trials, the Mossman and Berger method
gives the 95% conﬁdence interval for the PPV as
[0.075,0.824].
3. The Winkler and Smith Method
Winkler & Smith (2004) diverges from Mossman
& Berger (2001) and the textbook method, see
Lesaffre & Lawson (2012) as an example, they
argue that the outcome of a patient’s test should
be used to update the distribution. They assert that
the PPV of a positive medical test for a disease is
not Eq. 5, Eq. 6 or Mossman and Berger’s (2001)
”objective bayesian method.” Rather, it should
be computed as a weighted average of assuming
the positive result is a true positive (and accord-
ingly augmenting the estimates for prevalence and
sensitivity) and assuming the positive result is a
false positive (and accordingly decrementing the
estimates for prevalence and speciﬁcity).
2630
Proceedings of the 29th European Safety and Reliability Conference
Starting with the same sample data as Table 1,
Winkler and Smith suggest that we should con-
struct two new confusion matrices, one assuming
a true positive and one assuming a false positive,
Table 2 and Table 3 respectively.
Table 2. New confusion matrix assuming true positive.
Has Problem No Problem Total
Positive Test α+ 1 β T++ 1
Negative Test γ δ T
Total TS+ 1 TWTN+ 1
Table 3. New confusion matrix assuming false positive.
Has Problem No Problem Total
Positive Test α β + 1 T+
Negative Test γ δ T+ 1
Total TSTW+ 1 TN+ 1
Assuming a true positive the prevalence, sensi-
tivity and speciﬁcity become:
p
S=TS+ 1
TN+ 1 (8)
s
S=α+ 1
TS+ 1 (9)
t
S=t. (10)
Similarly for the assuming false positive case:
p
W=TS
TN+ 1 (11)
s
W=s(12)
t
W=δ
TW+ 1 (13)
They then say to construct a weighted average
of the new confusion matrices using
Pr(D|+) =
f(p
S, s
S, t
S) Pr(D|+, p, s, t)
+f(p
W, s
W, t
W) Pr(¬D|+, p, s, t).
(14)
Where f(p, s, t)is a convolution of the Jeffery’s
prior distributions (Eq 7) for p,sand t:
f(p, s, t) = fj(p|pk, pn)fj(s|sk, sn)fj(t|tk, tn),
(15)
Pr(D|+, p, s, t)is calculated using Eq. 6 and
Pr(¬D|+, p, s, t) = 1 Pr(D|+, p, s, t). Re-
turning to the numerical example at the end of 2
(pk= 5,pn= 100,sk= 9,sn= 10,tk= 9
and tn= 10), we ﬁnd that the 95% conﬁdence
interval is [0.062,0.741]. Notice that the width of
this conﬁdence interval is smaller than that of the
Mossman and Berger method.
3.1. The Logical Inconsistency with the
Winkler and Smith Method
In order to show that the Winkler and Smith
method is ﬂawed we will explore the situation in
which you are conducting more than one test and
the number of tests approaches inﬁnity, we will
then contrast this with the imprecise probability
approach which, whilst not providing useful in-
formation, at least makes some sense.
As Winkler and Smith make use of the test
result in their calculations, Instead of just consid-
ering the effect of using just one test, we could
also consider what happens after Xpositive tests.
Using the Winkler and Smith method we would
have the following two confusion matrices: Ta-
ble 4 and Table 5.
Table 4. New confusion matrix assuming true positive for X
positive tests.
Has Problem No Problem Total
Positive Test α+X β T++hX
Negative Test γ δ T
Total TS+X TWTN+X
Table 5. New confusion matrix assuming true positive for X
positive tests.
Has Problem No Problem Total
Positive Test α β +X T++hX
Negative Test γ δ T
Total TSTW+X TN+X
In the assuming true positive case the new
prevalence would be given by
p
+=TS+X
TT+X,(16)
the new sensitivity by
s
+=α+X
α+X+γ,(17)
and speciﬁcity wouldn’t change.
t
+=t(18)
Using Table 5 we get the new assuming false
positive case we get the new prevalence as
p
=TS
TT+X(19)
Proceedings of the 29th European Safety and Reliability Conference
2631
and the new sensitivity wouldn’t change
s
=s(20)
and speciﬁcity would be.
t
=δ
β+X(21)
We will now consider what happens to p
+,s
+,t
+,
p
,s
and t
when the number of tests becomes
large (X→ ∞). We ﬁnd
lim
X→∞
p
+= 1,(22)
lim
X→∞
s
+= 1,(23)
lim
X→∞
t
+=t, (24)
lim
X→∞
p
= 1,(25)
lim
X→∞
s
=s, (26)
and
lim
X→∞
t
= 0 (27)
Hence, using Equation 5 we get Pr(D|+) = 1
which implies that Pr(¬D|+) = 0. Naively
interpreting this result at face value would let you
conclude that any positive test at this limit is a
true positive alternatively there is no such thing
as a false negative. Winkler and Smith do not
say to use this result though they would next use
Equation 14 along with Equation 15. At the limit
this give us:
Pr(D|+, p, s, t) =fβ(p|TS+X, TT+X)×
fβ(s|α+X, α +γ+X)×
fβ(t|δ, β +δ).
(28)
As
lim
a,b→∞
fβ(x|a, b) = x= 1/2
0x6= 1/2(29)
and
Z
−∞
Pr(x)dx= 1 (30)
then the cumulative distribution for the PPV be-
comes
Pr(D|+, p, s, t) = 0x < 1/2
1x1/2.(31)
Figure 1 shows this migration from the ﬁrst test,
X= 1 towards the result in Equation 31 starting
from the Section 2 example data set (pk= 5,
pn= 100,sk= 9,sn= 10,tk= 9 and
tn= 10). These results amount to a logical ﬂaw in
the Winkler and Smith method. We haven’t added
any new information (apart from the number of
tests) and the uncertainty has reduced. It should
be noted that this asymptote is due to the choice
of prior. Interestingly, and perhaps worryingly,
different priors would give different values in the
limit X→ ∞.
PPV
Cumlative distribution
0.00 0.25 0.50 0.75 1.00
0.00 0.25 0.50 0.75 1.00
Fig. 1. Plot of the CDF for the ﬁrst 1000 tests using the
Winkler and Smith method.
4. Imprecise Probability Approach
It is possible to reconsider the argument made by
Winkler and Smith using a framework provided
by the theory of imprecise probabilities. (Wal-
ley 1991, 1996, Walley et al. 1996) Under this
perspective, the prevalence p, sensitivity s, and
speciﬁcity tcan each be updated as prescribed
by Winkler and Smith to yield a distribution for
the PPV assuming the patient is actually sick (in
which case the test was a true positive) and a
distribution for the PPV assuming the patient is
actually not sick (in which case the test was a false
positive). However, the appropriate synthesis of
these two contingent estimates of the PPV is not a
weighted mixture as Winkler and Smith conceive
it. Instead, because whether the patient is sick or
not is precisely what is unknown in this problem,
an envelope of the two distributions would be
more appropriate.
2632
Proceedings of the 29th European Safety and Reliability Conference
Returning to the numerical example in sec-
tion 2, with priors for p,sand timplied by pk= 5,
pn= 100,sk= 9,sn= 10,tk= 9 and
tn= 10. The envelope of the two contingent
distributions yields a rather wide probability box,
Ferson et al. (2003), that is shown as the outer
bounds in black in Figure 2. The leftmost edge
corresponds to the distribution that assumes the
positive test result was a false positive, increment-
ing pand taccording to Table 3 . The rightmost
edge corresponds to the distribution that assumes
the test result was a true positive, incrementing
pand saccording to Table 2. This envelop-
ing calculation is equivalent to a mixture with
unknown weights characterised by the vacuous
interval [0,1] for both distributions. In contrast,
the traditional Bayesian result and the Winkler
and Smith method are both also shown in the
ﬁgure. We see that the envelope encloses both
the traditional and the Winkler and Smith distribu-
tions. The 95% conﬁdence interval using impre-
cise probabilities is [0.057,0.848], which as ex-
pected encompanses the interval for both the tradi-
tional Bayesian reusult and the Winkler and Smith
result. This envelope is reminiscent of prob-
abilistic dilation of uncertainty that sometimes
accompanies the addition of weakly informative
data in probabilistic calculations.(Seidenfeld &
Wasserman 1993) In this case, the unveriﬁed test
result is certainly information, but it does not seem
to be information that leads to a contraction of
uncertainty about what the test result itself means.
PPV
Cumlative distribution
0.00 0.25 0.50 0.75 1.00
0.00 0.25 0.50 0.75 1.00
Imprecise Probability
Mossman−Berger
Winkler−Smith
Fig. 2. P-box showing the distribution envelope for the PPV.
4.1. Logical Consistency
As we said that the Winkler and Smith methods
becomes logically inconsistent when we consider
it in the extreme scenario we will now show that
using imprecise probabilities leads to at least a
logical result in the limit. What the imprecise
probability confusion matrix would be after X
tests is shown in Table 6
Table 6. New imprecise probability confusion matrix after
Xpositive tests
Has Problem No Problem Total
Positive
Test α+X[0,1] β+X[0,1] T
+
Negative
Test γ[0,1] δ[0,1] T
Total T
ST
WTN+X
The new prevalence would be
p=T
S
TN+X
=α+γ+X[0,1]
TN+X,
(32)
new sensitivity
s=α+X[0,1]
α+γ+X[0,1],(33)
and new speciﬁcity
t=δ
β+δ+X[0,1].(34)
Now at the X→ ∞ limit:
lim
X→∞
p= [0,1] (35)
lim
X→∞
s= 1 (36)
lim
X→∞
t= 0 (37)
Using these results along with Eq. 5 gives:
Pr(D|+) = [0,1] (38)
as the ﬁnal value for the PPV Figure 3 shows the
migration to the vacuous p-box using the impre-
cise probability method starting from Mossman &
Berger (2001) sample data.
Proceedings of the 29th European Safety and Reliability Conference
2633
PPV
Cumlative distribution
0.00 0.25 0.50 0.75 1.00
0.00 0.25 0.50 0.75 1.00
1 10 100 1000
Fig. 3. Plot of the p-boxes for ﬁrst 1000 tests using the
imprecise probability method.
5. Discussion
Let us ﬁrst consider the difference between the
Winkler and Smith method and the imprecise
probability method. Figure 1 shows that as the
number of tests increases the uncertainty on the
PPV decreases. This amounts to a reductio ad
absurdum thus proving their method untenable.
This uncertainty reduction happens even after one
test as demonstrated in the numerical example in
Section 3
In the imprecise version, we have also given
the test no information but the uncertainty has in-
creased, which we argue is reasonable. Although
at the inﬁnity limit the vacuous p-box result is
not useful, it at least makes logical sense. It is
perfectly reasonable to say I don’t know when one
does not know.
Equation 15 assumes that the prevalence, sen-
sitivity and speciﬁcity are independent of each
other. In their data, this is a fair assumption as
it states that they all come from different indepen-
dent studies, however this may not always be the
case. For example when conducting non-invasive
neonatal screening for foetal aneuploidy condi-
tions, such as Down’s syndrome, the prevalence
of these conditions changes with the age of the
mother, and as the condition is rare, often studies
of the test statistics are focused on higher risk
categories. (Badeau et al. 2015, Montgomery et al.
2017) Therefore, it is not unimaginable that there
is dependence between p,sand t.
6. Conclusion
We have shown that the method for dealing with
the lack of a gold standard in a classiﬁcation test
is inappropriate, it leads to the illogical result of
the test becoming less uncertainty after more trails
even though no new information is added. We
have shown that it is possible to reimagine their
method using imprecise probabilities in order to
create logically consistent results.
Acknowledgement
This research is partly funded by the UK Engi-
neering & Physical Sciences Research Council
(EPSRC) “Digital twins for improved dynamic
design”, through grant number EP/R006768/1
and UK Medical Research Council (MRC)
Treatment According to Response in Giant
cEll arTeritis (TARGET)”, through grant number
MR/N011775/1. The funding and support of EP-
SRC and MRC are greatly acknowledge.
This paper beneﬁted from discussion with
many people, including Masatoshi Sugeno, Jack
Siegrist, Michael Balch and Jason O’Rawe.
References
Albert, P. S. (2009), ‘Estimating diagnostic ac-
curacy of multiple binary tests with an imper-
fect reference standard’, Statistics in Medicine
28(December 2008), 780–797.
Badeau, M., Lindsay, C., Blais, J., Takwoingi, Y.,
Langlois, S., L´
egar´
e, F., Gigu`
ere, Y., Turgeon,
A. F., William, W. & Rousseau, F. (2015),
‘Genomics-based non-invasive prenatal testing
for detection of fetal chromosomal aneuploidy
in pregnant women’, Cochrane Database of
Systematic Reviews (7).
Baron, J. A. (1994), ‘Uncertainty in Bayes’, Med-
ical Decision Making 14(1), 46–51.
Bolstad, W. M. (2007), Introduction to Bayesian
Statistics, 2 edn, Wiley.
Cuevas, J. R. T., Bravo Melo, L. & Achcar, J.
(2016), ‘Estimaci´
on del Valor Predictivo Pos-
itivo de la Colangiopancreatograf´
ıa Magn´
etica
utilizando metodos de Bayes. (Estimation of the
Positive Predictive Value of the Magnetic reso-
nance Cholangiopancreatography using Bayes
methods) (in Spanish)’, Revista M´
edica de Ris-
aralda 22(1), 19–26.
Cuevas, T. (2015), ‘Inferencia Bayesiana e In-
vestigaci´
on en salud: un caso de aplicaci´
on
en diagn´
ostico cl´
ınico (Bayesian Inference and
Health Research: a case of application in clini-
cal diagnosis) (in Spanish)’, Revista M´
edica de
Risaralda 21(3), 9–16.
Ferson, S., Kreinovich, V., Ginzburg, L., My-
ers, D. S. & Sentz, K. (2003), Constructing
Probability Boxes and Dempster-Shafer Struc-
tures, Technical Report January, Sandia Na-
2634
Proceedings of the 29th European Safety and Reliability Conference
tional Lab.(SNL-NM),, Albuquerque, United
States.
Finkel, A. M. (2008), Protecting People in Spite
of–or Thanks to–the Veil of Ignorance, in R. R.
Sharp & G. E. Marchant, eds, ‘Genomics and
Environmental Regulation: Science, Ethics,
and Law’, The John Hopkins University Press,
Balitmore, pp. 290–342.
Fry, H. (2018), Hello World: How to be Human
in the Age of the Machine, WW Norton &
Company Inc, New York.
Hunder, G. G., Bloch, D. A., Michel, B. A.,
Stevens, M. B., Aren, W. P., Calabrese, L. H.,
Edworthy, S. M., Fauci, A. S., Leavitt, R. Y.,
Lie, J. T., Lightfoot Jr., R. W., Masi, A. T.,
McShane, D. J., Mills, J. A., Wallace, S. L. &
Zvaiﬂer, N. J. (1990), ‘The American College
of Rheumatology 1990 criteria for the clas-
siﬁcation of giant cell arteritis’, Arthritis &
Rheumatism 33(8), 1122—-1128.
Jafar, T. H., Chaturvedi, N., Hatcher, J. & Levey,
A. S. (2007), ‘Use of albumin creatinine ratio
and urine albumin concentration as a screen-
ing test for albuminuria in an Indo-Asian pop-
ulation’, Nephrology Dialysis Transplantation
22(8), 2194–2200.
Joseph, L., Gyorkos, T. W. & Coupal, L. (1995),
‘Bayesian estimation of disease prevalence and
the parameters of diagnostic tests in the absence
of a gold standard’, American Journal of Epi-
demiology 141(3), 263–272.
Lesaffre, E. & Lawson, A. B. (2012), Bayesian
Biostatistics, John Wiley & Sons, Ltd, Chich-
ester.
Low-Choy, S., Hammond, N., Penrose, L., An-
derson, C. & Taylor, S. (2011), Dispersal in a
hurry: Bayesian learning from surveillance to
establish area freedom from plant pests with
early dispersal, in ‘MODSIM2011, 19th Inter-
national Congress on Modelling and Simula-
tion’, Perth, Australia, pp. 2521–2527.
Montgomery, J., Caney, S., Clancy, T., Edwards,
J., Gallagher, A., Greenﬁeld, A., Haimes, E.,
Hughes, J., Jackson, R., Lawrence, D., Pat-
tinson, S. D., Shakespear, T., Siddiqui, M.,
Watson, C., Widdows, H., Wishart, A. & de Zu-
lueta, P. (2017), Non-invasive prenatal testing:
ethical issues, Technical report, Nufﬁeld Coun-
cil of Bioethics.
Mossman, D. & Berger, J. O. (2001), ‘Intervals for
posttest probabilities: A comparison of 5 meth-
ods’, Medical Decision Making 21(6), 498–
507.
Proeve, M. (2009), ‘Issues in the application of
Bayes’ Theorem to child abuse decision mak-
ing’, Child Maltreatment 14(1), 114–120.
Raab, S. (2010), Kidney and Urinary Tract, in
W. Gray & G. Kocjan, eds, ‘Diagnostic Cy-
topathology’, 3 edn, Churchill Livingstone El-
sevier, pp. 365–401.
Rushdi, R. A. & Rushdi, A. M. (2018),
‘Karnaugh-Map Utility in Medical Studies :
The Case of Fetal Malnutrition Karnaugh-Map
Utility in Medical Studies : The Case of Fetal
Malnutrition’, International Journal of Math-
ematical, Engineering and Management Sci-
ences (IJMEMS) 3(3), 220–244.
Rzepi´
nski, T. (2018), ‘Twierdzenie Bayesa w pro-
jektowaniu strategii diagnostycznych w medy-
cynie (Making Diagnostic Strategies in Medical
Practice with the Use of Bayes’ Theorem) (in
Polish)’, Diametros 57(57), 39–60.
Seidenfeld, T. & Wasserman, L. (1993), ‘Dilation
for Sets of Probabilities’, The Annals of Statis-
tics 21(3), 1139–1154.
Smith, J. E. & Winkler, R. L. (1999), ‘Casey’s
Problem: Interpreting and Evaluating a New
Test’, Interfaces 29(3), 63–76.
Smith, J. E., Winkler, R. L. & Fryback, D. G.
(2000), ‘The First Positive: Computing Positive
Predictive Value at the Extremes’, Annals of
Internal Medicine 132(10), 804.
Walley, P. (1991), Statistical reasoning with im-
precise probabilities, Chapman and Hall, Lon-
don.
Walley, P. (1996), ‘Inferences from Multinomial
Data: Learning about a Bag of Marbles’, Jour-
nal of the Royal Statistical Society. Series B
(Methodological) 58(1), 3–57.
Walley, P., Gurrin, L. & Burton, P. (1996), ‘Anal-
ysis of Clinical Data Using Imprecise Prior
Probabilities’, Journal of the Royal Statistical
Society. Series D (The Statistician) 45(4), 457–
485.
Weber, K. M. (2009), Making a treatment decision
for breast cancer: Associations among mari-
tal qualities, couple communication, and breast
cancer treatment decision outcomes, PhD the-
sis, The Pennsylvania State University.
URL: https://etda.libraries.psu.edu/ﬁles/
ﬁnal submissions/3502
Winkler, R. L. & Smith, J. E. (2004), ‘On Un-
certainty in Medical Testing’, Medical Decision
Making 24(6), 654–658.
Zuk, T. (2008), Visualizing Uncertainty, PhD the-
sis, The University of Calgary.
URL:
https://prism.ucalgary.ca/bitstream/handle/1880/
46780/Zuk 2008.pdf?sequence=1