Content uploaded by Nick Gray

Author content

All content in this area was uploaded by Nick Gray on Sep 30, 2019

Content may be subject to copyright.

A Problem in the Bayesian Analysis of Data without Gold Standards

Nick Gray, Marco De Angelis, Dominic Calleja and Scott Ferson

Institute for Risk and Uncertainty, University of Liverpool, United Kingdom.

E-mail: nickgray@liverpool.ac.uk

We review methods of calculating the positive predictive value of a test (the probability of having a condition given

a positive test for it) in situations where there is no ’gold standard’ way to determine the true classiﬁcation. We show

that Bayesian methods lead to illogical results and instead show that a new approach using imprecise probabilities

is logically consistent.

Keywords: diagnostics, Bayes’ rule, false positives, prevalence, sensitivity, speciﬁcity, uncertainty, gold standard

1. Introduction

In many applications, the problem with classiﬁca-

tion is the lack of a ‘gold standard’ of evidence.

That is to say when we cannot decisively assign

an observation to a particular category. This phe-

nomenon is pervasive, arising in many ﬁelds, from

structural health in engineering, to supervised

learning in computer science, or patient diagnosis

in medicine. Tests, on which classiﬁcations rest,

are often imperfect; they yield false alarms (false

positives), fail to detect threats (false negatives),

or are prone to other misclassiﬁcations.

For instance, medical practitioners commonly

diagnose a patients health condition based on

some diagnostic, which in isolation is not deﬁni-

tive. Although sometimes multiple tests can be

combined together in order to become deﬁnitive.

(Joseph et al. 1995, Albert 2009) The diagnostic

result has some statistical uncertainty associated

with detecting the true health state. Naively in-

terpreting the result from a medical test can lead

to an incorrect assessment for a patient’s true

health condition. Bayes’ rule is commonly used to

estimate the actual probability of some individual

being a member of a class, subject to some piece

of evidence. (Mossman & Berger 2001)

It is often impractical or even impossible to

know gold standard information about classiﬁca-

tion. In medicine there are many diseases for

which there is no way to conclusively determine

whether a patient has a particular disease. For in-

stance, for patients with Giant Cell Arteritis, even

after a biopsy has been undertaken there is still

uncertainty about the true health state. (Hunder

et al. 1990)

There is also the situation where gold stan-

dard information can only be gathered from some

classes, yet for others it is unknown. For ex-

ample, some prison authorities use classiﬁcation

algorithms to assess whether a prisoner is likely

to reoffend when released from prison on parole.

(Fry 2018) As authorities are unwilling to release

prisoners if the test says that they are likely to

return to crime, it would be impossible to know

whether the recidivism test was accurate or not. If

a prisoner fails the test, and thus remains impris-

oned, there is no available data on whether they

would have re-offended had they been released.

Therefore, we could never know if it was a true

negative or a false negative.

Winkler & Smith (2004) have argued that the

traditional application of Bayes’ rule in medical

counselling is inappropriate and represents a con-

fusion in the medical decision-making literature.

They propose in its place a radically different for-

mulation that makes special use of the information

about the test results for new patients, although

not their actual disease status. As the ground

truth cannot be established, they instead construct

two new confusion matrices; one based upon the

assumption the test is a true positive, and the

other assuming the test is a false positive. They

then make use of these alternative facts in order

to update the tests sensitivity, speciﬁcity and un-

derlying prevalence of the disease, thus reducing

the test’s uncertainty asymptotically as the test is

applied.

According to Google Scholar, the Winkler and

Smith paper has only been referenced 11 times

since its publication: Jafar et al. (2007), Finkel

(2008), Zuk (2008), Proeve (2009), Weber (2009),

Raab (2010), Low-Choy et al. (2011), Cuevas

(2015), Cuevas et al. (2016), Rzepi´

nski (2018),

Rushdi & Rushdi (2018). The majority of these

are in papers relating to medical decision making.

However, Proeve (2009) concerns child abuse de-

cision making and Low-Choy et al. (2011) con-

cerns plant pest dispersal. Only Cuevas et al.

(2016) appears to actually make use of their

method. Nevertheless, we think their argument

deserves a clear rebuttal because of the centrality

of the issue in diagnostic testing, and the remark-

able delicacy of Bayesian reasoning by which

Proceedings of the 29th European Safety and Reliability Conference.

Edited by

Michael Beer and Enrico Zio

Copyright c

2019 European Safety and Reliability Association.

Published by

Research Publishing, Singapore.

ISBN: 978-981-11-2724-3; doi:10.3850/978-981-11-2724-3 0458-cd 2628

Proceedings of the 29th European Safety and Reliability Conference

2629

such a profound disagreement could emerge and

escape resolution for many years.

2. The Standard Bayesian Approach to

Calculating Positive Predictive Value

Throughout this paper, we will refer to the fol-

lowing hypothetical dataset for TNtrials of a

diagnostic test for condition D: we have αtrue

positives, βfalse positives, γfalse negatives and

δtrue negatives. Let T+be the total number of

positive tests, T−the total number of negative

tests, TSthe total number of sick people and TW

the total number of well people. This allows us to

construct the confusion matrix shown in Table 1.

Table 1. Sample data.

Has Problem No Problem Total

Positive Test α β T+

Negative Test γ δ T−

Total TSTWTN

The probability that someone has the disease

given they have had a positive test result is given

by Pr(D|+), whilst the probability that they do

not have the disease is Pr(¬D|+). Throughout,

we will only consider positive test results, how-

ever all the arguments made could equally apply

to negative test results. The prevalence is given by

p=TS

TT

=α+γ

α+β+γ+δ,(1)

the sensitivity by

s=α

TS

=α

α+γ,(2)

and the speciﬁcity by

t=δ

TW

=δ

β+δ.(3)

We will also deﬁne the ratio of positive tests out

of total number tested as

h=T+

TT

=α+β

α+β+γ+δ.(4)

Often the values of p,sand tare published inde-

pendent of each other. In which case the following

notation can also be used: p=pk/pn,s=sk/sn

and t=tk/tn.

Given the published values of p,sand t, Bayes’

rule gives the probability that a patient has D

given they have tested positive is:

Pr(D|+, p, s, t) = ps

ps + (1 −p)(1 −t).(5)

(Baron 1994, Lesaffre & Lawson 2012) We are

usually more interested in obtaining the proba-

bility that is only conditioned on a positive test

outcome, Pr(D|+), this is also known as the

positivew predictive value (PPV). When p,sand

tare available in scalar form, we can obtain

Pr(D|+) = Pr(D|+, p, s, t).

The Mossman & Berger (2001) paper takes this

a step further and considers the following hypo-

thetical situation:

Mr Smith has tested positive for disorder D,

he asks his doctor the following ”Given the pub-

lished estimates for prevalence, sensitivity and

speciﬁcity, what is the 95% conﬁdence interval for

my probability of having Dgiven my positive test

results and the imprecision in the estimates?”

When there is uncertainty about the values of

p,sand tthey can be described by distributions

Smith & Winkler (1999), Smith et al. (2000).

There are a couple of ways in which in PPV can

be determined, the simplest is to estimate it using

the 5 but replacing p,sand twith their expected

values:

Pr ∗(D|+, p, s, t) =

E(p)E(s)

E(p)E(s) + (1 −E(p))(1 −E(t))

(6)

In order to obtain a distribution for the PPV,

Mossman & Berger (2001) use a convolution of

the distributions of p,sand twithin Eq. 5. In

their numerical calculation they sample random

variables from the distributions of p,sand tand

use Eq. 5 to ﬁnd the distribution of the PPV.

Both Mossman & Berger (2001) and Winkler &

Smith (2004) use Jeffery’s prior for p,sand t:

fj(x|a, b) = B(x|a+ 1/2, b −a+ 1/2) (7)

where Bis the beta distribution, however alterna-

tives are available.(Bolstad 2007)

Using priors deﬁned by pk= 5,pn= 100,

sk= 9,sn= 10,tk= 9 and tn= 10, where

the values of p,sand thave come from inde-

pendent trials, the Mossman and Berger method

gives the 95% conﬁdence interval for the PPV as

[0.075,0.824].

3. The Winkler and Smith Method

Winkler & Smith (2004) diverges from Mossman

& Berger (2001) and the textbook method, see

Lesaffre & Lawson (2012) as an example, they

argue that the outcome of a patient’s test should

be used to update the distribution. They assert that

the PPV of a positive medical test for a disease is

not Eq. 5, Eq. 6 or Mossman and Berger’s (2001)

”objective bayesian method.” Rather, it should

be computed as a weighted average of assuming

the positive result is a true positive (and accord-

ingly augmenting the estimates for prevalence and

sensitivity) and assuming the positive result is a

false positive (and accordingly decrementing the

estimates for prevalence and speciﬁcity).

2630

Proceedings of the 29th European Safety and Reliability Conference

Starting with the same sample data as Table 1,

Winkler and Smith suggest that we should con-

struct two new confusion matrices, one assuming

a true positive and one assuming a false positive,

Table 2 and Table 3 respectively.

Table 2. New confusion matrix assuming true positive.

Has Problem No Problem Total

Positive Test α+ 1 β T++ 1

Negative Test γ δ T−

Total TS+ 1 TWTN+ 1

Table 3. New confusion matrix assuming false positive.

Has Problem No Problem Total

Positive Test α β + 1 T+

Negative Test γ δ T−+ 1

Total TSTW+ 1 TN+ 1

Assuming a true positive the prevalence, sensi-

tivity and speciﬁcity become:

p′

S=TS+ 1

TN+ 1 (8)

s′

S=α+ 1

TS+ 1 (9)

t′

S=t. (10)

Similarly for the assuming false positive case:

p′

W=TS

TN+ 1 (11)

s′

W=s(12)

t′

W=δ

TW+ 1 (13)

They then say to construct a weighted average

of the new confusion matrices using

Pr(D|+) =

f(p′

S, s′

S, t′

S) Pr(D|+, p, s, t)

+f(p′

W, s′

W, t′

W) Pr(¬D|+, p, s, t).

(14)

Where f(p, s, t)is a convolution of the Jeffery’s

prior distributions (Eq 7) for p,sand t:

f(p, s, t) = fj(p|pk, pn)fj(s|sk, sn)fj(t|tk, tn),

(15)

Pr(D|+, p, s, t)is calculated using Eq. 6 and

Pr(¬D|+, p, s, t) = 1 −Pr(D|+, p, s, t). Re-

turning to the numerical example at the end of 2

(pk= 5,pn= 100,sk= 9,sn= 10,tk= 9

and tn= 10), we ﬁnd that the 95% conﬁdence

interval is [0.062,0.741]. Notice that the width of

this conﬁdence interval is smaller than that of the

Mossman and Berger method.

3.1. The Logical Inconsistency with the

Winkler and Smith Method

In order to show that the Winkler and Smith

method is ﬂawed we will explore the situation in

which you are conducting more than one test and

show that it leads to a reductio ad absurdum. As

the number of tests approaches inﬁnity, we will

then contrast this with the imprecise probability

approach which, whilst not providing useful in-

formation, at least makes some sense.

As Winkler and Smith make use of the test

result in their calculations, Instead of just consid-

ering the effect of using just one test, we could

also consider what happens after Xpositive tests.

Using the Winkler and Smith method we would

have the following two confusion matrices: Ta-

ble 4 and Table 5.

Table 4. New confusion matrix assuming true positive for X

positive tests.

Has Problem No Problem Total

Positive Test α+X β T++hX

Negative Test γ δ T−

Total TS+X TWTN+X

Table 5. New confusion matrix assuming true positive for X

positive tests.

Has Problem No Problem Total

Positive Test α β +X T++hX

Negative Test γ δ T−

Total TSTW+X TN+X

In the assuming true positive case the new

prevalence would be given by

p′

+=TS+X

TT+X,(16)

the new sensitivity by

s′

+=α+X

α+X+γ,(17)

and speciﬁcity wouldn’t change.

t′

+=t(18)

Using Table 5 we get the new assuming false

positive case we get the new prevalence as

p′

−=TS

TT+X(19)

Proceedings of the 29th European Safety and Reliability Conference

2631

and the new sensitivity wouldn’t change

s′

−=s(20)

and speciﬁcity would be.

t′

−=δ

β+X(21)

We will now consider what happens to p′

+,s′

+,t′

+,

p′

−,s′

−and t′

−when the number of tests becomes

large (X→ ∞). We ﬁnd

lim

X→∞

p′

+= 1,(22)

lim

X→∞

s′

+= 1,(23)

lim

X→∞

t′

+=t, (24)

lim

X→∞

p′

−= 1,(25)

lim

X→∞

s′

−=s, (26)

and

lim

X→∞

t′

−= 0 (27)

Hence, using Equation 5 we get Pr(D|+) = 1

which implies that Pr(¬D|+) = 0. Naively

interpreting this result at face value would let you

conclude that any positive test at this limit is a

true positive alternatively there is no such thing

as a false negative. Winkler and Smith do not

say to use this result though they would next use

Equation 14 along with Equation 15. At the limit

this give us:

Pr(D|+, p, s, t) =fβ(p|TS+X, TT+X)×

fβ(s|α+X, α +γ+X)×

fβ(t|δ, β +δ).

(28)

As

lim

a,b→∞

fβ(x|a, b) = ∞x= 1/2

0x6= 1/2(29)

and

Z∞

−∞

Pr(x)dx= 1 (30)

then the cumulative distribution for the PPV be-

comes

Pr(D|+, p, s, t) = 0x < 1/2

1x≥1/2.(31)

Figure 1 shows this migration from the ﬁrst test,

X= 1 towards the result in Equation 31 starting

from the Section 2 example data set (pk= 5,

pn= 100,sk= 9,sn= 10,tk= 9 and

tn= 10). These results amount to a logical ﬂaw in

the Winkler and Smith method. We haven’t added

any new information (apart from the number of

tests) and the uncertainty has reduced. It should

be noted that this asymptote is due to the choice

of prior. Interestingly, and perhaps worryingly,

different priors would give different values in the

limit X→ ∞.

PPV

Cumlative distribution

0.00 0.25 0.50 0.75 1.00

0.00 0.25 0.50 0.75 1.00

1 10 100 1000

Fig. 1. Plot of the CDF for the ﬁrst 1000 tests using the

Winkler and Smith method.

4. Imprecise Probability Approach

It is possible to reconsider the argument made by

Winkler and Smith using a framework provided

by the theory of imprecise probabilities. (Wal-

ley 1991, 1996, Walley et al. 1996) Under this

perspective, the prevalence p, sensitivity s, and

speciﬁcity tcan each be updated as prescribed

by Winkler and Smith to yield a distribution for

the PPV assuming the patient is actually sick (in

which case the test was a true positive) and a

distribution for the PPV assuming the patient is

actually not sick (in which case the test was a false

positive). However, the appropriate synthesis of

these two contingent estimates of the PPV is not a

weighted mixture as Winkler and Smith conceive

it. Instead, because whether the patient is sick or

not is precisely what is unknown in this problem,

an envelope of the two distributions would be

more appropriate.

2632

Proceedings of the 29th European Safety and Reliability Conference

Returning to the numerical example in sec-

tion 2, with priors for p,sand timplied by pk= 5,

pn= 100,sk= 9,sn= 10,tk= 9 and

tn= 10. The envelope of the two contingent

distributions yields a rather wide probability box,

Ferson et al. (2003), that is shown as the outer

bounds in black in Figure 2. The leftmost edge

corresponds to the distribution that assumes the

positive test result was a false positive, increment-

ing pand taccording to Table 3 . The rightmost

edge corresponds to the distribution that assumes

the test result was a true positive, incrementing

pand saccording to Table 2. This envelop-

ing calculation is equivalent to a mixture with

unknown weights characterised by the vacuous

interval [0,1] for both distributions. In contrast,

the traditional Bayesian result and the Winkler

and Smith method are both also shown in the

ﬁgure. We see that the envelope encloses both

the traditional and the Winkler and Smith distribu-

tions. The 95% conﬁdence interval using impre-

cise probabilities is [0.057,0.848], which as ex-

pected encompanses the interval for both the tradi-

tional Bayesian reusult and the Winkler and Smith

result. This envelope is reminiscent of prob-

abilistic dilation of uncertainty that sometimes

accompanies the addition of weakly informative

data in probabilistic calculations.(Seidenfeld &

Wasserman 1993) In this case, the unveriﬁed test

result is certainly information, but it does not seem

to be information that leads to a contraction of

uncertainty about what the test result itself means.

PPV

Cumlative distribution

0.00 0.25 0.50 0.75 1.00

0.00 0.25 0.50 0.75 1.00

Imprecise Probability

Mossman−Berger

Winkler−Smith

Fig. 2. P-box showing the distribution envelope for the PPV.

4.1. Logical Consistency

As we said that the Winkler and Smith methods

becomes logically inconsistent when we consider

it in the extreme scenario we will now show that

using imprecise probabilities leads to at least a

logical result in the limit. What the imprecise

probability confusion matrix would be after X

tests is shown in Table 6

Table 6. New imprecise probability confusion matrix after

Xpositive tests

Has Problem No Problem Total

Positive

Test α+X[0,1] β+X[0,1] T′

+

Negative

Test γ[0,1] δ[0,1] T′

−

Total T′

ST′

WTN+X

The new prevalence would be

p′=T′

S

TN+X

=α+γ+X[0,1]

TN+X,

(32)

new sensitivity

s′=α+X[0,1]

α+γ+X[0,1],(33)

and new speciﬁcity

t′=δ

β+δ+X[0,1].(34)

Now at the X→ ∞ limit:

lim

X→∞

p′= [0,1] (35)

lim

X→∞

s′= 1 (36)

lim

X→∞

t′= 0 (37)

Using these results along with Eq. 5 gives:

Pr(D|+) = [0,1] (38)

as the ﬁnal value for the PPV Figure 3 shows the

migration to the vacuous p-box using the impre-

cise probability method starting from Mossman &

Berger (2001) sample data.

Proceedings of the 29th European Safety and Reliability Conference

2633

PPV

Cumlative distribution

0.00 0.25 0.50 0.75 1.00

0.00 0.25 0.50 0.75 1.00

1 10 100 1000

Fig. 3. Plot of the p-boxes for ﬁrst 1000 tests using the

imprecise probability method.

5. Discussion

Let us ﬁrst consider the difference between the

Winkler and Smith method and the imprecise

probability method. Figure 1 shows that as the

number of tests increases the uncertainty on the

PPV decreases. This amounts to a reductio ad

absurdum thus proving their method untenable.

This uncertainty reduction happens even after one

test as demonstrated in the numerical example in

Section 3

In the imprecise version, we have also given

the test no information but the uncertainty has in-

creased, which we argue is reasonable. Although

at the inﬁnity limit the vacuous p-box result is

not useful, it at least makes logical sense. It is

perfectly reasonable to say I don’t know when one

does not know.

Equation 15 assumes that the prevalence, sen-

sitivity and speciﬁcity are independent of each

other. In their data, this is a fair assumption as

it states that they all come from different indepen-

dent studies, however this may not always be the

case. For example when conducting non-invasive

neonatal screening for foetal aneuploidy condi-

tions, such as Down’s syndrome, the prevalence

of these conditions changes with the age of the

mother, and as the condition is rare, often studies

of the test statistics are focused on higher risk

categories. (Badeau et al. 2015, Montgomery et al.

2017) Therefore, it is not unimaginable that there

is dependence between p,sand t.

6. Conclusion

We have shown that the method for dealing with

the lack of a gold standard in a classiﬁcation test

is inappropriate, it leads to the illogical result of

the test becoming less uncertainty after more trails

even though no new information is added. We

have shown that it is possible to reimagine their

method using imprecise probabilities in order to

create logically consistent results.

Acknowledgement

This research is partly funded by the UK Engi-

neering & Physical Sciences Research Council

(EPSRC) “Digital twins for improved dynamic

design”, through grant number EP/R006768/1

and UK Medical Research Council (MRC)

“Treatment According to Response in Giant

cEll arTeritis (TARGET)”, through grant number

MR/N011775/1. The funding and support of EP-

SRC and MRC are greatly acknowledge.

This paper beneﬁted from discussion with

many people, including Masatoshi Sugeno, Jack

Siegrist, Michael Balch and Jason O’Rawe.

References

Albert, P. S. (2009), ‘Estimating diagnostic ac-

curacy of multiple binary tests with an imper-

fect reference standard’, Statistics in Medicine

28(December 2008), 780–797.

Badeau, M., Lindsay, C., Blais, J., Takwoingi, Y.,

Langlois, S., L´

egar´

e, F., Gigu`

ere, Y., Turgeon,

A. F., William, W. & Rousseau, F. (2015),

‘Genomics-based non-invasive prenatal testing

for detection of fetal chromosomal aneuploidy

in pregnant women’, Cochrane Database of

Systematic Reviews (7).

Baron, J. A. (1994), ‘Uncertainty in Bayes’, Med-

ical Decision Making 14(1), 46–51.

Bolstad, W. M. (2007), Introduction to Bayesian

Statistics, 2 edn, Wiley.

Cuevas, J. R. T., Bravo Melo, L. & Achcar, J.

(2016), ‘Estimaci´

on del Valor Predictivo Pos-

itivo de la Colangiopancreatograf´

ıa Magn´

etica

utilizando metodos de Bayes. (Estimation of the

Positive Predictive Value of the Magnetic reso-

nance Cholangiopancreatography using Bayes

methods) (in Spanish)’, Revista M´

edica de Ris-

aralda 22(1), 19–26.

Cuevas, T. (2015), ‘Inferencia Bayesiana e In-

vestigaci´

on en salud: un caso de aplicaci´

on

en diagn´

ostico cl´

ınico (Bayesian Inference and

Health Research: a case of application in clini-

cal diagnosis) (in Spanish)’, Revista M´

edica de

Risaralda 21(3), 9–16.

Ferson, S., Kreinovich, V., Ginzburg, L., My-

ers, D. S. & Sentz, K. (2003), Constructing

Probability Boxes and Dempster-Shafer Struc-

tures, Technical Report January, Sandia Na-

2634

Proceedings of the 29th European Safety and Reliability Conference

tional Lab.(SNL-NM),, Albuquerque, United

States.

Finkel, A. M. (2008), Protecting People in Spite

of–or Thanks to–the Veil of Ignorance, in R. R.

Sharp & G. E. Marchant, eds, ‘Genomics and

Environmental Regulation: Science, Ethics,

and Law’, The John Hopkins University Press,

Balitmore, pp. 290–342.

Fry, H. (2018), Hello World: How to be Human

in the Age of the Machine, WW Norton &

Company Inc, New York.

Hunder, G. G., Bloch, D. A., Michel, B. A.,

Stevens, M. B., Aren, W. P., Calabrese, L. H.,

Edworthy, S. M., Fauci, A. S., Leavitt, R. Y.,

Lie, J. T., Lightfoot Jr., R. W., Masi, A. T.,

McShane, D. J., Mills, J. A., Wallace, S. L. &

Zvaiﬂer, N. J. (1990), ‘The American College

of Rheumatology 1990 criteria for the clas-

siﬁcation of giant cell arteritis’, Arthritis &

Rheumatism 33(8), 1122—-1128.

Jafar, T. H., Chaturvedi, N., Hatcher, J. & Levey,

A. S. (2007), ‘Use of albumin creatinine ratio

and urine albumin concentration as a screen-

ing test for albuminuria in an Indo-Asian pop-

ulation’, Nephrology Dialysis Transplantation

22(8), 2194–2200.

Joseph, L., Gyorkos, T. W. & Coupal, L. (1995),

‘Bayesian estimation of disease prevalence and

the parameters of diagnostic tests in the absence

of a gold standard’, American Journal of Epi-

demiology 141(3), 263–272.

Lesaffre, E. & Lawson, A. B. (2012), Bayesian

Biostatistics, John Wiley & Sons, Ltd, Chich-

ester.

Low-Choy, S., Hammond, N., Penrose, L., An-

derson, C. & Taylor, S. (2011), Dispersal in a

hurry: Bayesian learning from surveillance to

establish area freedom from plant pests with

early dispersal, in ‘MODSIM2011, 19th Inter-

national Congress on Modelling and Simula-

tion’, Perth, Australia, pp. 2521–2527.

Montgomery, J., Caney, S., Clancy, T., Edwards,

J., Gallagher, A., Greenﬁeld, A., Haimes, E.,

Hughes, J., Jackson, R., Lawrence, D., Pat-

tinson, S. D., Shakespear, T., Siddiqui, M.,

Watson, C., Widdows, H., Wishart, A. & de Zu-

lueta, P. (2017), Non-invasive prenatal testing:

ethical issues, Technical report, Nufﬁeld Coun-

cil of Bioethics.

Mossman, D. & Berger, J. O. (2001), ‘Intervals for

posttest probabilities: A comparison of 5 meth-

ods’, Medical Decision Making 21(6), 498–

507.

Proeve, M. (2009), ‘Issues in the application of

Bayes’ Theorem to child abuse decision mak-

ing’, Child Maltreatment 14(1), 114–120.

Raab, S. (2010), Kidney and Urinary Tract, in

W. Gray & G. Kocjan, eds, ‘Diagnostic Cy-

topathology’, 3 edn, Churchill Livingstone El-

sevier, pp. 365–401.

Rushdi, R. A. & Rushdi, A. M. (2018),

‘Karnaugh-Map Utility in Medical Studies :

The Case of Fetal Malnutrition Karnaugh-Map

Utility in Medical Studies : The Case of Fetal

Malnutrition’, International Journal of Math-

ematical, Engineering and Management Sci-

ences (IJMEMS) 3(3), 220–244.

Rzepi´

nski, T. (2018), ‘Twierdzenie Bayesa w pro-

jektowaniu strategii diagnostycznych w medy-

cynie (Making Diagnostic Strategies in Medical

Practice with the Use of Bayes’ Theorem) (in

Polish)’, Diametros 57(57), 39–60.

Seidenfeld, T. & Wasserman, L. (1993), ‘Dilation

for Sets of Probabilities’, The Annals of Statis-

tics 21(3), 1139–1154.

Smith, J. E. & Winkler, R. L. (1999), ‘Casey’s

Problem: Interpreting and Evaluating a New

Test’, Interfaces 29(3), 63–76.

Smith, J. E., Winkler, R. L. & Fryback, D. G.

(2000), ‘The First Positive: Computing Positive

Predictive Value at the Extremes’, Annals of

Internal Medicine 132(10), 804.

Walley, P. (1991), Statistical reasoning with im-

precise probabilities, Chapman and Hall, Lon-

don.

Walley, P. (1996), ‘Inferences from Multinomial

Data: Learning about a Bag of Marbles’, Jour-

nal of the Royal Statistical Society. Series B

(Methodological) 58(1), 3–57.

Walley, P., Gurrin, L. & Burton, P. (1996), ‘Anal-

ysis of Clinical Data Using Imprecise Prior

Probabilities’, Journal of the Royal Statistical

Society. Series D (The Statistician) 45(4), 457–

485.

Weber, K. M. (2009), Making a treatment decision

for breast cancer: Associations among mari-

tal qualities, couple communication, and breast

cancer treatment decision outcomes, PhD the-

sis, The Pennsylvania State University.

URL: https://etda.libraries.psu.edu/ﬁles/

ﬁnal submissions/3502

Winkler, R. L. & Smith, J. E. (2004), ‘On Un-

certainty in Medical Testing’, Medical Decision

Making 24(6), 654–658.

Zuk, T. (2008), Visualizing Uncertainty, PhD the-

sis, The University of Calgary.

URL:

https://prism.ucalgary.ca/bitstream/handle/1880/

46780/Zuk 2008.pdf?sequence=1