Technical ReportPDF Available

Abstract and Figures

This short paper shows how the simple case of updating a hypothesis H when observing evidence E can be extended to handle uncertainty about the priors for H and for E given H. The solution involves a Bayesian Network (BN) in AgenaRisk that extends the simple model with continuous nodes corresponding to the uncertain prior probability distributions.
Content may be subject to copyright.
1
Handling Uncertain Priors in Basic Bayesian Reasoning
Norman Fenton, 14 July 2018
Abstract
This short paper shows how the simple case of updating a hypothesis H when observing evidence E
can be extended to handle uncertainty about the priors for H and for E given H. The solution involves
a Bayesian Network (BN) in AgenaRisk that extends the simple model with continuous nodes
corresponding to the uncertain prior probability distributions.
1. The Classic Bayesian Example
Figure 1 shows a classic simple example of Bayes Theorem represented as a Bayesian Network (BN).
It is assumed that the population prior for the disease is 1 in a 1000, meaning that a randomly
selected person has a probability of 0.001 of having the disease. The test for the disease is assumed
to have a 0% false negative rate and a 5% false positive rate as represented by the NPT for E given H.
Figure 1 Classic example of Bayes
If a randomly selected person tests positive then Bayes theorem tells us that the revised probability
of H (the person has the disease) is 1.963% (this is the classic Harvard Medical School problem
most participants believed the results was 95%) (Fenton & Neil, 2012). This result is, of course,
automatically computed by entering the observation (i.e. evidence) that E is true in the BN as shown
in Figure 2.
Figure 2 Entering evidence
This is all fine and standard, but it assumes certainty about the prior probabilities.
2
2. Extending the simple example when there is uncertainty about
the priors
Suppose, the prior probability for H is not exactly 0.001, but a distribution whose mean is 0.001 with
a relatively high variance. Indeed, if the prior was based on having observed exactly 1000 people of
whom 1 had the disease then, rather than an exact prior probability of 0.001, a Beta(1, 999)
distribution as shown in Figure 3 (representing 1 success and 999 failures in 1000 trials using
Binomial assumptions) would be more appropriate. This distribution has a mean of 0.001 but, there
is a chance it could be higher than 0.005.
Figure 3 Beta(1,99) distribution
If we are uncertain about the probabilities then the new piece of evidence a random person
testing positive may actually require us to revise those priors. Perhaps the mean rate for the
disease is greater than 1 in 1000, or perhaps the mean false positive rate is higher than 5%.
We can, in fact, incorporate uncertainty about the priors to produce a fully Bayesian solution to this
problem (note: it is very easy to do this in AgenaRisk (Agena Ltd, 2018) as it handles the necessary
continuous nodes using its dynamic discretisation algorithm (Neil, Tailor, & Marquez, 2007)). The
solution is shown in Figure 4, with the additional continuous nodes, representing the three uncertain
prior probability distributions.
Figure 4 Revised BN model that captures uncertainty about priors
3
The false positive probability here is defined as Beta(5, 95) (meaning we have observed 5 false
positives in 100 tests) and the false negative is defined as Beta(1, 999). The dotted arrows hide two
nodes that are simply the integer equivalent of the two Boolean nodes (The Appendix provides the
full details, while the AgenaRisk BN model is provided as supplementary material).
With this model we get a slightly different updated probability (compared to Figure 2) that a person
has the disease when we observe a positive test result as shown in Figure 5. Instead of 1.963% it is
1.905%. The explanation for this is primarily in the uncertainty about the false positive probability
(which was much greater than our uncertainty about the prior disease rate). Indeed, as a result of
the observation of a positive test, the revised false positive probability distribution now has a mean
of 5.9% compared to its prior of 5%.
Figure 5 Positive test result observed
We can, of course, still enter exact observations in the full model to get exactly the same result as
Figure 2. This is shown in Figure 6.
Figure 6 Exactly replicating the results of the simple model (we have removed all uncertainty about the priors by
entering exact values as observations)
4
3. Practical implications
The proposed solution works well because the nodes H and E required just one and two uncertain
parameters (respectively) to defining their probability tables. However, suppose a Boolean node A
has two Boolean parents B, C and hence four uncertain parameters defining its probability table -
namely P(A| B,C), P(A | not B, C), P(A| B, not C), P(A|not B, not C). This requires us to define 4
additional continuous node parents to node A. While conceptually simple this requires significant
computation power. In AgenaRisk (which cannot use its underlying Binary factorisation (Neil, Chen,
& Fenton, 2012) in such cases because there are Boolean parents as well as continuous parents) the
computation time is quite long for such cases and it becomes infeasible with more parameters.
It could be argued that the difference in results for the simple example are insufficient to merit the
additional modelling effort. As a rule of thumb this is true if there is reasonable certainty about the
priors; for example, if they are derived from large medical trials and we are making just a single new
observation then it will have minimal impact. However, suppose in our example that the prior mean
probability for the disease is 20% but that it could equally likely be any value between 0 and 40% - so
it is a Uniform [0, 0.4] distribution. Suppose also that the false positive probability is 30%. In this
case the simple model (assuming a constant prior probability 20% for disease) yields a posterior
probability of 45% for a person testing positive which would be the same for each of three
independent people testing positive. However, running the model with the uncertain Uniform [0,
0.4] prior and three independent positive test results (this model is included with the supplementary
material) results in a posterior probability of 51% for each person testing positive. This significant
change is explained by the fact the 3 out of 3 people testing positive provides evidence that the
disease is more prevalent than previously assumed; the posterior distribution for the disease
probability is no longer Uniform[0.0.4] but a positively skewed distribution over the range [0,0.4]
with mean 0.26.
The problem of uncertain priors is especially relevant in many legal applications where Bayes is used
this will be investigated in an accompanying paper, which will also highlight the problem of
modelling as evidence facts which are considered as part of the case.
4. References
Agena Ltd. (2018). AgenaRisk. http://www.agenarisk.com. Retrieved from
http://www.agenarisk.com
Fenton, N. E., & Neil, M. (2012). Risk Assessment and Decision Analysis with Bayesian Networks. Boca
Raton: CRC Press. Retrieved from www.bayesianrisk.com
Neil, M., Chen, X., & Fenton, N. E. (2012). Optimizing the Calculation of Conditional Probability
Tables in Hybrid Bayesian Networks using Binary Factorization . IEEE Transactions on
Knowledge and Data Engineering, 24(7), 13061312.
http://doi.org/http://dx.doi.org/10.1109/TKDE.2011.87
Neil, M., Tailor, M., & Marquez, D. (2007). Inference in hybrid Bayesian networks using dynamic
discretization. Statistics and Computing, 17(3), 219233. Retrieved from
http://dx.doi.org/10.1007/s11222-007-9018-y
5
Appendix
The full AgenaRisk BN model is defined in Figure 7. The nodes p, fp and fn are simulation nodes and
can be defined as any distribution. Here we have defined p as Beta(1,999), fp as Beta(5,95) and fn as
Beta(1,999).
Figure 7 Full model fully defined
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We consider approximate inference in hybrid Bayesian Networks (BNs) and present a new iterative algorithm that efficiently combines dynamic discretisation with robust propagation algorithms on junction trees structures. Our approach offers a significant extension to Bayesian Network theory and practice by offering a flexible way of modelling continuous nodes in BNs conditioned on complex configurations of evidence and intermixed with discrete nodes as both parents and children of continuous nodes. Our algorithm is implemented in a commercial Bayesian Network software package, AgenaRisk, which allows model construction and testing to be carried out easily. The results from the empirical trials clearly show how our software can deal effectively with different type of hybrid models containing elements of exp ert judgement as well as statistical inference. In particular, we show how the rapid convergence of the algorithm towards zones of high probability density, make robust inference analysis possible even in situations where, due to the lack of information in both prior and data, robust sampling becomes unfeasible.
Book
Since the first edition of this book published, Bayesian networks have become even more important for applications in a vast array of fields. This second edition includes new material on influence diagrams, learning from data, value of information, cybersecurity, debunking bad statistics, and much more. Focusing on practical real-world problem-solving and model building, as opposed to algorithms and theory, it explains how to incorporate knowledge with data to develop and use (Bayesian) causal models of risk that provide more powerful insights and better decision making than is possible from purely data-driven solutions. Features • Provides all tools necessary to build and run realistic Bayesian network models • Supplies extensive example models based on real risk assessment problems in a wide range of application domains provided; for example, finance, safety, systems reliability, law, forensics, cybersecurity and more • Introduces all necessary mathematics, probability, and statistics as needed • Establishes the basics of probability, risk, and building and using Bayesian network models, before going into the detailed applications A dedicated website contains exercises and worked solutions for all chapters along with numerous other resources. The AgenaRisk software contains a model library with executable versions of all of the models in the book. Lecture slides are freely available to accredited academic teachers adopting the book on their course.
Article
Reducing the computational complexity of inference in Bayesian Networks (BNs) is a key challenge. Current algorithms for inference convert a BN to a junction tree structure made up of clusters of the BN nodes and the resulting complexity is time exponential in the size of a cluster. The need to reduce the complexity is especially acute where the BN contains continuous nodes. We propose a new method for optimizing the calculation of Conditional Probability Tables (CPTs) involving continuous nodes, approximated in Hybrid Bayesian Networks (HBNs), using an approximation algorithm called dynamic discretization. We present an optimized solution to this problem involving binary factorization of the arithmetical expressions declared to generate the CPTs for continuous nodes for deterministic functions and statistical distributions. The proposed algorithm is implemented and tested in a commercial Hybrid Bayesian Network software package and the results of the empirical evaluation show significant performance improvement over unfactorized models.