Score-based algorithms that learn the structure of Bayesian networks can be used for both exact and approximate solutions. While approximate learning scales better with the number of variables, it can be computationally expensive in the presence of high dimensional data. This paper describes an approximate algorithm that performs parallel sampling on Candidate Parent Sets (CPSs), and can be viewed as an extension of MINOBS which is a state-of-the-art algorithm for structure learning from high dimensional data. The modified algorithm, which we call Parallel Sampling MINOBS (PS-MINOBS), constructs the graph by sampling CPSs for each variable. Sampling is performed in parallel under the assumption the distribution of CPSs is half-normal when ordered by Bayesian score for each variable. Sampling from a half-normal distribution ensures that the CPSs sampled are likely to be those which produce the higher scores. Empirical results show that, in most cases, the proposed algorithm discovers higher score structures than MINOBS when both algorithms are restricted to the same runtime limit.
Discovering and parameterising latent confounders represent important and challenging problems in causal structure learning and density estimation respectively. In this paper, we focus on both discovering and learning the distribution of latent confounders. This task requires solutions that come from different areas of statistics and machine learning. We combine elements of variational Bayesian methods, expectation-maximisation, hill-climbing search, and structure learning under the assumption of causal insufficiency. We propose two learning strategies; one that maximises model selection accuracy, and another that improves computational efficiency in exchange for minor reductions in accuracy. The former strategy is suitable for small networks and the latter for moderate size networks. Both learning strategies perform well relative to existing solutions.
Causal Bayesian Networks provide an important tool for reasoning under uncertainty with potential application to many complex causal systems. Structure learning algorithms that can tell us something about the causal structure of these systems are becoming increasingly important. In the literature, the validity of these algorithms is often tested for sensitivity over varying sample sizes, hyper-parameters, and occasionally objective functions. In this paper, we show that the order in which the variables are read from data can have much greater impact on the accuracy of the algorithm than these factors. Because the variable ordering is arbitrary, any significant effect it has on learnt graph accuracy is concerning, and this raises questions about the validity of the results produced by algorithms that are sensitive to, but have not been assessed against, different variable orderings.
Learning the structure of a Bayesian Network (BN) with score-based solutions involves exploring the search space of possible graphs and moving towards the graph that maximises a given objective function. Some algorithms offer exact solutions that guarantee to return the graph with the highest objective score, while others offer approximate solutions in exchange for reduced computational complexity. This paper describes an approximate BN structure learning algorithm, which we call Model Averaging Hill-Climbing (MAHC), that combines two novel strategies with hill-climbing search. The algorithm starts by pruning the search space of graphs, where the pruning strategy can be viewed as an aggressive version of the pruning strategies that are typically applied to combinatorial optimisation structure learning problems. It then performs model averaging in the hill-climbing search process and moves to the neighbouring graph that maximises the objective function, on average, for that neighbouring graph and over all its valid neighbouring graphs. Comparisons with other algorithms spanning different classes of learning suggest that the combination of aggressive pruning with model averaging is both effective and efficient, particularly in the presence of data noise.
In Bayesian Networks (BNs), the direction of edges is crucial for causal reasoning and inference. However, Markov equivalence class considerations mean it is not always possible to establish edge orientations, which is why many BN structure learning algorithms cannot orientate all edges from purely observational data. Moreover, latent confounders can lead to false positive edges. Relatively few methods have been proposed to address these issues. In this work, we present the hybrid mFGS-BS (majority rule and Fast Greedy equivalence Search with Bayesian Scoring) algorithm for structure learning from discrete data that involves an observational data set and one or more interventional data sets. The algorithm assumes causal insufficiency in the presence of latent variables and produces a Partial Ancestral Graph (PAG). Structure learning relies on a hybrid approach and a novel Bayesian scoring paradigm that calculates the posterior probability of each directed edge being added to the learnt graph. Experimental results based on well-known networks of up to 109 variables and 10k sample size show that mFGS-BS improves structure learning accuracy relative to the state-of-the-art and it is computationally efficient.
Bayesian Networks (BNs) have become increasingly popular over the last few decades as a tool for reasoning under uncertainty in fields as diverse as medicine, biology, epidemiology, economics and the social sciences. This is especially true in real-world areas where we seek to answer complex questions based on hypothetical evidence to determine actions for intervention. However, determining the graphical structure of a BN remains a major challenge, especially when modelling a problem under causal assumptions. Solutions to this problem include the automated discovery of BN graphs from data, constructing them based on expert knowledge, or a combination of the two. This paper provides a comprehensive review of combinatoric algorithms proposed for learning BN structure from data, describing 61 algorithms including prototypical, well-established and state-of-the-art approaches. The basic approach of each algorithm is described in consistent terms, and the similarities and differences between them highlighted. Methods of evaluating algorithms and their comparative performance are discussed including the consistency of claims made in the literature. Approaches for dealing with data noise in real-world datasets and incorporating expert knowledge into the learning process are also covered.
Learning from data that contain missing values represents a common phenomenon in many domains. Relatively few Bayesian Network structure learning algorithms account for missing data, and those that do tend to rely on standard approaches that assume missing data are missing at random, such as the Expectation-Maximisation algorithm. Because missing data are often systematic, there is a need for more pragmatic methods that can effectively deal with data sets containing missing values not missing at random. The absence of approaches that deal with systematic missing data impedes the application of BN structure learning methods to real-world problems where missingness are not random. This paper describes three variants of greedy search structure learning that utilise pairwise deletion and inverse probability weighting to maximally leverage the observed data and to limit potential bias caused by missing values. The first two of the variants can be viewed as sub-versions of the third and best performing variant, but are important in their own in illustrating the successive improvements in learning accuracy. The empirical investigations show that the proposed approach outperforms the commonly used and state-of-the-art Structural EM algorithm, both in terms of learning accuracy and efficiency, as well as both when data are missing at random and not at random.
This paper builds on recent developments in Bayesian network (BN) structure learning under the controversial assumption that the input variables are dependent. This assumption can be viewed as a learning constraint geared towards cases where the input variables are known or assumed to be dependent. It addresses the problem of learning multiple disjoint subgraphs that do not enable full propagation of evidence. This problem is highly prevalent in cases where the sample size of the input data is low with respect to the dimensionality of the model, which is often the case when working with real data. The paper presents a novel hybrid structure learning algorithm, called SaiyanH, that addresses this issue. The results show that this constraint helps the algorithm to estimate the number of true edges with higher accuracy compared to the state-of-the-art. Out of the 13 algorithms investigated, the results rank SaiyanH 4th in reconstructing the true DAG, with accuracy scores lower by 8.1% (F1), 10.2% (BSF), and 19.5% (SHD) compared to the top ranked algorithm, and higher by 75.5% (F1), 118% (BSF), and 4.3% (SHD) compared to the bottom ranked algorithm. Overall, the results suggest that the proposed algorithm discovers satisfactorily accurate connected DAGs in cases where other algorithms produce multiple disjoint subgraphs that often underfit the true graph.
several algorithms have been proposed towards discovering the graphical structure of bayesian networks. most of these algorithms are restricted to observational data and some enable us to incorporate knowledge as constraints in terms of what can and cannot be discovered by an algorithm. a common type of such knowledge involves the temporal order of the variables in the data. for example, knowledge that event B occurs after observing A and hence, the constraint that B cannot cause A. this paper investigates real-world case studies that incorporate interesting properties of objective temporal variable order, and the impact these temporal constraints have on the learnt graph. the results show that most of the learnt graphs are subject to major modifications after incorporating incomplete temporal objective information. because temporal information is widely viewed as a form of knowledge that is subjective, rather than as a form of data that tends to be objective, it is generally disregarded and reduced to an optional piece of information that only few of the structure learning algorithms may consider. the paper argues that objective temporal information should form part of observational data, to reduce the risk of disregarding such information when available and to encourage its reusability across related studies.
Score-based algorithms that learn Bayesian Network (BN) structures provide solutions ranging from different levels of approximate learning to exact learning. Approximate solutions exist because exact learning is generally not applicable to networks of moderate or higher complexity. In general, approximate solutions tend to sacrifice accuracy for speed, where the aim is to minimise the loss in accuracy and maximise the gain in speed. While some approximate algorithms are optimised to handle thousands of variables, these algorithms may still be unable to learn such high dimensional structures. Some of the most efficient score-based algorithms cast the structure learning problem as a combinatorial optimisation of candidate parent sets. This paper explores a strategy towards pruning the size of candidate parent sets, and which could form part of existing score-based algorithms as an additional pruning phase aimed at high dimensionality problems. The results illustrate how different levels of pruning affect the learning speed relative to the loss in accuracy in terms of model fitting, and show that aggressive pruning may be required to produce approximate solutions for high complexity problems.
Score-based algorithms that learn Bayesian Network (BN) structures provide solutions ranging from different levels of approximate learning to exact learning. Approximate solutions exist because exact learning is generally not applicable to networks of moderate or higher complexity. In general, approximate solutions tend to sacrifice accuracy for speed, where the aim is to minimise the loss in accuracy and maximise the gain in speed. While some approximate algorithms are optimised to handle thousands of variables, these algorithms may still be unable to learn such high dimensional structures. Some of the most efficient score-based algorithms cast the structure learning problem as a combinatorial optimisation of candidate parent sets. This paper explores a strategy towards pruning the size of candidate parent sets, aimed at high dimensionality problems. The results illustrate how different levels of pruning affect the learning speed in conjunction to the loss in accuracy in terms of model fitting, and show that aggressive pruning may be required to produce approximate solutions for high complexity problems.
Despite the massive popularity of the Asian Handicap (AH) football betting market, it has not been adequately studied by the relevant literature. This paper combines rating systems with hybrid Bayesian networks and presents the first published model specifically developed for prediction and assessment of the AH betting market. The results are based on 13 English Premier League seasons and are compared to the traditional 1X2 market. Different betting situations have been examined including a) both average and maximum (best available) market odds, b) all possible betting decision thresholds between predicted and published odds, c) optimisations for both return-on-investment and profit, and d) simple stake adjustments to investigate how the variance of returns changes when targeting equivalent profit in both 1X2 and AH markets. While the AH market is found to share the inefficiencies of the traditional 1X2 market, the findings reveal both interesting differences as well as similarities between the two.
Judea Pearl, a Turing Award prize winner, is a true giant of the field of computer science and artificial intelligence. To say that his new book with Dana Mackenzie is timely is, in our view, an understatement. Coming from somebody of his stature and being written for a general audience (unlike his previous books), means that the concerns we have held about both the limitations of solely data driven approaches to artificial intelligence (AI) and the need for a causal approach, will finally reach a very broad audience.
Scientific research is heavily driven by interest in discovering, assessing, and modelling cause-and-effect relationships as guides for action. Much of the research in discovering relationships between information is based on methods which focus on maximising the predictive accuracy of a target factor of interest from a set of other related factors. However, the best predictors of the target factor are often not its causes and hence, the motto "association does not imply causation". Although the distinction between association and causation is nowadays better understood, what has changed over the past few decades is mostly the way by which the results are stated rather than the way they are generated. Bayesian Networks (BNs) offer a framework for modelling relationships between information under causal or influential assumptions, which makes them suitable for modelling real-world situations where we seek to simulate the impact of various interventions. BNs are also widely recognised as the most appropriate method to model uncertainty in situations where data are limited but where human domain experts have a good understanding of the underlying causal mechanisms and/or real-world facts. Despite these benefits, a BN model alone is incapable of determining the optimal decision path for a given problem. To achieve this, a BN needs to be extended to a Bayesian Decision Network (BDN), also known as an Influence Diagram (ID). In brief, BDNs are BNs augmented with additional functionality and knowledge-based assumptions to support the representation of decisions and associated utilities that a decision maker would like to minimise or maximise . As a result, BDNs are suitable for modelling real-world situations where we seek to discover the optimal decision path to maximise utilities of interest and minimise undesirable risk. Because BNs come from statistical and computing sciences, and whereas BDNs come mainly from decision theory introduced in economics, research works between these two fields only occasionally extend from one field to another. As a result, it is fair to say that the landscape of these approaches has matured rather incoherently between these two fields of research. It is possible to develop a new generation of algorithms and methods to improve the way we 'construct' BDNs. The overall goal of the project is to develop an open-source software that will enable end-users, who may be domain experts and not statisticians, mathematicians, or computer scientists, to quickly and efficiently generate BDNs for optimal real-world decision-making. The proposed system will allow users to incorporate their prior knowledge for information fusion with data, along with relevant decision support requirements for intervention and risk management, but will avoid the levels of manual construction currently required when building BDNs. The system will be evaluated with diverse real-world decision problems including, but not limited to, sports, medicine, forensics, the UK housing market, and the UK financial market.