ArticlePDF Available

Abstract and Figures

This paper describes a class of probabilistic approximation algorithms based on bucket elimination which offer adjustable levels of accuracy and efficiency. We analyze the approximation for several tasks: finding the most probable explanation, belief updating and finding the maximum a posteriori hypothesis. We identify regions of completeness and provide preliminary empirical evaluation on randomly generated networks. 1 Overview Bucket elimination, is a unifying algorithmic framework that generalizes dynamic programming to enable many complex problem-solving and reasoning activities. Among the algorithms that can be accommodated within this framework are directional resolution for propositional satisfiability, adaptive consistency for constraint satisfaction, Fourier and Gaussian elimination for linear equalities and inequalities, and dynamic programming for combinatorial optimization [ 7 ] . Many algorithms for probabilistic inference, such as belief updating, finding the most proba...
Content may be subject to copyright.
U1U2 U3 U 4
s1 Y
s3 Ys4
10000 M/L
Accuracy vs efficiency:
M/L and U/M vs TR, i =12
M/L and U/M
... Recently, a class of parameterized approximation algorithms based on the bucket elimination framework was proposed and analyzed [3]. The approximation scheme uses as a controlling parameter a bound on size of probabilistic functions created during variable elimination, allowing a trade-o between accuracy and eciency [7,5]. The algorithms were presented and analyzed for several tasks, such as: nding the most probable explanation (mpe), nding the maximum a posteriori hypothesis (map), and belief updating. ...
... Else, generate the functions h p : h p = max Xp Since processing a bucket may create functions having a much larger arity than the input functions, we proposed in [7] to approximate these functions by a collection of smaller arity functions. Let h 1 ; :::; h j be the functions in the bucket of X p , and let S 1 ; :::; S j be the variable subsets on which those functions are dened. ...
... A function f subsumes the function g if each argument of g is also an argument of f. It was shown [7] that algorithm approx-mpe(i; m) computes an upper bound to the mpe in time O(m 1 exp(2i)) and space O(m 1 exp(i)), where n and m 2 i . Algorithm approx-mpe(i,m) Input: A belief network BN = fP 1 ; :::; P n g; and an ordering of the variables, d; Output: An upper bound on the most probable assignment, given evidence e. 1. Initialize: Partition into bucket 1 , . . ...
Full-text available
It was recently shown that the problem of decoding messages transmitted through a noisy channel can be formulated as a belief updating task over a probabilistic network [McEliece]. Moreover, it was observed that iterative application of the (linear time) Pearl's belief propagation algorithm designed for polytrees outperformed state of the art decoding algorithms, even though the corresponding networks may have many cycles. This paper demonstrates empirically that an approximation algorithm approx-mpe for solving the most probable explanation (MPE) problem, developed within the recently proposed mini-bucket elimination framework [Dechter96], outperforms iterative belief propagation on classes of coding networks that have bounded induced width. Our experiments suggest that approximate MPE decoders can be good competitors to the approximate belief updating decoders.
... The scope-based partition heuristic (SCP) proposed in (Dechter and Rish 1997) and used since, aims at minimizing the number of mini-buckets in the partition by including in each mini-bucket as many functions as possible as long as the z bound is satisfied. First, single function minibuckets are decreasingly ordered according to their arity. ...
Mini-Bucket Elimination (MBE) is a well-known approximation algorithm deriving lower and upper bounds on quantities of interest over graphical models. It relies on a procedure that partitions a set of functions, called bucket, into smaller subsets, called mini-buckets. The method has been used with a single partitioning heuristic throughout, so the impact of the partitioning algorithm on the quality of the generated bound has never been investigated. This paper addresses this issue by presenting a framework within which partitioning strategies can be described, analyzed and compared. We derive a new class of partitioning heuristics from first-principles geared for likelihood queries, demonstrate their impact on a number of benchmarks for probabilistic reasoning and show that the results are competitive (often superior) to state-of-the-art bounding schemes.
... Of course, introducing identical copies x\ X \ • • • ,X^ of Xi will introduce conflicts of the Xi values and errors in the marginalization. Experimental analyses appear in Chapter 8. Min-Buckets [19] is the first algorithm proposed for applying this idea to solve probability inference problems approximately. In this section, we generalize the idea of Min-Buckets as the approximate VE algorithm in Algorithm 5.2. ...
... Reconstruction algorithms based on variable elimination [23,24] reconstruct the potentials. Since the size of a potential is exponential to the width of the elimination order, certain approximations, such as the mini-bucket algorithm [22], must be used to constrain the size of the potentials. A recent advanced example of this approach is DIS [25,26]. ...
Approximate Bayesian inference by importance sampling derives probabilistic statements from a Bayesian network, an essential part of evidential reasoning with the network and an important aspect of many Bayesian methods. A critical problem in importance sampling on Bayesian networks is the selection of a good importance function to sample a network's prior and posterior probability distribution. The initially optimal importance functions eventually start deviating from the optimal function when sampling a network's posterior distribution given evidence, even when adaptive methods are used that adjust an importance function to the evidence by learning. In this article we propose a new family of Refractor Importance Sampling (RIS) algorithms for adaptive importance sampling under evidential reasoning. RIS applies ''arc refractors'' to a Bayesian network by adding new arcs and refining the conditional probability tables. The goal of RIS is to optimize the importance function for the posterior distribution and reduce the error variance of sampling. Our experimental results show a significant improvement of RIS over state-of-the-art adaptive importance sampling algorithms.
Full-text available
Multi-dimensional classification is a cutting-edge problem, in which the values of multiple class variables have to be simultaneously assigned to a given example. It is an extension of the well known multi-label subproblem, in which the class variables are all binary. In this article, we review and expand the set of performance evaluation measures suitable for assessing multi-dimensional classifiers. We focus on multi-dimensional Bayesian network classifiers, which directly cope with multi-dimensional classification and consider dependencies among class variables. A comprehensive survey of this state-of-the-art classification model is offered by covering aspects related to their learning and inference process complexities. We also describe algorithms for structural learning, provide real-world applications where they have been used, and compile a collection of related software.
Conference Paper
This paper proposes a flexible framework to work with probabilistic potentials in Probabilistic Graphical Models. The so-called Extended Probability Trees allow the representation of multiplicative and additive factorisations within the structure, along with context-specific independencies, with the aim of providing a way of representing and managing complex distributions. This work gives the details of the structure and develops the basic operations on potentials necessary to perform inference. The three basic operations, namely restriction, combination and marginalisation, are defined so they can take advantage of the defined factorisations within the structure, following a lazy methodology.
The most important function of software metrics is to support quantitative decision of software project management. In this paper, we focus on identification of the most critical risk factors in software project risk management framework based on metrics and Bayesian network. Sensitivity analysis can be performed to study how sensitive the risk node’s probability is according to small changes of probability parameters in the risk BN. For a risk BN of known structure and probability parameters, we first estimate the most probable risk scenario, and then perform sensitivity analysis for the risk node. After we find the critical risk factors, concentrate on these factors in risk monitoring and control process.
Conference Paper
Graphical models are one of the most prominent frameworks to model complex systems and efficiently query them. Their underlying algebraic properties are captured by a valuation structure that, most usually, is a semiring. Depending on the semiring of choice, we can capture probabilistic models, constraint networks, cost networks, etc. In this paper we address the partitioning problem which occurs in many approximation techniques such as mini-bucket elimination and joingraph propagation algorithms. Roghly speaking, subject to complexity bounds, the algorithm needs to find a partition of a set of factors such that best approximates the whole set. While this problem has been addressed in the past in a particular case, we present here a general description. Furthermore, we also propose a general partitioning scheme. Our proposal is general in the sense that it is presented in terms of a generic semiring with the only additional requirements of a division operation and a refinement of its order. The proposed algorithm instantiates to the particular task of computing the probability of evidence, but also applies directly to other important reasoning tasks. We demonstrate its good empirical behaviour on the problem of computing the most probable explanation.
Recursive probability trees (RPTs) are a data structure for representing several types of potentials involved in probabilistic graphical models. The RPT structure improves the modeling capabilities of previous structures (like probability trees or conditional probability tables). These capabilities can be exploited to gain savings in memory space and/or computation time during inference. This paper describes the modeling capabilities of RPTs as well as how the basic operations required for making inference on Bayesian networks operate on them. The performance of the inference process with RPTs is examined with some experiments using the variable elimination algorithm.
Full-text available
Probabilistic inference algorithms for belief updating, finding the most probable explanation, the maximum a posteriori hypothesis, and the maximum expected utility are reformulated within the bucket elimination framework. This emphasizes the principles common to many of the algorithms appearing in the probabilistic inference literature and clarifies the relationship of such algorithms to nonserial dynamic programming algorithms. A general method for combining conditioning and bucket elimination is also presented. For all the algorithms, bounds on complexity are given as a function of the problem's structure.
Heuristics stand for strategies using readily accessible information to control problem-solving processes in man and machine. This book presents an analysis of the nature and the power of typical heuristic methods, primarily those used in artificial intelligence and operations research, to solve problems in areas such as reasoning, design, scheduling, planning, signal interpretation, symbolic computation, and combinatorial optimization. It is intended for advanced undergraduate or graduate students in artificial intelligence and for researchers in engineering, mathematics, and operations research.
In this survey I will give an overview of recent developments in methods for solving combinatorially difficult, i.e., NP-hard, problems defined on simple, labelled or unlabelled, graphs or hypergraphs. Special instances of these methods are repeatedly being invented in applications. They can be characterized as table-based reduction methods, because they work by successively eliminating the vertices of the problem graph, while building tables with the information about the eliminated part of the graph required to solve the problem at hand. Bertele and Brioschi [9] give a fUll account of the state-of-the-art in 1972 of these methods applied to non-serial optimization problems. A preliminary taste of the family of algorithms I consider is given by the most well-known of all graph algorithms: the series-parallel reduction method for computing the resistance between two terminals of a network of resistors. It was invented by Ohm (1787-1854) and is contained in most high-school physics curricula. However, since resistance computation is not combinatorially difficult (the resistance is easily obtained, via Kirchhol~s laws, from the solution tO a linear system of equations), I will illustrate the method on a different problem, that of computing the probability that a connection exists between two terminals of a series-parallel network of communication links that are unreliable and fail independently of each other, with given probabilities. This problem is NP-hard for arbitrary graphs, see Garey and Johnson [17, problem ND20], basically because all states of the set of communication links must be considered. On a series-parallel network, however, a simple reduction method will work. The method successively eliminates non-terminal nodes which are adjacent to at most two links, by a local transformation of the network. The elimination operation can be defined as in Figure 1.1., where the quantities Pi labelling edges are the probabilities that the corresponding links work correctly. If a link of the
Local consistency has proven to be an important concept in the theory and practice of constraint networks. In this paper, we present a new definition of local consistency, called relational consistency. The new definition is relation-based, in contrast with the previous definition of local consistency, which we characterize as variable-based. We show the conceptual power of the new definition by showing how it unifies known elimination operators such as resolution in theorem proving, joins in relational databases, and variable elimination for solving linear inequalities. Algorithms for enforcing various levels of relational consistency are introduced and analyzed. We also show the usefulness of the new definition in characterizing relationships between properties of constraint networks and the level of local consistency needed to ensure global consistency.
Many AI tasks can be formulated as constraint-satisfaction problems (CSP), i.e., the assignment of values to variables subject to a set of constraints. While some CSPs are hard, those that are easy can often be mapped into sparse networks of constraints which, in the extreme case, are trees. This paper identifies classes of problems that lend themselves to easy solutions, and develops algorithms that solve these problems optimally. The paper then presents a method of generating heuristic advice to guide the order of value assignments based on both the sparseness found in the constraint network and the simplicity of tree-structured CSPs. The advice is generated by simplifying the pending subproblems into trees, counting the number of consistent solutions in each simplified subproblem, and comparing these counts to decide among the choices pending in the original problem.
We describe the close connection between the now celebrated iterative turbo decoding algorithm of Berrou et al. (1993) and an algorithm that has been well known in the artificial intelligence community for a decade, but which is relatively unknown to information theorists: Pearl's (1982) belief propagation algorithm. We see that if Pearl's algorithm is applied to the “belief network” of a parallel concatenation of two or more codes, the turbo decoding algorithm immediately results. Unfortunately, however, this belief diagram has loops, and Pearl only proved that his algorithm works when there are no loops, so an explanation of the experimental performance of turbo decoding is still lacking. However, we also show that Pearl's algorithm can be used to routinely derive previously known iterative, but suboptimal, decoding algorithms for a number of other error-control systems, including Gallager's (1962) low-density parity-check codes, serially concatenated codes, and product codes. Thus, belief propagation provides a very attractive general methodology for devising low-complexity iterative decoding algorithms for hybrid coded systems
We present a unified graphical model framework for describing compound codes and deriving iterative decoding algorithms. After reviewing a variety of graphical models (Markov random fields, Tanner graphs, and Bayesian networks), we derive a general distributed marginalization algorithm for functions described by factor graphs. From this general algorithm, Pearl's (1986) belief propagation algorithm is easily derived as a special case. We point out that iterative decoding algorithms for various codes, including “turbo decoding” of parallel-concatenated convolutional codes, may be viewed as probability propagation in a graphical model of the code. We focus on Bayesian network descriptions of codes, which give a natural input/state/output/channel description of a code and channel, and we indicate how iterative decoders can be developed for parallel-and serially concatenated coding systems, product codes, and low-density parity-check codes