Article

# Faithfulness and learning hypergraphs from discrete distributions

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

## Abstract

In this paper, we study the concepts of faithfulness and strong-faithfulness for discrete distributions. In the discrete setting, graphs are not sufficient for describing the association structure. So we consider hypergraphs instead, and introduce the concept of parametric (strong-) faithfulness with respect to a hypergraph. Assuming strong-faithfulness, we build uniformly consistent parameter estimators and corresponding procedures for a hypergraph search. The strength of association in a discrete distribution can be quantified with various measures, leading to different concepts of strong-faithfulness. We explore these by computing lower and upper bounds for the proportions of distributions that do not satisfy strong-faithfulness.

## No full-text available

... Using the PC algorithm to identify crosstalk structure in a QIP implies some subtle assumptions about the crosstalk errors. To clarify these, we first note that the PC algorithm is known to fail to detect causal network structure when the probability distribution being sampled from is not faithful to the underlying causal graph [43,45,54]. In our context, faithfulness means that if there exists crosstalk between regions r i and r j , then there exist at least some random variables in r i that exhibit dependence to some random variables r j , vice versa, or both. ...
... In our context, faithfulness means that if there exists crosstalk between regions r i and r j , then there exist at least some random variables in r i that exhibit dependence to some random variables r j , vice versa, or both. The classic example [54,55] where the faithfulness assumption is violated and the PC algorithm fails is with three random variables X 1 , X 2 , X 3 , that are pairwise independent; e.g., if X 1 , X 2 , X 3 are binary, and X 3 = X 1 ⊕ X 2 . This means that X i X j , for any i, j, but (X i X j ) | X k (for i j k). ...
Preprint
Crosstalk occurs in most quantum computing systems with more than one qubit. It can cause a variety of correlated and nonlocal errors, which we call crosstalk errors. They can be especially harmful to fault-tolerant quantum error correction, which generally relies on errors being local and relatively predictable. Mitigating crosstalk errors requires understanding, modeling, and detecting them. In this paper, we introduce a comprehensive framework for crosstalk errors and a protocol for detecting all kinds of crosstalk errors. We begin by giving a rigorous definition of crosstalk errors that captures a wide range of disparate physical phenomena that have been called "crosstalk". The heart of this definition is a concrete error model for crosstalk-free quantum processors. Errors that violate this model are crosstalk errors. Next, we give an equivalent but purely operational (model-independent) definition of crosstalk errors. Finally, using this definition, we construct a protocol for detecting crosstalk errors in a multi-qubit processor. It detects crosstalk errors by evaluating conditional dependencies between observed experimental probabilities, and it is highly efficient in the sense that the number of unique experiments required scales linearly with the number of qubits. We demonstrate the protocol using simulations of 2-qubit and 6-qubit processors.
Article
Crosstalk occurs in most quantum computing systems with more than one qubit. It can cause a variety of correlated and nonlocal crosstalk errors that can be especially harmful to fault-tolerant quantum error correction, which generally relies on errors being local and relatively predictable. Mitigating crosstalk errors requires understanding, modeling, and detecting them. In this paper, we introduce a comprehensive framework for crosstalk errors and a protocol for detecting and localizing them. We give a rigorous definition of crosstalk errors that captures a wide range of disparate physical phenomena that have been called crosstalk'', and a concrete model for crosstalk-free quantum processors. Errors that violate this model are crosstalk errors. Next, we give an equivalent but purely operational (model-independent) definition of crosstalk errors. Using this definition, we construct a protocol for detecting a large class of crosstalk errors in a multi-qubit processor by finding conditional dependencies between observed experimental probabilities. It is highly efficient, in the sense that the number of unique experiments required scales at most cubically, and very often quadratically, with the number of qubits. We demonstrate the protocol using simulations of 2-qubit and 6-qubit processors.
Article
Full-text available
The operations of edge addition and deletion for hierarchical log-linear models are defined, and polynomial-time algorithms for the operations are given.
Book
Full-text available
What assumptions and methods allow us to turn observations into causal knowledge, and how can even incomplete causal knowledge be used in planning and prediction to influence and control our environment? In this book Peter Spirtes, Clark Glymour, and Richard Scheines address these questions using the formalism of Bayes networks, with results that have been applied in diverse areas of research in the social, behavioral, and physical sciences. The authors show that although experimental and observational study designs may not always permit the same inferences, they are subject to uniform principles. They axiomatize the connection between causal structure and probabilistic independence, explore several varieties of causal indistinguishability, formulate a theory of manipulation, and develop asymptotically reliable procedures for searching over equivalence classes of causal models, including models of categorical data and structural equation models with and without latent variables. The authors show that the relationship between causality and probability can also help to clarify such diverse topics in statistics as the comparative power of experimentation versus observation, Simpson's paradox, errors in regression models, retrospective versus prospective sampling, and variable selection. The second edition contains a new introduction and an extensive survey of advances and applications that have appeared since the first edition was published in 1993.
Article
Full-text available
A fundamental question in causal inference is whether it is possible to reliably infer manipulation effects from observational data. There are a variety of senses of asymptotic reliability in the statistical literature, among which the most commonly discussed frequentist notions are pointwise consistency and uniform consistency. Uniform consistency is in general preferred to pointwise consistency because the former allows us to control the worst case error bounds with a finite sample size. In the sense of pointwise consistency, several reliable causal inference algorithms have been established under the Markov and Faithfulness assumptions [Pearl 2000, Spirtes et al. 2001]. In the sense of uniform consistency, however, reliable causal inference is impossible under the two assumptions when time order is unknown and/or latent confounders are present [Robins et al. 2000]. In this paper we present two natural generalizations of the Faithfulness assumption in the context of structural equation models, under which we show that the typical algorithms in the literature (in some cases with modifications) are uniformly consistent even when the time order is unknown. We also discuss the situation where latent confounders may be present and the sense in which the Faithfulness assumption is a limiting case of the stronger assumptions.
Article
Full-text available
Many algorithms for inferring causality rely heavily on the faithfulness assumption. The main justification for imposing this assumption is that the set of unfaithful distributions has Lebesgue measure zero, since it can be seen as a collection of hypersurfaces in a hypercube. However, due to sampling error the faithfulness condition alone is not sufficient for statistical estimation, and strong-faithfulness has been proposed and assumed to achieve uniform or high-dimensional consistency. In contrast to the plain faithfulness assumption, the set of distributions that is not strong-faithful has nonzero Lebesgue measure and in fact, can be surprisingly large as we show in this paper. We study the strong-faithfulness condition from a geometric and combinatorial point of view and give upper and lower bounds on the Lebesgue measure of strong-faithful distributions for various classes of directed acyclic graphs. Our results imply fundamental limitations for the PC-algorithm and potentially also for other algorithms based on partial correlation testing in the Gaussian case.
Article
Full-text available
Bayesian Networks (BNs) are useful tools giving a natural and compact representation of joint probability distributions. In many applications one needs to learn a Bayesian Network (BN) from data. In this context, it is important to understand the number of samples needed in order to guarantee a successful learning. Previous work have studied BNs sample complexity, yet it mainly focused on the requirement that the learned distribution will be close to the original distribution which generated the data. In this work, we study a different aspect of the learning, namely the number of samples needed in order to learn the correct structure of the network. We give both asymptotic results, valid in the large sample limit, and experimental results, demonstrating the learning behavior for feasible sample sizes. We show that structure learning is a more difficult task, compared to approximating the correct distribution, in the sense that it requires a much larger number of samples, regardless of the computational power available for the learner.
Chapter
Full-text available
The paper discusses statistical models for categorical data based on directed acyclic graphs (DAGs) assuming that only effects associated with the arrows of the graph exist. Graphical models based on DAGs are similar, but allow the existence of effects not directly associated with any of the arrows. Graphical models based on DAGs are marginal models and are best parameterized by using hierarchical marginal log-linear parameters. Path models are defined here by assuming that all hierarchical marginal log-linear parameters not associated with an arrow are zero, providing a parameterization with straightforward interpretation. The paper gives a brief review of log-linear, graphical and marginal models, presents a method for the maximum likelihood estimation of path models and illustrates the use of path models, with special emphasis on the interpretation of estimated parameter values, to real data.
Conference Paper
Full-text available
Bayesian Networks (BNs) are useful tools giv- ing a natural and compact representation of joint probability distributions. In many ap- plications one needs to learn a Bayesian Net- work (BN) from data. In this context, it is important to understand the number of sam- ples needed in order to guarantee a successful learning. Previous works have studied BNs sample complexity, yet they mainly focused on the requirement that the learned distri- bution will be close to the original distribu- tion which generated the data. In this work, we study a difierent aspect of the learning task, namely the number of samples needed in order to learn the correct structure of the network. We give both asymptotic results (lower and upper-bounds) on the probability of learning a wrong structure, valid in the large sample limit, and experimental results, demonstrating the learning behavior for fea- sible sample sizes.
Article
Full-text available
We present a sound and complete graphical criterion for reading dependencies from the minimal undirected independence map G of a graphoid M that satisfies weak transitivity. Here, complete means that it is able to read all the dependencies in M that can be derived by applying the graphoid properties and weak transitivity to the dependencies used in the construction of G and the inde- pendencies obtained from G by vertex separation. We argue that assuming weak transitivity is not too restrictive. As an intermediate step in the derivation o f the graphical criterion, we prove that for any undirected graph G there exists a strictly positive discrete probability dist ribution with the prescribed sample spaces that is faithful to G. We also report an algorithm that implements the graphical criterion and whose running time is considered to be at most O(n2(e + n)) for n nodes
Article
Full-text available
Sumario: Structural models for counted data -- Maximum likelihood estimates for complete tables -- Formal goodness of it: summary statistics and model selection -- Maximum likelihood estimation for incomplete tables -- Estimating the size of a closed population -- Models for measuring change -- Analysis of square tables: symmetry and marginal homogeneity -- Model selection and assessing closeness of fit: practical aspects -- Other methods for estimation and testing in cross-classifications -- Measures of association and agreement -- Pseudo-bayes estimates of cell probabilities -- Sampling models for discrete data -- Asymptotic methods.
Article
Full-text available
Statistical models defined by imposing restrictions on marginal distributions of contingency tables have received considerable attention recently. This paper introduces a general definition of marginal log-linear parameters and describes conditions for a marginal log-linear parameter to be a smooth parameterization of the distribution and to be variation independent. Statistical models defined by imposing affine restrictions on the marginal log-linear parameters are investigated. These models generalize ordinary log-linear and multivariate logistic models. Sufficient conditions for a log-affine marginal model to be nonempty and to be a curved exponential family are given. Standard large-sample theory is shown to apply to maximum likelihood estimation of log-affine marginal models for a variety of sampling procedures.
Article
Full-text available
There is a long tradition of representing causal relationships by directed acyclic graphs (Wright, 1934). Spirtes (1994), Spirtes et al. (1993) and Pearl & Verma (1991) describe procedures for inferring the presence or absence of causal arrows in the graph even if there might be unobserved confounding variables, and&sol;or an unknown time order, and that under weak conditions, for certain combinations of directed acyclic graphs and probability distributions, are asymptotically, in sample size, consistent. These results are surprising since they seem to contradict the standard statistical wisdom that consistent estimators of causal effects do not exist for nonrandomised studies if there are potentially unobserved confounding variables. We resolve the apparent incompatibility of these views by closely examining the asymptotic properties of these causal inference procedures. We show that the asymptotically consistent procedures are ‘pointwise consistent’, but ‘uniformly consistent’ tests do not exist. Thus, no finite sample size can ever be guaranteed to approximate the asymptotic results. We also show the nonexistence of valid, consistent confidence intervals for causal effects and the nonexistence of uniformly consistent point estimators. Our results make no assumption about the form of the tests or estimators. In particular, the tests could be classical independence tests, they could be Bayes tests or they could be tests based on scoring methods such as BIC or AIC. The implications of our results for observational studies are controversial and are discussed briefly in the last section of the paper. The results hinge on the following fact: it is possible to find, for each sample size n , distributions P and Q such that P and Q are empirically indistinguishable and yet P and Q correspond to different causal effects.
Article
Full-text available
Much of the recent work on the epistemology of causation has centered on two assumptions, known as the Causal Markov Condition and the Causal Faithfulness Condition. Philosophical discussions of the latter condition have exhibited situations in which it is likely to fail. This paper studies the Causal Faithfulness Condition as a conjunction of weaker conditions. We show that some of the weaker conjuncts can be empirically tested, and hence do not have to be assumed a priori. Our results lead to two methodologically significant observations: (1) some common types of counterexamples to the Faithfulness condition constitute objections only to the empirically testable part of the condition; and (2) some common defenses of the Faithfulness condition do not provide justification or evidence for the testable parts of the condition. It is thus worthwhile to study the possibility of reliable causal inference under weaker Faithfulness conditions. As it turns out, the modification needed to make standard procedures work under a weaker version of the Faithfulness condition also has the practical effect of making them more robust when the standard Faithfulness condition actually holds. This, we argue, is related to the possibility of controlling error probabilities with finite sample size (‘‘uniform consistency’’) in causal inference.
Article
Full-text available
We consider the PC-algorithm Spirtes et. al. (2000) for estimating the skeleton of a very high-dimensional acyclic directed graph (DAG) with corresponding Gaussian distribution. The PC-algorithm is computationally feasible for sparse problems with many nodes, i.e. variables, and it has the attractive property to automatically achieve high computational efficiency as a function of sparseness of the true underlying DAG. We prove consistency of the algorithm for very high-dimensional, sparse DAGs where the number of nodes is allowed to quickly grow with sample size n, as fast as O(n^a) for any 0<a<infinity. The sparseness assumption is rather minimal requiring only that the neighborhoods in the DAG are of lower order than sample size n. We empirically demonstrate the PC-algorithm for simulated data and argue that the algorithm is rather insensitive to the choice of its single tuning parameter.
Article
When populations are cross-classified with respect to two or more classifications or polytomies, questions often arise about the degree of association existing between the several polytomies. Most of the traditional measures or indices of association are based upon the standard chi-square statistic or on an assumption of underlying joint normality. In this paper a number of alternative measures are considered, almost all based upon a probabilistic model for activity to which the cross-classification may typically lead. Only the case in which the population is completely known is considered, so no question of sampling or measurement error appears. We hope, however, to publish before long some approximate distributions for sample estimators of the measures we propose, and approximate tests of hypotheses. Our major theme is that the measures of association used by an empirical investigator should not be blindly chosen because of tradition and convention only, although these factors may properly be given some weight, but should be constructed in a manner having operational meaning within the context of the particular problem.
Article
We consider the problem of learning a directed acyclic graph (DAG) model based on conditional independence testing. The most widely used approaches to this problem are variants of the PC algorithm. One of the drawbacks of the PC algorithm is that it requires the strong-faithfulness assumption, which has been show to be restrictive especially for graphs with undirected cycles in the skeleton. In this paper, we propose an alternative method based on finding the permutation of the variables that yields the sparsest DAG. We prove that the constraints required for our sparsest permutation (SP) algorithm are strictly weaker than faithfulness and are necessary for any causal inference algorithm based on conditional independence testing. Through specific examples and simulations we show that the SP algorithm has better performance than the PC algorithm. In the Gaussian setting, we prove that our algorithm boils down to finding the permutation of the variables with sparsest Cholesky decomposition for the inverse covariance matrix. Using this connection, we show that in the oracle setting, where the true covariance matrix is known, the SP algorithm is in fact equivalent to $\ell_0$-penalized maximum likelihood estimation.
Article
A certain class of patterns of association can be investigated by fitting multiplicative models to a contingency table or by using covariance selection on a covariance matrix. We show that each multiplicative model for a contingency table corresponds to one particular covariance selection model, and we point at the resulting similarities in the interpretation of patterns, in test statistics for each pattern and in implied marginal associations among variable pairs.
Article
A completeness result for d-separation applied to discrete Bayesian networks is presented and it is shown that in a strong measure-theoretic sense almost all discrete distributions for a given network structure are faithful; i.e. the independence facts true of the distribution are all and only those entailed by the network structure.
Chapter
"At last, after a decade of mounting interest in log-linear and related models for the analysis of discrete multivariate data, particularly in the form of multidimensional tables, we now have a comprehensive text and general reference on the subject. Even a mediocre attempt to organize the extensive and widely scattered literature on discrete multivariate analysis would be welcome; happily, this is an excellent such effort, but a group of Harvard statisticians taht has contributed much to the field. Their book ought to serve as a basic guide to the analysis of quantitative data for years to come." -James R. Beninger, Contemporary Sociology "A welcome addition to multivariate analysis. The discussion is lucid and very leisurely, excellently illustrated with applications drawn from a wide variety of fields. A good part of the book can be understood without very specialized statistical knowledge. It is a most welcome contribution to an interesting and lively subject." -D.R. Cox, Nature "Discrete Multivariate Analysis is an ambitious attempt to present log-linear models to a broad audience. Exposition is quite discursive, and the mathematical level, except in Chapters 12 and 14, is very elementary. To illustrate possible applications, some 60 different sets of data have been gathered together from diverse fields. To aid the reader, an index of these examples has been provided. ...the book contains a wealth of material on important topics. Its numerous examples are especially valuable." -Shelby J. Haberman, The Annals of Statistics. © 2007 Springer Science+Business Media, LLC. All rights reserved.
Categorical Data Analysis Information and Exponential Families Marginal models for categorical data
• A Agresti
• O E Barndorff-Nielsen
Agresti, A., 2002. Categorical Data Analysis. Wiley, New York. Barndorff-Nielsen, O.E., 1978. Information and Exponential Families. Wiley, New York. Bergsma, W.P., Rudas, T., 2002. Marginal models for categorical data. Ann. Statist. 30, 140–159.
Discrete Multivariate Analysis: Theory and Practice Introduction to Graphical Modeling A note on adding and deleting edges in hierarchical log-linear models
• Y M M Bishop
• S E Fienberg
• P W Holland
Bishop, Y.M.M., Fienberg, S.E., Holland, P.W., 1975. Discrete Multivariate Analysis: Theory and Practice. MIT. Edwards, D., 2000. Introduction to Graphical Modeling. Springer, London. Edwards, D., 2012. A note on adding and deleting edges in hierarchical log-linear models. Comput. Statist. 27, 799–803.
Probabilistic Conditional Independence Structures Learning directed acyclic graphs based on sparsest permutations Geometry of faithfulness assumption in causal inference
• M Studený
• Springer
• London
• C Uhler
Studený, M., 2005. Probabilistic Conditional Independence Structures. Springer, London. Uhler, C., Raskutti, G., 2013. Learning directed acyclic graphs based on sparsest permutations. arXiv:1307.0366. Uhler, C., Raskutti, G., Bühlmann, P., Yu, B., 2013. Geometry of faithfulness assumption in causal inference. Ann. Statist. 41, 436–463.
Causation, prediction and search Probabilistic conditional independence structures Learning directed acyclic graphs based on sparsest permuta-tions Geometry of faithfulness assumption in causal inference
• P Spirtes
• C Glymour
• R Scheines
• Springer
• London
• C Uhler