Clark Glymour's research while affiliated with Carnegie Mellon University and other places

Publications (233)

Preprint
Most causal discovery procedures assume that there are no latent confounders in the system, which is often violated in real-world problems. In this paper, we consider a challenging scenario for causal structure identification, where some variables are latent and they form a hierarchical graph structure to generate the measured variables; the childr...
Article
Full-text available
The ultimate focus of the current essay is on methods of "creative abduction" that have some guarantees as reliable guides to the truth, and those that do not. Emphasizing work by Richard Englehart using data from the World Values Survey, Gerhard Schurz has analyzed literature surrounding Samuel Huntington's well-known claims that civilization is d...
Preprint
Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and using their representation that contains essential and sufficient information required by downstream decision-making tasks will help improve computational efficiency and generalization ability in the tasks. In this paper, we focus on partially observab...
Preprint
Full-text available
We consider the problem of estimating a particular type of linear non-Gaussian model. Without resorting to the overcomplete Independent Component Analysis (ICA), we show that under some mild assumptions, the model is uniquely identified by a hybrid method. Our method leverages the advantages of constraint-based methods and independent noise-based m...
Preprint
Full-text available
Causal discovery aims to recover causal structures or models underlying the observed data. Despite its success in certain domains, most existing methods focus on causal relations between observed variables, while in many scenarios the observed ones may not be the underlying causal variables (e.g., image pixels), but are generated by latent causal v...
Article
A number of approaches to causal discovery assume that there are no hidden confounders and are designed to learn a fixed causal model from a single data set. Over the last decade, with closer cooperation across laboratories, we are able to accumulate more variables and data for analysis, while each lab may only measure a subset of them, due to tech...
Preprint
This paper is concerned with data-driven unsupervised domain adaptation, where it is unknown in advance how the joint distribution changes across domains, i.e., what factors or modules of the data distribution remain invariant or change across domains. To develop an automated way of domain adaptation with multiple source domains, we propose to use...
Article
Full-text available
p>The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In large-scale complex dynamical systems such as the Earth system, real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated data opens up the use of novel data-driven causal methods...
Preprint
Standard fMRI connectivity analyses depend on aggregating the time series of individual voxels within regions of interest (ROIs). In certain cases, this spatial aggregation implies a loss of valuable functional and anatomical information about smaller subsets of voxels that drive the ROI level connectivity. We use two recently published graphical s...
Article
Full-text available
A fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical...
Article
In many scientific fields, such as economics and neuroscience, we are often faced with nonstationary time series, and concerned with both finding causal relations and forecasting the values of variables of interest, both of which are particularly challenging in such nonstationary environments. In this paper, we study causal discovery and forecastin...
Preprint
In many scientific fields, such as economics and neuroscience, we are often faced with nonstationary time series, and concerned with both finding causal relations and forecasting the values of variables of interest, both of which are particularly challenging in such nonstationary environments. In this paper, we study causal discovery and forecastin...
Preprint
It is commonplace to encounter nonstationary or heterogeneous data. Such a distribution shift feature presents both challenges and opportunities for causal discovery, of which the underlying generating process changes over time or across domains. In this paper, we develop a principled framework for causal discovery from such data, called Constraint...
Preprint
Full-text available
Autism spectrum disorder (ASD) is one of the major developmental disorders affecting children. Recently, it has been hypothesized that ASD is associated with atypical brain connectivities. A substantial body of researches use Pearson's correlation coefficients, mutual information, or partial correlation to investigate the differences in brain conne...
Article
Full-text available
A central theme in western philosophy was to find formal methods that can reliably discover empirical relationships and their explanations from data assembled from experience. As a philosophical project, that ambition was abandoned in the 20th century and generally dismissed as impossible. It was replaced in philosophy by neo-Kantian efforts at rec...
Article
Motivation: Integration of data from different modalities is a necessary step for multi-scale data analysis in many fields, including biomedical research and systems biology. Directed graphical models offer an attractive tool for this problem because they can represent both the complex, multivariate probability distributions and the causal pathway...
Article
Full-text available
Modern technologies allow large, complex biomedical datasets to be collected from patient cohorts. These datasets are comprised of both continuous and categorical data (“Mixed Data”), and essential variables may be unobserved in this data due to the complex nature of biomedical phenomena. Causal inference algorithms can identify important relations...
Conference Paper
Full-text available
Discovery of causal relationships from observational data is a fundamental problem. Roughly speaking, there are two types of methods for causal discovery, constraint-based ones and score-based ones. Score-based methods avoid the multiple testing problem and enjoy certain advantages compared to constraint-based ones. However, most of them need stron...
Article
Full-text available
We test the adequacies of several proposed and two new statistical methods for recovering the causal structure of systems with feedback from synthetic BOLD time series. We compare an adaptation of the first correct method for recovering cyclic linear systems; Granger causal regression; a multivariate autoregressive model with a permutation test; th...
Article
We propose a new generative model for domain adaptation, in which training data (source domain) and test data (target domain) come from different distributions. An essential problem in domain adaptation is to understand how the distribution shifts across domains. For this purpose, we propose a generative domain adaptation network to understand and...
Preprint
Full-text available
We test the adequacies of several proposed and two new statistical methods for recovering the causal structure of systems with feedback that generate noisy time series closely matching real BOLD time series. We compare: an adaptation for time series of the first correct method for recovering the structure of cyclic linear systems; multivariate Gran...
Conference Paper
We address two important issues in causal discovery from nonstationary or heterogeneous data, where parameters associated with a causal structure may change over time or across data sets. First, we investigate how to efficiently estimate the "driving force" of the nonstationarity of a causal mechanism. That is, given a causal mechanism that varies...
Article
Discovering causal structure of a dynamical system from observed time series is a traditional and important problem. In many practical applications, observed data are obtained by applying subsampling or temporally aggregation to the original causal processes, making it difficult to discover the underlying causal relations. Subsampling refers to the...
Conference Paper
It is commonplace to encounter nonstationary or heterogeneous data, of which the underlying generating process changes over time or across data sets (the data sets may have different experimental conditions or data collection conditions). Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this pape...
Article
Full-text available
Measurement error in the observed values of the variables can greatly change the output of various causal discovery methods. This problem has received much attention in multiple fields, but it is not clear to what extent the causal model for the measurement-error-free variables can be identified in the presence of measurement error with unknown var...
Article
Full-text available
Graphical causal models are an important tool for knowledge discovery because they can represent both the causal relations between variables and the multivariate probability distributions over the data. Once learned, causal graphs can be used for classification, feature selection and hypothesis generation, while revealing the underlying causal netw...
Article
Full-text available
We describe two modifications that parallelize and reorganize caching in the well-known Greedy Equivalence Search algorithm for discovering directed acyclic graphs on random variables from sample values. We apply one of these modifications, the Fast Greedy Equivalence Search (fGES) assuming faithfulness, to an i.i.d. sample of 1000 units to recover...
Preprint
Full-text available
Background Standard BOLD connectivity analyses depend on aggregating the signals of individual voxels within regions of interest (ROIs). In certain cases, this spatial aggregation implies a loss of valuable functional and anatomical information about subsets of voxels that drive the ROI level connectivity. New Method We use the FGES algorithm, a d...
Preprint
Full-text available
Scale-free networks (SFN) arise from simple growth processes, which can encourage efficient, centralized and fault tolerant communication ( 1 ). Recently its been shown that stable network hub structure is governed by a phase transition at exponents ( >2.0) causing a dramatic change in network structure including a loss of global connectivity, an i...
Article
Domain adaptation arises in supervised learning when the training (source domain) and test (target domain) data have different distributions. Let X and Y denote the features and target, respectively, previous work on domain adaptation mainly considers the covariate shift situation where the distribution of the features P(X) changes across domains w...
Article
Full-text available
Scale-free networks (SFN) arise from simple growth processes, which can encourage efficient, centralized and fault tolerant communication (1). Recently its been shown that stable network hub structure is governed by a phase transition at exponents (>2.0) causing a dramatic change in network structure including a loss of global connectivity, an incr...
Preprint
Full-text available
Standard BOLD connectivity analyses depend on aggregating the signals of individual voxel within regions of interest (ROIs). In certain cases, this aggregation implies a loss of valuable functional and anatomical information about sub-regions of voxels that drive the ROI level connectivity. We describe a data-driven statistical search method that i...
Chapter
The aim of confirmation theory is to provide a true account of the principles that guide scientific argument in so far as that argument is not, and does not purport to be, of a deductive kind. A confirmation theory should serve as a critical and explanatory instrument quite as much as do theories of deductive inference. Any successful confirmation...
Article
Full-text available
Using Gebharter’s representation, we consider aspects of the problem of discovering the structure of unmeasured submechanisms when the variables in those submechanisms have not been measured. Exploiting an early insight of Sober’s, we provide a correct algorithm for identifying latent, endogenous structure—submechanisms—for a restricted class of st...
Article
Full-text available
The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the...
Article
Reverse inference in cognitive neuropsychology has been characterized as inference to ‘psychological processes’ from ‘patterns of activation’ revealed by functional magnetic resonance or other scanning techniques. Several arguments have been provided against the possibility. Focusing on Machery’s ([2014]) presentation, we attempt to clarify the iss...
Article
Full-text available
Four theories proposing determinate relations of actual causation for Boolean networks are described and applied to 16 cases. All four theories are founded on the idea that actual causation is based on results that appropriate experimental interventions would produce. They differ in their accounts of the relevant kinds of experimental interventions...
Article
Full-text available
Machine learning methods to find graphical models of genetic regulatory networks from cDNA microarray data have become increasingly popular in recent years. We provide three reasons to question the reliability of such methods: (1) a major theoretical challenge to any method using conditional independence relations; (2) a simulation study using real...
Article
Recent literature in philosophy of science has addressed purported notions of explanatory virtues—‘explanatory power’, ‘unification’, and ‘coherence’. In each case, a probabilistic relation between a theory and data is said to measure the power of an explanation, or degree of unification, or degree of coherence. This essay argues that the measures...
Article
We consider several alternative ways of exploiting non-Gaussian distributional features, including some that can in principle identify direct, positive feedback relations (graphically, 2-cycles) and combinations of methods that can identify high dimensional graphs. All of the procedures are implemented in the TETRAD freeware (Ramsey et al., 2013)....
Article
Failing to engage in joint attention is a strong marker of impaired social cognition associated with autism spectrum disorder (ASD). The goal of this study was to localize the source of impaired joint attention in individuals with ASD by examining both behavioral and fMRI data collected during various tasks involving eye gaze, directional cuing, an...
Article
Halvorson argues through a series of examples and a general result due to Myers that the “semantic view” of theories has no available account of formal theoretical equivalence. De Bouvere provides criteria overlooked in Halvorson’s paper that are immune to his counterexamples and to the theorem he cites. Those criteria accord with a modest version...
Article
This paper (1)shows that the best supported current psychological theory (Cheng, 1997) of how human subjects judge the causal power or influence of variations in presence or absence of one feature on another, given data on their covariation, tacitly uses a Bayes network which is either a noisy or gate (for causes that promote the effect) or a noisy...
Article
It is "well known" that in linear models: (1) testable constraints on the marginal distribution of observed variables distinguish certain cases in which an unobserved cause jointly influences several observed variables; (2) the technique of "instrumental variables" sometimes permits an estimation of the influence of one variable on another even whe...
Article
Full-text available
Observed associations in a database may be due in whole or part to variations in unrecorded (latent) variables. Identifying such variables and their causal relationships with one another is a principal goal in many scientific and practical domains. Previous work shows that, given a partition of observed variables such that members of a class share...
Article
In an essay recently published in this journal, Branden Fitelson argues that a variant of Miller’s argument for the language dependence of the accuracy of predictions can be applied to Joyce’s notion of accuracy of credences formulated in terms of scoring rules, resulting in a general potential problem for Joyce’s argument for probabilism. We argue...
Article
Full-text available
We show that if any number of variables are allowed to be simultaneously and independently randomized in any one experiment, log2(N) + 1 experiments are sufficient and in the worst case necessary to determine the causal relations among N >= 2 variables when no latent variables, no sample selection bias and no feedback cycles are present. For all K,...
Conference Paper
We show that if any number of variables are allowed to be simultaneously and independently randomized in any one experiment, log_2(N) + 1 experiments are sufficient and in the worst case necessary to determine the causal relations among N ≥ 2 variables when no latent variables, no sample selection bias and no feedback cycles are present. For all K,...
Article
Various proposals have suggested that an adequate explanatory theory should reduce the number or the cardinality of the set of logically independent claims that need be accepted in order to entail a body of data. A (and perhaps the only) well-formed proposal of this kind is William Kneale’s: an explanatory theory should be finitely axiomatizable bu...
Chapter
Full-text available
Machine learning methods to find graphical models of genetic regulatory networks from cDNA microarray data have become increasingly popular in recent years. We provide three reasons to question the reliability of such methods: (1) a major theoretical challenge to any method using conditional independence relations; (2) a simulation study using real...
Article
Reichenbach states that an inductive logic cannot be built up entirely from logical principles independent of experience, but must develop out of the reasoning practiced and useful to the natural sciences. Inductive inference system needs to be built on some solid to guide scientific methodology. This chapter describes Reichenbach's reasons for sta...
Article
Bayesian psychology follows an old instrumentalist tradition most infamously illustrated by Osiander's preface to Copernicus's masterpiece. Jones & Love's (J&L's) criticisms are, if anything, understated, and their proposals overoptimistic.
Article
Lindquist and Sobel claim that the graphical causal models they call "agnostic" do not imply any counterfactual conditionals. They doubt that "causal effects" can be discovered using graphical causal models typical of SEMs, DCMs, Bayes nets, Granger causal models, etc. Each of these claims is false or exaggerated. They recommend instead that invest...
Article
Neumann et al. (2010) aim to find directed graphical representations of the independence and dependence relations among activities in brain regions by applying a search procedure to merged fMRI activity records from a large number of contrasts obtained under a variety of conditions. To that end, Neumann et al., obtain three graphical models, justif...
Article
Full-text available
We present evidence of cross-hybridization artifact intrinsic to spotted single-dye cDNA microarrays as a result of cDNA containing 5'-end sequences of consecutive thymidine (dT) residues. These poly(dT) tracts result from the synthesis, via oligo (dT) primed reverse transcription, of expressed sequence tags (EST) cDNA from a polyadenylated mRNA te...
Article
Full-text available
Various proposals have been made for understanding gene regulation through measurements of differential expression in wild type versus strains in which expression of specific genes has been suppressed or enhanced, as well as determining the most informative next experiment in a sequence. While the behavior of these algorithms has been investigated...
Article
Full-text available
To whom correspondence should be addressed. Motivation. One strategy for understanding gene regulation involves the differential measurement of expression in wild type versus strains in which expression of specific genes has been suppressed or enhanced. Various proposals have been made for optimizing the information obtained from such experiments a...
Article
Full-text available
We argue that current discussions of criteria for actual causation are ill-posed in several respects. (1) The methodology of current discussions is by induction from intuitions about an infinitesimal fraction of the possible examples and counterexamples; (2) cases with larger numbers of causes generate novel puzzles; (3) “neuron” and causal Bayes n...
Article
We agree with Cramer et al.'s goal of the discovery of causal relationships, but we argue that the authors' characterization of latent variable models (as deployed for such purposes) overlooks a wealth of extant possibilities. We provide a preliminary analysis of their data, using existing algorithms for causal inference and for the specification o...
Article
Nancy Cartwright's recent criticisms of efforts and methods to obtain causal information from sample data using automated search are considered. In addition to reviewing that effort, I argue that almost all of her criticisms are false and rest on misreading, overgeneralization, or neglect of the relevant literature. Introduction Cartwright's Clai...
Conference Paper
Identifying brain regions that activate when a subject is presented a stimulus or performs a task can be done by analyzing Functional Magnetic Resonance Imaging (f/IRl) data. Causal modeling methods can be used to discover causal relationships among activity in brain regions, or which regions of the brain influence which other regions during a task...
Article
Functional magnetic resonance imaging (fMRI) data have been used for identifying brain regions that activate when a subject is presented a stimulus or performs a task. Beyond identifying which regions of the brain are active during a task, it is also of interest to discover causal relationships among activity in those regions, that is, which region...
Article
In the applied statistical literature, causal relations are often described equivocally or euphemistically as 'risk factors', or as part of 'dimension reduction'. The statistical literature also tends to speak of 'statistical models' rather than of causal explanations, and to say that parameters of a model are 'interpretable', often means that the...
Chapter
Although both philosophers and scientists are interested in how to obtain reliable knowledge in the face of error, there is a gap between their perspectives that has been an obstacle to progress. By means of a series of exchanges between the editors and leaders from the philosophy of science, statistics and economics, this volume offers a cumulativ...
Article
Neuroimaging (e.g. fMRI) data are increasingly used to attempt to identify not only brain regions of interest (ROIs) that are especially active during perception, cognition, and action, but also the qualitative causal relations among activity in these regions (known as effective connectivity; Friston, 1994). Previous investigations and anatomical a...
Article
Pointwise consistent, feasible procedures for estimating contemporaneous linear causal structure from time series data have been developed using multiple conditional independence tests, but no such procedures are available for non-linear systems. We describe a feasible procedure for learning a class of non-linear time series structures, which we ca...
Article
Full-text available
The evidence that exposure to media violence causes later aggression derives largely from observational (nonexperimental) studies augmented by short-term experimental studies. The authors review some of the difficulties in causal inference from observational, longitudinal data; examine the extent to which these seem relevant to the empirical work o...
Conference Paper
In many domains, data are distributed among datasets that share only some vari- ables; other recorded variables may occur in only one dataset. While there are asymptotically correct, informative algorithms for disco vering causal relation- ships from a single dataset, even with missing values and hidden variables, there have been no such reliable p...
Article
In everyday matters, as well as in law, we allow that someone's reasons can be causes of her actions, and often are. That correct reasoning accords with Bayesian principles is now so widely held in philosophy, psychology, computer science and elsewhere that the contrary is beginning to seem obtuse, or at best quaint. And that rational agents should...
Article
The conditional intervention principle is a formal principle that relates patterns of interventions and outcomes to causal structure. It is a central assumption of experimental design and the causal Bayes net formalism. Two studies suggest that preschoolers can use the conditional intervention principle to distinguish causal chains, common cause an...
Chapter
This book outlines the recent revolutionary work in cognitive science formulating a “probabilistic model” theory of learning and development. It provides an accessible and clear introduction to the probabilistic modeling in psychology, including causal model, Bayes net, and Bayesian approaches. It also outlines new cognitive and developmental psych...
Article
Full-text available
We discuss our concerns regarding the reliability of data generated by spotted cDNA microarrays. Two types of error we highlight are cross-hybridization artifact due to sequence homologies and sequence errors in the cDNA used for spotting on microarrays. We feel that statisticians who analyze microarray data should be aware of these sources of unre...
Chapter
Statistical inference turns on trade-offs among conflicting assumptions that provide stronger or weaker guarantees of convergence to the truth, and stronger or weaker measures of uncertainty of inference. In applied statistics—social statistics, epidemiology, economics—these assumptions are often hidden within computerized data analysis procedures,...