Article

Causality: Models, reasoning, and inference, second edition

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Written by one of the preeminent researchers in the field, this book provides a comprehensive exposition of modern analysis of causation. It shows how causality has grown from a nebulous concept into a mathematical theory with significant applications in the fields of statistics, artificial intelligence, economics, philosophy, cognitive science, and the health and social sciences. Judea Pearl presents and unifies the probabilistic, manipulative, counterfactual, and structural approaches to causation and devises simple mathematical tools for studying the relationships between causal connections and statistical associations. The book will open the way for including causal analysis in the standard curricula of statistics, artificial intelligence, business, epidemiology, social sciences, and economics. Students in these fields will find natural models, simple inferential procedures, and precise mathematical definitions of causal concepts that traditional texts have evaded or made unduly complicated. The first edition of Causality has led to a paradigmatic change in the way that causality is treated in statistics, philosophy, computer science, social science, and economics. Cited in more than 5,000 scientific publications, it continues to liberate scientists from the traditional molds of statistical thinking. In this revised edition, Judea Pearl elucidates thorny issues, answers readers’ questions, and offers a panoramic view of recent advances in this field of research. Causality will be of interests to students and professionals in a wide variety of fields. Anyone who wishes to elucidate meaningful relationships from data, predict effects of actions and policies, assess explanations of reported events, or form theories of causal understanding and causal speech will find this book stimulating and invaluable.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Further, we allow for the existence of unobserved variables H ∈ R m (also known as hidden confounders) affecting both, X and Y . Due to the hidden confounders, the observed relationship between X and Y is prone to be biased, even when the sample size approaches infinity [Pearl, 2009]. The instrumental variable methods share the idea of exploiting the existence of exogenous heterogeneity (the instrument Z) to consistently estimate the causal function in the presence of unmeasured confounders. ...
... The IV method requires the instrument Z ∈ R q meeting the following assumptions [Pearl, 2009], whose precise formulation depends on the specific methodological framework 1 : (A1 ) relevance: the instrument Z is not independent of the treatment variable X. ...
... Note that the latter assumption is not testable since H is unobserved, and has therefore to be made based on scientific considerations and expert knowledge. It is common to assume that the data generating process follows a structural causal model (SCM) [Pearl, 2009], implying that the data distribution is Markovian with respect to the induced graph. ...
Preprint
Full-text available
The instrumental variable (IV) approach is commonly used to infer causal effects in the presence of unmeasured confounding. Conventional IV models commonly make the additive noise assumption, which is hard to ensure in practice, but also typically lack flexibility if the causal effects are complex. Further, the vast majority of the existing methods aims to estimate the mean causal effects only, a few other methods focus on the quantile effects. This work aims for estimation of the entire interventional distribution. We propose a novel method called distributional instrumental variables (DIV), which leverages generative modelling in a nonlinear instrumental variable setting. We establish identifiability of the interventional distribution under general assumptions and demonstrate an `under-identified' case where DIV can identify the causal effects while two-step least squares fails to. Our empirical results show that the DIV method performs well for a broad range of simulated data, exhibiting advantages over existing IV approaches in terms of the identifiability and estimation error of the mean or quantile treatment effects. Furthermore, we apply DIV to an economic data set to examine the causal relation between institutional quality and economic development and our results that closely align with the original study. We also apply DIV to a single-cell data set, where we study the generalizability and stability in predicting gene expression under unseen interventions. The software implementations of DIV are available in R and Python.
... Beyond contextual bandits, the variance reduction properties of the MR estimator make it highly useful in a wide variety of other applications. Here, we show one such application in the field of causal inference, where MR can be used for the estimation of average treatment effect (ATE) [Pearl, 2009] and leads to some desirable properties in comparison to the conventional ATE estimation approaches. Specifically, we illustrate that the MR estimator for ATE utilizes the evaluation data D more efficiently and achieves lower variance than state-of-the-art ATE estimators and consequently provides more accurate ATE estimates. ...
... We formulate twin assessment as a problem of causal inference [Rubin, 1974, Pearl, 2009, Hernán and Robins, 2020. In particular, we consider a twin to be accurate if it correctly captures the behaviour of a real-world process of interest in response to certain interventions, rather than the behaviour of the process as it evolves on its own. ...
... We begin by providing a causal model the real-world process that the twin is designed to simulate. We do so in the language of potential outcomes [Rubin, 1974[Rubin, , 2005, although we note that we could have used the alternative framework of directed acyclic graphs and structural causal models [Pearl, 2009] (see also Imbens [2020] for a comparison of the two). We assume the real-world process operates over a fixed time horizon T ∈ {1, 2, . . ...
Preprint
Full-text available
Off-policy evaluation (OPE) is a critical challenge in robust decision-making that seeks to assess the performance of a new policy using data collected under a different policy. However, the existing OPE methodologies suffer from several limitations arising from statistical uncertainty as well as causal considerations. In this thesis, we address these limitations by presenting three different works. Firstly, we consider the problem of high variance in the importance-sampling-based OPE estimators. We introduce the Marginal Ratio (MR) estimator, a novel OPE method that reduces variance by focusing on the marginal distribution of outcomes rather than direct policy shifts, improving robustness in contextual bandits. Next, we propose Conformal Off-Policy Prediction (COPP), a principled approach for uncertainty quantification in OPE that provides finite-sample predictive intervals, ensuring robust decision-making in risk-sensitive applications. Finally, we address causal unidentifiability in off-policy decision-making by developing novel bounds for sequential decision settings, which remain valid under arbitrary unmeasured confounding. We apply these bounds to assess the reliability of digital twin models, introducing a falsification framework to identify scenarios where model predictions diverge from real-world behaviour. Our contributions provide new insights into robust decision-making under uncertainty and establish principled methods for evaluating policies in both static and dynamic settings.
... To achieve this goal, the framework of causal models was formulated for rigorously connecting cause-effect relations and observable data between random variables. Originating in the context of classical statistics, this formalism has found diverse applications in data-driven fields such as machine learning, economics, biological systems and medical trials [RCLH21,KH11,Pea09,Spi05,PL14,AHK20,LKK21]. ...
... Classical causal modeling frameworks involve a representation through directed graphs, whose vertices are associated with random variables and edges with causal relations, and incorporate causal mechanisms, such as functional dependencies between variables. More specifically, these are called functional causal models (fCMs), as well as structural equation/causal models. 1 The majority of the causal modeling literature focuses on acyclic graphs, where there exists a well-defined probability rule to evaluate correlations over observed variables for all causal mechanisms and crucial graph-theoretic properties, such as the d-separation theorem [VP90, GVP90,Pea09], hold. This theorem is a cornerstone for causal discovery and inference methods as it allows to tightly relate observed conditional independences to the connectivity of the causal graph. ...
... In the special case of functional models on acyclic graphs, the probability rule is given as follows. Note that this definition is standard in the literature [Pea09,Spi05] ...
Preprint
Full-text available
Functional causal models (fCMs) specify functional dependencies between random variables associated to the vertices of a graph. In directed acyclic graphs (DAGs), fCMs are well-understood: a unique probability distribution on the random variables can be easily specified, and a crucial graph-separation result called the d-separation theorem allows one to characterize conditional independences between the variables. However, fCMs on cyclic graphs pose challenges due to the absence of a systematic way to assign a unique probability distribution to the fCM's variables, the failure of the d-separation theorem, and lack of a generalization of this theorem that is applicable to all consistent cyclic fCMs. In this work, we develop a causal modeling framework applicable to all cyclic fCMs involving finite-cardinality variables, except inconsistent ones admitting no solutions. Our probability rule assigns a unique distribution even to non-uniquely solvable cyclic fCMs and reduces to the known rule for uniquely solvable fCMs. We identify a class of fCMs, called averagely uniquely solvable, that we show to be the largest class where the probabilities admit a Markov factorization. Furthermore, we introduce a new graph-separation property, p-separation, and prove this to be sound and complete for all consistent finite-cardinality cyclic fCMs while recovering the d-separation theorem for DAGs. These results are obtained by considering classical post-selected teleportation protocols inspired by analogous protocols in quantum information theory. We discuss further avenues for exploration, linking in particular problems in cyclic fCMs and in quantum causality.
... Causal models provide a rigorous framework for describing correlations arising from a causal structure and identifying which structures are compatible with observed data. Classical causal modelling [Pea09,SGS93], captures causal relationships among random variables and has been widely applied across fields such as machine learning, economics, and clinical trials [RCLH21,KH11,Pea09,Spi05,PL14,AHK20,LKK21]. However, as demonstrated by Bell's theorem [Bel64], this classical framework cannot account for quantum correlations without invoking fine-tuned mechanisms or modifications to the causal structure naturally associated with a Bell scenario [WS15]. ...
... Causal models provide a rigorous framework for describing correlations arising from a causal structure and identifying which structures are compatible with observed data. Classical causal modelling [Pea09,SGS93], captures causal relationships among random variables and has been widely applied across fields such as machine learning, economics, and clinical trials [RCLH21,KH11,Pea09,Spi05,PL14,AHK20,LKK21]. However, as demonstrated by Bell's theorem [Bel64], this classical framework cannot account for quantum correlations without invoking fine-tuned mechanisms or modifications to the causal structure naturally associated with a Bell scenario [WS15]. ...
... The classical and non-classical causal modelling literature have predominantly focused on acyclic graphs, where there exists a well-defined probability rule for deriving correlations from causal mechanisms [Pea09,HLP14,BLO20]. Moreover, a foundational and powerful result in acyclic causal models (both classical and non-classical) is the d-separation theorem [VP90,GVP90,HLP14], proving the soundness and completeness of a central graph-theoretic notion, d-separation [Pea09,Spi05]. ...
Preprint
Full-text available
Causal modelling frameworks link observable correlations to causal explanations, which is a crucial aspect of science. These models represent causal relationships through directed graphs, with vertices and edges denoting systems and transformations within a theory. Most studies focus on acyclic causal graphs, where well-defined probability rules and powerful graph-theoretic properties like the d-separation theorem apply. However, understanding complex feedback processes and exotic fundamental scenarios with causal loops requires cyclic causal models, where such results do not generally hold. While progress has been made in classical cyclic causal models, challenges remain in uniquely fixing probability distributions and identifying graph-separation properties applicable in general cyclic models. In cyclic quantum scenarios, existing frameworks have focussed on a subset of possible cyclic causal scenarios, with graph-separation properties yet unexplored. This work proposes a framework applicable to all consistent quantum and classical cyclic causal models on finite-dimensional systems. We address these challenges by introducing a robust probability rule and a novel graph-separation property, p-separation, which we prove to be sound and complete for all such models. Our approach maps cyclic causal models to acyclic ones with post-selection, leveraging the post-selected quantum teleportation protocol. We characterize these protocols and their success probabilities along the way. We also establish connections between this formalism and other classical and quantum frameworks to inform a more unified perspective on causality. This provides a foundation for more general cyclic causal discovery algorithms and to systematically extend open problems and techniques from acyclic informational networks (e.g., certification of non-classicality) to cyclic causal structures and networks.
... Relationships between variables can be depicted, using Pearl's approach [5,6], as a time series graph. Each dynamic variable at a specific time is represented by a node and an arrow from one node to another represents a direct causal link. ...
... We generalize the principle of Eq. (2) by stating that X α is a direct or indirect cause of X β when the α-component ofŵ β , denoted byŵ α β , is nonzero. We repeat the optimization in Eq. (5) for several values of τ between 1 and τ max , where τ max is a hyper parameter which, in applications, we take of the order of the autocorrelation time of X β . As depicted in Fig. 1 ...
... Using the graphical criterion of d-separation [5,9], conditional independencies of the form given in Eq. (S4) can be directly inferred from the structure of the time series graph. All the methods discussed in the following paragraphs are also based on the assumption that every measurable conditional independence corresponds to d-separation among the variables in the graph. ...
Preprint
Full-text available
Understanding which parts of a dynamical system cause each other is extremely relevant in fundamental and applied sciences. However, inferring causal links from observational data, namely without direct manipulations of the system, is still computationally challenging, especially if the data are high-dimensional. In this study we introduce a framework for constructing causal graphs from high-dimensional time series, whose computational cost scales linearly with the number of variables. The approach is based on the automatic identification of dynamical communities, groups of variables which mutually influence each other and can therefore be described as a single node in a causal graph. These communities are efficiently identified by optimizing the Information Imbalance, a statistical quantity that assigns a weight to each putative causal variable based on its information content relative to a target variable. The communities are then ordered starting from the fully autonomous ones, whose evolution is independent from all the others, to those that are progressively dependent on other communities, building in this manner a community causal graph. We demonstrate the computational efficiency and the accuracy of our approach on time-discrete and time-continuous dynamical systems including up to 80 variables.
... Two important features of DAGs that can inform causal reasoning (compared to other graphical models such as Gaussian graphical models) are: (a) edges (i.e., arrows) are "directed," so a directed edge indicates that variable X has a direct causal effect on variable Y; and (b) since the future cannot cause the past (VanderWeele, 2020), the graph is "acyclic" so the variable X t (at time t) has a direct causal effect on the variable Y t + 1 (at time t + 1) but the variable Y t + 1 cannot have a direct causal effect on the variable X t (for an introduction to DAGs, please refer to Pearl, 2009;Rohrer, 2018). Furthermore, the acyclicity also requires that there are no paths along the directed edges from a node to itself (Hernán & Robins, 2020). ...
... There are certain limitations regarding establishing the DAG theoretically, relying solely on expert knowledge. First, establishing the DAG is notably challenging due to the large number of possible causal structures (i.e., all causal effects established between a set of variables, Pearl, 2009). For example, when 10 variables are evaluated in a study, there are 4.2 × 10 18 possible DAGs to be considered so one can be chosen . ...
Article
Full-text available
Adolescence is a period in which peer problems and emotional symptoms markedly increase in prevalence. However, the causal mechanisms regarding how peer problems cause emotional symptoms at a behavioral level and vice versa remain unknown. To address this gap, the present study investigated the longitudinal network of peer problems and emotional symptoms among Australian adolescents aged 12–14 years. Data were from the Longitudinal Study of Australian Children. The complete case samples included adolescents who participated in the Longitudinal Study of Australian Children B (n = 2,694) or K (n = 3,144) Cohorts at two study follow-ups (ages 12 and 14). Peer problems and emotional symptoms were measured with the self-report Strengths and Difficulties Questionnaire. The analytical steps were (a) in Study 1, a causal discovery algorithm, Bayesian structure learning of directed acyclic graphs (DAGs), was used to identify the longitudinal network in the K Cohort; (b) the DAG discovered was evaluated with Bayesian structural equation modeling in an independent sample (the B Cohort) and compared against a DAG established through expert knowledge; and (c) in Study 2, the longitudinal network was again evaluated but considered contemporaneous effects. The empirically discovered DAG provided a better explanation of independent data than the expert DAG. Based on the discovered DAG, several plausible causal effects were identified such as that being bullied at age 12 negatively affected popularity at age 14. This study provides new insights into potential causal effects established between peer problems and emotional symptoms among Australian adolescents aged 12–14 years.
... In fact, this relationship is a very simple one: it can be seen as a counterfactual! The "why," "what if", and other such "wh"-questions can be interpreted as counterfactuals [31]. In the static setting, we are simply interested in an alterative world where some properties are different. ...
Preprint
There has been considerable recent interest in explainability in AI, especially with black-box machine learning models. As correctly observed by the planning community, when the application at hand is not a single-shot decision or prediction, but a sequence of actions that depend on observations, a richer notion of explanations are desirable. In this paper, we look to provide a formal account of ``counterfactual explanations," based in terms of action sequences. We then show that this naturally leads to an account of model reconciliation, which might take the form of the user correcting the agent's model, or suggesting actions to the agent's plan. For this, we will need to articulate what is true versus what is known, and we appeal to a modal fragment of the situation calculus to formalise these intuitions. We consider various settings: the agent knowing partial truths, weakened truths and having false beliefs, and show that our definitions easily generalize to these different settings.
... In fact, this relationship is a very simple one: it can be seen as a counterfactual! The "why," "what if", and other such "wh"-questions can be interpreted as counterfactuals [31]. In the static setting, we are simply interested in an alterative world where some properties are different. ...
... The tasks of quantifying and detecting conditional dependence underlie a wide range of statistical problems, including limit theorems, Markov chain theory, sufficiency, and causality [76], among others. Conditional independence also plays a pivotal role in graphical models [77], causal inference [78], and artificial intelligence [79]; see also [80] for more recent developments. The notion of treating conditional independence as an abstract concept with a dedicated calculus was first introduced by [76], who demonstrated that many results and theorems related to statistical ideas (such as ancillarity, sufficiency, and causality) can be viewed as specific applications of the fundamental properties of conditional independence, extended to include both stochastic and non-stochastic variables. ...
Article
Full-text available
The main aim of this paper is to improve the existing limit theorems for set-indexed conditional empirical processes involving functional strong mixing random variables. To achieve this, we propose using the k-nearest neighbor approach to estimate the regression function, as opposed to the traditional kernel method. For the first time, we establish the weak consistency, asymptotic normality, and density of the proposed estimator. Our results are derived under certain assumptions about the richness of the index class C , specifically in terms of metric entropy with bracketing. This work builds upon our previous papers, which focused on the technical performance of empirical process methodologies, and further refines the prior estimator. We highlight that the k-nearest neighbor method outperforms the classical approach due to several advantages.
Article
Real‐world data (RWD) and real‐world evidence (RWE) have been increasingly used in medical product development and regulatory decision‐making, especially for rare diseases. After outlining the challenges and possible strategies to address the challenges in rare disease drug development (see the accompanying paper), the Real‐World Evidence (RWE) Scientific Working Group of the American Statistical Association Biopharmaceutical Section reviews the roles of RWD and RWE in clinical trials for drugs treating rare diseases. This paper summarizes relevant guidance documents and frameworks by selected regulatory agencies and the current practice on the use of RWD and RWE in natural history studies and the design, conduct, and analysis of rare disease clinical trials. A targeted learning roadmap for rare disease trials is described, followed by case studies on the use of RWD and RWE to support a natural history study and marketing applications in various settings.
Article
Full-text available
The sex of children has frequently been used in the field of social sciences to conduct natural experiments. The key hypothesis behind this methodology is that the sex of the firstborn (or the first k births) is an exogenous variable, meaning it is not influenced by family characteristics observed before the birth of the children. Recent analyses have questioned the supposed exogeneity of the sex of children, proposing that stress experienced during pregnancy results in higher male embryo mortality, thereby leading to a higher probability of female births. This hypothesis casts doubt on the results reached by studies that have used the sex of children to conduct natural experiments. However, the analyses supporting this hypothesis have not properly considered the problems arising from stopping rules, specifically the tendency of some families to continue having children until a child of the desired sex is born. In this work, we show, using an indirect approach, that if stopping rules are properly taken into account, the sex of offspring is not associated to parental stress.
Article
Effective management of a firm’s operating cash flow is essential for supporting growth, servicing debt, and maintaining overall financial health. Mismanagement of cash flows can result in severe liquidity challenges and even business failure. However, managing operating cash flow is complex because of its intricate, endogenous relationships with operational variables, like sales, operating costs, inventory, payables, and the impact of exogenous macroeconomic factors on a firm. In this paper, we present a structural model of operating cash flow that untangles this endogeneity, allows us to estimate causal relationships among these variables, and provides a valuable tool for evaluating cash flow management policies. Applying our model to quarterly financial data from S&P’s Compustat database spanning from 1990 to 2020 along with macroeconomic indicators, we provide empirical evidence of the endogenous nature of cash flow with other operational variables. We then showcase the practical value of our model by (i) identifying the characteristics of structural shocks and the new equilibria they induce within the system; (ii) offering a tool for evaluating alternative managerial actions or policy decisions to counteract these shocks; (iii) predicting the impacts of macroeconomic events, such as global recessions and fluctuations in economic sentiment, on firm performance; and (iv) demonstrating superior forecasting performance compared with traditional univariate models. In summary, our structural model of operating cash flow enhances our understanding of its dynamics, enabling better-informed decision making and more effective cash flow management in firms. This paper was accepted by David Simchi Levi, operations management. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2021.03790 .
Article
Full-text available
Many scientific datasets are compositional in nature. Important biological examples include species abundances in ecology, cell-type compositions derived from single-cell sequencing data, and amplicon abundance data in microbiome research. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause. First, we crisply articulate potential pitfalls for practitioners regarding the interpretation of compositional causes from the viewpoint of interventions and warn against attributing causal meaning to common summary statistics such as diversity indices in microbiome data analysis. We then advocate for and develop multivariate methods using statistical data transformations and regression techniques that take the special structure of the compositional sample space into account while still yielding scientifically interpretable results. In a comparative analysis on synthetic and real microbiome data we show the advantages and limitations of our proposal. We posit that our analysis provides a useful framework and guidance for valid and informative cause-effect estimation in the context of compositional data.
Article
Full-text available
The study of causal inference has gained significant attention in artificial intelligence (AI) and machine learning (ML), particularly in areas such as explainability, automated diagnostics, reinforcement learning, and transfer learning.. This research applies causal inference techniques to analyze student placement data, aiming to establish cause-and-effect relationships rather than mere correlations. Using the DoWhy Python library, the study follows a structured four-step approach—Modeling, Identification, Estimation, and Refutation—and introduces a novel 3D framework (Data Correlation, Causal Discovery, and Domain Knowledge) to enhance causal modeling reliability. Causal discovery algorithms, including Peter Clark (PC), Greedy Equivalence Search (GES), and Linear Non-Gaussian Acyclic Model (LiNGAM), are applied to construct and validate a robust causal model. Results indicate that internships (0.155) and academic branch selection (0.148) are the most influential factors in student placements, while CGPA (0.042), projects (0.035), and employability skills (0.016) have moderate effects, and extracurricular activities (0.004) and MOOCs courses (0.012) exhibit minimal impact. This research underscores the significance of causal reasoning in higher education analytics and highlights the effectiveness of causal ML techniques in real-world decision-making. Future work may explore larger datasets, integrate additional educational variables, and extend this approach to other academic disciplines for broader applicability.
Chapter
Full-text available
Job polarization is a phenomenon observed in developed countries. According to this hypothesis, there is a greater demand for work involving abstract and manual tasks, while routine tasks are displaced due to technological substitution. This research tests the hypothesis that in the case of Mexico, this phenomenon does not occur because the technology used is complementary to routine tasks rather than a substitute. The objective of this study is to analyze the composition of routine employment. The first methodological approach involves describing four-digit occupations that perform routine tasks and examining their growth during the studied period. Subsequently, a shift-share analysis is conducted to observe the growth of these occupations by industry. To achieve these objectives, data from the National Survey of Occupation and Employment (ENOE, for its acronym in Spanish) for the period 2005–2020 is utilized. The main results demonstrate that occupations such as industrial machinery operators, assemblers, drivers, and transportation drivers, which are classified as routine tasks, have experienced significant growth. This contradicts what is observed in developed countries, supporting the existence of a different hypothesis. It is concluded that characteristics of the Mexican labor market, such as informal employment, precarious jobs, technological dependence, and an increase in the supply of professionals, may contribute to a routine-oriented labor market.
Article
Full-text available
Objective This study aimed to explore the complex relationships between self‐identity, affective style, emotion regulation, and intolerance of uncertainty (IU) in predicting anxiety. A model was proposed to integrate these factors, investigating their combined influence on anxiety. Method Involving 608 university students who completed self‐report measures of self‐identity, affective style, emotion regulation, IU, and anxiety. Network analysis and Bayesian network modeling were used to identify direct and mediating effects among these variables. Results Network analysis revealed that self‐identity, affective style, and IU directly predicted trait anxiety, with adjusting affective style emerging as a central factor. Bayesian network modeling further showed that IU and affective style mediated the impact of self‐identity on anxiety. Notably, emotion regulation did not mediate the relationship between affective style and anxiety, suggesting a possible spurious correlation. The model achieved a predictive accuracy of 90.13% for trait anxiety and 88.49% for state anxiety. Conclusion The findings highlight the central role of self‐identity in anxiety interventions, while also emphasizing the importance of addressing affective styles and IU. The results suggest that emotion regulation strategies alone may not directly reduce anxiety, indicating a need for more comprehensive clinical approaches.
Article
Full-text available
The urban digital twin (UDT) is derived from the original digital-twin concept of a representation of physical assets. This has left the social component of the city underrepresented in UDTs. Here, we discuss what this means for the current maturity stage of UDTs and why better representing human behaviour in UDTs may diversify possibilities to support different types of planning. We contemplate operationalizing the representation of human behaviour by means of agent-based models (ABMs) integrated with UDTs and illustrate this with two concrete examples of simulating stress and safety perception in public spaces. One example shows the idea of the UDT as a live data repository for ABMs, with the ABM adding dynamism, and the other of live feedback between the city, the ABM and UDT. We discuss several epistemological, conceptual, technical, and ethical challenges that may be involved in this integration. We conclude with a future agenda to promote (1) the abandonment of the vision of a UDT as the highly detailed mirror of the city, (2) UDTs fit for sectoral (strategic) in addition to operational planning, (3) the inclusion of behavioural and social processes in UDTs by incorporating ABMs, (4) a culture of cumulative research using structured guided frameworks and reusable building blocks, (5) ABMs with explicit purposes to allow fit-for-purpose selection in UDTs, and (6) explicitly addressing epistemic, normative, and moral responsibilities. Thus, though including agents may at some point be a solution for the (currently lacking) perspective on the role of humans in shaping and being shaped by the city, several reconsiderations in the UDT and ABM communities need to take place first.
Article
Full-text available
Investigating causal interactions between entities is a crucial task across various scientific domains. The traditional causal discovery methods often assume a predetermined causal direction, which is problematic when prior knowledge is insufficient. Identifying causal directions from observational data remains a key challenge. Causal discovery typically relies on two priors: the uniform prior and the Solomonoff prior. The Solomonoff prior theoretically outperforms the uniform prior in determining causal directions in bivariate scenarios by using the causal independence mechanism assumption. However, this approach has two main issues: it assumes that no unobserved variables affect the outcome, leading to method failure if violated, and it relies on the uncomputable Kolmogorov complexity (KC). In addition, we employ Kolmogorov’s structure function to analyze the use of the minimum description length (MDL) as an approximation for KC, which shows that the function class used for computing the MDL introduces prior biases, increasing the risk of misclassification. Inspired by the insufficient but necessary part of an unnecessary but sufficient condition (INUS condition), we propose an asymmetry where the expected complexity change in the cause, due to changes in the effect, is greater than the reverse. This criterion supplements the causal independence mechanism when its restrictive conditions are not met under the Solomonoff prior. To mitigate prior bias and reduce misclassification risk, we introduce a multilayer perceptron based on the universal approximation theorem as the backbone network, enhancing method stability. Our approach demonstrates a competitive performance against the SOTA methods on the TCEP real dataset. Additionally, the results on synthetic datasets show that our method maintains stability across various data generation mechanisms and noise distributions. This work advances causal direction determination research by addressing the limitations of the existing methods and offering a more robust and stable approach.
Article
data Mendelian randomization (MR), a widely used approach in causal inference, has recently attracted attention for improving causal mediation analysis. Two existing methods corresponding to the difference method and product method of linear mediation analysis have been developed to perform MR‐based mediation analysis using the inverse‐variance weighted method (MR‐IVW). Despite these developments, there is still a need for more rigorous, efficient, and precise MR‐based mediation methodologies. In this study, we develop summary‐data MR‐based frameworks for causal mediation analysis. We improve the accuracy, statistical efficiency and robustness of the existing MR‐based mediation analysis by implementing novel variance estimators for the mediation effects, deriving rigorous procedures for statistical inference, and accounting for widespread pleiotropic effects. Specifically, we propose Diff‐IVW and Prod‐IVW to improve upon the existing methods and provide the pleiotropy‐robust methods (Diff‐Egger, Diff‐Median, Prod‐Egger, and Prod‐Median), adapted from MR‐Egger and MR‐Median, to enhance the robustness of the MR‐based mediation analysis. We conduct comprehensive simulation studies to compare the existing and proposed methods. The results show that the proposed methods, Diff‐IVW and Prod‐IVW, improve statistical efficiency and type I error control over the existing approaches. Although all IVW‐based methods suffer from directional pleiotropy biases, the median‐based methods (Diff‐Median and Prod‐Median) can mitigate such biases. The differences among the methods can lead to discrepant statistical conclusions as demonstrated in real data applications. Based on our simulation results, we recommend the three proposed methods in practice: Diff‐IVW, Prod‐IVW, and Prod‐Median, which are complementary under various scenarios.
Article
Full-text available
open access: https://www.tandfonline.com/doi/full/10.1080/1091367X.2025.2462661 The purpose of this study was to improve measurement of physical activity self-efficacy (PASE) in adults with obesity by evaluating relationships between responses to a new battery of PASE Scales and physical activity (PA). Specifically, relationships between a recently proposed two-dimensional (shorter and longer durations) latent PASE conceptualization and PA was evaluated across intensities (moderate, vigorous) and domains (work, transport, domestic, leisure) measured by the PASE Scales. Longitudinal secondary data (N = 461) from the Well-Being and Physical Activity (ClinicalTrials.gov, identifier: NCT03194854) study, which deployed the Fun For Wellness (FFW) intervention, were analyzed. A positive direct effect (latent PASE→PA) was observed for the shorter (but not the longer) duration in each domain at both intensities. A positive indirect effect (FFW→latent PASE→PA) was observed for the shorter (but not the longer) duration in each domain at a moderate (but not vigorous) intensity. These results have implications for PA-promotion in this at-risk population.
Article
Visual knowledge is a new form of knowledge representation that can encapsulate visual concepts and their relations in a succinct, comprehensive, and interpretable manner, with a deep root in cognitive psychology. As the knowledge of the visual world has been identified as an indispensable component of human cognition and intelligence, visual knowledge is poised to have a pivotal role in establishing machine intelligence. With the recent advance of artificial intelligence (AI) techniques, large AI models (or foundation models) have emerged as a potent tool capable of extracting versatile patterns from broad data as implicit knowledge, and abstracting them into an outrageous amount of numeric parameters. To pave the way for creating visual knowledge empowered AI machines in this coming wave, we present a timely review that investigates the origins and development of visual knowledge in the pre-big-model era, and accentuates the opportunities and unique role of visual knowledge in the big model era.
Conference Paper
Raciocinar sobre relações de causa e efeito é um passo recorrente no processo humano de tomada de decisão. Não é raro que enfrentemos problemas que são consequências de outros problemas, e que só podem ser considerados solucionados quando suas causas também o forem. Em Ciência da Computação, diversas instâncias desse cenário podem ser identificadas. Apesar de propostas existentes serem capazes de solucionar consequências, bem como endereçar causas adequadamente, o raciocínio sobre relações causais é implementado manualmente e de forma específica para cada aplicação. Neste artigo, discutimos o uso de agentes de software como uma alternativa possível capaz de lidar com tal cenário, e apresentamos uma proposta para uma solução independente de domínio utilizando a arquitetura BDI (Belief-Desire-Intention).
Chapter
In natural language, conditionals are frequently used for giving explanations. Thus the antecedent of a conditional is typically understood as being connected to, being relevant for, or providing evidential support for the conditional’s consequent. This aspect has not been adequately mirrored by the logics that are usually offered for the reasoning with conditionals: neither in the logic of the material conditional or the strict conditional, nor in the plethora of logics for suppositional conditionals that have been produced over the past 50 years. In this paper I survey some recent attempts to come to terms with the problem of encoding evidential support or relevance in the logic of conditionals. I present models in a qualitative-modal and in a quantitative-probabilistic setting. Focusing on some particular examples, I show that no perfect match between the two kinds of settings has been achieved yet.
Book
Causation in Physics demonstrates the importance of causation in the physical world. It details why causal mastery of natural phenomena is an important part of the effective strategies of experimental physicists. It develops three novel arguments for the viewpoint that causation is indispensable to the ontology of some of our best physical theories. All three arguments make much of the successes of experimental physics. This title is also available as Open Access on Cambridge Core.
Article
The existing stance detection methods have several limitations. (1) They utilize additional information such as sentiment or linguistic information to construct multitasks framework for achieving performance bottleneck breakthroughs and only use the last encoder layers of their language models for semantic encoding. (2) They may establish spurious correlations between the input and labels based on their statistical dependence, causing the produced text-target representations to contain more biased stance-unrelated noncausal features; these features are used as shortcuts for conducting stance prediction without considering the underlying causal mechanisms, resulting in a performance bottleneck. In this paper, we propose a novel multitask causality-inspired feature enhancement ( MTCIFE ) method for stance detection, achieving a performance breakthrough based on an original dataset without incorporating additional information. In MTCIFE , we propose two different stance detection tasks, an auxiliary stance detection task (ASDT) and a main stance detection task (MSDT), in which we construct the relation between text and a target in different ways by using different transformer encoder layers of bidirectional encoder representations from transformers (BERT); this augments the diversity of the obtained representations and makes them learn from each other. Then, we explicitly decouple the representations of the stance-related causal and stance-unrelated noncausal features and encourage their independence in both the ASDT and MSDT. Considering the underlying causal mechanisms, we propose a causality-inspired feature enhancement (CIFE) module for implementing causal learning and intervention at the feature level. By integrating the CIFE module into both the ASDT and MSDT via a multitask learning paradigm, we aim to learn the representations of stance-related causal and stance-unrelated noncausal features in a decoupled manner to acquire better causal features while mitigating the confounding effect of noncausal features. Extensive experiments conducted on a stance detection benchmark dataset demonstrate the outstanding performance of our model over other state-of-the-art approaches.
Article
Instrumental variables (IV) analysis is a powerful, but fragile, tool for drawing causal inferences from observational data. Sociologists increasingly turn to this strategy in settings where unmeasured confounding between the treatment and outcome is likely. This paper reviews the assumptions required for IV and the consequences of violating them, focusing on sociological applications. We highlight three methodological problems IV faces: (i) identification bias, an asymptotic bias from assumption violations; (ii) estimation bias, a finite-sample bias that persists even when assumptions hold; and (iii) type-M error, the exaggeration of effect size given statistical significance. In each case, we emphasize how weak instruments exacerbate these problems and make results sensitive to minor violations of assumptions. We survey IV papers from top sociology journals, finding that assumptions often go unstated and robust uncertainty measures are rarely used. We provide a practical checklist to show how IV, despite its fragility, can still be useful when handled with care.
Preprint
Full-text available
There have been several efforts to improve Novelty Detection (ND) performance. However, ND methods often suffer significant performance drops under minor distribution shifts caused by changes in the environment, known as style shifts. This challenge arises from the ND setup, where the absence of out-of-distribution (OOD) samples during training causes the detector to be biased toward the dominant style features in the in-distribution (ID) data. As a result, the model mistakenly learns to correlate style with core features, using this shortcut for detection. Robust ND is crucial for real-world applications like autonomous driving and medical imaging, where test samples may have different styles than the training data. Motivated by this, we propose a robust ND method that crafts an auxiliary OOD set with style features similar to the ID set but with different core features. Then, a task-based knowledge distillation strategy is utilized to distinguish core features from style features and help our model rely on core features for discriminating crafted OOD and ID sets. We verified the effectiveness of our method through extensive experimental evaluations on several datasets, including synthetic and real-world benchmarks, against nine different ND methods.
Chapter
Full-text available
We introduce the concept of Automated Causal Discovery (AutoCD), defined as any system that aims to fully automate the application of causal discovery and causal reasoning methods. AutoCD’s goal is to deliver all causal information that an expert human analyst would provide and answer user’s causal queries. To this goal, we introduce ETIA, a system that performs dimensionality reduction, causal structure learning, and causal reasoning. We present the architecture of ETIA, benchmark its performance on synthetic data sets, and present a use case example. The system is general and can be applied to a plethora of causal discovery problems.
Article
Full-text available
This paper considers the out-of-distribution (OOD) generalization problem under the setting that both style distribution shift and spurious features exist and domain labels are missing. This setting frequently arises in real-world applications and is underlooked because previous approaches mainly handle either of these two factors. The critical challenge is decoupling style and spurious features in the absence of domain labels. We propose a structural causal model (SCM) for the image generation process to address this challenge, considering both style distribution shifts and spurious features. The proposed SCM enables us to design a new framework called IRSS, which can gradually separate style distribution and spurious features from images by introducing adversarial neural networks and multi-environment optimization, thus achieving OOD generalization. Moreover, it does not require additional supervision (e.g., domain labels) other than the images and their corresponding labels. Experiments on benchmark datasets demonstrate that IRSS outperforms traditional OOD methods and solves the problem of Invariant risk minimization degradation, enabling the extraction of invariant features under distribution shift.
Article
Modern Configurational Comparative Methods (CCMs), such as Qualitative Comparative Analysis (QCA) and Coincidence Analysis (CNA), have gained in popularity among social scientists over the last thirty years. A new CCM called Combinational Regularity Analysis (CORA) has recently joined this family of methods. In this article, we provide a software tutorial for the open-source package CORA, which implements the eponymous method. In particular, we demonstrate how to use CORA to discover shared causes of complex effects and how to interpret corresponding solutions correctly, how to mine configurational data to identify minimum-size tuples of solution-generating inputs, and how to visualize solutions by means of logic diagrams.
Article
Full-text available
The World Meteorological Organization (WMO) has called for more meaningful warnings to help reduce the impacts of weather-related events. Impact-based forecasts and warnings (IBFW) are being developed by forecasting agencies globally to meet this call. However, there are many challenges facing those implementing such systems. The WMO World Weather Research Programme High Impact Weather project sought to understand the future direction of research on IBFW systems. This research involved a virtual workshop series in late 2022 with over 350 international registrants to identify and analyse challenges that people are facing in developing IBFW systems, and potential solutions. We found that challenges relate to ten themes, in addition to defining the measures of success of an IBFW system Examples of key research gaps are to develop evaluation methods to explore the value of multi-hazard IBFW, in terms of collating data at appropriate scales, and including avoided losses, behavioural responses, and unconventional observations. We need to explore the value of using quantitative approaches in comparison to more efficient qualitative approaches, as well as of dynamic exposure and vulnerability data sets, and tailored warnings. We must investigate how to effectively communicate uncertainty and explore the governance of underpinning data. Further research on these topics will assist with the successful implementation of more meaningful warnings globally, whilst considering the feasibility and effectiveness of the efforts involved. This is our contribution to reducing the impacts of future hazards, at a time where climate-related events are expected to increase in severity.
Article
Full-text available
The subject of investigating causation in ecology has been widely discussed in recent years, especially by advocates of a structural causal model (SCM) approach. Some of these advocates have criticized the use of predictive models and model selection for drawing inferences about causation. We argue that the comparison of model‐based predictions with observations is a key step in hypothetico‐deductive (H‐D) science and remains a valid approach for assessing causation. We draw a distinction between two approaches to inference based on predictive modeling. The first approach is not guided by causal hypotheses and focuses on the relationship between a (typically) single response variable and a potentially large number of covariates. We agree that this approach does not yield useful inferences about causation and is primarily useful for hypothesis generation. The second approach follows a H‐D framework and is guided by specific hypotheses about causal relationships. We believe that this has been, and continues to be, a useful approach to causal inference. Here, we first define different kinds of causation, arguing that a “probability‐raisers‐of‐processes” definition is especially appropriate for many ecological systems. We outline different scientific “designs” for generating the observations used to investigate causation. We briefly outline some relevant components of the SCM and H‐D approaches to investigating causation, emphasizing a H‐D approach that focuses on modeling causal effects on vital rate (e.g., rates of survival, recruitment, local extinction, colonization) parameters underlying system dynamics. We consider criticisms of predictive modeling leveled by some SCM proponents and provide two example analyses of ecological systems that use predictive modeling and avoid these criticisms. We conclude that predictive models have been, and can continue to be, useful for providing inferences about causation.
Article
Full-text available
Racial/ethnic differences are associated with the symptoms and conditions of post-acute sequelae SARS-CoV-2 infection (PASC) in adults. These differences may exist among children and warrant further exploration. We conducted a retrospective cohort study with difference-in-differences analyzes to assess these differences in children and adolescents under the age of 21. The study utilized data from the RECOVER Initiative in the United States, which aims to learn about the long-term effects of COVID-19. The cohort included 225,723 patients with SARS-CoV-2 infection or COVID-19 diagnosis between March 2020 and October 2022. The study compared minority racial/ethnic groups to Non-Hispanic White (NHW) individuals, stratified by severity during the acute phase of COVID-19. Within the severe group, Asian American/Pacific Islanders (AAPI) had a higher prevalence of fever/chills and respiratory signs and symptoms, Hispanic patients showed greater hair loss prevalence in severe COVID-19 cases, while Non-Hispanic Black (NHB) patients had fewer skin symptoms in comparison to NHW patients. Within the non-severe group, AAPI patients had increased POTS/dysautonomia and respiratory symptoms, and NHB patients showed more cognitive symptoms than NHW patients. In conclusion, racial/ethnic differences related to COVID-19 exist among PASC symptoms and conditions in pediatrics, and these differences are associated with the severity of illness during acute COVID-19.
Book
This Element offers a concise introduction to the theory and practice of narrative creativity. It distinguishes narrative creativity from ideation, divergent thinking, design thinking, brainstorming, and other current approaches to explaining and/or cultivating creativity. It explains the biological and neuroscientific origins of narrative creativity. It provides practical exercises, developed and tested in hundreds of classrooms and businesses, and validated independently by the US Army. It details how narrative creativity contributes to technological innovation, scientific progress, cultural growth, and psychological wellbeing. It describes how narrative creativity can be assessed. This title is also available as Open Access on Cambridge Core.
Article
Full-text available
Experiments have long been the gold standard for causal inference in Ecology. As Ecology tackles progressively larger problems, however, we are moving beyond the scales at which randomised controlled experiments are feasible. To answer causal questions at scale, we need to also use observational data —something Ecologists tend to view with great scepticism. The major challenge using observational data for causal inference is confounding variables: variables affecting both a causal variable and response of interest. Unmeasured confounders—known or unknown—lead to statistical bias, creating spurious correlations and masking true causal relationships. To combat this omitted variable bias, other disciplines have developed rigorous approaches for causal inference from observational data that flexibly control for broad suites of confounding variables. We show how ecologists can harness some of these methods—causal diagrams to identify confounders coupled with nested sampling and statistical designs—to reduce risks of omitted variable bias. Using an example of estimating warming effects on snails, we show how current methods in Ecology (e.g., mixed models) produce incorrect inferences due to omitted variable bias and how alternative methods can eliminate it, improving causal inferences with weaker assumptions. Our goal is to expand tools for causal inference using observational and imperfect experimental data in Ecology.
Article
Full-text available
Ecology often seeks to answer causal questions, and while ecologists have a rich history of experimental approaches, novel observational data streams and the need to apply insights across naturally occurring conditions pose opportunities and challenges. Other fields have developed causal inference approaches that can enhance and expand our ability to answer ecological causal questions using observational or experimental data. However, the lack of comprehensive resources applying causal inference to ecological settings and jargon from multiple disciplines creates barriers. We introduce approaches for causal inference, discussing the main frameworks for counterfactual causal inference, how causal inference differs from other research aims and key challenges; the application of causal inference in experimental and quasi‐experimental study designs; appropriate interpretation of the results of causal inference approaches given their assumptions and biases; foundational papers; and the data requirements and trade‐offs between internal and external validity posed by different designs. We highlight that these designs generally prioritise internal validity over generalisability. Finally, we identify opportunities and considerations for ecologists to further integrate causal inference with synthesis science and meta‐analysis and expand the spatiotemporal scales at which causal inference is possible. We advocate for ecology as a field to collectively define best practices for causal inference.
Article
Local causal structure learning (LCS) efficiently identifies a set of direct neighbors of a specified variable from observational data. Additionally, it distinguishes direct causes and direct effects of this variable without learning the entire causal structure. While many LCS algorithms have been proposed, they do not consider the data privacy-preserving problem, which has attracted extensive attention from academia and industry. To address this issue, we propose a federated local causal structure learning (FedLCS) algorithm to learn local causal structures in privacy-preserving data in a federated setting. Specifically, FedLCS introduces a layer-wise federated local skeleton learning algorithm to construct the local skeleton. Based on this skeleton, it introduces a federated local skeleton orientation algorithm and an extension-and-backtracking orientation algorithm to orient the edges. Finally, FedLCS uses a federated local extension-and-backtracking orientation algorithm to orient the remaining edges. Extensive experiments on benchmark, synthetic, and real datasets demonstrate that FedLCS can learn the local causal structure of a given variable in a federated setting.
Article
Full-text available
Representations learned by self-supervised approaches are generally considered to possess sufficient generalizability and discriminability. However, we disclose a nontrivial mutual-exclusion relationship between these critical representation properties through an exploratory demonstration on self-supervised learning. State-of-the-art self-supervised methods tend to enhance either generalizability or discriminability but not both simultaneously. Thus, learning representations jointly possessing strong generalizability and discriminability presents a specific challenge for self-supervised learning. To this end, we revisit the learning paradigm of self-supervised learning from the perspective of evolutionary game theory (EGT) and outline the theoretical roadmap to achieve a desired trade-off between these representation properties. EGT performs well in analyzing the trade-off point in a two-player game by utilizing dynamic system modeling. However, the EGT analysis requires sufficient annotated data, which contradicts the principle of self-supervised learning, i.e., the EGT analysis cannot be conducted without the annotations of the specific target domain for self-supervised learning. Thus, to enhance the methodological generalization, we propose a novel self-supervised learning method that leverages advancements in reinforcement learning to jointly benefit from the general guidance of EGT and sequentially optimize the model to chase the consistent improvement of generalizability and discriminability for specific target domains during pre-training. On top of this, we provide a benchmark to evaluate the generalizability and discriminability of learned representations comprehensively. Theoretically, we establish that the proposed method tightens the generalization error upper bound of self-supervised learning. Empirically, our method achieves state-of-the-art performance on various benchmarks. Our implementation is available at https://github.com/ZangZehua/essl.
Article
Full-text available
Live-cell microscopy routinely provides massive amounts of time-lapse images of complex cellular systems under various physiological or therapeutic conditions. However, this wealth of data remains difficult to interpret in terms of causal effects. Here, we describe CausalXtract, a flexible computational pipeline that discovers causal and possibly time-lagged effects from morphodynamic features and cell–cell interactions in live-cell imaging data. CausalXtract methodology combines network-based and information-based frameworks, which is shown to discover causal effects overlooked by classical Granger and Schreiber causality approaches. We showcase the use of CausalXtract to uncover novel causal effects in a tumor-on-chip cellular ecosystem under therapeutically relevant conditions. In particular, we find that cancer-associated fibroblasts directly inhibit cancer cell apoptosis, independently from anticancer treatment. CausalXtract uncovers also multiple antagonistic effects at different time delays. Hence, CausalXtract provides a unique computational tool to interpret live-cell imaging data for a range of fundamental and translational research applications.
Article
Psychologists are often interested in the effect of an internal state, such as ego depletion, that cannot be directly assigned in an experiment. Instead, they assign participants to a manipulation intended to produce this state and use manipulation checks to assess the manipulation’s effectiveness. In this article, I discuss statistical analyses for experiments in which researchers are primarily interested in the average treatment effect (ATE) of the target internal state rather than that of the manipulation. Often, researchers estimate the association of the manipulation itself with the dependent variable, but this intention-to-treat (ITT) estimator is typically biased for the ATE of the target state, and the bias could be either toward the null (conservative) or away from the null. I discuss the fairly stringent assumptions under which this estimator is conservative. Given this, I argue against the status-quo practice of interpreting the ITT estimate as the effect of the target state without any explicit discussion of whether these assumptions hold. Under a somewhat weaker version of the same assumptions, one can alternatively use instrumental-variables (IVs) analysis to directly estimate the effect of the target state. IVs analysis complements ITT analysis by directly addressing the central question of interest. As a running example, I consider a multisite replication study on the ego-depletion effect, in which the manipulation’s partial effectiveness led to criticism and several reanalyses that arrived at varying conclusions. I use IVs analysis to directly account for the manipulation’s partial effectiveness; this corroborated the replication authors’ reported null results.
Article
Full-text available
The theoretical relationship between social media use and job satisfaction, especially concerning gender-specific mechanisms, remains a subject of ongoing debate in the literature. This divergence reflects our insufficient understanding of the complex relationships among gender, social media use, and job satisfaction. Drawing on Social Role Theory (SRT) and the Theory of Planned Behavior (TPB), this study utilizes 4651 valid samples from the 2020 China Family Panel Studies (CFPS) database to investigate how gender influences interpersonal relationships through social media sharing frequency, thereby enhancing job satisfaction. The findings indicate that women, compared to men, exhibit higher job satisfaction and more frequent social media sharing behavior. Moreover, the frequency of social media sharing positively affects job satisfaction by improving interpersonal relationships. This study employs a chain-mediated causal path analysis to delve into the causal relationships among gender, social media sharing frequency, and interpersonal relationships, effectively addressing previous limitations in handling multiple mediating effects. The findings not only provide new insights into the role of social media in the modern workplace but also offer empirical evidence and practical guidance for organizations on leveraging social media to foster employee relationships and enhance job satisfaction.
Article
Full-text available
We can use structural causal models (SCMs) to help us evaluate the consequences of actions given data. SCMs identify actions with structural interventions. A careful decision maker may wonder whether this identification is justified. We seek such a justification. We begin with decision models, which map actions to distributions over outcomes but avoid additional causal assumptions. We then examine assumptions that could justify causal interventions, with a focus on symmetry. First, we introduce conditionally independent and identical responses (CIIR), a generalisation of the IID assumption to decision models. CIIR justifies identifying actions with interventions, but is often an implausible assumption. We consider an alternative: precedent is the assumption that “what I can do has been done before, and its consequences observed,” and is generally more plausible than CIIR. We show that precedent together with independence of causal mechanisms (ICM) and an observed conditional independence can justify identifying actions with causal interventions. ICM has been proposed as an alternative foundation for causal modelling, but this work suggests that it may in fact justify the interventional interpretation of causal models.
Article
Full-text available
In some fields of artificial intelligence, machine learning and statistics, the validation of new methods and algorithms is often hindered by the scarcity of suitable real-world datasets. Researchers must often turn to simulated data, which yields limited information about the applicability of the proposed methods to real problems. As a step forward, we have constructed two devices that allow us to quickly and inexpensively produce large datasets from non-trivial but well-understood physical systems. The devices, which we call causal chambers, are computer-controlled laboratories that allow us to manipulate and measure an array of variables from these physical systems, providing a rich testbed for algorithms from a variety of fields. We illustrate potential applications through a series of case studies in fields such as causal discovery, out-of-distribution generalization, change point detection, independent component analysis and symbolic regression. For applications to causal inference, the chambers allow us to carefully perform interventions. We also provide and empirically validate a causal model of each chamber, which can be used as ground truth for different tasks. The hardware and software are made open source, and the datasets are publicly available at causalchamber.org or through the Python package causalchamber.
Article
Full-text available
Published studies using the regression discontinuity design have been limited to cases in which linear regression is applied to a categorical treatment indicator and an equal interval outcome. This is unnecessarily narrow. We show here how a generalization the usual regression discontinuity design can be applied in a wider range of situations. We focus on the use of categorical treatment and response variables, but we also consider the more general case of any regression relationship. We also show how a resampling sensitivity analysis may be used to address the credibility of the assumed assignment process. The broader formulation is applied to an evaluation of California's inmate classification system, which is used to allocate prisoners to different kinds of confinement.
Article
Full-text available
It is shown how a causal ordering can be defined in a complete structure , and how it is equivalent to identifying the mechanisms of a system. Several techniques are shown that may be useful in actually accomplishing such identification. Finally, it is shown how this explication of causal ordering can be used to analyse causal counterfactual conditionals. First the counterfactual proposition at issue is articulated through the device of a belief-contravening supposition. Then the causal ordering is used to provide modal categories for the factual propositions, and the logical contradiction in the system is resolved by ordering the factual propositions according to these causal categories.
Article
Full-text available
The aim of the paper is to explicate the concept of causal independence between sets of factors and Reichenbach's screening-off-relation in probabilistic terms along the lines of Suppes' probabilistic theory of causality (1970). The probabilistic concept central to this task is that of conditional stochastic independence. The adequacy of the explication is supported by proving some theorems about the explicata which correspond to our intuitions about the explicanda.
Article
Full-text available
We consider contingency tables having one variable specified as a response with just two categories. We look at conditions for collapsibility of a symmetric and a directed measure of association, the odds-ratio and the relative risk: situations are discussed under which equal partial associations coincide with the corresponding marginal association. Contrary to the odds-ratio the relative risk is collapsible, if there are independencies in the marginal distribution of the influencing variables. This fact is exploited to derive conditions for the lack of a moderating effect, the latter being a much discussed concept in the social science literature.
Article
Full-text available
The main purpose of this paper is to clarify relations and distinctions between several approaches suggested in the statistical literature for analysing structures in correlation matrices, i.e. of relations among observable, quantitative variables having exclusively linear associations. Block-recursive regression equations are derived as the key to understanding the relation between two main approaches, between graphical chain models for continuous variables on the one hand and linear structural equations discussed in the econometric and in the psychometric literature on the other hand. Their relations to other model classes such as covariance selection, multivariate linear regression, and path analysis are discussed.
Article
Full-text available
We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data. First and foremost, we develop a methodology for assessing informative priors needed for learning. Our approach is derived from a set of assumptions made previously as well as the assumption oflikelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence. We show that likelihood equivalence when combined with previously made assumptions implies that the user''s priors for network parameters can be encoded in a single Bayesian network for the next case to be seen—aprior network—and a single measure of confidence for that network. Second, using these priors, we show how to compute the relative posterior probabilities of network structures given data. Third, we describe search methods for identifying network structures with high posterior probabilities. We describe polynomial algorithms for finding the highest-scoring network structures in the special case where every node has at mostk=1 parent. For the general case (k>1), which is NP-hard, we review heuristic search algorithms including local search, iterative local search, and simulated annealing. Finally, we describe a methodology for evaluating Bayesian-network learning algorithms, and apply this approach to a comparison of various approaches.
Article
Full-text available
L'A. etudie les dangers de l'idealisation en science, qui consistent non pas en de mauvaises images pour representer le monde, mais en de mauvais outils pour en changer. L'A. developpe deux exemples choisis dans le domaine de la methodologie: il s'agit de montrer comment de mauvaises idealisations impliquent de mauvaises methodes qui impliquent une mauvaise science. D'un point de vue philosophique, il s'agit de passer d'un ideal platonique a un ideal contemporain a ce monde (this wordly)
Article
[Cause in fact] is a question of fact. It is, furthermore, a fact upon which all the learning, literature and lore of the law are largely lost. It is a matter upon which any layman is quite as competent to sit in judgment as the most experienced court. For that reason, in the ordinary case, it is peculiarly a question for the jury. This question of "fact" ordinarily is one upon which all the learning, literature and lore of the law are largely lost. It is a matter upon which lay opinion is quite as competent as that of the most experienced court. For that reason, in the ordinary case, it is peculiarly a question for the jury.
Article
This article has no abstract.
Chapter
The basic idea of probabilistic causality, as introduced by Reichenbach (1956), Good (1961–62), Suppes (1970), is that a cause raises the probability of the effect. However, not all events that raise the probability of another event can be counted as causes. In order to sort out those pairs of events that stand in a causal relation to each other, further conditions must be added. Suppes handles this by distinguishing between genuine and spurious causes as well as between direct and indirect causes.
Article
Economists and workers in other decision sciences frequently use words such as “causal” or “cause” in their writings, as a glance at the listing of titles in the Social Science Citation Index will easily confirm. A number of papers have been written specifically on these concepts although few formal definitions have been attempted.1 In Economics, we observe values for our major variables each month, or some similar regular time-period, and because of this it can be claimed that there is some generating mechanism which produces these values. One of the main tasks of the economist is to try to untangle and understand this mechanism and in the search for the fundamentals of this process, the concepts of “law” and “causation” arise very naturally. However, the search for causality may be motivated by more than just academic interest, as will be discussed below.
Article
After clarifying the probabilistic conception of causality suggested by Good (1961-2), Suppes (1970), Cartwright (1979), and Skyrms (1980), we prove a sufficient condition for transitivity of causal chains. The bearing of these considerations on the units of selection problem in evolutionary theory and on the Newcomb paradox in decision theory is then discussed.
Article
[Introduction] Probabilistic theories of causation and probabilistic theories of rational decision both face difficulties from spurious probabilistic correlations. Both types of theory handle these difficulties in the same manner: the spurious correlations are made to disappear by conditionalizing upon the elements of a carefully chosen partition. The structural similarity between the two types of theory suggests a systematic connection between them. One view -the view reflected in the name 'causal decision theory'- has it that the theory of causation is conceptually prior to that of decision: causal decision theory has the structure it does because it aims to tell us about the expected effects of our actions. But then we may ask from whence probabilistic theories of causation inherit their mathematical structure. In this paper, I will explore the prospects for a 'decision-theoretic causation' that explains the mathematical structure of a probabilistic theory of causation using a conceptually prior decision theory.
Article
Multidimensional contingency tables can be summed over factors, without affecting the log‐linear parameters describing interactions of other factors, under less restrictive conditions than generally recognized. Examples are given that contradict a theorem of Bishop, Fienberg and Holland (1975). Necessary and sufficient conditions for collapsibility are given, and an example applying the results to categorical data is presented.
Article
While graphs are normally defined in terms of the 2-place relation of adjacency, we take the 3-place relation of interception as the basic primitive of the definition. The paper views graphs as an economic scheme for encoding interception relations, and establishes as axiomatic characterization of relations that lend themselves to representation in terms of graph interception, thus providing a new characterization of graphs.</p
Article
A linear structural equation model (SEM) without free parameters has two parts: a probability distribution and an associated path diagram corresponding to the causal relations among variables specified by the structural equations and the correlations among the error terms. This article shows how path diagrams can be used to solve a number of important problems in structural equation modeling for example, How much do sample data underdetermine the correct model specification? Given that there are equivalent models, is it possible to extract the features common to those models? When a modeler draws conclusions about coefficients in an unknown underlying SEM from a multivariate regression, precisely what assumptions are being made about the SEM The authors explain how the path diagram provides much more than heuristics for special cares; the theory of path diagrams helps to clarify several of the issues just noted.
Article
We consider the dispute between causal decision theorists and evidential decision theorists over Newcomb-like problems. We introduce a framework relating causation and directed graphs developed by P. Spirtes et al. [Causation, prediction and search. (1993; Zbl 0806.62001)] and evaluate several arguments in this context. We argue that much of the debate between the two camps is misplaced; the disputes turn on the distinction between conditioning on an event E as against conditioning on an event I which is an action to bring about E. We give the essential machinery for calculating the effect of an intervention and consider recent work which extends the basic account given here to the case where causal knowledge is incomplete.
Article
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
Article
Race is often viewed as a causal variable and “race effects” found from regression analyses are sometimes given causal interpretations. I argue that this is a mistaken way to proceed. Race is not a causal variable in a very important sense of the word, and yet it does have a significant role in causal studies. The key role that race plays is to help our understanding of the effects of causes or interventions through the statistical “interaction” of race with causal variables, rather than as “the main effect of race.” These ideas are briefly illustrated using data from a study of tests constructed to manipulate the distribution of scores of Black and White test takers.
Article
Statisticians commonly make causal inferences from the results of randomized experiments, but usually question causal inferences from observational studies on the grounds that untestable assumptions are required. This paper explains the basis for this situation and examines, within quite a general framework, various assumptions that appear in the literature. It is demonstrated how the role of each assumption is intimately related to the sort of causal inference being considered. Contrary to general belief, it is shown that neither `no confounding' nor `randomization' are sufficient assumptions for testing the fundamental causal hypothesis of `no causation'. They become sufficient, however, if supplemented by a `modelling' assumption for how unobserved covariates affect potential responses. Without supplementary assumptions, no confounding and randomization allow testing of weaker but more important hypotheses, such as `no mean effect'. This follows from the sufficiency of an assumption of `ignorable treatment assignment' for testing the hypothesis of `no distribution effect'. It is argued that these ideas will have some operational relevance if we can quantify our beliefs about how closely assumptions hold. To illustrate, a novel form of sensitivity analysis is contrasted with an analysis presented by Rosenbaum and Rubin.
Article
This paper defends a counterfactual account of explanation, according to which successful explanation requires tracing patterns of counterfactual dependence of a special sort, involving what I call active counterfactuals. Explanations having this feature must appeal to generalizations that are invariant - stable under certain sorts of changes. These ideas are illustrated by examples drawn from physics and econometrics. Copyright 1997 by the Philosophy of Science Association. All rights reserved.
Article
We consider the collapsibility of relative risks in contingency tables with a polytomous response variable. Simple collapsibility of relative risks means that they are consistent over all levels of a collapsed variable and also coincide with the corresponding marginal relative risk. Strong collapsibility of relative risks means that they retain simple collapsibility no matter how the variable is partially collapsed. We present necessary and sufficient conditions for simple and strong collapsibilities of relative risks.
Article
Binary-valued Markov random fields may be used as models for point processes with interactions (e.g. repulsion or attraction) between their points. This paper aims to provide a simple nontechnical introduction to Markov random fields in this context. The underlying spaces on which points occur are taken to be countable (e.g. lattice vertices) or continuous (Euclidean space). The role of Markov random fields as equilibrium processes for the temporal evolution of spatial processes is also discussed and various applications and examples are given. /// On peut utiliser les champs binaires de Markov comme modèles pour les processus ponctuels avec intéractions entre points (tels que l'attraction ou la répulsion). En suivant cette démarche, cet article se présente comme une introduction simple aux champs de Markov, sans dévelopements technicques détaillés. Nous nous intéressons au cas où les espaces sousjacents contenant ces points sont dénombrables (ex. les noeuds d'un treillis ou continus (espace euclidien). Nous considérons également les champs de Markov en tant que processus d'équilibre dans les évolutions temporelles des processus spatiaux. Pour finir, nous signalons quelques exemples et quelques applications.
Article
This chapter discusses statistical contingency models of causal induction have had two major advantages over their power-based rivals. First, statistics-based models are grounded in direct experience and, hence, promise to explicate the evidence and the processes responsible for acquiring cause-effect relationships from raw data. Second, statistics-based models enjoy the symbolic machinery of probability calculus, which enables researchers to posit hypotheses, communicate ideas, and make predictions with mathematical precision. In comparison, as well as skirting the issue of causal induction by presuming the preexistence of a causal structure, power-based theories have lacked an adequate formal language in which to cast assumptions, claims, and predictions. The chapter offers a formal setting, based on mechanisms, structures, and surgeries, which accommodates both the statistical and the power components of causal inference. It discusses how preexisting causal knowledge, cast qualitatively in the form of a graph, can combine with statistical data to produce new causal knowledge that is both qualitative and quantitative in nature. The explication of causal relationships in terms of mechanisms and physical laws is not meant to imply that the induction of physical laws is a solved, trivial task. It implies that the problem of causal induction, once freed of the mysteries and suspicions that normally surround discussions of causality, is formulated as part of the more familiar problem of scientific induction.
Article
A 2 × 2 table of frequencies can be expressed as the sum of two other tables corresponding to the levels of a real or potential ‘masked’ factor in which the direction of the apparent association between the marginal factors is opposite that of the original. This paper analyses the counterpart of this disaggregation obtained by replacing frequencies by probabilities. It is shown that such a disaggregation cannot occur if the original experiment is sufficiently well-balanced with respect to this masked factor. When the class of such disaggregations is nonempty that of its elements which maximizes the average absolute differences of certain marginal probabilities in the resulting tables is found.
Article
Graphs consisting of points, and lines or arrows as connections between selected pairs of points, are used to formulate hypotheses about relations between variables. Points stand for variables, connections represent associations. When a missing connection is interpreted as a conditional independence, the graph characterizes a conditional independence structure as well. Statistical models, called graphical chain models, correspond to special types of graphs which are interpreted in this fashion. Examples are used to illustrate how conditional independences are reflected in summary statistics deirived from the models and how the graphs help to identify analogies and equivalences between different models. Graphical chain models are shown to provide a unifying concept for many statistical techniques that in the past have proven to be useful in analyses of data. They also provide tools for new types of analysis.
Article
Using a data set on lung ventilation, a number of alternative assumptions governing nondirected paths in structural equation models without latent variables are examined and compared. Some problems with the conventional procedures in path analysis are pointed out, and alternative possibilities are suggested.
Article
The connection between the simplicity of scientific theories and the credence attributed to their predictions seems to permeate the practice of scientific discovery. When a scientist succeeds in explaining a set of nobservations using a model Mof complexity c then it is generally believed that the likelihood of finding another explanatory model with similar complexity but leading to opposite predictions decreases with increasing nand decreasing c. This paper derives formal relationships between n, c and the probability of ambiguous predictions by examining three modeling languages under binary classification tasks: perceptrons, Boolean formulae, and Boolean networks. Bounds are also derived for the probability of error associated with the policy of accepting only models of complexity not exceeding c. Human tendency to regard the simpler as the more trustworthy is given a qualified justification.
Article
The notion of a recursive causal graph is introduced, hopefully capturing the essential aspects of the path diagrams usually associated with recursive causal models. We describe the conditional independence constraints which such graphs are meant to embody and prove a theorem relating the fulfilment of these constraints by a probability distribution to a particular sort of factorisation. The relation of our results to the usual linear structural equations on the one hand, and to log-linear models, on the other, is also explained
Article
Expected years of life lost is an important concept in public-health and legal issues. We describe conditions under which the expected years of life lost due to hazardous exposure is estimable (identifiable) from epidemiologic data. We show that, in general, the average years of life lost among exposed subjects dying at a given age (the age-specific expected years of life lost) is not identifiable, although the average years of life lost among all exposed subjects (the unconditional expected years of life lost) is identifiable from an unbiased epidemiologic study. We also show that the average years of life lost among all exposed subjects dying of a specific cause (the cause-specific expected years of life lost) is not identifiable. We discuss the implications of these results for compensation schemes based on years of life lost, and compare such schemes with those based on the probability of causation.
Article
[Introduction] What is the relationship between claims of singular causation such as 1. David's smoking caused him to develop lung cancer, and claims of general causation, such as 2. Smoking causes lung cancer? Hume held that the truth of singular causal claims depended upon the existence of universal regularities in nature. In the first Enquiry, for example, Hume wrote that we may define a cause to be an object, followed by another, and where all the objects similar to the first, are followed by objects similar to the second. (Hume [1748] 1955, §VII) Davidson (1980) concurs, while noting that it may not be apparent which generalization is instantiated by any particular episode of singular causation: [l]f 'a caused b' is true, then there are descriptions of a and b such that the result of substituting them for 'a' and 'b' in 'a caused b' is entailed by true premises of the form of (L) and (P) [where (L) provides the form of a causal law, and (P) provides the form of premises describing initial conditions) ... If this is correct, it does not follow that we must be able to dredge up a law if we know a singular casual statement to be true; all that follows is that we know there must be a covering law. (pp. 159-60). Hume and Davidson may be understood as pursuing the following strategy: analyze the truth-conditions of general causal claims in terms of universal regularities in nature, and then treat singular causal claims as describing instantiations of such regularities. Let us call this the Humean strategy.
Article
The primary criterion of adequacy of a probabilistic causal analysis is that the causal variable should render the simultaneous phenomenological data conditionally independent. The intuition back of this idea is that the common cause of the phenomena should factor out the observed correlations. So we label the principle the common cause criterion. If we find that the barometric pressure and temperature are both dropping at the same time, we do not think of one as the cause of the other but look for a common dynamical cause within the physical theory of meteorology. If we find fever and headaches positively correlated, we look for a common disease as the source and do not consider one the cause of the other. But we do not want to suggest that satisfaction of this criterion is the end of the search for causes or probabilistic explanations. It does represent a significant and important milestone in any particular investigation.
Article
Discontented people might talk of corruption in the Commons, closeness in the Commons and the necessity of reforming the Commons, said Mr. Spenlow solemnly, in conclusion; but when the price of wheat per bushel had been the highest, the Commons had been the busiest; and a man might lay his hand upon his heart, and say this to the whole world, – ‘Touch the Commons, and down comes the country!’
Article
An analysis of causality has been particularly troublesome, and thus mostly ignored, by those who believe the world is indeterministic. Patrick Suppes has attempted to give an account of causality that would hold in both deterministic and indeterministic worlds. To do this, Suppes uses probability relations to define causal relations. The main problems facing a probabilistic theory of causality are those of distinguishing between genuine and spurious causes as well as direct and indirect causes. Suppes presents several definitions of different types of causes in an attempt to capture the distinction between genuine and spurious causes and direct and indirect causes. It is my claim that Suppes' definitions fail to distinguish among genuine and spurious causes and direct and indirect causes. To support this claim I will give some counterexamples to Suppes' theory. I will then modify some of Suppes' definitions in a natural manner, and show that even with modification they are still prone to counterexamptes. The main thrust here is that Suppes' account of causation is intrinsically defective. I believe that there is no way to differentiate genuine from spurious causes or direct from indirect causes using only probability relations; thus no minor modifications of Suppes' definitions will be sufficient to resolve these difficulties. While presenting counterexamples to Suppes' definitions, I will also try to explain in principle why each particular example is a counterexample to Suppes' theory. After presenting these counterexamples, I will introduce the idea of an interactive fork and use it to argue that the basic intuition around which Suppes built his theory is faulty. In the last section of the paper I will discuss the more fundamental issue of whether all positive causes must raise the probabilities of their effects. Although this issue lies at the heart of most probabilistic accounts of causality, it has largely been ignored in the literature. I hope to show that we are not justified in believing that positive causes always raise the probability of their effects and that more discussion is needed on this important subject.
Article
Many aspects of statistical design, modelling, and inference have close and important connections with causal thinking. These are analyzed in the paper against a philosophical background that regards formal mathematical models as having dual interpretations, reflecting both objectivist reality and subjectivist rationality. The latter aspect weakens the need for an objective theory of probabilistic causation, and suggests that a traditional image of causes as deterministic mechanisms should remain primary. It is argued that such causes should guide much preformal thinking about what to include in formal statistical models, especially of dynamic phenomena. The statistical measurement of causal effects is facilitated by good statistical design, including randomization where feasible, and requires other methodologies for controlling and assessing uncertainties, for example in model construction and inference. Illustrative examples include case studies where the problem is to assess retrospectively the causes of observed events and where the task is to assess future risks from controllable factors.
Article
The purpose of this note is to draw attention to certain aspects of causal reasoning which are pervasive in ordinary discourse yet, based on the author's scan of the literature, have not received due treatment by logical formalisms of common-sense reasoning. In a nutshell, it appears that almost every default rule falls into one of two categories: expectation-evoking or explanation-evoking. The former describes association among events in the outside world (e.g., fire is typically accompanied by smoke); the latter describes how we reason about the world (e.g., smoke normally suggests fire). This distinction is consistently recognized by people and serves as a tool for controlling the invocation of new default rules. This note questions the ability of formal systems to reflect common-sense inferences without acknowledging such distinction and outlines a way in which the flow of causation can be summoned within the formal framework of default logic.
Article
Counterfactuals are a form of common-sense nonmonotonic inference that has been of long-term interest to philosophers. In this paper, we begin by describing some of the impact counterfactuals can be expected to have in artificial intelligence, and by reviewing briefly some of the philosophical conclusions which have been drawn about them. We continue by presenting a formal description of counterfactual implication and discussing the issues involved in implementing it. Specific applications in the domains of planning and the automated diagnosis of hardware faults are considered, and we conclude by describing possible extensions to this work involving multi-valued logics and situation semantics.
Article
A careful discussion of the concept of conditional event leads to a sensible use of frequency data as conditional probabilities: as a by-product, the well-known ‘paradoxes’ arising from the so-called confounding effect are avoided.
Article
This paper is a response to Iwasaki and Simon [14] which criticizes de Kleer and Brown [8]. We argue that many of their criticisms, particularly concerning causality, modeling and stability, originate from the difference of concerns between engineering and economics. Our notion of causality arises from considering the interconnections of components, not equations. When no feedback is present, the ordering produced by our qualitative physics is similar to theirs. However, when feedback is present, our qualitative physics determines a causal ordering around feedback loops as well.Causal ordering is a general technique not only applicable to qualitative reasoning. Therefore we also explore the relationship between causal ordering and propagation of constraints upon which the methods of qualitative physics are based.