Lydia Kakampakou’s research while affiliated with Lancaster University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (6)


A hierarchical causal diagram illustrates individual-level causal relationships among five variables (circles are unobserved, i.e., latent; squares are observed; double-edged enclosures are determined variables): Y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y$$\end{document}, the outcome; X\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X$$\end{document}, the exposure; Z\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z$$\end{document}, a ‘regular’ confounder of the X-Y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X-Y$$\end{document} relationship that is observed; L\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L$$\end{document}, a latent confounder of the X-Y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X-Y$$\end{document} relationship that is unobserved but affects individual-level latent variable Ni\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${N}_{i}$$\end{document}, which manifests as an observed cluster-level feature, Nj\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${N}_{j}$$\end{document}. The solid single arrows signify causal relationships between variables; dashed lines are bivariate correlations realised among aggregated cluster-level (fully determined) variables; and double-lined arrows indicate deterministic pathways [43]
Table 2 (continued)
A schematic illustration of the algorithm that transforms an individual-level latent variable into a cluster-level measure of cluster size, which is used to produce the data clusters, illustrated using the example of daily mean levels of physical activity (PA\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$PA$$\end{document}) in minutes as the exposure and body weight (Wt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Wt$$\end{document}) in kilograms as the outcome. (footer): The algorithm categorises simulated individual-level data into C\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{C}}$$\end{document} clusters to convey cross-level associations with causal origins as per the data generating mechanism of Fig. 1. The process involves: (a) sorting individual-level data by ascending latent variable Ni\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${N}_{i}$$\end{document} values; (b) rescaling such that, once rounded, N^i\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{N}}_{i}$$\end{document} are potential cluster sizes with mean N/C=1000\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol N/\boldsymbol C=1000$$\end{document} and standard deviation 10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10$$\end{document}; (c) subset selection into C\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{C}}$$\end{document} evenly sized subsets – enclosed in the three ellipses; (d) randomly select one N^i\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{N}}_{i}$$\end{document} value per subset and round to generate C=100\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{C}}=100$$\end{document} cluster size values [alternatively, take subgroup means and round]; (e) undertake value modification to randomly selected cluster size values by adding or subtracting one to ensure all cluster sizes sum to population size; and (f) regroup subsets into unequally sized clusters – enclosed in the two new ellipses – based on the ordered values of Ni\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${N}_{i}$$\end{document}
of the multilevel and main ecological analyses of simulated data (plotted in black and blue respectively) for all four scenarios for continuous (charts A, C, E, G) and binary outcomes (charts B, D,F, H) – the diamond shaped plots are median estimates (y-axis) plotted against individual-level simulated ‘true’ effect sizes (x-axis); the dotted grey line indicates perfect agreement between simulated and estimated effect sizes; continuous lines are fitted lines to the median estimates. Scenario 1: Estimates of ρ7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rho }_{7}$$\end{document} with regular confounding only. Scenario 2: Estimates of ρ7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rho }_{7}$$\end{document} with latent confounding only. Scenario 3: Estimates of ρ7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rho }_{7}$$\end{document} with regular and latent confounding that are not causally related. Scenario 4: Estimates of ρ7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rho }_{7}$$\end{document} with regular and latent confounding that are causally related
Plots of multilevel and main ecological estimates of simulated data (plotted in black and orange respectively) for Scenario 4 (where estimates of ρ7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rho }_{7}$$\end{document} were sought for causally related regular and latent confounding) with additional complexity considerations: (a) low outcome prevalence (0.1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0.1$$\end{document}%); (b) binary Li-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${L}_{i}-$$\end{document} confounding (10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10$$\end{document}% prevalence) with continuous outcome; and (c) binary Li-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${L}_{i}-$$\end{document} confounding with binary outcome (both 10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10$$\end{document}% prevalence). The diamond shaped plots are individual simulation cluster-level estimates (y-axis) plotted against the individual-level simulated ‘true’ effect sizes (x-axis); the grey dotted line depicts perfect agreement between simulated and estimated effect sizes; continuous lines are linear fitted lines to all 1000\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1000$$\end{document} estimates. Scenario 4a: Estimates of ρ7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rho }_{7}$$\end{document} with regular and latent confounding that are causally related with low binary prevalence. Scenario 4b: Estimates of ρ7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rho }_{7}$$\end{document} with regular and latent confounding that are causally related with binary latent confounding and continuous outcome. Scenario 4c: Estimates of ρ7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rho }_{7}$$\end{document} with regular and latent confounding that are causally related with binary latent confounding and binary outcome

+1

Simulating hierarchical data to assess the utility of ecological versus multilevel analyses in obtaining individual-level causal effects
  • Article
  • Full-text available

March 2025

·

28 Reads

Lydia Kakampakou

·

·

Andreas Hoehn

·

[...]

·

Understanding causality, over mere association, is vital for researchers wishing to inform policy and decision making – for example, when seeking to improve population health outcomes. Yet, contemporary causal inference methods have not fully tackled the complexity of data hierarchies, such as the clustering of people within households, neighbourhoods, cities, or regions. However, complex data hierarchies are the rule rather than the exception. Gaining an understanding of these hierarchies is important for complex population outcomes, such as non-communicable disease, which is impacted by various social determinants at different levels of the data hierarchy. The alternative of analysing aggregated data could introduce well-known biases, such as the ecological fallacy or the modifiable areal unit problem. We devise a hierarchical causal diagram that encodes the multilevel data generating mechanism anticipated when evaluating non-communicable diseases in a population. The causal diagram informs data simulation. We also provide a flexible tool to generate synthetic population data that captures all multilevel causal structures, including a cross-level effect due to cluster size. For the very first time, we can then quantify the ecological fallacy within a formal causal framework to show that individual-level data are essential to assess causal relationships that affect the individual. This study also illustrates the importance of causally structured synthetic data for use with other methods, such as Agent Based Modelling or Microsimulation Modelling. Many methodological challenges remain for robust causal evaluation of multilevel data, but this study provides a foundation to investigate these.

Download

Heat maps for dependence measures for each pair of variables: Kendall’s τ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} (left), χ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi $$\end{document} (middle) and η\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta $$\end{document} (right). Note the scale in each plot varies, depending on the support of the measure, and the diagonals are left blank, where each variable is compared against itself
QQ plot for our final model (model 7 in Table 1) on standard exponential margins. The y=x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y=x$$\end{document} line is given in red and the grey region represents the 95% tolerance bounds (left). Predicted 0.9999-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0.9999-$$\end{document}quantiles against true quantiles for the 100 covariate combinations. The points are the median predicted quantile over 200 bootstrapped samples and the vertical error bars are the corresponding 50% confidence intervals. The y=x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y = x$$\end{document} line is also shown (right)
Boxplots of empirical χ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi $$\end{document} estimates obtained for the subsets GI,kA\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^A_{I,k}$$\end{document}, with k=1,…,10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 1, \ldots , 10$$\end{document} and I={1,2,3}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I=\{1,2,3\}$$\end{document}. The colour transition (from blue to orange) over k illustrates the trend in χ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi $$\end{document} estimates as the atmospheric values are increased
Final QQ plots for parts 1 (left) and 2 (right) of C3, with the y=x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y=x$$\end{document} line given in red. In both cases, the grey regions represent the 95% bootstrapped tolerance bounds
Heat map of estimated empirical pairwise χ(u)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi (u)$$\end{document} extremal dependence coefficients with u=0.95\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u=0.95$$\end{document}
Extreme value methods for estimating rare events in Utopia

November 2024

·

46 Reads

·

1 Citation

Extremes

To capture the extremal behaviour of complex environmental phenomena in practice, flexible techniques for modelling tail behaviour are required. In this paper, we introduce a variety of such methods, which were used by the Lancopula Utopiversity team to tackle the EVA (2023) Conference Data Challenge. This data challenge was split into four challenges, labelled C1-C4. Challenges C1 and C2 comprise univariate problems, where the goal is to estimate extreme quantiles for a non-stationary time series exhibiting several complex features. For these, we propose a flexible modelling technique, based on generalised additive models, with diagnostics indicating generally good performance for the observed data. Challenges C3 and C4 concern multivariate problems where the focus is on estimating joint probabilities. For challenge C3, we propose an extension of available models in the multivariate literature and use this framework to estimate joint probabilities in the presence of non-stationary dependence. Finally, for challenge C4, which concerns a 50-dimensional random vector, we employ a clustering technique to achieve dimension reduction and use a conditional modelling approach to estimate extremal probabilities across independent groups of variables.


Spatial Extremal Modelling: A Case Study on the Interplay Between Margins and Dependence

November 2024

·

8 Reads

Stat

It is no secret that statistical modelling often involves making simplifying assumptions when attempting to study complex stochastic phenomena. Spatial modelling of extreme values is no exception, with one of the most common such assumptions being stationarity in the marginal and/or dependence features. If non‐stationarity has been detected in the marginal distributions, it is tempting to try to model this while assuming stationarity in the dependence, without necessarily putting this latter assumption through thorough testing. However, margins and dependence are often intricately connected and the detection of non‐stationarity in one feature might affect the detection of non‐stationarity in the other. This work is an in‐depth case study of this interrelationship, with a particular focus on a spatio‐temporal environmental application exhibiting well‐documented marginal non‐stationarity. Specifically, we compare and contrast four different marginal detrending approaches in terms of our post‐detrending ability to detect temporal non‐stationarity in the spatial extremal dependence structure of a sea surface temperature dataset from the Red Sea.


Figure 1: Time series plot at location s 35 on the original marginal scale.
Figure 4: Boxplots of χ u (h jk ) estimates grouped in 10 equidistant distance blocks. Boxplots in red are based on χ u (h jk ) values for the period 1985 − 1989, while boxplots in blue are based on χ u (h jk ) values for years 2011 − 2015. All estimates are calculated using u = 0.95. [Left to right and top to bottom:] A-B-C-D margins.
Figure B1: Histogram of automatically selected thresholds used in margins D.
Figure C2: Differences iñ η 0.95 (s k ) between periods (1985 − 1989) − (2011 − 2015) for all spatial locations s k , k ∈ {1, . . . , D}. [Left to right and top to bottom:] A-B-C-D margins.
Spatial extremal modelling: A case study on the interplay between margins and dependence

September 2024

·

22 Reads

It is no secret that statistical modelling often involves making simplifying assumptions when attempting to study complex stochastic phenomena. Spatial modelling of extreme values is no exception, with one of the most common such assumptions being stationarity in the marginal and/or dependence features. If non-stationarity has been detected in the marginal distributions, it is tempting to try to model this while assuming stationarity in the dependence, without necessarily putting this latter assumption through thorough testing. However, margins and dependence are often intricately connected and the detection of non-stationarity in one feature might affect the detection of non-stationarity in the other. This work is an in-depth case study of this interrelationship, with a particular focus on a spatio-temporal environmental application exhibiting well-documented marginal non-stationarity. Specifically, we compare and contrast four different marginal detrending approaches in terms of our post-detrending ability to detect temporal non-stationarity in the spatial extremal dependence structure of a sea surface temperature dataset from the Red Sea.


P58 Assessing the utility of multilevel versus ecological analyses to obtain individual-level causal effect estimates

August 2024

·

7 Reads

Journal of Epidemiology and Community Health

Background Government bodies, private enterprises, and researchers increasingly use ‘big data’ to monitor, evaluate interventions, make future predictions, and seek causal understanding. Such data are often complex in structure (i.e., hierarchical), which creates challenges for methods that work for a single homogeneous population, but which mislead if applied to data with substructure. If causal insights are sought, this usually pertains to the individual, yet most datasets are aggregated due to issues surrounding sensitive personal information, which is why it is common to encounter simulation approaches, such as agent-based modelling (ABM), or ecological analyses that evaluate only marginal (i.e., clustered) information. Contemporary causal inference methods are yet to tackle the full complexities of multilevel data structure, beyond longitudinal repeated measures. There is thus a gap in our understanding and methods capabilities surrounding causal analysis of structured data, which this study examines. Methods 1) devise a hierarchical causal diagram that encodes a multilevel data generating mechanism with prespecified cross-level causal relationships; 2) simulate multilevel data from the causal diagram and obtain aggregated data; 3) contrast multilevel and ecological estimates of a simulated individual-level causal effect, to assess the presence and extent of potential biases. Results Unlike a multilevel analysis of the full data, ecological analyses of cluster-level data do not generally yield robust causal effect estimates. While it is known that ecological analyses invoke the ‘ecological fallacy’ (i.e., where attributing features of clusters to units within clusters may mislead), this study quantifies this for the first time within a formal causal framework. An algorithm to simulate causally structured multilevel data is also demonstrated. Conclusion Insights into the limitations of common analytical practices were made possible by simulating causally structured hierarchical data, demonstrating the value of causal diagrams in both simulation and causal analysis. Methodological challenges remain for robust causal evaluation of big data, but this study shows how to investigate these challenges. Results reveal the need for individual-level data with application of multilevel analyses to achieve robust causal inquiry; ecological analyses do not generally provide sound causal effect estimation. If individual-level data are unavailable, synthetic data (informed by available marginal data) becomes necessary to answer causal questions and this study provides a tool to generate synthetic population data that reflects multilevel causal structures, which in turn will then better inform the use of methods such as ABMs. This study has enormous implications for the use of big data when seeking causal insights.


Extreme value methods for estimating rare events in Utopia

December 2023

·

58 Reads

To capture the extremal behaviour of complex environmental phenomena in practice, flexible techniques for modelling tail behaviour are required. In this paper, we introduce a variety of such methods, which were used by the Lancopula Utopiversity team to tackle the data challenge of the 2023 Extreme Value Analysis Conference. This data challenge was split into four sections, labelled C1-C4. Challenges C1 and C2 comprise univariate problems, where the goal is to estimate extreme quantiles for a non-stationary time series exhibiting several complex features. We propose a flexible modelling technique, based on generalised additive models, with diagnostics indicating generally good performance for the observed data. Challenges C3 and C4 concern multivariate problems where the focus is on estimating joint extremal probabilities. For challenge C3, we propose an extension of available models in the multivariate literature and use this framework to estimate extreme probabilities in the presence of non-stationary dependence. Finally, for challenge C4, which concerns a 50 dimensional random vector, we employ a clustering technique to achieve dimension reduction and use a conditional modelling approach to estimate extremal probabilities across independent groups of variables.

Citations (1)


... where * , represents some base 'threshold' quantile level obtained using standard techniques. This corresponds with the model recently proposed by André et al. (2023) for modelling non-stationary dependence. Assuming the formulation of equation (3) for ( ) and a constant shape parameter, i.e., ( ) = , we fit model (11) using the evgam package in the R computing language (Youngman, 2022). ...

Reference:

Modelling non-stationarity in asymptotically independent extremes
Extreme value methods for estimating rare events in Utopia

Extremes