Andrew C ParnellMaynooth University · Hamilton Institute
Andrew C Parnell
PhD Statistics, University of Sheffield
About
164
Publications
55,269
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
13,744
Citations
Introduction
Additional affiliations
November 2016 - March 2017
September 2008 - present
September 2008 - present
Publications
Publications (164)
We propose a Bayesian, noisy-input, spatial–temporal generalized additive model to examine regional relative sea-level (RSL) changes over time. The model provides probabilistic estimates of component drivers of regional RSL change via the combination of a univariate spline capturing a common regional signal over time, random slopes and intercepts c...
Modelling growth in student achievement is a significant challenge in the field of education. Understanding how interventions or experiences such as part-time work can influence this growth is also important. Traditional methods like difference-in-differences are effective for estimating causal effects from longitudinal data. Meanwhile, Bayesian no...
Background: Despite its important role in education, significant gaps remain in the literature on homework. Notably, there is a dearth of understanding regarding how homework effects vary across different subjects, how student backgrounds may moderate its effectiveness, what the optimal amount and distribution of homework is, and how the causal imp...
Bayesian Causal Forests (BCF) is a causal inference machine learning model based on the flexible non-parametric regression and classification tool, Bayesian Additive Regression Trees (BART). Motivated by data from the Trends in International Mathematics and Science Study (TIMSS), which includes data on student achievement in both mathematics and sc...
Turbidity is commonly monitored as an important water quality index. Human activities, such as dredging and dumping operations, can disrupt turbidity levels and should be monitored and analysed for possible effects. In this paper, we model the variations of turbidity in Dublin Bay over space and time to investigate the effects of dumping and dredgi...
Background: Despite its important role in education, significant gaps remain in the literature on homework. Notably, there is a dearth of understanding regarding how homework effects vary across different subjects, how student backgrounds may moderate its effectiveness, what the optimal amount and distribution of homework is, and how the causal imp...
In this study we investigate the impact of regularly assigning creative mathematical reasoning tasks on student achievement. Using a causal inference machine learning approach applied to Irish eighth grade data from TIMSS 2019, we find that assigning challenging questions requiring students to go beyond the instruction has a clear positive effect o...
In this study we investigate the impact of regularly assigning creative mathematical reasoning tasks on student achievement. Using a causal inference machine learning approach applied to Irish eighth grade data from TIMSS 2019, we find that assigning challenging questions requiring students to go beyond the instruction has a clear positive effect o...
We present reslr, an R package to perform Bayesian modelling of relative sea level data. We include a variety of different statistical models previously proposed in the literature, with a unifying framework for loading data, fitting models, and summarising the results. Relative sea-level data often contain measurement error in multiple dimensions a...
Environmental monitoring is crucial to our understanding of climate change, biodiversity loss and pollution. The availability of large-scale spatio-temporal data from sources such as sensors and satellites allows us to develop sophisticated models for forecasting and understanding key drivers. However, the data collected from sensors often contain...
Teacher shortages and attrition are problems of international concern. One of the most frequent reasons for teachers leaving the profession is a lack of job satisfaction. Accordingly, in this study we have adopted a causal inference machine learning approach to identify practical interventions for improving overall levels of job satisfaction. We ap...
We propose a Bayesian, noisy-input, spatial-temporal generalised additive model to examine regional relative sea-level (RSL) changes over time. The model provides probabilistic estimates of component drivers of regional RSL change via the combination of a univariate spline capturing a common regional signal over time, random slopes and intercepts c...
We propose a Bayesian model which produces probabilistic reconstructions of hydroclimatic variability in Queensland Australia. The model provides a standardized approach to hydroclimate reconstruction using multiple palaeoclimate proxy records derived from natural archives such as speleothems, ice cores and tree rings. The method combines time‐seri...
Lockdowns were widely used to reduce transmission of COVID-19 and prevent health care services from being overwhelmed. While these mitigation measures helped to reduce loss of life, they also disrupted the everyday lives of billions of people. We use data from a survey of Singaporean citizens and permanent residents during the peak of the lockdown...
We present vivid, an R package for visualizing variable importance and variable interactions in machine learning models. The package provides a range of displays including heatmap and graph-based displays for viewing variable importance and interaction jointly and partial dependence plots in both a matrix layout and an alternative layout emphasizin...
Tree-based regression and classification has become a standard tool in modern data science. Bayesian Additive Regression Trees (BART) has in particular gained wide popularity due its flexibility in dealing with interactions and non-linear effects. BART is a Bayesian tree-based machine learning method that can be applied to both regression and class...
The immense advances in computer power achieved in the last decades have had a significant impact in Earth science, providing valuable research outputs that allow the simulation of complex natural processes and systems, and generating improved forecasts. The development and implementation of innovative geoscientific software is currently evolving t...
We propose a simple yet powerful extension of Bayesian Additive Regression Trees which we name Hierarchical Embedded BART (HE-BART). The model allows for random effects to be included at the terminal node level of a set of regression trees, making HE-BART a non-parametric alternative to mixed effects models which avoids the need for the user to spe...
We propose a Bayesian model which produces probabilistic reconstructions of hydroclimatic variability in Queensland Australia. The approach uses instrumental records of hydroclimate indices such as rain and evaporation, as well as palaeoclimate proxy records derived from natural archives such as sediment cores, speleothems, ice cores and tree rings...
Stable isotope ratios are used to reconstruct animal diet in trophic ecology via mixing models. Several assumptions of stable isotope mixing models are critical, i.e., constant trophic discrimination factor and isotopic equilibrium between the consumer and its diet. The isotopic turnover rate (λ and its counterpart the half-life) affects the dynami...
Teacher shortages and attrition are problems of international concern. Studies investigating this problem often identify important correlates of these two outcomes, but fail to produce easily implementable recommendations. Accordingly, in this study we have adopted a causal inference machine learning approach to identify practical interventions for...
Process and material uncertainty, particularly real-time process-structure-property checks, are a key obstacle to improved uptake of metal powder bed fusion in industry. Efforts are underway for live process monitoring such as thermal and image-based data gathering for every layer printed. Current crystal plasticity finite element (CPFE) modelling...
Conservation paleobiology seeks to leverage proxy reconstructions of ecological communities and environmental conditions to predict future changes and inform management decisions. Populations of East African megafauna likely changed during the Holocene in response to trends and events in the regional hydroclimate, but reconstructing these populatio...
Palaeoclimate data relating to hydroclimate variability over the past millennia have a vital contribution to make to the water sector globally. The water industry faces considerable challenges accessing climate data sets that extend beyond that of historical gauging stations. Without this, variability around the extremes of floods and droughts is u...
The relative contributions of both copy number variants (CNVs) and single nucleotide polymorphisms (SNPs) to the additive genetic variance of carcass traits in cattle is not well understood. A detailed understanding of the relative importance of CNVs in cattle may have implications for study design of both genomic predictions and genome-wide associ...
We consider the analysis of count data in which the observed frequency of zero counts is unusually large, typically with respect to the Poisson distribution. We focus on two alternative modelling approaches: over‐dispersion (OD) models and zero‐inflation (ZI) models, both of which can be seen as generalisations of the Poisson distribution; we refer...
Background
The carcass value of cattle is a function of carcass weight and quality. Given the economic importance of carcass merit to producers, it is routinely included in beef breeding objectives. A detailed understanding of the genetic variants that contribute to carcass merit is useful to maximize the efficiency of breeding for improved carcass...
Earthquake hazard assessments for the Tokyo Region are complicated by the trench–trench triple junction where the oceanic Philippine Sea Plate not only underthrusts a continental plate but is also being subducted by the Pacific Plate. Great thrust earthquakes and associated tsunamis are historically recognized hazards from the Continental/Philippin...
The epidemic increase in the incidence of Human Papilloma Virus (HPV) related Oropharyngeal Squamous Cell Carcinomas (OPSCCs) in several countries worldwide represents a significant public health concern. Although gender neutral HPV vaccination programmes are expected to cause a reduction in the incidence rates of OPSCCs, these effects will not be...
We propose a new semi-parametric model based on Bayesian Additive Regression Trees (BART). In our approach, the response variable is approximated by a linear predictor and a BART model, where the first component is responsible for estimating the main effects and BART accounts for the non-specified interactions and non-linearities. The novelty in ou...
Variable importance, interaction measures, and partial dependence plots are important summaries in the interpretation of statistical and machine learning models. In this paper we describe new visualization techniques for exploring these model summaries. We construct heatmap and graph-based displays showing variable importance and interaction jointl...
This study investigates the use of a statistical anomaly detection method to analyse in-situ process monitoring data obtained during the Laser-Powder Bed Fusion of Ti-6Al-4V parts. The printing study was carried out on a Renishaw 500M Laser-Powder Bed Fusion system. A photodiode-based system called InfiniAM was used to monitor the melt-pool emissio...
Bayesian additive regression trees (BART) is a tree-based machine learning method that has been successfully applied to regression and classification problems. BART assumes regularisation priors on a set of trees that work as weak learners and is very flexible for predicting in the presence of nonlinearity and high-order interactions. In this paper...
Building robust age–depth models to understand climatic and geologic histories from coastal sedimentary archives often requires composite chronologies consisting of multi-proxy age markers. Pollen chronohorizons derived from a known change in vegetation are important for age–depth models, especially those with other sparse or imprecise age markers....
We develop a new approach for feature selection via gain penalization in tree-based models. First, we show that previous methods do not perform sufficient regularization and often exhibit sub-optimal out-of-sample performance, especially when correlated features are present. Instead, we develop a new gain penalization idea that exhibits a general l...
The North Atlantic Oscillation (NAO) is the major atmospheric mode that controls winter European climate variability because its strength and phase determine regional temperature, precipitation and storm tracks. The NAO spatial structure and associated climatic impacts over Europe are not stationary making it crucial to understanding its past evolu...
While the ecosystem of the Great Barrier Reef (GBR), north-eastern Australia, is being threatened by the elevated levels of sediments and nutrients discharged from adjacent coastal river systems, the source of these detrimental pollutants are not well understood. Here we used a combined isotopic (δ¹³C, δ¹⁵N) and geochemical (Zn, Pt and S) signature...
We consider models underlying regression analysis of count data in which the observed frequency of zero counts is unusually large, typically with respect to the Poisson distribution. We focus on two alternative modelling approaches: Over-Dispersion (OD) models, and Zero-Inflation (ZI) models, both of which can be seen as generalisations of the Pois...
Modes of climate variability affect global and regional climates on different spatio-temporal scales, and they have important impacts on human activities and ecosystems. As these modes are a useful tool for simplifying the understanding of the climate system, it is crucial that we gain improved knowledge of their long-term past evolution and intera...
Bayesian Additive Regression Trees (BART) is a tree-based machine learning method that has been successfully applied to regression and classification problems. BART assumes regularisation priors on a set of trees that work as weak learners and is very flexible for predicting in the presence of non-linearity and high-order interactions. In this pape...
We develop a new approach for feature selection via gain penalization in tree-based models. First, we show that previous methods do not perform sufficient regularization and often exhibit sub-optimal out-of-sample performance, especially when correlated features are present. Instead, we develop a new gain penalization idea that exhibits a general l...
Background:
The trading of individual animal genotype information often involves only the exchange of the called genotypes and not necessarily the additional information required to effectively call structural variants. The main aim here was to determine if it is possible to impute copy number variants (CNVs) using the flanking single nucleotide p...
In this study, we begin a comprehensive characterisation of temperature extremes in Ireland for the period 1981‐2010. We produce return levels of anomalies of daily maximum temperature extremes for an area over Ireland, for the 30‐year period 1981‐2010. We employ extreme value theory (EVT) to model the data using the generalised Pareto distribution...
Intrinsic water use efficiency (iWUE), defined as the ratio of photosynthesis to stomatal conductance, is a key variable in plant physiology and ecology. Yet, how rising atmospheric CO2 concentration affects iWUE at broad species and ecosystem scales is poorly understood. In a field-based study of 244 woody angiosperm species across eight biomes ov...
We review published literature and historical texts to propose that three periods of
official Chinese maritime bans impacted the composition and circulation of trade ceramics along Asian trade routes: Ming Ban 1 (1371 – 1509), Ming Ban 2 (1521 – 1529), and Qing Ban (1654 – 1684). We use ceramics collected during a landscape archaeology survey along...
The North Atlantic Oscillation (NAO) is the major atmospheric mode ruling European climate variability during winter and its significance is underpinned by the number of recent studies aimed at reconstructing past NAO variability across different time scales and temporal resolutions. We present a new 2000-year multi-annual, proxy-based, local NAO i...
The detection of anomalies in real time is paramount to maintain performance and efficiency across a wide range of applications including web services and smart manufacturing. This paper presents a novel algorithm to detect anomalies in streaming time series data via statistical learning. We adapt the generalised extreme studentised deviate test [1...
Despite strong selection for athletic traits in Thoroughbred horses, there is marked variation in speed and aptitude for racing performance within the breed. Using global positioning system monitoring during exercise training, we measured speed variables and temporal changes in speed with age to derive phenotypes for GWAS. The aim of the study was...
Direct metal laser sintering (DMLS) is a powder bed fusion (PBF) additive manufacturing process commonly used within the medical device and aerospace industries where regulations drive the requirement for stringent quality control. Using in-situ monitoring, the identification of defects, as well as the geometric and dimensional measurement of the l...
Durability traits in Thoroughbred horses are heritable, economically valuable and may affect horse welfare. The aims of this study were to test the hypotheses that (i) durability traits are heritable and (ii) genetic data may be used to predict a horse's potential to have a racecourse start. Heritability for the phenotype ‘number of 2‐ and 3‐year‐o...
In this study, we begin a comprehensive characterisation of temperature extremes in Ireland for the period 1981-2010. We produce return levels of anomalies of daily maximum temperature extremes for an area over Ireland, for the 30-year period 1981-2010. We employ extreme value theory (EVT) to model the data using the generalised Pareto distribution...
Archaeological evidence shows that a predecessor of the 2004 Indian Ocean tsunami devastated nine distinct communities along a 40-km section of the northern coast of Sumatra in about 1394 CE. Our evidence is the spatial and temporal distribution of tens of thousands of medieval ceramic sherds and over 5,000 carved gravestones, collected and recorde...
Stomatal conductance (gs) in terrestrial vegetation regulates the uptake of atmospheric carbon dioxide for photosynthesis and water loss through transpiration, closely linking the biosphere and atmosphere and influencing climate. Yet, the range and pattern of gs in plants from natural ecosystems across broad geographic, climatic, and taxonomic rang...
We present a novel approach to estimating the effect of control parameters on tool wear rates and related changes in the three force components in turning of medical grade Co-Cr-Mo (ASTM F75) alloy. Co-Cr-Mo is known to be a difficult to cut material which, due to a combination of mechanical and physical properties, is used for the critical structu...
Characterizing the spatio-temporal variability of relative sea level (RSL) and estimating local, regional, and global RSL trends requires statistical analysis of RSL data. Formal statistical treatments, needed to account for the spatially and temporally sparse distribution of data and for geochronological and elevational uncertainties, have advance...
Background
Race distance aptitude in Thoroughbred horses is highly heritable and is influenced largely by variation at the myostatin gene (MSTN).
Objectives
In addition to MSTN, we hypothesised that other modifying loci contribute to best race distance.
Study design
Using 3006 Thoroughbreds, including 835 ‘elite’ horses, which were >3 years old,...
In Maths for Business, a large first-year mathematics module, the continuous assessment component comprises 10 weekly quizzes which combine to contribute 40% of the final module mark. If students did not receive the full five marks on their weekly quiz, they were provided with the opportunity to resubmit their corrected weekly quiz with an explanat...
Previous research has shown that students’ use of module resources strongly relates to the timing of the module’s continuous assessment. In our case study of a large first-year mathematics module for Business students, Maths for Business, we examine this relationship and the resources relied on by students for completing their continuous assessment...
We present archaeological evidence for a trading settlement dating from the 13th to the mid-16th century ce on an elevated headland in Lamreh village about 30 km east of Banda Aceh, on the northern coast of Sumatra, Indonesia. We propose this site was part of historic Lamri, known from documentary sources as an important node in the maritime "silk...
Nous présentons les preuves archéologiques d’un établissement commercial datant du XIIIe au milieu du XVIe siècle EC sur un promontoire élevé du village de Lamreh, à environ 30 km à l’est de Banda Aceh, sur la côte nord de Sumatra, en Indonésie. Nous l’identifions au site historique de Lamri, décrit par les sources textuelles comme un noeud importa...