Sylvia RichardsonMedical Research Council (UKRI) | mrc · MRC Biostatistics Unit
Sylvia Richardson
About
489
Publications
73,217
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
34,745
Citations
Introduction
Publications
Publications (489)
Multistate models provide a useful framework for modelling complex event history data in clinical settings and have recently been extended to the joint modelling framework to appropriately handle endogenous longitudinal covariates, such as repeatedly measured biomarkers, which are informative about health status and disease progression. However, th...
Traumatic brain injury (TBI), a global leading cause of mortality and disability, lacks effective treatments to enhance recovery. Synaptic remodeling has been postulated as one mechanism that influences outcomes after TBI. We sought to investigate whether common mechanisms affecting synapse maintenance are shared between TBI and other neuropsychiat...
The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from...
Recent work has reported that respiratory audio-trained AI classifiers can accurately predict SARS-CoV-2 infection status. However, it has not yet been determined whether such model performance is driven by latent audio biomarkers with true causal links to SARS-CoV-2 infection or by confounding effects, such as recruitment bias, present in observat...
Motivation
In recent years, network models have gained prominence for their ability to capture complex associations. In statistical omics, networks can be used to model and study the functional relationships between genes, proteins, and other types of omics data. If a Gaussian graphical model is assumed, a gene association network can be determined...
There is still limited understanding of how chronic conditions co-occur in patients with multimorbidity and what are the consequences for patients and the health care system. Most reported clusters of conditions have not considered the demographic characteristics of these patients during the clustering process. The study used data for all registere...
The identification of sets of co-regulated genes that share a common function is a key question of modern genomics. Bayesian profile regression is a semi-supervised mixture modelling approach that makes use of a response to guide inference toward relevant clusterings. Previous applications of profile regression have considered univariate continuous...
Background
Multimorbidity, characterised by the coexistence of multiple chronic conditions in an individual, is a rising public health concern. While much of the existing research has focused on cross-sectional patterns of multimorbidity, there remains a need to better understand the longitudinal accumulation of diseases. This includes examining th...
Background:
To inform targeted public health strategies, it is crucial to understand how coexisting diseases develop over time and their associated impacts on patient outcomes and health-care resources. This study aimed to examine how psychosis, diabetes, and congestive heart failure, in a cluster of physical-mental health multimorbidity, develop...
In recent years, network models have gained prominence for their ability to capture complex associations. In statistical omics, networks can be used to model and study the functional relationships between genes, proteins, and other types of omics data. If a Gaussian graphical model is assumed, a gene association network can be determined from the n...
Background:
Multimorbidity presents a growing public health challenge. While existing research has mainly focused on cross-sectional patterns of multimorbidity, there remains a need to better understand the longitudinal accumulation of diseases and the effects of important sociodemographic characteristics on the accumulation and progression of chro...
Clustering is commonly performed as an initial analysis step for uncovering structure in 'omics datasets, e.g. to discover molecular subtypes of disease. The high-throughput, high-dimensional nature of these datasets means that they provide information on a diverse array of different biomolecular processes and pathways. Different groups of variable...
The potential utility of wastewater-based epidemiology as an early warning tool has been explored widely across the globe during the current COVID-19 pandemic. Methods to detect the presence of SARS-CoV-2 RNA in wastewater were developed early in the pandemic, and extensive work has been conducted to evaluate the relationship between viral concentr...
The biology driving individual patient responses to severe acute respiratory syndrome coronavirus 2 infection remains ill understood. Here, we developed a patient-centric framework leveraging detailed longitudinal phenotyping data and covering a year after disease onset, from 215 infected individuals with differing disease severities. Our analyses...
Unlabelled:
There is an increasing body of work exploring the integration of random projection into algorithms for numerical linear algebra. The primary motivation is to reduce the overall computational cost of processing large datasets. A suitably chosen random projection can be used to embed the original dataset in a lower-dimensional space such...
The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from...
The potential utility of wastewater-based epidemiology as an early warning tool has been explored widely across the globe during the current COVID-19 pandemic. Methods to detect the presence of SARS-CoV-2 RNA in wastewater were developed early in the pandemic, and extensive work has been conducted to evaluate the relationship between viral concentr...
Objective:
To investigate the reproducibility and validity of latent class analysis (LCA) and hierarchical cluster analysis (HCA), multiple correspondence analysis followed by k-means (MCA-kmeans) and k-means (kmeans) for multimorbidity clustering.
Study design:
We first investigated clustering algorithms in simulated datasets with 26 diseases o...
Network models are useful tools for modelling complex associations. If a Gaussian graphical model is assumed, conditional independence is determined by the non-zero entries of the inverse covariance (precision) matrix of the data. The Bayesian graphical horseshoe estimator provides a robust and flexible framework for precision matrix inference, as...
The biology driving individual patient responses to SARS-CoV-2 infection remains ill understood. Here, we developed a patient-centric framework leveraging detailed longitudinal phenotyping data, covering a year post disease onset, from 215 SARS-CoV-2 infected subjects with differing disease severities. Our analyses revealed distinct “systemic recov...
We present interoperability as a guiding framework for statistical modelling to assist policy makers asking multiple questions using diverse datasets in the face of an evolving pandemic response. Interoperability provides an important set of principles for future pandemic preparedness, through the joint design and deployment of adaptable systems of...
Multimorbidity is defined as the coexistence of two or more chronic health conditions in an individual. The objective of this study was to examine how diseases in a cluster of physical-mental health multimorbidity with a high all-cause mortality (psychosis, diabetes, and congestive heart failure) develop and coexist over time, and to assess the ass...
Background
Factors such as age, pre-injury health, and injury severity, account for less than 35% of outcome variability in traumatic brain injury (TBI). While some residual outcome variability may be attributable to genetic factors, published candidate gene association studies have often been underpowered and subject to publication bias.
Methods...
Background
Ethnically diverse and socio-economically deprived communities have been differentially affected by the COVID-19 pandemic in the UK.
Method
Using a multilevel regression model we assessed the time-varying association between SARS-CoV-2 infections and areal level deprivation and ethnicity from 1st of June 2020 to the 19th of September 20...
There is an increasing body of work exploring the integration of random projection into algorithms for numerical linear algebra. The primary motivation is to reduce the overall computational cost of processing large datasets. A suitably chosen random projection can be used to embed the original dataset in a lower-dimensional space such that key pro...
Global and national surveillance of SARS-CoV-2 epidemiology is mostly based on targeted schemes focused on testing individuals with symptoms. These tested groups are often unrepresentative of the wider population and exhibit test positivity rates that are biased upwards compared with the true population prevalence. Such data are routinely used to i...
Background: Ethnically diverse and socio-economically deprived communities have been differentially affected by the COVID-19 pandemic in the UK.
Method: Using a multilevel regression model we assess the time-varying association between SARS-CoV-2 infections and areal level deprivation and ethnicity. We separately consider weekly test positivity rat...
The identification of sets of co-regulated genes that share a common function is a key question of modern genomics. Bayesian profile regression is a semi-supervised mixture modelling approach that makes use of a response to guide inference toward relevant clusterings. Previous applications of profile regression have considered univariate continuous...
In molecular biology, advances in high-throughput technologies have made it possible
to study complex multivariate phenotypes and their simultaneous associations with
high-dimensional genomic and other omics data, a problem that can be studied with
high-dimensional multi-response regression, where the response variables are potentially
highly corre...
Background
An Informatics Consult has been proposed in which clinicians request novel evidence from large scale health data resources, tailored to the treatment of a specific patient. However, the availability of such consultations is lacking. We seek to provide an Informatics Consult for a situation where a treatment indication and contraindicatio...
We present "interoperability" as a guiding framework for statistical modelling to assist policy makers asking multiple questions using diverse datasets in the face of an evolving pandemic response. Interoperability provides an important set of principles for future pandemic preparedness, through the joint design and deployment of adaptable systems...
Prominent early features of COVID-19 include severe, often clinically silent, hypoxia and a pronounced reduction in B cells, the latter important in defence against SARS-CoV-2. This brought to mind the phenotype of mice with VHL-deficient B cells, in which Hypoxia-Inducible Factors are constitutively active, suggesting hypoxia might drive B cell ab...
Targeted surveillance testing schemes for SARS-CoV-2 focus on certain subsets of the population, such as individuals experiencing one or more of a prescribed list of symptoms. These schemes have routinely been used to monitor the spread of SARS-CoV-2 in countries across the world. The number of positive tests in a given region can provide local ins...
The placenta is the interface between mother and fetus and inadequate function contributes to short and long-term ill-health. The placenta is absent from most large-scale RNA-Seq datasets. We therefore analyze long and small RNAs (~101 and 20 million reads per sample respectively) from 302 human placentas, including 94 cases of preeclampsia (PE) an...
Our work is motivated by the search for metabolite quantitative trait loci (QTL) in a cohort of more than 5000 people. There are 158 metabolites measured by NMR spectroscopy in the 31‐year follow‐up of the Northern Finland Birth Cohort 1966 (NFBC66). These metabolites, as with many multivariate phenotypes produced by high‐throughput biomarker techn...
The kinetics of the immune changes in COVID-19 across severity groups have not been rigorously assessed. Using immunophenotyping, RNA sequencing and serum cytokine analysis, we analyzed serial samples from 207 SARS-CoV2-infected individuals with a range of disease severities over 12 weeks from symptom onset. An early robust bystander CD8⁺ T cell im...
In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with high-dimensional genomic and other omics data, a problem that can be studied with high-dimensional multi-response regression, where the response variables are potentially highly corre...
We present EPISPOT, a fully joint framework which exploits large panels of epigenetic annotations as variant-level information to enhance molecular quantitative trait locus (QTL) mapping. Thanks to a purpose-built Bayesian inferential algorithm, EPISPOT accommodates functional information for both cis and trans actions, including QTL hotspot effect...
During Covid-19 outbreaks, school closures are employed as part of governments' non-pharmaceutical interventions around the world to reduce the number of contacts and keep the reproduction number below 1. Yet, prolonged school closures have profound negative impact on the future opportunities of pupils, particularly from disadvantaged backgrounds,...
Introduction
Multimorbidity is widely recognised as the presence of two or more concurrent long-term conditions, yet remains a poorly understood global issue despite increasing in prevalence.
We have created the Wales Multimorbidity e-Cohort (WMC) to provide an accessible research ready data asset to further the understanding of multimorbidity. Our...
An Informatics Consult has been proposed in which clinicians request novel evidence from large scale health data resources, tailored to the treatment of a specific patient, with return of results in clinical timescales. However, the availability of such consultations is lacking. We seek to provide an Informatics Consult for a situation where a trea...
In a study of 207 SARS-CoV2-infected individuals with a range of severities followed over 12 weeks
from symptom onset, we demonstrate that an early robust immune response, without systemic
inflammation, is characteristic of asymptomatic or mild disease. Those presenting to hospital had
delayed adaptive responses and systemic inflammation already ev...
Loss to follow-up and missing outcomes data are important issues for longitudinal observational studies and clinical trials in traumatic brain injury. One popular solution to missing 6-month outcomes has been to use the last observation carry forward (LOCF). The purpose of the current study was to compare the performance of model-based single-imput...
When using Markov chain Monte Carlo (MCMC) algorithms to perform inference for Bayesian clustering models, such as mixture models, the output is typically a sample of clusterings (partitions) drawn from the posterior distribution. In practice, a key challenge is how to summarise this output. Here we build upon the notion of the posterior similarity...
We present EPISPOT, a fully joint framework which exploits large panels of epigenetic marks as variant-level information to enhance molecular quantitative trait locus (QTL) mapping. Thanks to a purpose-built Bayesian inferential algorithm, our approach effectively couples simultaneous QTL analysis of thousands of genetic variants and molecular trai...
Most patients with rare diseases do not receive a molecular diagnosis and the aetiological variants and causative genes for more than half such disorders remain to be discovered1. Here we used whole-genome sequencing (WGS) in a national health system to streamline diagnosis and to discover unknown aetiological variants in the coding and non-coding...
Study designs where data have been aggregated by geographical areas are popular in environmental epidemiology. These studies are commonly based on administrative databases and, providing a complete spatial coverage, are particularly appealing to make inference on the entire population. However, the resulting estimates are often biased and difficult...
We tackle modelling and inference for variable selection in regression problems with many predictors and many responses. We focus on detecting hotspots, that is, predictors associated with several responses. Such a task is critical in statistical genetics, as hotspot genetic variants shape the architecture of the genome by controlling the expressio...
Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical inves...
We propose a general method for distributed Bayesian model choice, where each worker has access only to non-overlapping subsets of the data. Our approach approximates the model evidence for the full data set through Monte Carlo sampling from the posterior on every subset generating a model evidence per subset. The model evidences per worker are the...
High-throughput technology for molecular biomarkers is increasingly producing multivariate phenotype data exhibiting strong correlation structures. Existing approaches for combining such data with genetic variants for multivariate Quantitative Trait Loci analysis generally either ignore correlation structure or make other restrictive assumptions ab...
We tackle modelling and inference for variable selection in regression problems with many predictors and many responses. We focus on detecting hotspots, i.e., predictors associated with several responses. Such a task is critical in statistical genetics, as hotspot genetic variants shape the architecture of the genome by controlling the expression o...
Table S2. Differential Testing Results between Naive and Activated CD4+ T cells, Related to Figure 5
For each gene highlighted to have a statistically significant change in variability: estimated difference in residual over-dispersion parameters between active and naive CD4+ T cells (DistEpsilon), estimated log2 fold change in mean expression para...
Linear shrinkage estimators of a covariance matrix --- defined by a weighted average of the sample covariance matrix and a pre-specified shrinkage target matrix --- are popular when analysing high-throughput molecular data. However, their performance strongly relies on an appropriate choice of target matrix. This paper introduces a more flexible cl...
Penalized likelihood methods are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well-developed, the relative efficacy of different methods in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigati...
Cell-to-cell transcriptional variability in otherwise homogeneous cell populations plays an important role in tissue function and development. Single-cell RNA sequencing can characterize this variability in a transcriptome-wide manner. However, technical variation and the confounding between variability and mean expression estimates hinder meaningf...
Recently, large scale genome‐wide association study (GWAS) meta‐analyses have boosted the number of known signals for some traits into the tens and hundreds. Typically, however, variants are only analysed one‐at‐a‐time. This complicates the ability of fine‐mapping to identify a small set of SNPs for further functional follow‐up. We describe a new a...
Despite major methodological developments, Bayesian inference in Gaussian graphical models remains challenging in high dimension due to the tremendous size of the model space. This article proposes a method to infer the marginal and conditional independence structures between variables by multiple testing, which bypasses the exploration of the mode...