Adrian E. RafteryUniversity of Washington Seattle | UW · Department of Statistics
Adrian E. Raftery
About
353
Publications
67,457
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
88,814
Citations
Publications
Publications (353)
Most population projection models require age-specific information on net migration totals as a key demographic component of population change. Existing methods for predicting future patterns of net migration by age have proven inadequate. The main reason is that methods applied to model net migration are unable to distinguish factors influencing t...
Population projections provide predictions of future population sizes for an area. Historically, most population projections have been produced using deterministic or scenario-based approaches and have not assessed uncertainty about future population change. Starting in 2015, however, the United Nations (UN) has produced probabilistic population pr...
In this chapter, we present a review of latent position models for networks. We review the recent literature in this area and illustrate the basic aspects and properties of this modeling framework. Through several illustrative examples we highlight how the latent position model is able to capture important features of observed networks. We emphasiz...
The bayesTFR package for R provides a set of functions to produce probabilistic projections of the total fertility rates for all countries, and is widely used, including as part of the basis for the United Nations official population projections for all countries. Liu and Raftery (2020) extended the theoretical model by adding a layer that accounts...
Bayesian model averaging (BMA) provides a coherent way to account for model uncertainty in statistical inference tasks. BMA requires specification of model space priors and parameter space priors. In this article we focus on comparing different model space priors in the presence of model uncertainty. We consider eight reference model space priors u...
The social cost of carbon dioxide (SC-CO2) measures the monetized value of the damages to society caused by an incremental metric tonne of CO2 emissions and is a key metric informing climate policy. Used by governments and other decision-makers in benefit-cost analysis for over a decade, SC-CO2 estimates draw on climate science, economics, demograp...
Boundaries on spatial fields divide regions with particular features from surrounding background areas. Methods to identify boundary lines from interpolated spatial fields are well established. Less attention has been paid to how to model sequences of connected spatial points. Such models are needed for physical boundaries. For example, in the Arct...
We propose a method for forecasting global human migration flows. A Bayesian hierarchical model is used to make probabilistic projections of the 39,800 bilateral migration flows among the 200 most populous countries. We generate out-of-sample forecasts for all bilateral flows for the 2015 to 2020 period, using models fitted to bilateral migration f...
The Heat Index is a metric that quantifies heat exposure in human beings. Here, using probabilistic emission projections, we show that changes in the Heat Index driven by anthropogenic CO 2 emissions will increase global exposure to dangerous environments in the coming decades. Even if the Paris Agreement goal of limiting global warming to 2 °C is...
The climate change projections of the Intergovernmental Panel on Climate Change are based on scenarios for future emissions, but these are not statistically-based and do not have a full probabilistic interpretation. Raftery et al. (Nat Clim Change 7:637–641, 2017) and Liu and Raftery (Commun Earth Environ 2:1–10, 2021) developed probabilistic forec...
The bayesTFR package for R provides a set of functions to produce probabilistic projections of the total fertility rates (TFR) for all countries, and is widely used, including as part of the basis for the UN's official population projections for all countries. Liu and Raftery (2020) extended the theoretical model by adding a layer that accounts for...
Significance
Choosing a statistical model and accounting for uncertainty about this choice are important parts of the scientific process and are required for common statistical tasks such as parameter estimation, interval estimation, statistical inference, point prediction, and interval prediction. A canonical example is the choice of variables in...
The record for oldest human being was set in 1997 by Jeanne Calment of France at 122 years and 164 days. Michael Pearce and Adrian E. Raftery expect that record will be broken in the coming decades The record for oldest human being was set in 1997 by Jeanne Calment of France at 122 years and 164 days. Michael Pearce and Adrian E. Raftery expect tha...
Population forecasts are used by governments and the private sector for planning, with horizons up to about three generations (around 2100) for different purposes. The traditional methods are deterministic using scenarios, but probabilistic forecasts are desired to get an idea of accuracy, assess changes, and make decisions involving risks. In a si...
Projecting mortality for subnational units, or regions, is of great interest to practicing demographers. We seek a probabilistic method for projecting subnational life expectancy that is based on the national Bayesian hierarchical model used by the United Nations, and at the same time is easy to use. We propose three methods of this kind. Two of th...
Background: We consider the problem of quantifying the human lifespan using a statistical approach that probabilistically forecasts the maximum reported age at death (MRAD) through 2100. Objective: We seek to quantify the probability that any person attains various extreme ages, such as those above 120, by the year 2100. Methods: We use the exponen...
There is a growing expectation that data collected by government-funded studies should be openly available to ensure research reproducibility, which also increases concerns about data privacy. A strategy to protect individuals’ identity is to release multiply imputed (MI) synthetic datasets with masked sensitivity values (Rubin, 1993). However, inf...
Smoking is one of the main risk factors that has affected human mortality and life expectancy over the past century. Smoking accounts for a large part of the nonlinearities in the growth of life expectancy and of the geographic and sex differences in mortality. As Bongaarts (2006) and Janssen (2018) suggested, accounting for smoking could improve t...
Boundaries on spatial fields divide regions with particular features from surrounding background areas. These boundaries are often described with contour lines. To measure and record these boundaries, contours are often represented as ordered sequences of spatial points that connect to form a line. Methods to identify boundary lines from interpolat...
Respondent-driven sampling is an approach for estimating features of populations that are difficult to access using standard survey tools, e.g., the fraction of injection drug users who are HIV positive. Baraff et al. (2016) introduced an approach to estimating uncertainty in population proportion estimates from respondent-driven sampling using the...
Since the 1940s, population projections have in most cases been produced using the deterministic cohort component method. However, in 2015, for the first time, in a major advance, the United Nations issued official probabilistic population projections for all countries based on Bayesian hierarchical models for total fertility and life expectancy. T...
Smoking is one of the leading preventable threats to human health and a major risk factor for lung cancer, upper aero-digestive cancer, and chronic obstructive pulmonary disease. Estimating and forecasting the smoking attributable fraction (SAF) of mortality can yield insights into smoking epidemics and also provide a basis for more accurate mortal...
Smoking is one of the main risk factors that has affected human mortality and life expectancy over the past century. Smoking accounts for a large part of the nonlinearities in the growth of life expectancy and of the geographic and sex differences in mortality. As Bongaarts (2006) and Janssen (2018) suggested, accounting for smoking could improve t...
Sea ice, or frozen ocean water, annually freezes and melts in the Arctic. The need for accurate forecasts of where sea ice will be located weeks to months in advance has increased as the amount of sea ice reduces due to climate change. Typical sea ice forecasts are made with ensemble models, physics-based deterministic models of sea ice and the sur...
Cambridge Core - Pattern Recognition and Machine Learning - Model-Based Clustering and Classification for Data Science - by Charles Bouveyron
The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we as...
Smoking is one of the preventable threats to human health and is a major risk factor for lung cancer, upper aero-digestive cancer, and chronic obstructive pulmonary disease. Estimating and forecasting the smoking attributable fraction (SAF) of mortality can yield insights into smoking epidemics and also provide a basis for more accurate mortality a...
Significance
Despite the importance of international migration, estimates of between-country migration flows are still imprecise. Reliable record keeping of migration events is typically available only in the developed world, and the best existing methods to produce global migration flow estimates are burdened by strong assumptions. We produce esti...
The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we as...
Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However, for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another...
Gene regulatory network reconstruction is an essential task of genomics in order to further our understanding of how genes interact dynamically with each other. The most readily available data, however, are from steady-state observations. These data are not as informative about the relational dynamics between genes as knockout or over-expression ex...
Since the 1940s, population projections have in most cases been produced using the deterministic cohort component method. However, in 2015, for the first time, in a major advance, the United Nations issued official probabilistic population projections for all countries based on Bayesian hierarchical models for total fertility and life expectancy. T...
The Schwarz or Bayesian information criterion (BIC) is one of the most widely used tools for model comparison in social science research. The BIC however is not suitable for evaluating models with order constraints on the parameters of interest. This paper explores two extensions of the BIC for evaluating order constrained models, one where a trunc...
The United Nations (UN) issued official probabilistic population projections for all countries to 2100 in July 2015. This was done by simulating future levels of total fertility and life expectancy from Bayesian hierarchical models, and combining the results using a standard cohort-component projection method. The 40 countries with generalized HIV/...
A new method, called contour shifting, is proposed for correcting the bias in forecasts of contours such as sea ice concentration above certain thresholds. Retrospective comparisons of observations and dynamical model forecasts are used to build a statistical spatiotemporal model of how predicted contours typically differ from observed contours. Fo...
Background: Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the ne...
The recently published Intergovernmental Panel on Climate Change (IPCC) projections to 2100 give likely ranges of global temperature increase in four scenarios for population, economic growth and carbon use. However, these projections are not based on a fully statistical approach. Here we use a country-specific version of Kaya's identity to develop...
Background
The inference of gene regulatory networks is of great interest and has various applications. The recent advances in high-throughout biological data collection have facilitated the construction and understanding of gene regulatory networks in many model organisms. However, the inference of gene networks from large-scale human genomic data...
Background:
We consider the problem of probabilistic projection of the total fertility rate (TFR) for subnational regions.
Objective:
We seek a method that is consistent with the UN's recently adopted Bayesian method for probabilistic TFR projections for all countries and works well for all countries.
Methods:
We assess various possible method...
BACKGROUND
Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a novel and computationally efficient method for eliminating redundant indirect edges...
Significance
Some hidden or hard-to-reach populations of interest to researchers are difficult to study with standard statistical methods because there is not a reliable list of members from which samples can be drawn. Respondent-driven sampling (RDS) is a common way to reach members of these populations by allowing a small number of respondents to...
We derive properties of latent variable models for networks, a broad class of models that includes the widely used latent position models. We characterize several features of interest, with particular focus on the degree distribution, clustering coefficient, average path length, and degree correlations. We introduce the Gaussian latent position mod...
We describe bayesPop, an R package for producing probabilistic population projections for all countries. This uses probabilistic projections of total fertility and life expectancy generated by Bayesian hierarchical models. It produces a sample from the joint posterior predictive distribution of future age-and sex-specific population counts, fertili...
Background
While probabilistic projection methods for projecting life expectancy exist, few account for covariates related to life expectancy. Generalized HIV/AIDS epidemics have a large, immediate negative impact on the life expectancy in a country, but this impact can be mitigated by widespread use of antiretroviral therapy (ART). Thus, projectio...
Finite mixture models are being used increasingly to model a wide variety of random phenomena for clustering, classification and density estimation. mclust is a powerful and popular package which allows modelling of data as a Gaussian finite mixture with different covariance structures and different numbers of mixture components, for a variety of p...
Significance
We develop a statistical model for the evolution of the network of leading Irish company directorates over 11 years, before and after the financial crisis of 2008. We focus on company interlocks, whereby a director simultaneously sits on more than one company board. Our analysis indicates that the level of director interlockingness inc...
The United Nations is the major organization producing and regularly updating probabilistic population projections for all countries. International migration is a critical component of such projections, and between-country correlations are important for forecasts of regional aggregates. However, there are 200 countries and only 12 data points, each...
Significance
Projected populations to the end of this century are an important factor in many policy decisions. Population forecasts become less reliable as we look farther into the future, suggesting a probabilistic approach to convey uncertainty. Migration projections have been largely deterministic until now, even in probabilistic population pro...
We propose Adaptive Incremental Mixture Markov chain Monte Carlo (AIMM), a novel approach to sample from challenging probability distributions defined on a general state-space. Typically, adaptive MCMC methods recursively update a parametric proposal kernel with a global rule; by contrast AIMM locally adapts a non-parametric kernel. AIMM is based o...
Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown...
We show that Bayesian population reconstruction, a recent method for estimating past populations by age, works for data of widely varying quality. Bayesian reconstruction simultaneously estimates age-specific population counts, fertility rates, mortality rates, and net international migration flows from fragmentary data, while formally accounting f...
The UN released official probabilistic population projections (PPP) for all countries for the first time in July 2014. These were obtained by projecting the period total fertility rate (TFR) and life expectancy at birth (e
0) using Bayesian hierarchical models, yielding a large set of future trajectories of TFR and e
0 for all countries and future...
The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometime...
Demographic forecasts are inherently uncertain. Nevertheless, an appropriate description of this uncertainty is a key underpinning of informed decision making. In recent decades various methods have been developed to describe the uncertainty of future populations and their structures, but the uptake of such tools amongst the practitioners of offici...
Background
Inference of gene networks from expression data is an important problem in computational biology. Many algorithms have been proposed for solving the problem efficiently. However, many of the available implementations are programming libraries that require users to write code, which limits their accessibility.
Results
We have developed a...
Initialisation of the EM algorithm in model-based clustering is often
crucial. Various starting points in the parameter space often lead to different
local maxima of the likelihood function and, so to different clustering
partitions. Among the several approaches available in the literature,
model-based agglomerative hierarchical clustering is used...
Bayesian Additive Regression Trees (BART) is a statistical sum of trees
model. It can be considered a Bayesian version of machine learning tree
ensemble methods where the individual trees are the base learners. However for
data sets where the number of variables $p$ is large (e.g. $p>5,000$) the
algorithm can become prohibitively expensive, computa...
We propose Bayesian model averaging (BMA) as a method for postprocessing the
results of model-based clustering. Given a number of competing models,
appropriate model summaries are averaged, using the posterior model
probabilities, instead of being taken from a single "best" model. We
demonstrate the use of BMA in model-based clustering for a number...
The United Nations issued probabilistic population projections for all countries for the first time in July 2014. This was done by simulating future levels of fertility and life expectancy from Bayesian hierarchical models, and combining the results using a standard cohort-component projection method. The 40 countries with generalized HIV/AIDS epid...
Paul Deheuvels is best known internationally as a theoretical statistician, but he has made many other contributions. Here I give a brief overview of his work as a mentor of many doctoral students, as an advocate for the discipline of statistics, particularly in the context of his work as the only statistician member of the French Académie des Scie...
In most countries in the world outside of sub-Saharan Africa, HIV is largely concentrated in sub-populations whose behavior puts them at higher risk of contracting and transmitting HIV, such as people who inject drugs, sex workers and men who have sex with men. Estimating the size of these sub-populations is important for assessing overall HIV prev...
The United Nations released official probabilistic population projections
(PPP) for all countries for the first time in July 2014. These were obtained by
projecting the period total fertility rate (TFR) and life expectancy at birth
($e_0$) using Bayesian hierarchical models, yielding a large set of future
trajectories of TFR and $e_0$ for all count...
Finite mixture modelling provides a framework for cluster analysis based on
parsimonious Gaussian mixture models. Variable or feature selection is of
particular importance in situations where only a subset of the available
variables provide clustering information. This enables the selection of a more
parsimonious model, yielding more efficient esti...
Bayesian model averaging has become a widely used approach to accounting for
uncertainty about the structural form of the model generating the data. When
data arrive sequentially and the generating model can change over time, Dynamic
Model Averaging (DMA) extends model averaging to deal with this situation.
Often in macroeconomics, however, many ca...
The United Nations (UN) recently released population projections based on data until 2012 and a Bayesian probabilistic methodology.
Analysis of these data reveals that, contrary to previous literature, the world population is unlikely to stop growing this century. There is an 80% probability that world population, now 7.2 billion people, will incre...
Probabilistic forecasts are becoming more and more available. How should they
be used and communicated? What are the obstacles to their use in practice? I
review experience with five problems where probabilistic forecasting played an
important role. This leads me to identify five types of potential users: Low
Stakes Users, who don't need probabilis...
Background:
In a given population the age pattern of mortality is an important determinant of total number of deaths, age structure, and through effects on age structure, the number of births and thereby growth. Good mortality models exist for most populations except those experiencing generalized HIV epidemics and some developing country populati...
The United Nations regularly publishes projections of the populations of all
the world's countries broken down by age and sex. These projections are the de
facto standard and are widely used by international organizations, governments
and researchers. Like almost all other population projections, they are
produced using the standard deterministic c...
Background: The United Nations (UN) produces population projections for all countries every two years. These are used by international organizations, governments, the private sector and researchers for policy planning, for monitoring development goals, as inputs to economic and environmental models, and for social and health research. The UN is con...
Genome-wide time-series data provide a rich set of information for discovering gene regulatory relationships.As genome-wide data for mammalian systems are being generated, it is critical to developnetwork inference methods that can handle tens of thousands of genes efficiently, provide a systematicframework for the integration of multiple data sour...
Economic modeling in the presence of endogeneity is subject to model uncertainty at both the instrument and covariate level. We propose a Two-Stage Bayesian Model Averaging (2SBMA) methodology that extends the Two-Stage Least Squares (2SLS) estimator. By constructing a Two-Stage Unit Information Prior in the endogenous variable model, we are able t...