Adrian E. Raftery

Adrian E. Raftery
University of Washington Seattle | UW · Department of Statistics

About

353
Publications
67,457
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
88,814
Citations

Publications

Publications (353)
Preprint
Full-text available
Most population projection models require age-specific information on net migration totals as a key demographic component of population change. Existing methods for predicting future patterns of net migration by age have proven inadequate. The main reason is that methods applied to model net migration are unable to distinguish factors influencing t...
Article
Population projections provide predictions of future population sizes for an area. Historically, most population projections have been produced using deterministic or scenario-based approaches and have not assessed uncertainty about future population change. Starting in 2015, however, the United Nations (UN) has produced probabilistic population pr...
Preprint
In this chapter, we present a review of latent position models for networks. We review the recent literature in this area and illustrate the basic aspects and properties of this modeling framework. Through several illustrative examples we highlight how the latent position model is able to capture important features of observed networks. We emphasiz...
Article
Full-text available
The bayesTFR package for R provides a set of functions to produce probabilistic projections of the total fertility rates for all countries, and is widely used, including as part of the basis for the United Nations official population projections for all countries. Liu and Raftery (2020) extended the theoretical model by adding a layer that accounts...
Article
Full-text available
Bayesian model averaging (BMA) provides a coherent way to account for model uncertainty in statistical inference tasks. BMA requires specification of model space priors and parameter space priors. In this article we focus on comparing different model space priors in the presence of model uncertainty. We consider eight reference model space priors u...
Article
Full-text available
The social cost of carbon dioxide (SC-CO2) measures the monetized value of the damages to society caused by an incremental metric tonne of CO2 emissions and is a key metric informing climate policy. Used by governments and other decision-makers in benefit-cost analysis for over a decade, SC-CO2 estimates draw on climate science, economics, demograp...
Article
Boundaries on spatial fields divide regions with particular features from surrounding background areas. Methods to identify boundary lines from interpolated spatial fields are well established. Less attention has been paid to how to model sequences of connected spatial points. Such models are needed for physical boundaries. For example, in the Arct...
Article
Full-text available
We propose a method for forecasting global human migration flows. A Bayesian hierarchical model is used to make probabilistic projections of the 39,800 bilateral migration flows among the 200 most populous countries. We generate out-of-sample forecasts for all bilateral flows for the 2015 to 2020 period, using models fitted to bilateral migration f...
Article
Full-text available
The Heat Index is a metric that quantifies heat exposure in human beings. Here, using probabilistic emission projections, we show that changes in the Heat Index driven by anthropogenic CO 2 emissions will increase global exposure to dangerous environments in the coming decades. Even if the Paris Agreement goal of limiting global warming to 2 °C is...
Article
Full-text available
The climate change projections of the Intergovernmental Panel on Climate Change are based on scenarios for future emissions, but these are not statistically-based and do not have a full probabilistic interpretation. Raftery et al. (Nat Clim Change 7:637–641, 2017) and Liu and Raftery (Commun Earth Environ 2:1–10, 2021) developed probabilistic forec...
Preprint
Full-text available
The bayesTFR package for R provides a set of functions to produce probabilistic projections of the total fertility rates (TFR) for all countries, and is widely used, including as part of the basis for the UN's official population projections for all countries. Liu and Raftery (2020) extended the theoretical model by adding a layer that accounts for...
Article
Full-text available
Significance Choosing a statistical model and accounting for uncertainty about this choice are important parts of the scientific process and are required for common statistical tasks such as parameter estimation, interval estimation, statistical inference, point prediction, and interval prediction. A canonical example is the choice of variables in...
Article
The record for oldest human being was set in 1997 by Jeanne Calment of France at 122 years and 164 days. Michael Pearce and Adrian E. Raftery expect that record will be broken in the coming decades The record for oldest human being was set in 1997 by Jeanne Calment of France at 122 years and 164 days. Michael Pearce and Adrian E. Raftery expect tha...
Article
Full-text available
Population forecasts are used by governments and the private sector for planning, with horizons up to about three generations (around 2100) for different purposes. The traditional methods are deterministic using scenarios, but probabilistic forecasts are desired to get an idea of accuracy, assess changes, and make decisions involving risks. In a si...
Article
Full-text available
Projecting mortality for subnational units, or regions, is of great interest to practicing demographers. We seek a probabilistic method for projecting subnational life expectancy that is based on the national Bayesian hierarchical model used by the United Nations, and at the same time is easy to use. We propose three methods of this kind. Two of th...
Article
Full-text available
Background: We consider the problem of quantifying the human lifespan using a statistical approach that probabilistically forecasts the maximum reported age at death (MRAD) through 2100. Objective: We seek to quantify the probability that any person attains various extreme ages, such as those above 120, by the year 2100. Methods: We use the exponen...
Article
Full-text available
There is a growing expectation that data collected by government-funded studies should be openly available to ensure research reproducibility, which also increases concerns about data privacy. A strategy to protect individuals’ identity is to release multiply imputed (MI) synthetic datasets with masked sensitivity values (Rubin, 1993). However, inf...
Article
Smoking is one of the main risk factors that has affected human mortality and life expectancy over the past century. Smoking accounts for a large part of the nonlinearities in the growth of life expectancy and of the geographic and sex differences in mortality. As Bongaarts (2006) and Janssen (2018) suggested, accounting for smoking could improve t...
Preprint
Boundaries on spatial fields divide regions with particular features from surrounding background areas. These boundaries are often described with contour lines. To measure and record these boundaries, contours are often represented as ordered sequences of spatial points that connect to form a line. Methods to identify boundary lines from interpolat...
Article
Full-text available
Respondent-driven sampling is an approach for estimating features of populations that are difficult to access using standard survey tools, e.g., the fraction of injection drug users who are HIV positive. Baraff et al. (2016) introduced an approach to estimating uncertainty in population proportion estimates from respondent-driven sampling using the...
Article
Since the 1940s, population projections have in most cases been produced using the deterministic cohort component method. However, in 2015, for the first time, in a major advance, the United Nations issued official probabilistic population projections for all countries based on Bayesian hierarchical models for total fertility and life expectancy. T...
Article
Smoking is one of the leading preventable threats to human health and a major risk factor for lung cancer, upper aero-digestive cancer, and chronic obstructive pulmonary disease. Estimating and forecasting the smoking attributable fraction (SAF) of mortality can yield insights into smoking epidemics and also provide a basis for more accurate mortal...
Preprint
Smoking is one of the main risk factors that has affected human mortality and life expectancy over the past century. Smoking accounts for a large part of the nonlinearities in the growth of life expectancy and of the geographic and sex differences in mortality. As Bongaarts (2006) and Janssen (2018) suggested, accounting for smoking could improve t...
Preprint
Sea ice, or frozen ocean water, annually freezes and melts in the Arctic. The need for accurate forecasts of where sea ice will be located weeks to months in advance has increased as the amount of sea ice reduces due to climate change. Typical sea ice forecasts are made with ensemble models, physics-based deterministic models of sea ice and the sur...
Book
Cambridge Core - Pattern Recognition and Machine Learning - Model-Based Clustering and Classification for Data Science - by Charles Bouveyron
Article
The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we as...
Preprint
Smoking is one of the preventable threats to human health and is a major risk factor for lung cancer, upper aero-digestive cancer, and chronic obstructive pulmonary disease. Estimating and forecasting the smoking attributable fraction (SAF) of mortality can yield insights into smoking epidemics and also provide a basis for more accurate mortality a...
Article
Full-text available
Significance Despite the importance of international migration, estimates of between-country migration flows are still imprecise. Reliable record keeping of migration events is typically available only in the developed world, and the best existing methods to produce global migration flow estimates are burdened by strong assumptions. We produce esti...
Conference Paper
Full-text available
The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we as...
Article
Full-text available
Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However, for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another...
Article
Full-text available
Gene regulatory network reconstruction is an essential task of genomics in order to further our understanding of how genes interact dynamically with each other. The most readily available data, however, are from steady-state observations. These data are not as informative about the relational dynamics between genes as knockout or over-expression ex...
Preprint
Since the 1940s, population projections have in most cases been produced using the deterministic cohort component method. However, in 2015, for the first time, in a major advance, the United Nations issued official probabilistic population projections for all countries based on Bayesian hierarchical models for total fertility and life expectancy. T...
Preprint
The Schwarz or Bayesian information criterion (BIC) is one of the most widely used tools for model comparison in social science research. The BIC however is not suitable for evaluating models with order constraints on the parameters of interest. This paper explores two extensions of the BIC for evaluating order constrained models, one where a trunc...
Article
Full-text available
The United Nations (UN) issued official probabilistic population projections for all countries to 2100 in July 2015. This was done by simulating future levels of total fertility and life expectancy from Bayesian hierarchical models, and combining the results using a standard cohort-component projection method. The 40 countries with generalized HIV/...
Article
A new method, called contour shifting, is proposed for correcting the bias in forecasts of contours such as sea ice concentration above certain thresholds. Retrospective comparisons of observations and dynamical model forecasts are used to build a statistical spatiotemporal model of how predicted contours typically differ from observed contours. Fo...
Article
Full-text available
Background: Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the ne...
Article
The recently published Intergovernmental Panel on Climate Change (IPCC) projections to 2100 give likely ranges of global temperature increase in four scenarios for population, economic growth and carbon use. However, these projections are not based on a fully statistical approach. Here we use a country-specific version of Kaya's identity to develop...
Preprint
Full-text available
Background The inference of gene regulatory networks is of great interest and has various applications. The recent advances in high-throughout biological data collection have facilitated the construction and understanding of gene regulatory networks in many model organisms. However, the inference of gene networks from large-scale human genomic data...
Article
Full-text available
Background: We consider the problem of probabilistic projection of the total fertility rate (TFR) for subnational regions. Objective: We seek a method that is consistent with the UN's recently adopted Bayesian method for probabilistic TFR projections for all countries and works well for all countries. Methods: We assess various possible method...
Preprint
Full-text available
BACKGROUND Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a novel and computationally efficient method for eliminating redundant indirect edges...
Article
Significance Some hidden or hard-to-reach populations of interest to researchers are difficult to study with standard statistical methods because there is not a reliable list of members from which samples can be drawn. Respondent-driven sampling (RDS) is a common way to reach members of these populations by allowing a small number of respondents to...
Article
We derive properties of latent variable models for networks, a broad class of models that includes the widely used latent position models. We characterize several features of interest, with particular focus on the degree distribution, clustering coefficient, average path length, and degree correlations. We introduce the Gaussian latent position mod...
Article
Full-text available
We describe bayesPop, an R package for producing probabilistic population projections for all countries. This uses probabilistic projections of total fertility and life expectancy generated by Bayesian hierarchical models. It produces a sample from the joint posterior predictive distribution of future age-and sex-specific population counts, fertili...
Article
Full-text available
Background While probabilistic projection methods for projecting life expectancy exist, few account for covariates related to life expectancy. Generalized HIV/AIDS epidemics have a large, immediate negative impact on the life expectancy in a country, but this impact can be mitigated by widespread use of antiretroviral therapy (ART). Thus, projectio...
Article
Full-text available
Finite mixture models are being used increasingly to model a wide variety of random phenomena for clustering, classification and density estimation. mclust is a powerful and popular package which allows modelling of data as a Gaussian finite mixture with different covariance structures and different numbers of mixture components, for a variety of p...
Article
Full-text available
Significance We develop a statistical model for the evolution of the network of leading Irish company directorates over 11 years, before and after the financial crisis of 2008. We focus on company interlocks, whereby a director simultaneously sits on more than one company board. Our analysis indicates that the level of director interlockingness inc...
Article
The United Nations is the major organization producing and regularly updating probabilistic population projections for all countries. International migration is a critical component of such projections, and between-country correlations are important for forecasts of regional aggregates. However, there are 200 countries and only 12 data points, each...
Article
Significance Projected populations to the end of this century are an important factor in many policy decisions. Population forecasts become less reliable as we look farther into the future, suggesting a probabilistic approach to convey uncertainty. Migration projections have been largely deterministic until now, even in probabilistic population pro...
Article
Full-text available
We propose Adaptive Incremental Mixture Markov chain Monte Carlo (AIMM), a novel approach to sample from challenging probability distributions defined on a general state-space. Typically, adaptive MCMC methods recursively update a parametric proposal kernel with a global rule; by contrast AIMM locally adapts a non-parametric kernel. AIMM is based o...
Article
Full-text available
Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown...
Article
We show that Bayesian population reconstruction, a recent method for estimating past populations by age, works for data of widely varying quality. Bayesian reconstruction simultaneously estimates age-specific population counts, fertility rates, mortality rates, and net international migration flows from fragmentary data, while formally accounting f...
Chapter
The UN released official probabilistic population projections (PPP) for all countries for the first time in July 2014. These were obtained by projecting the period total fertility rate (TFR) and life expectancy at birth (e 0) using Bayesian hierarchical models, yielding a large set of future trajectories of TFR and e 0 for all countries and future...
Article
The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometime...
Article
Full-text available
Demographic forecasts are inherently uncertain. Nevertheless, an appropriate description of this uncertainty is a key underpinning of informed decision making. In recent decades various methods have been developed to describe the uncertainty of future populations and their structures, but the uptake of such tools amongst the practitioners of offici...
Article
Full-text available
Background Inference of gene networks from expression data is an important problem in computational biology. Many algorithms have been proposed for solving the problem efficiently. However, many of the available implementations are programming libraries that require users to write code, which limits their accessibility. Results We have developed a...
Article
Full-text available
Initialisation of the EM algorithm in model-based clustering is often crucial. Various starting points in the parameter space often lead to different local maxima of the likelihood function and, so to different clustering partitions. Among the several approaches available in the literature, model-based agglomerative hierarchical clustering is used...
Article
Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for data sets where the number of variables $p$ is large (e.g. $p>5,000$) the algorithm can become prohibitively expensive, computa...
Article
Full-text available
We propose Bayesian model averaging (BMA) as a method for postprocessing the results of model-based clustering. Given a number of competing models, appropriate model summaries are averaged, using the posterior model probabilities, instead of being taken from a single "best" model. We demonstrate the use of BMA in model-based clustering for a number...
Conference Paper
The United Nations issued probabilistic population projections for all countries for the first time in July 2014. This was done by simulating future levels of fertility and life expectancy from Bayesian hierarchical models, and combining the results using a standard cohort-component projection method. The 40 countries with generalized HIV/AIDS epid...
Chapter
Paul Deheuvels is best known internationally as a theoretical statistician, but he has made many other contributions. Here I give a brief overview of his work as a mentor of many doctoral students, as an advocate for the discipline of statistics, particularly in the context of his work as the only statistician member of the French Académie des Scie...
Article
In most countries in the world outside of sub-Saharan Africa, HIV is largely concentrated in sub-populations whose behavior puts them at higher risk of contracting and transmitting HIV, such as people who inject drugs, sex workers and men who have sex with men. Estimating the size of these sub-populations is important for assessing overall HIV prev...
Article
Full-text available
The United Nations released official probabilistic population projections (PPP) for all countries for the first time in July 2014. These were obtained by projecting the period total fertility rate (TFR) and life expectancy at birth ($e_0$) using Bayesian hierarchical models, yielding a large set of future trajectories of TFR and $e_0$ for all count...
Article
Full-text available
Finite mixture modelling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide clustering information. This enables the selection of a more parsimonious model, yielding more efficient esti...
Article
Bayesian model averaging has become a widely used approach to accounting for uncertainty about the structural form of the model generating the data. When data arrive sequentially and the generating model can change over time, Dynamic Model Averaging (DMA) extends model averaging to deal with this situation. Often in macroeconomics, however, many ca...
Article
The United Nations (UN) recently released population projections based on data until 2012 and a Bayesian probabilistic methodology. Analysis of these data reveals that, contrary to previous literature, the world population is unlikely to stop growing this century. There is an 80% probability that world population, now 7.2 billion people, will incre...
Article
Probabilistic forecasts are becoming more and more available. How should they be used and communicated? What are the obstacles to their use in practice? I review experience with five problems where probabilistic forecasting played an important role. This leads me to identify five types of potential users: Low Stakes Users, who don't need probabilis...
Article
Full-text available
Background: In a given population the age pattern of mortality is an important determinant of total number of deaths, age structure, and through effects on age structure, the number of births and thereby growth. Good mortality models exist for most populations except those experiencing generalized HIV epidemics and some developing country populati...
Article
Full-text available
The United Nations regularly publishes projections of the populations of all the world's countries broken down by age and sex. These projections are the de facto standard and are widely used by international organizations, governments and researchers. Like almost all other population projections, they are produced using the standard deterministic c...
Article
Full-text available
Background: The United Nations (UN) produces population projections for all countries every two years. These are used by international organizations, governments, the private sector and researchers for policy planning, for monitoring development goals, as inputs to economic and environmental models, and for social and health research. The UN is con...
Article
Full-text available
Genome-wide time-series data provide a rich set of information for discovering gene regulatory relationships.As genome-wide data for mammalian systems are being generated, it is critical to developnetwork inference methods that can handle tens of thousands of genes efficiently, provide a systematicframework for the integration of multiple data sour...
Article
Economic modeling in the presence of endogeneity is subject to model uncertainty at both the instrument and covariate level. We propose a Two-Stage Bayesian Model Averaging (2SBMA) methodology that extends the Two-Stage Least Squares (2SLS) estimator. By constructing a Two-Stage Unit Information Prior in the endogenous variable model, we are able t...