Adrian E. Raftery

Adrian E. Raftery
University of Washington Seattle | UW · Department of Statistics

About

329
Publications
50,335
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
72,534
Citations

Publications

Publications (329)
Preprint
Full-text available
The bayesTFR package for R provides a set of functions to produce probabilistic projections of the total fertility rates (TFR) for all countries, and is widely used, including as part of the basis for the UN's official population projections for all countries. Liu and Raftery (2020) extended the theoretical model by adding a layer that accounts for...
Article
Full-text available
Significance Choosing a statistical model and accounting for uncertainty about this choice are important parts of the scientific process and are required for common statistical tasks such as parameter estimation, interval estimation, statistical inference, point prediction, and interval prediction. A canonical example is the choice of variables in...
Article
The record for oldest human being was set in 1997 by Jeanne Calment of France at 122 years and 164 days. Michael Pearce and Adrian E. Raftery expect that record will be broken in the coming decades The record for oldest human being was set in 1997 by Jeanne Calment of France at 122 years and 164 days. Michael Pearce and Adrian E. Raftery expect tha...
Article
Population forecasts are used by governments and the private sector for planning, with horizons up to about three generations (around 2100) for different purposes. The traditional methods are deterministic using scenarios, but probabilistic forecasts are desired to get an idea of accuracy, assess changes, and make decisions involving risks. In a si...
Article
Full-text available
Projecting mortality for subnational units, or regions, is of great interest to practicing demographers. We seek a probabilistic method for projecting subnational life expectancy that is based on the national Bayesian hierarchical model used by the United Nations, and at the same time is easy to use. We propose three methods of this kind. Two of th...
Article
Full-text available
There is a growing expectation that data collected by government-funded studies should be openly available to ensure research reproducibility, which also increases concerns about data privacy. A strategy to protect individuals’ identity is to release multiply imputed (MI) synthetic datasets with masked sensitivity values (Rubin, 1993). However, inf...
Article
Smoking is one of the main risk factors that has affected human mortality and life expectancy over the past century. Smoking accounts for a large part of the nonlinearities in the growth of life expectancy and of the geographic and sex differences in mortality. As Bongaarts (2006) and Janssen (2018) suggested, accounting for smoking could improve t...
Preprint
Boundaries on spatial fields divide regions with particular features from surrounding background areas. These boundaries are often described with contour lines. To measure and record these boundaries, contours are often represented as ordered sequences of spatial points that connect to form a line. Methods to identify boundary lines from interpolat...
Article
Full-text available
Respondent-driven sampling is an approach for estimating features of populations that are difficult to access using standard survey tools, e.g., the fraction of injection drug users who are HIV positive. Baraff et al. (2016) introduced an approach to estimating uncertainty in population proportion estimates from respondent-driven sampling using the...
Article
Since the 1940s, population projections have in most cases been produced using the deterministic cohort component method. However, in 2015, for the first time, in a major advance, the United Nations issued official probabilistic population projections for all countries based on Bayesian hierarchical models for total fertility and life expectancy. T...
Article
Smoking is one of the leading preventable threats to human health and a major risk factor for lung cancer, upper aero-digestive cancer, and chronic obstructive pulmonary disease. Estimating and forecasting the smoking attributable fraction (SAF) of mortality can yield insights into smoking epidemics and also provide a basis for more accurate mortal...
Preprint
Smoking is one of the main risk factors that has affected human mortality and life expectancy over the past century. Smoking accounts for a large part of the nonlinearities in the growth of life expectancy and of the geographic and sex differences in mortality. As Bongaarts (2006) and Janssen (2018) suggested, accounting for smoking could improve t...
Preprint
Sea ice, or frozen ocean water, annually freezes and melts in the Arctic. The need for accurate forecasts of where sea ice will be located weeks to months in advance has increased as the amount of sea ice reduces due to climate change. Typical sea ice forecasts are made with ensemble models, physics-based deterministic models of sea ice and the sur...
Book
Cambridge Core - Pattern Recognition and Machine Learning - Model-Based Clustering and Classification for Data Science - by Charles Bouveyron
Article
The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we as...
Preprint
Smoking is one of the preventable threats to human health and is a major risk factor for lung cancer, upper aero-digestive cancer, and chronic obstructive pulmonary disease. Estimating and forecasting the smoking attributable fraction (SAF) of mortality can yield insights into smoking epidemics and also provide a basis for more accurate mortality a...
Article
Full-text available
We propose a method for estimating migration flows between all pairs of countries that allows for decomposition of migration into emigration, return, and transit components. Current state-of-the-art estimates of bilateral migration flows rely on the assumption that the number of global migrants is as small as possible. We relax this assumption, pro...
Conference Paper
Full-text available
The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we as...
Article
Full-text available
Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However, for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another...
Article
Full-text available
Gene regulatory network reconstruction is an essential task of genomics in order to further our understanding of how genes interact dynamically with each other. The most readily available data, however, are from steady-state observations. These data are not as informative about the relational dynamics between genes as knockout or over-expression ex...
Preprint
Since the 1940s, population projections have in most cases been produced using the deterministic cohort component method. However, in 2015, for the first time, in a major advance, the United Nations issued official probabilistic population projections for all countries based on Bayesian hierarchical models for total fertility and life expectancy. T...
Preprint
The Schwarz or Bayesian information criterion (BIC) is one of the most widely used tools for model comparison in social science research. The BIC however is not suitable for evaluating models with order constraints on the parameters of interest. This paper explores two extensions of the BIC for evaluating order constrained models, one where a trunc...
Article
Full-text available
The United Nations (UN) issued official probabilistic population projections for all countries to 2100 in July 2015. This was done by simulating future levels of total fertility and life expectancy from Bayesian hierarchical models, and combining the results using a standard cohort-component projection method. The 40 countries with generalized HIV/...
Article
A new method, called contour shifting, is proposed for correcting the bias in forecasts of contours such as sea ice concentration above certain thresholds. Retrospective comparisons of observations and dynamical model forecasts are used to build a statistical spatiotemporal model of how predicted contours typically differ from observed contours. Fo...
Article
Full-text available
Background: Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the ne...
Article
The recently published Intergovernmental Panel on Climate Change (IPCC) projections to 2100 give likely ranges of global temperature increase in four scenarios for population, economic growth and carbon use. However, these projections are not based on a fully statistical approach. Here we use a country-specific version of Kaya's identity to develop...
Preprint
Full-text available
Background The inference of gene regulatory networks is of great interest and has various applications. The recent advances in high-throughout biological data collection have facilitated the construction and understanding of gene regulatory networks in many model organisms. However, the inference of gene networks from large-scale human genomic data...
Article
Full-text available
Background: We consider the problem of probabilistic projection of the total fertility rate (TFR) for subnational regions. Objective: We seek a method that is consistent with the UN's recently adopted Bayesian method for probabilistic TFR projections for all countries and works well for all countries. Methods: We assess various possible method...
Preprint
Full-text available
BACKGROUND Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a novel and computationally efficient method for eliminating redundant indirect edges...
Article
Respondent-driven sampling (RDS) is a network-based form of chain-referral sampling used to estimate attributes of populations that are difficult to access using standard survey tools. Although it has grown quickly in popularity since its introduction, the statistical properties of RDS estimates remain elusive. In particular, the sampling variabili...
Article
We derive properties of latent variable models for networks, a broad class of models that includes the widely used latent position models. We characterize several features of interest, with particular focus on the degree distribution, clustering coefficient, average path length, and degree correlations. We introduce the Gaussian latent position mod...
Article
Full-text available
We describe bayesPop, an R package for producing probabilistic population projections for all countries. This uses probabilistic projections of total fertility and life expectancy generated by Bayesian hierarchical models. It produces a sample from the joint posterior predictive distribution of future age-and sex-specific population counts, fertili...
Article
Full-text available
Background While probabilistic projection methods for projecting life expectancy exist, few account for covariates related to life expectancy. Generalized HIV/AIDS epidemics have a large, immediate negative impact on the life expectancy in a country, but this impact can be mitigated by widespread use of antiretroviral therapy (ART). Thus, projectio...
Article
Full-text available
Finite mixture models are being used increasingly to model a wide variety of random phenomena for clustering, classification and density estimation. mclust is a powerful and popular package which allows modelling of data as a Gaussian finite mixture with different covariance structures and different numbers of mixture components, for a variety of p...
Article
Full-text available
We analyze the temporal bipartite network of the leading Irish companies and their directors from 2003 to 2013, encompassing the end of the Celtic Tiger boom and the ensuing financial crisis in 2008. We focus on the evolution of company interlocks, whereby a company director simultaneously sits on two or more boards. We develop a statistical model...
Article
The United Nations is the major organization producing and regularly updating probabilistic population projections for all countries. International migration is a critical component of such projections, and between-country correlations are important for forecasts of regional aggregates. However, there are 200 countries and only 12 data points, each...
Article
We produce probabilistic projections of population for all countries based on probabilistic projections of fertility, mortality, and migration. We compare our projections to those from the United Nations' Probabilistic Population Projections, which uses similar methods for fertility and mortality but deterministic migration projections. We find tha...
Article
Full-text available
We propose Adaptive Incremental Mixture Markov chain Monte Carlo (AIMM), a novel approach to sample from challenging probability distributions defined on a general state-space. Typically, adaptive MCMC methods recursively update a parametric proposal kernel with a global rule; by contrast AIMM locally adapts a non-parametric kernel. AIMM is based o...
Article
Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown...
Article
We show that Bayesian population reconstruction, a recent method for estimating past populations by age, works for data of widely varying quality. Bayesian reconstruction simultaneously estimates age-specific population counts, fertility rates, mortality rates, and net international migration flows from fragmentary data, while formally accounting f...
Chapter
The UN released official probabilistic population projections (PPP) for all countries for the first time in July 2014. These were obtained by projecting the period total fertility rate (TFR) and life expectancy at birth (e 0) using Bayesian hierarchical models, yielding a large set of future trajectories of TFR and e 0 for all countries and future...
Article
The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometime...
Article
Full-text available
Demographic forecasts are inherently uncertain. Nevertheless, an appropriate description of this uncertainty is a key underpinning of informed decision making. In recent decades various methods have been developed to describe the uncertainty of future populations and their structures, but the uptake of such tools amongst the practitioners of offici...
Article
Full-text available
Background Inference of gene networks from expression data is an important problem in computational biology. Many algorithms have been proposed for solving the problem efficiently. However, many of the available implementations are programming libraries that require users to write code, which limits their accessibility. Results We have developed a...
Article
Full-text available
Initialisation of the EM algorithm in model-based clustering is often crucial. Various starting points in the parameter space often lead to different local maxima of the likelihood function and, so to different clustering partitions. Among the several approaches available in the literature, model-based agglomerative hierarchical clustering is used...
Article
Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for data sets where the number of variables $p$ is large (e.g. $p>5,000$) the algorithm can become prohibitively expensive, computa...
Article
Full-text available
We propose Bayesian model averaging (BMA) as a method for postprocessing the results of model-based clustering. Given a number of competing models, appropriate model summaries are averaged, using the posterior model probabilities, instead of being taken from a single "best" model. We demonstrate the use of BMA in model-based clustering for a number...
Conference Paper
The United Nations issued probabilistic population projections for all countries for the first time in July 2014. This was done by simulating future levels of fertility and life expectancy from Bayesian hierarchical models, and combining the results using a standard cohort-component projection method. The 40 countries with generalized HIV/AIDS epid...
Chapter
Paul Deheuvels is best known internationally as a theoretical statistician, but he has made many other contributions. Here I give a brief overview of his work as a mentor of many doctoral students, as an advocate for the discipline of statistics, particularly in the context of his work as the only statistician member of the French Académie des Scie...
Article
In most countries in the world outside of sub-Saharan Africa, HIV is largely concentrated in sub-populations whose behavior puts them at higher risk of contracting and transmitting HIV, such as people who inject drugs, sex workers and men who have sex with men. Estimating the size of these sub-populations is important for assessing overall HIV prev...
Article
Full-text available
The United Nations released official probabilistic population projections (PPP) for all countries for the first time in July 2014. These were obtained by projecting the period total fertility rate (TFR) and life expectancy at birth ($e_0$) using Bayesian hierarchical models, yielding a large set of future trajectories of TFR and $e_0$ for all count...
Article
Full-text available
Finite mixture modelling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide clustering information. This enables the selection of a more parsimonious model, yielding more efficient esti...
Article
Bayesian model averaging has become a widely used approach to accounting for uncertainty about the structural form of the model generating the data. When data arrive sequentially and the generating model can change over time, Dynamic Model Averaging (DMA) extends model averaging to deal with this situation. Often in macroeconomics, however, many ca...
Article
The United Nations (UN) recently released population projections based on data until 2012 and a Bayesian probabilistic methodology. Analysis of these data reveals that, contrary to previous literature, the world population is unlikely to stop growing this century. There is an 80% probability that world population, now 7.2 billion people, will incre...
Article
Probabilistic forecasts are becoming more and more available. How should they be used and communicated? What are the obstacles to their use in practice? I review experience with five problems where probabilistic forecasting played an important role. This leads me to identify five types of potential users: Low Stakes Users, who don't need probabilis...
Article
Full-text available
Background: In a given population the age pattern of mortality is an important determinant of total number of deaths, age structure, and through effects on age structure, the number of births and thereby growth. Good mortality models exist for most populations except those experiencing generalized HIV epidemics and some developing country populati...
Article
Full-text available
The United Nations regularly publishes projections of the populations of all the world's countries broken down by age and sex. These projections are the de facto standard and are widely used by international organizations, governments and researchers. Like almost all other population projections, they are produced using the standard deterministic c...
Article
Full-text available
Background: The United Nations (UN) produces population projections for all countries every two years. These are used by international organizations, governments, the private sector and researchers for policy planning, for monitoring development goals, as inputs to economic and environmental models, and for social and health research. The UN is con...
Article
Full-text available
Genome-wide time-series data provide a rich set of information for discovering gene regulatory relationships.As genome-wide data for mammalian systems are being generated, it is critical to developnetwork inference methods that can handle tens of thousands of genes efficiently, provide a systematicframework for the integration of multiple data sour...
Article
Economic modeling in the presence of endogeneity is subject to model uncertainty at both the instrument and covariate level. We propose a Two-Stage Bayesian Model Averaging (2SBMA) methodology that extends the Two-Stage Least Squares (2SLS) estimator. By constructing a Two-Stage Unit Information Prior in the endogenous variable model, we are able t...
Article
Full-text available
The original version of Bayesian reconstruction, a method for estimating age-specific fertility, mortality, migration and population counts of the recent past with uncertainty, produced estimates for female-only populations. Here we show how two-sex populations can be similarly reconstructed and probabilistic estimates of various sex ratio quantiti...
Conference Paper
A Bayesian approach for probabilistic population projections has recently been used by the United Nations Population Division in the preparation of the 2012 revision of the World Population Prospects. The methods have been implemented in publicly available open-source software as a collection of R packages. In this paper, we demonstrate how to easi...
Article
Full-text available
We propose a method for obtaining joint probabilistic projections of migration rates for all countries, broken down by age and sex. Joint trajectories for all countries are constrained to satisfy the requirement of zero global net migration. We evaluate our model using out-of-sample validation and compare point projections to the projected migratio...
Article
Full-text available
We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a curre...
Article
We develop methods for estimating hard-to-reach populations from data collected using network-based questions on standard surveys. Such data arise by asking respondents how many people they know in a specific group (e.g. people named Michael, intravenous drug users). The Network Scale up Method (NSUM) is a tool for producing population size estimat...
Conference Paper
We extend Bayesian population reconstruction, a recent method for estimating past populations by age with fully probabilistic statements of uncertainty. It simultaneously estimates age-specific population counts, vital rates and net migration from fragmentary data while formally accounting for measurement error. As inputs, it takes initial bias-red...
Article
Full-text available
We propose a Bayesian hierarchical model for producing probabilistic forecasts of male period life expectancy at birth for all the countries of the world to 2100. Such forecasts would be an input to the production of probabilistic population projections for all countries, which is currently being considered by the United Nations. To evaluate the me...
Article
Full-text available
Current methods for reconstructing human populations of the past by age and sex are deterministic or do not formally account for measurement error. We propose a method for simultaneously estimating age-specific population counts, fertility rates, mortality rates, and net international migration flows from fragmentary data that incorporates measurem...
Article
The United Nations (UN) Population Division is considering producing probabilistic projections for the total fertility rate (TFR) using the Bayesian hierarchical model of Alkema et al. (2011), which produces predictive distributions of TFR for individual countries. The UN is interested in publishing probabilistic projections for aggregates of count...