
Adrian E. RafteryUniversity of Washington Seattle | UW · Department of Statistics
Adrian E. Raftery
About
329
Publications
50,335
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
72,534
Citations
Publications
Publications (329)
The bayesTFR package for R provides a set of functions to produce probabilistic projections of the total fertility rates (TFR) for all countries, and is widely used, including as part of the basis for the UN's official population projections for all countries. Liu and Raftery (2020) extended the theoretical model by adding a layer that accounts for...
Significance
Choosing a statistical model and accounting for uncertainty about this choice are important parts of the scientific process and are required for common statistical tasks such as parameter estimation, interval estimation, statistical inference, point prediction, and interval prediction. A canonical example is the choice of variables in...
The record for oldest human being was set in 1997 by Jeanne Calment of France at 122 years and 164 days. Michael Pearce and Adrian E. Raftery expect that record will be broken in the coming decades The record for oldest human being was set in 1997 by Jeanne Calment of France at 122 years and 164 days. Michael Pearce and Adrian E. Raftery expect tha...
Population forecasts are used by governments and the private sector for planning, with horizons up to about three generations (around 2100) for different purposes. The traditional methods are deterministic using scenarios, but probabilistic forecasts are desired to get an idea of accuracy, assess changes, and make decisions involving risks. In a si...
Projecting mortality for subnational units, or regions, is of great interest to practicing demographers. We seek a probabilistic method for projecting subnational life expectancy that is based on the national Bayesian hierarchical model used by the United Nations, and at the same time is easy to use. We propose three methods of this kind. Two of th...
There is a growing expectation that data collected by government-funded studies should be openly available to ensure research reproducibility, which also increases concerns about data privacy. A strategy to protect individuals’ identity is to release multiply imputed (MI) synthetic datasets with masked sensitivity values (Rubin, 1993). However, inf...
Smoking is one of the main risk factors that has affected human mortality and life expectancy over the past century. Smoking accounts for a large part of the nonlinearities in the growth of life expectancy and of the geographic and sex differences in mortality. As Bongaarts (2006) and Janssen (2018) suggested, accounting for smoking could improve t...
Boundaries on spatial fields divide regions with particular features from surrounding background areas. These boundaries are often described with contour lines. To measure and record these boundaries, contours are often represented as ordered sequences of spatial points that connect to form a line. Methods to identify boundary lines from interpolat...
Respondent-driven sampling is an approach for estimating features of populations that are difficult to access using standard survey tools, e.g., the fraction of injection drug users who are HIV positive. Baraff et al. (2016) introduced an approach to estimating uncertainty in population proportion estimates from respondent-driven sampling using the...
Since the 1940s, population projections have in most cases been produced using the deterministic cohort component method. However, in 2015, for the first time, in a major advance, the United Nations issued official probabilistic population projections for all countries based on Bayesian hierarchical models for total fertility and life expectancy. T...
Smoking is one of the leading preventable threats to human health and a major risk factor for lung cancer, upper aero-digestive cancer, and chronic obstructive pulmonary disease. Estimating and forecasting the smoking attributable fraction (SAF) of mortality can yield insights into smoking epidemics and also provide a basis for more accurate mortal...
Smoking is one of the main risk factors that has affected human mortality and life expectancy over the past century. Smoking accounts for a large part of the nonlinearities in the growth of life expectancy and of the geographic and sex differences in mortality. As Bongaarts (2006) and Janssen (2018) suggested, accounting for smoking could improve t...
Sea ice, or frozen ocean water, annually freezes and melts in the Arctic. The need for accurate forecasts of where sea ice will be located weeks to months in advance has increased as the amount of sea ice reduces due to climate change. Typical sea ice forecasts are made with ensemble models, physics-based deterministic models of sea ice and the sur...
Cambridge Core - Pattern Recognition and Machine Learning - Model-Based Clustering and Classification for Data Science - by Charles Bouveyron
The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we as...
Smoking is one of the preventable threats to human health and is a major risk factor for lung cancer, upper aero-digestive cancer, and chronic obstructive pulmonary disease. Estimating and forecasting the smoking attributable fraction (SAF) of mortality can yield insights into smoking epidemics and also provide a basis for more accurate mortality a...
We propose a method for estimating migration flows between all pairs of countries that allows for decomposition of migration into emigration, return, and transit components. Current state-of-the-art estimates of bilateral migration flows rely on the assumption that the number of global migrants is as small as possible. We relax this assumption, pro...
The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we as...
Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However, for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another...
Gene regulatory network reconstruction is an essential task of genomics in order to further our understanding of how genes interact dynamically with each other. The most readily available data, however, are from steady-state observations. These data are not as informative about the relational dynamics between genes as knockout or over-expression ex...
Since the 1940s, population projections have in most cases been produced using the deterministic cohort component method. However, in 2015, for the first time, in a major advance, the United Nations issued official probabilistic population projections for all countries based on Bayesian hierarchical models for total fertility and life expectancy. T...
The Schwarz or Bayesian information criterion (BIC) is one of the most widely used tools for model comparison in social science research. The BIC however is not suitable for evaluating models with order constraints on the parameters of interest. This paper explores two extensions of the BIC for evaluating order constrained models, one where a trunc...
The United Nations (UN) issued official probabilistic population projections for all countries to 2100 in July 2015. This was done by simulating future levels of total fertility and life expectancy from Bayesian hierarchical models, and combining the results using a standard cohort-component projection method. The 40 countries with generalized HIV/...
A new method, called contour shifting, is proposed for correcting the bias in forecasts of contours such as sea ice concentration above certain thresholds. Retrospective comparisons of observations and dynamical model forecasts are used to build a statistical spatiotemporal model of how predicted contours typically differ from observed contours. Fo...
Background: Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the ne...
The recently published Intergovernmental Panel on Climate Change (IPCC) projections to 2100 give likely ranges of global temperature increase in four scenarios for population, economic growth and carbon use. However, these projections are not based on a fully statistical approach. Here we use a country-specific version of Kaya's identity to develop...
Background
The inference of gene regulatory networks is of great interest and has various applications. The recent advances in high-throughout biological data collection have facilitated the construction and understanding of gene regulatory networks in many model organisms. However, the inference of gene networks from large-scale human genomic data...
Background:
We consider the problem of probabilistic projection of the total fertility rate (TFR) for subnational regions.
Objective:
We seek a method that is consistent with the UN's recently adopted Bayesian method for probabilistic TFR projections for all countries and works well for all countries.
Methods:
We assess various possible method...
BACKGROUND
Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a novel and computationally efficient method for eliminating redundant indirect edges...
Respondent-driven sampling (RDS) is a network-based form of chain-referral sampling used to estimate attributes of populations that are difficult to access using standard survey tools. Although it has grown quickly in popularity since its introduction, the statistical properties of RDS estimates remain elusive. In particular, the sampling variabili...
We derive properties of latent variable models for networks, a broad class of models that includes the widely used latent position models. We characterize several features of interest, with particular focus on the degree distribution, clustering coefficient, average path length, and degree correlations. We introduce the Gaussian latent position mod...
We describe bayesPop, an R package for producing probabilistic population projections for all countries. This uses probabilistic projections of total fertility and life expectancy generated by Bayesian hierarchical models. It produces a sample from the joint posterior predictive distribution of future age-and sex-specific population counts, fertili...
Background
While probabilistic projection methods for projecting life expectancy exist, few account for covariates related to life expectancy. Generalized HIV/AIDS epidemics have a large, immediate negative impact on the life expectancy in a country, but this impact can be mitigated by widespread use of antiretroviral therapy (ART). Thus, projectio...
Finite mixture models are being used increasingly to model a wide variety of random phenomena for clustering, classification and density estimation. mclust is a powerful and popular package which allows modelling of data as a Gaussian finite mixture with different covariance structures and different numbers of mixture components, for a variety of p...
We analyze the temporal bipartite network of the leading Irish companies and their directors from 2003 to 2013, encompassing the end of the Celtic Tiger boom and the ensuing financial crisis in 2008. We focus on the evolution of company interlocks, whereby a company director simultaneously sits on two or more boards. We develop a statistical model...
The United Nations is the major organization producing and regularly updating probabilistic population projections for all countries. International migration is a critical component of such projections, and between-country correlations are important for forecasts of regional aggregates. However, there are 200 countries and only 12 data points, each...
We produce probabilistic projections of population for all countries based on probabilistic projections of fertility, mortality, and migration. We compare our projections to those from the United Nations' Probabilistic Population Projections, which uses similar methods for fertility and mortality but deterministic migration projections. We find tha...
We propose Adaptive Incremental Mixture Markov chain Monte Carlo (AIMM), a novel approach to sample from challenging probability distributions defined on a general state-space. Typically, adaptive MCMC methods recursively update a parametric proposal kernel with a global rule; by contrast AIMM locally adapts a non-parametric kernel. AIMM is based o...
Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown...
We show that Bayesian population reconstruction, a recent method for estimating past populations by age, works for data of widely varying quality. Bayesian reconstruction simultaneously estimates age-specific population counts, fertility rates, mortality rates, and net international migration flows from fragmentary data, while formally accounting f...
The UN released official probabilistic population projections (PPP) for all countries for the first time in July 2014. These were obtained by projecting the period total fertility rate (TFR) and life expectancy at birth (e
0) using Bayesian hierarchical models, yielding a large set of future trajectories of TFR and e
0 for all countries and future...
The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometime...
Demographic forecasts are inherently uncertain. Nevertheless, an appropriate description of this uncertainty is a key underpinning of informed decision making. In recent decades various methods have been developed to describe the uncertainty of future populations and their structures, but the uptake of such tools amongst the practitioners of offici...
Background
Inference of gene networks from expression data is an important problem in computational biology. Many algorithms have been proposed for solving the problem efficiently. However, many of the available implementations are programming libraries that require users to write code, which limits their accessibility.
Results
We have developed a...
Initialisation of the EM algorithm in model-based clustering is often
crucial. Various starting points in the parameter space often lead to different
local maxima of the likelihood function and, so to different clustering
partitions. Among the several approaches available in the literature,
model-based agglomerative hierarchical clustering is used...
Bayesian Additive Regression Trees (BART) is a statistical sum of trees
model. It can be considered a Bayesian version of machine learning tree
ensemble methods where the individual trees are the base learners. However for
data sets where the number of variables $p$ is large (e.g. $p>5,000$) the
algorithm can become prohibitively expensive, computa...
We propose Bayesian model averaging (BMA) as a method for postprocessing the
results of model-based clustering. Given a number of competing models,
appropriate model summaries are averaged, using the posterior model
probabilities, instead of being taken from a single "best" model. We
demonstrate the use of BMA in model-based clustering for a number...
The United Nations issued probabilistic population projections for all countries for the first time in July 2014. This was done by simulating future levels of fertility and life expectancy from Bayesian hierarchical models, and combining the results using a standard cohort-component projection method. The 40 countries with generalized HIV/AIDS epid...
Paul Deheuvels is best known internationally as a theoretical statistician, but he has made many other contributions. Here I give a brief overview of his work as a mentor of many doctoral students, as an advocate for the discipline of statistics, particularly in the context of his work as the only statistician member of the French Académie des Scie...
In most countries in the world outside of sub-Saharan Africa, HIV is largely concentrated in sub-populations whose behavior puts them at higher risk of contracting and transmitting HIV, such as people who inject drugs, sex workers and men who have sex with men. Estimating the size of these sub-populations is important for assessing overall HIV prev...
The United Nations released official probabilistic population projections
(PPP) for all countries for the first time in July 2014. These were obtained by
projecting the period total fertility rate (TFR) and life expectancy at birth
($e_0$) using Bayesian hierarchical models, yielding a large set of future
trajectories of TFR and $e_0$ for all count...
Finite mixture modelling provides a framework for cluster analysis based on
parsimonious Gaussian mixture models. Variable or feature selection is of
particular importance in situations where only a subset of the available
variables provide clustering information. This enables the selection of a more
parsimonious model, yielding more efficient esti...
Bayesian model averaging has become a widely used approach to accounting for
uncertainty about the structural form of the model generating the data. When
data arrive sequentially and the generating model can change over time, Dynamic
Model Averaging (DMA) extends model averaging to deal with this situation.
Often in macroeconomics, however, many ca...
The United Nations (UN) recently released population projections based on data until 2012 and a Bayesian probabilistic methodology.
Analysis of these data reveals that, contrary to previous literature, the world population is unlikely to stop growing this century. There is an 80% probability that world population, now 7.2 billion people, will incre...
Probabilistic forecasts are becoming more and more available. How should they
be used and communicated? What are the obstacles to their use in practice? I
review experience with five problems where probabilistic forecasting played an
important role. This leads me to identify five types of potential users: Low
Stakes Users, who don't need probabilis...
Background:
In a given population the age pattern of mortality is an important determinant of total number of deaths, age structure, and through effects on age structure, the number of births and thereby growth. Good mortality models exist for most populations except those experiencing generalized HIV epidemics and some developing country populati...
The United Nations regularly publishes projections of the populations of all
the world's countries broken down by age and sex. These projections are the de
facto standard and are widely used by international organizations, governments
and researchers. Like almost all other population projections, they are
produced using the standard deterministic c...
Background: The United Nations (UN) produces population projections for all countries every two years. These are used by international organizations, governments, the private sector and researchers for policy planning, for monitoring development goals, as inputs to economic and environmental models, and for social and health research. The UN is con...
Genome-wide time-series data provide a rich set of information for discovering gene regulatory relationships.As genome-wide data for mammalian systems are being generated, it is critical to developnetwork inference methods that can handle tens of thousands of genes efficiently, provide a systematicframework for the integration of multiple data sour...
Economic modeling in the presence of endogeneity is subject to model uncertainty at both the instrument and covariate level. We propose a Two-Stage Bayesian Model Averaging (2SBMA) methodology that extends the Two-Stage Least Squares (2SLS) estimator. By constructing a Two-Stage Unit Information Prior in the endogenous variable model, we are able t...
The original version of Bayesian reconstruction, a method for estimating
age-specific fertility, mortality, migration and population counts of the
recent past with uncertainty, produced estimates for female-only populations.
Here we show how two-sex populations can be similarly reconstructed and
probabilistic estimates of various sex ratio quantiti...
A Bayesian approach for probabilistic population projections has recently been used by the United Nations Population Division in the preparation of the 2012 revision of the World Population Prospects.
The methods have been implemented in publicly available open-source software as a collection of R packages. In this paper, we demonstrate how to easi...
We propose a method for obtaining joint probabilistic projections of
migration rates for all countries, broken down by age and sex. Joint
trajectories for all countries are constrained to satisfy the requirement of
zero global net migration. We evaluate our model using out-of-sample validation
and compare point projections to the projected migratio...
We compare two major approaches to variable selection in clustering: model
selection and regularization. Based on previous results, we select the method
of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006),
as a current state of the art model selection method. We select the method of
Witten and Tibshirani (2010) as a curre...
We develop methods for estimating hard-to-reach populations from data
collected using network-based questions on standard surveys. Such data arise by
asking respondents how many people they know in a specific group (e.g. people
named Michael, intravenous drug users). The Network Scale up Method (NSUM) is a
tool for producing population size estimat...
We extend Bayesian population reconstruction, a recent method for estimating past populations by age with fully probabilistic statements of uncertainty. It simultaneously estimates age-specific population counts, vital rates and net migration from fragmentary data while formally accounting for measurement error. As inputs, it takes initial bias-red...
We propose a Bayesian hierarchical model for producing probabilistic forecasts of male period life expectancy at birth for all the countries of the world to 2100. Such forecasts would be an input to the production of probabilistic population projections for all countries, which is currently being considered by the United Nations. To evaluate the me...
Current methods for reconstructing human populations of the past by age and sex are deterministic or do not formally account for measurement error. We propose a method for simultaneously estimating age-specific population counts, fertility rates, mortality rates, and net international migration flows from fragmentary data that incorporates measurem...
The United Nations (UN) Population Division is considering producing
probabilistic projections for the total fertility rate (TFR) using the Bayesian
hierarchical model of Alkema et al. (2011), which produces predictive
distributions of TFR for individual countries. The UN is interested in
publishing probabilistic projections for aggregates of count...