Article

Using the Bayesian Model Averaging Approach for Genomic Selection by Considering Skewed Error Distributions

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Unbiased estimation of causal effects with observational data requires adjustment for confounding variables that are related to both the outcome and treatment assignment. Standard variable selection techniques aim to maximize predictive ability of the outcome model, but they ignore covariate associations with treatment and may not adjust for important confounders weakly associated to outcome. We propose a novel method for estimating causal effects that simultaneously considers models for both outcome and treatment, which we call the bilevel spike and slab causal estimator (BSSCE). By using a Bayesian formulation, BSSCE estimates the posterior distribution of all model parameters and provides straightforward and reliable inference. Spike and slab priors are used on each covariate coefficient which aim to minimize the mean squared error of the treatment effect estimator. Theoretical properties of the treatment effect estimator are derived justifying the prior used in BSSCE. Simulations show that BSSCE can substantially reduce mean squared error over numerous methods and performs especially well with large numbers of covariates, including situations where the number of covariates is greater than the sample size. We illustrate BSSCE by estimating the causal effect of vasoactive therapy vs. fluid resuscitation on hypotensive episode length for patients in the Multiparameter Intelligent Monitoring in Intensive Care III critical care database.
Article
Full-text available
A crucial problem in building a multiple regression model is the selection of predictors to include. The main thrust of this article is to propose and develop a procedure that uses probabilistic considerations for selecting promising subsets. This procedure entails embedding the regression setup in a hierarchical normal mixture model where latent variables are used to identify subset choices. In this framework the promising subsets of predictors can be identified as those with higher posterior probability. The computational burden is then alleviated by using the Gibbs sampler to indirectly sample from this multinomial posterior distribution on the set of possible subset choices. Those subsets with higher probability—the promising ones—can then be identified by their more frequent appearance in the Gibbs sample.
Article
The use of the Gibbs sampler for Bayesian computation is reviewed and illustrated in the context of some canonical examples. Other Markov chain Monte Carlo simulation methods are also briefly described, and comments are made on the advantages of sample‐based approaches for Bayesian inference summaries.
Article
In most examples of inference and prediction, the expression of uncertainty about unknown quantities y on the basis of known quantities x is based on a model M that formalizes assumptions about how x and y are related. M will typically have two parts: structural assumptions S, such as the form of the link function and the choice of error distribution in a generalized linear model, and parameters θ whose meaning is specific to a given choice of S. It is common in statistical theory and practice to acknowledge parametric uncertainty about θ given a particular assumed structure S; it is less common to acknowledge structural uncertainty about S itself. A widely used approach involves enlisting the aid of x to specify a plausible single ‘best’ choice S* for S, and then proceeding as if S* were known to be correct. In general this approach fails to assess and propagate structural uncertainty fully and may lead to miscalibrated uncertainty assessments about y given x. When miscalibration occurs it will often result in understatement of inferential or predictive uncertainty about y, leading to inaccurate scientific summaries and overconfident decisions that do not incorporate sufficient hedging against uncertainty. In this paper I discuss a Bayesian approach to solving this problem that has long been available in principle but is only now becoming routinely feasible, by virtue of recent computational advances, and examine its implementation in examples that involve forecasting the price of oil and estimating the chance of catastrophic failure of the US space shuttle.
Article
The prediction of cancer prognosis and metastatic potential immediately after the initial diagnoses is a major challenge in current clinical research. The relevance of such a signature is clear, as it will free many patients from the agony and toxic side-effects associated with the adjuvant chemotherapy automatically and sometimes carelessly subscribed to them. Motivated by this issue, several previous works presented a Bayesian model which led to the following conclusion: thousands of samples are needed to generate a robust gene list for predicting outcome. This conclusion is based on existence of some statistical assumptions including asymptotic independence of sample correlations. The current work makes two main contributions: (1) It shows that while the assumptions of the Bayesian model discussed by previous papers seem to be non-restrictive, they are quite strong. To demonstrate this point, it is shown that some standard sparse and Gaussian models are not included in the set of models which are mathematically consistent with these assumptions. (2) It is shown that the empirical Bayes methodology which was applied in order to test the relevant assumptions does not detect severe violations and consequently an overestimation of the required sample size might be incurred. Finally, we suggest that under some regularity conditions it is possible that the current theoretical results can be used for development of a new method to test the asymptotic independence assumption.
Article
Interest in the skew-normal and related families of distributions has grown enormously over recent years, as theory has advanced, challenges of data have grown, and computational tools have made substantial progress. This comprehensive treatment, blending theory and practice, will be the standard resource for statisticians and applied researchers. Assuming only basic knowledge of (non-measure-theoretic) probability and statistical inference, the book is accessible to the wide range of researchers who use statistical modelling techniques. Guiding readers through the main concepts and results, it covers both the probability and the statistics sides of the subject, in the univariate and multivariate settings. The theoretical development is complemented by numerous illustrations and applications to a range of fields including quantitative finance, medical statistics, environmental risk studies, and industrial and business efficiency. The author’s freely available R package sn, available from CRAN, equips readers to put the methods into action with their own data.
Article
Computational algorithms for selecting subsets of regression variables are discussed. Only linear models and the least-squares criterion are considered. The use of planar-rotation algorithms, instead of Gauss-Jordan methods, is advocated. The advantages and disadvantages of a number of "cheap" search methods are described for use when it is not feasible to carry out an exhaustive search for the best-fitting subsets. Hypothesis testing for three purposes is considered, namely (i) testing for zero regression coefficients for remaining variables, (ii) comparing subsets and (iii) testing for any predictive value in a selected subset. Three small data sets are used to illustrate these test. Spjøtvoll's (1972a) test is discussed in detail, though an extension to this test appears desirable. Estimation problems have largely been overlooked in the past. Three types of bias are identified, namely that due to the omission of variables, that due to competition for selection and that due to the stopping rule. The emphasis here is on competition bias, which can be of the order of two or more standard errors when coefficients are estimated from the same data as were used to select the subset. Five possible ways of handling this bias are listed. This is the area most urgently requiring further research. Mean squared errors of prediction and stopping rules are briefly discussed. Competition bias invalidates the use of existing stopping rules as they are commonly applied to try to produce optimal prediction equations.
Article
The use of truncated distributions arises often in a wide variety of scientific problems. In the literature, there are a lot of sampling schemes and proposals developed for various specific truncated distributions. So far, however, the study of the truncated multivariate t (TMVT) distribution is rarely discussed. In this paper, we first present general formulae for computing the first two moments of the TMVT distribution under the double truncation. We formulate the results as analytic matrix expressions, which can be directly computed in existing software. Results for the left and right truncation can be viewed as special cases. We then apply the slice sampling algorithm to generate random variates from the TMVT distribution by introducing auxiliary variables. This strategic approach can result in a series of full conditional densities that are of uniform distributions. Finally, several examples and practical applications are given to illustrate the effectiveness and importance of the proposed results.
Article
It is argued that P-values and the tests based upon them give unsatisfactory results, especially in large samples. It is shown that, in regression, when there are many candidate independent variables, standard variable selection procedures can give very misleading results. Also, by selecting a single model, they ignore model uncertainty and so underestimate the uncertainty about quantities of interest. The Bayesian approach to hypothesis testing, model selection, and accounting for model uncertainty is presented. Implementing this is straightforward through the use of the simple and accurate BIC approximation, and it can be done using the output from standard software. Specific results are presented for most of the types of model commonly used in sociology. It is shown that this approach overcomes the difficulties with P-values and standard model selection procedures based on them. It also allows easy comparison of nonnested models, and permits the quantification of the evidence for a null hypothesis of interest, such as a convergence theory or a hypothesis about societal norms.
Article
Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-confident inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA)provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA have recently emerged. We discuss these methods and present a number of examples.In these examples, BMA provides improved out-of-sample predictive performance. We also provide a catalogue of currently available BMA software.
Article
Bayesian statistical methods have recently made great inroads into many areas of science, and this advance is now extending to the assessment of association between genetic variants and disease or other phenotypes. We review these methods, focusing on single-SNP tests in genome-wide association studies. We discuss the advantages of the Bayesian approach over classical (frequentist) approaches in this setting and provide a tutorial on basic analysis steps, including practical guidelines for appropriate prior specification. We demonstrate the use of Bayesian methods for fine mapping in candidate regions, discuss meta-analyses and provide guidance for refereeing manuscripts that contain Bayesian analyses.
Article
A fairly general procedure is studied to perturbate a multivariate density satisfying a weak form of multivariate symmetry, and to generate a whole set of non-symmetric densities. The approach is general enough to encompass a number of recent proposals in the literature, variously related to the skew normal distribution. The special case of skew elliptical densities is examined in detail, establishing connections with existing similar work. The final part of the paper specializes further to a form of multivariate skew t density. Likelihood inference for this distribution is examined, and it is illustrated with numerical examples.
Article
this article is to show how the Bayesian graphical framework unifies and greatly simplifies many standard discrete data problems such as Bayesian log linear modeling with either complete or incomplete data, model selection and accounting for model uncertainty, closed population estimation, multinomial estimation with misclassification, double sampling and database error prediction. This list by no means exhausts the possible applications. This is a methodological article. Our objective is to demonstrate the diverse range of potential applications, alert the reader to an exciting new methodology and hopefully stimulate further development
Model selection and accounting for model uncertainty in graphical models using occam's window
  • D Madigan
  • A E Raftery
Madigan D, Raftery AE. Model selection and accounting for model uncertainty in graphical models using occam's window. American Statistical Association. 1994; 89 (428):1535-46. [DOI: 10.1080/01621459.1994.10476894]
Bayesian model averaging for linear regression models
  • A E Raftery
  • D Madigan
  • J A Hoeting
Raftery AE, Madigan D, Hoeting JA. Bayesian model averaging for linear regression models. Journal of the American Statistical Association. 2014; 92(437):179-91. [DOI:10.1080/01621459.1997.1047 3615]