Bayesian Nonparametric Hidden Markov Models with application to the analysis of copy-number-variation in mammalian genomes.

Department of Statistics and the Oxford-Man Institute for Quantitative Finance, University of Oxford, , .
Journal of the Royal Statistical Society Series B (Statistical Methodology) (Impact Factor: 5.72). 01/2011; 73(1):37-57. DOI: 10.1111/j.1467-9868.2010.00756.x
Source: PubMed

ABSTRACT We consider the development of Bayesian Nonparametric methods for product partition models such as Hidden Markov Models and change point models. Our approach uses a Mixture of Dirichlet Process (MDP) model for the unknown sampling distribution (likelihood) for the observations arising in each state and a computationally efficient data augmentation scheme to aid inference. The method uses novel MCMC methodology which combines recent retrospective sampling methods with the use of slice sampler variables. The methodology is computationally efficient, both in terms of MCMC mixing properties, and robustness to the length of the time series being investigated. Moreover, the method is easy to implement requiring little or no user-interaction. We apply our methodology to the analysis of genomic copy number variation.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider a Bayesian hierarchical model for the integration of gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. The approach defines a measurement error model that relates the gene expression levels to latent copy number states. In turn, the latent states are related to the observed surrogate CGH measurements via a hidden Markov model. The model further incorpo-rates variable selection with a spatial prior based on a probit link that exploits dependencies across adjacent DNA segments. Posterior inference is carried out via Markov chain Monte Carlo stochastic search techniques. We study the performance of the model in simulations and show better results than those achieved with recently proposed alternative priors. We also show an application to data from a genomic study on lung squamous cell carcinoma, where we identify potential candidates of associations between copy number variants and the transcriptional activity of target genes. Gene ontology (GO) analyses of our findings reveal enrichments in genes that code for proteins involved in cancer. Our model also identifies a number of potential candidate biomarkers for further experimental validation.
    Cancer informatics 09/2014; 13(S2):29-37.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Copy number variants (CNVs) may play an important part in the development of common birth defectssuch as oral clefts, and individual patients with multiple birth defects (including clefts) have beenshown to carry small and large chromosomal deletions. In this paper we investigate de novo deletionsdefined as DNA segments missing in an oral cleft proband but present in both unaffected parents.We compare de novo deletion frequencies in children of European ancestry with an isolated, nonsyndromicoral cleft to frequencies in European ancestry children from randomly sampled trios. We identified a genome-wide significant 62 kilo base (kb) non-coding region on chromosome 7p14.1where de novo deletions occur more frequently among oral cleft cases than controls. We also observedwider de novo deletions among cleft lip palate (CLP) cases than seen among cleft palate (CP) and cleftlip (CL) cases. This study presents a region where de novo deletions appear to be involved in the etiology of oralclefts, although the underlying biological mechanisms are still unknown. Larger de novo deletions aremore likely to interfere with normal craniofacial development and may result in more severe clefts.Study protocol and sample DNA source can severely affect estimates of de novo deletion frequencies.Follow-up studies are needed to further validate these findings and to potentially identify additionalstructural variants underlying oral clefts.
    BMC Genetics 02/2014; 15(1):24. · 2.36 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Hyperuricemia is associated with multiple diseases, including gout, cardiovascular disease, and renal disease. Serum urate is highly heritable, yet association studies of single nucleotide polymorphisms (SNPs) and serum uric acid explain a small fraction of heritability. Whether copy number polymorphisms (CNPs) contribute to uric acid levels is unknown.
    BMC genetics. 07/2014; 15(1):81.


Available from