[Show abstract][Hide abstract] ABSTRACT: The detection of local genomic signals using high-throughput DNA sequencing
data can be cast as a problem of scanning a Poisson random field for local
changes in the rate of the process. We propose a likelihood-based framework for
for such scans, and derive formulas for false positive rate control and power
calculations. The framework can also accommodate mixtures of Poisson processes
to deal with over-dispersion. As a specific, detailed example, we consider the
detection of insertions and deletions by paired-end DNA-sequencing. We propose
several statistics for this problem, compare their power under current
experimental designs, and illustrate their application on an Illumina Platinum
Genomes data set.
[Show abstract][Hide abstract] ABSTRACT: Long range dependence in stationary processes of increments corresponds to the situations where the variance of cumulative sums is dominated by the accumulation of the covariances between increments. The Hurst parameter, the exponent of the standard deviation of the sum as a function of the number of increments involved, is a characteristic of long range dependence. Models of long range dependence, models that involve an Hurst parameter 0·5<H<1, are frequently used to model the incoming workload in computer networks and communication. Consider a Gaussian arrival process with long range dependence, a buffer, and a departure process bounded by the bandwidth. This paper present an analytical approximations of the probability of a buffer overflow within a given time interval. The analysis uses and demonstrates a measure-transformation technique.
[Show abstract][Hide abstract] ABSTRACT: This paper gives a new representation of Pickands' constants, which arise in
the study of extremes for a variety of Gaussian processes. Using this
representation, we resolve the long-standing problem of devising a reliable
algorithm for estimating these constants. A detailed error analysis illustrates
the strength of our approach.
[Show abstract][Hide abstract] ABSTRACT: Recombination events are not uniformly distributed and often cluster in narrow regions known as recombination hotspots. Several studies using different approaches have dramatically advanced our understanding of recombination hotspot regulation. Population genetic data have been used to map and quantify hotspots in the human genome. Genetic variation in recombination rates and hotspots usage have been explored in human pedigrees, mouse intercrosses, and by sperm typing. These studies pointed to the central role of the PRDM9 gene in hotspot modulation. In this study, we used single nucleotide polymorphisms (SNPs) from whole-genome resequencing and genotyping studies of mouse inbred strains to estimate recombination rates across the mouse genome and identified 47,068 historical hotspots--an average of over 2477 per chromosome. We show by simulation that inbred mouse strains can be used to identify positions of historical hotspots. Recombination hotspots were found to be enriched for the predicted binding sequences for different alleles of the PRDM9 protein. Recombination rates were on average lower near transcription start sites (TSS). Comparing the inferred historical recombination hotspots with the recent genome-wide mapping of double-strand breaks (DSBs) in mouse sperm revealed a significant overlap, especially toward the telomeres. Our results suggest that inbred strains can be used to characterize and study the dynamics of historical recombination hotspots. They also strengthen previous findings on mouse recombination hotspots, and specifically the impact of sequence variants in Prdm9.
[Show abstract][Hide abstract] ABSTRACT: The false discovery rate is a criterion for controlling Type I error in simultaneous testing of multiple hypotheses. For scanning statistics, due to local dependence, clusters of neighbouring hypotheses are likely to be rejected together. In such situations, it is more intuitive and informative to group neighbouring rejections together and count them as a single discovery, with the false discovery rate defined as the proportion of clusters that are falsely declared among all declared clusters. Assuming that the number of false discoveries, under this broader definition of a discovery, is approximately Poisson and independent of the number of true discoveries, we examine approaches for estimating and controlling the false discovery rate, and provide examples from biological applications. Copyright 2011, Oxford University Press.
[Show abstract][Hide abstract] ABSTRACT: Given a set of aligned sequences of independent noisy observations, we are
concerned with detecting intervals where the mean values of the observations
change simultaneously in a subset of the sequences. The intervals of changed
means are typically short relative to the length of the sequences, the subset
where the change occurs, the "carriers," can be relatively small, and the sizes
of the changes can vary from one sequence to another. This problem is motivated
by the scientific problem of detecting inherited copy number variants in
aligned DNA samples. We suggest a statistic based on the assumption that for
any given interval of changed means there is a given fraction of samples that
carry the change. We derive an analytic approximation for the false positive
error probability of a scan, which is shown by simulations to be reasonably
accurate. We show that the new method usually improves on methods that analyze
a single sample at a time and on our earlier multi-sample method, which is most
efficient when the carriers form a large fraction of the set of sequences. The
proposed procedure is also shown to be robust with respect to the assumed
fraction of carriers of the changes.
Full-text · Article · Aug 2011 · The Annals of Applied Statistics
[Show abstract][Hide abstract] ABSTRACT: Because of their somatic cell origin, human induced pluripotent stem cells (HiPSCs) are assumed to carry a normal diploid genome, and adaptive chromosomal aberrations have not been fully evaluated. Here, we analyzed the chromosomal integrity of 66 HiPSC and 38 human embryonic stem cell (HESC) samples from 18 different studies by global gene expression meta-analysis. We report identification of a substantial number of cell lines carrying full and partial chromosomal aberrations, half of which were validated at the DNA level. Several aberrations resulted from culture adaptation, and others are suspected to originate from the parent somatic cell. Our classification revealed a third type of aneuploidy already evident in early passage HiPSCs, suggesting considerable selective pressure during the reprogramming process. The analysis indicated high incidence of chromosome 12 duplications, resulting in significant enrichment for cell cycle-related genes. Such aneuploidy may limit the differentiation capacity and increase the tumorigenicity of HiPSCs.
[Show abstract][Hide abstract] ABSTRACT: Chronic neuropathic pain is affected by specifics of the precipitating neural pathology, psychosocial factors, and by genetic predisposition. Little is known about the identity of predisposing genes. Using an integrative approach, we discovered that CACNG2 significantly affects susceptibility to chronic pain following nerve injury. CACNG2 encodes for stargazin, a protein intimately involved in the trafficking of glutamatergic AMPA receptors. The protein might also be a Ca(2+) channel subunit. CACNG2 has previously been implicated in epilepsy. Initially, using two fine-mapping strategies in a mouse model (recombinant progeny testing [RPT] and recombinant inbred segregation test [RIST]), we mapped a pain-related quantitative trait locus (QTL) (Pain1) into a 4.2-Mb interval on chromosome 15. This interval includes 155 genes. Subsequently, bioinformatics and whole-genome microarray expression analysis were used to narrow the list of candidates and ultimately to pinpoint Cacng2 as a likely candidate. Analysis of stargazer mice, a Cacng2 hypomorphic mutant, provided electrophysiological and behavioral evidence for the gene's functional role in pain processing. Finally, we showed that human CACNG2 polymorphisms are associated with chronic pain in a cohort of cancer patients who underwent breast surgery. Our findings provide novel information on the genetic basis of neuropathic pain and new insights into pain physiology that may ultimately enable better treatments.
[Show abstract][Hide abstract] ABSTRACT: The likelihood ratio method for dealing with change-point problems of B. Yakir and M. Pollak [Ann. Appl. Probab. 8, No. 3, 749–774 (1998; Zbl 0937.60082)], which has subsequently been extended to deal with a wide variety of problems involving maxima of random fields, has as a key ingredient a conditional local limit theorem for a log-likelihood ratio, given an almost independent “local” sigma-algebra. This article contains a general version of that theorem, illustrated by several examples.
No preview · Article · Jul 2010 · Sequential Analysis
[Show abstract][Hide abstract] ABSTRACT: In his interesting paper Y. Mei [Sequential Anal. 27, No. 4, 354–376 (2008; Zbl 1149.62070)] criticized the use of the expected run length as the mean for controlling the rate of false detection and proposed alternative measures. In this paper we join Mei’s attack on the traditional constraint by claiming that the rate of detection is a local phenomena, and hence should be balanced against local constraints on false detection. We propose local probabilistic constraints on the rate of false detection and demonstrate their usefulness in the detection of a shift in a normal mean with an unknown baseline.
Preview · Article · Oct 2008 · Sequential Analysis
[Show abstract][Hide abstract] ABSTRACT: Until last year, type 2 diabetes (T2D) susceptibility loci have hardly been identified, despite great effort. Recently, however, several whole-genome association (WGA) studies jointly uncovered 10 robustly replicated loci. Here, we examine these loci in the Ashkenazi Jewish (AJ) population in a sample of 1,131 cases versus 1,147 controls. Genetic predisposition to T2D in the AJ population was found similar to that established in the previous studies. One SNP, rs7754840 in the CDKAL1 gene, presented a significantly stronger effect in the AJ population as compared to the general Caucasian population. This may possibly be due to the increased homogeneity of the AJ population. The use of the SNPs considered in this study, to identify individuals at high (or low) risk to develop T2D, was found of limited value. Our study, however, strongly supports the robustness of WGA studies for the identification of genes affecting complex traits in general and T2D in particular.
[Show abstract][Hide abstract] ABSTRACT: The result of Pollak [1985. Optimal detection of a change in distribution. Ann. Statist. 13, 206–227] proving the asymptotic optimality in sequential change-point detection of a suitable Shirayayev–Roberts stopping rule up to terms that vanish in the limit is generalized from the case of two completely specified distributions to that of a composite alternative hypothesis in a multidimensional exponential family. An explicit asymptotic lower bound on the expected Kullback–Leibler information required to detect a change-point is derived and is shown to be attained by a Shirayayev–Roberts stopping rule.
Full-text · Article · Sep 2008 · Journal of Statistical Planning and Inference
[Show abstract][Hide abstract] ABSTRACT: We give a unified treatment of the statistical foundations of population based association mapping and of family based linkage mapping of quantitative traits in humans. A central ingredient in the unification involves the efficient score statistic. The discussion focuses on generalized linear models with an additional illustration of the Cox (proportional hazards) model for age of onset data. We give analytic expressions for noncentrality parameters and show how they give qualitative insight into the loss of power that occurs if the scientist's assumed genetic model differs from nature's "true" genetic model. Issues to be studied in detail in the future development of this approach are discussed.
Preview · Article · Jan 2008 · Proceedings of the National Academy of Sciences
[Show abstract][Hide abstract] ABSTRACT: We study sequential change-point detection when observations form a sequence of independent Gaussian random fields, and the change-point is the time at which a signal of known functional form involving a finite number of unknown parameters appears. Building on D.O. Siegmund and B. Yakir, J. Stat. Plann. Inference 138, No. 9, 2815–2825 (2008; Zbl 05287891), which identifies in a simpler problem a detection procedure of Shiryayev-Roberts type that is asymptotically minimax up to terms that vanish as the false detection rate converges to zero, we compare easily computed approximations to the Shiryayev-Roberts detection procedure with similar approximations to CUSUM type procedures. Although the CUSUM type procedures are suboptimal, our studies indicate that they compare favorably to the asymptotically optimal procedures.
Full-text · Article · Jan 2008 · Statistics and its interface
[Show abstract][Hide abstract] ABSTRACT: Motivated by the problem of testing for the existence of a signal of known parametric structure and unknown ``location'' (as explained below) against a noisy background, we obtain for the maximum of a centered, smooth random field an approximation for the tail of the distribution. For the motivating class of problems this gives approximately the significance level of the maximum score test. The method is based on an application of a likelihood-ratio-identity followed by approximations of local fields. Numerical examples illustrate the accuracy of the approximations.
Full-text · Article · Nov 2007 · The Annals of Statistics
[Show abstract][Hide abstract] ABSTRACT: Sex and environment may dramatically affect genetic studies, and thus should be carefully considered. Beginning with two inbred mouse strains with contrasting phenotype in the neuroma model of neuropathic pain (autotomy), we established a backcross population on which we conducted a genome-wide scan. The backcross population was partially maintained in small social groups and partially in isolation. The genome scan detected one previously reported quantitative trait locus (QTL) on chromosome 15 (pain1), but no additional QTLs were found. Interestingly, group caging introduced phenotypic noise large enough to completely mask the genetic effect of the chromosome 15 QTL. The reason appears to be that group-caging animals from the low-autotomy strain together with animals from the high-autotomy strain dramatically increases autotomy in the otherwise low-autotomy mice (males or females). The converse, suppression of pain behaviour in the high-autotomy strain when caged with the low-autotomy strain was also observed, but only in females. Even in isolated mice, the genetic effect of the chromosome 15 QTL was significant only in females. To determine why, we evaluated autotomy levels of females in 12 different inbred stains of mice and compared them to previously reported levels for males. Strikingly larger environmental variation was observed in males than in females for this pain phenotype. The high baseline variance in males can explain the difficulty in detecting the genetic effect, which was readily seen in females. Our study emphasizes the importance of sex and environment in the genetic analysis of pain.
No preview · Article · Sep 2007 · European Journal of Neuroscience